List Info

Thread: Re: xen 3.1 problem (Re: xen 3.1.0 is there)




Re: xen 3.1 problem (Re: xen 3.1.0 is there)
user name
2007-06-20 04:38:22
Hi Daniel,

   On Jun 20, 11:16, Daniel Carosone wrote:
   > Subject: Re: xen 3.1 problem (Re: xen 3.1.0 is
there)
   > 
   > On Wed, Jun 20, 2007 at 08:38:44AM +0900, Kazushi
Marukawa wrote:
   > > > This would mean something disabled
interrupts between copyout() and
   > > > pmap_load() and failed to reenable them,
but I didn't find anything
   > > > obvious.  copyout() itself doesn't call
pmap_load() so there's
   > > >  probably a trap in between that isn't
shown by ddb.
   > >
   > > Thanks.  I think this is very minor bug since
only few of us
   > > are having this problem.
   > 
   > That puzzles me too. I wonder if it's something
specific to a
   > particular driver we use that others don't?  Shall
we compare kernel
   > configs?  Mine's attached.  The only things I'm
using that might be
   > unusual for hardware + drivers are:

I'm using the default XEN3_DOM0.  I didn't change anything.

On the other hand, I'm using Core Solo T2300 on GIGABYTE
GA-8I945GMMFY-RH (minor Intel 945GM motherboard).  I was
thinking this minor motherboard/chipset/core is causing the
problem.

In addition, I'm using 3ware 9500 (raid card) and Intel
100M
ether card on this motherboard.

   >  - cgd
   >  - bce (the special limited dma mapping)
   > 
   > While it's a great idea to do the binary date search
looking for a
   > culprit change, I suspect you'll hit the big merge
as the culprit, and
   > then it would help to have something else to direct
the search.

Yep.  It will happen.  I just thought finding a date
doesn't
spend much time since I have not enough time in week day to
understand how kernel works and search the problem.  

If you have something to know about my configuration or a
modified kernel to try, please let me know.

-- Kazushi
Saw a sign on a restaurant that said Breakfast, any time --
so I
ordered French Toast in the Renaissance.
		-- Steven Wright

Re: xen 3.1 problem (Re: xen 3.1.0 is there)
user name
2007-06-26 11:49:08
Hi,

Following patch fixed hvm KASSERT issue for me.  Not have a
concrete scenario when or how the race condition broke.

However, the related issue, slow hvm is not solved yet.
When I use WinXP with recent kernel, I feel it is much
slower than 4.99.18.  I feel something like below.
e.g. 4.99.21 Xen3.1.0 took 30 sec to open empty start menu.

4.99.18 Xen3.0.4 hvm WinXP(4/30 source)          base 1x
slow
4.99.20 Xen3.0.4 hvm WinXP(5/18 src with this patch)  5x
slow
4.99.21 Xen3.0.4 hvm WinXP(6/24 src with this patch)  5x
slow
4.99.21 Xen3.1.0 hvm WinXP(6/24 src with this patch)  20x
slow

-- Kazushi

Index: arch/xen/i386/locore.S
============================================================
=======
RCS file: /cvsroot/src/sys/arch/xen/i386/locore.S,v
retrieving revision 1.25
diff -u -r1.25 locore.S
--- arch/xen/i386/locore.S	17 May 2007 14:51:35 -0000	1.25
+++ arch/xen/i386/locore.S	26 Jun 2007 16:39:08 -0000
 -659,6
+659,8 
 
 switch_skipsave:
 
+	CLI(%ebx)
+
 	/*
 	 * Switch to newlwp's stack.
 	 */
 -681,6
+683,8 
 
 	movl	$0,CPUVAR(RESCHED)
 
+	STI(%ebx)
+
 	/*
 	 *  Check for restartable atomic sequences (RAS)
 	 */

On a paper submitted by a physicist colleague:

This isn't right.  This isn't even wrong.
		-- Wolfgang Pauli

Re: xen 3.1 problem (Re: xen 3.1.0 is there)
user name
2007-06-26 13:06:46
On Wed, Jun 27, 2007 at 01:49:08AM +0900, Kazushi Marukawa
wrote:
> Hi,
> 
> Following patch fixed hvm KASSERT issue for me.  Not
have a
> concrete scenario when or how the race condition
broke.

Does it still happens with yesterday's commit which did fix
a race in
cpu_idle() ? I can't see how this commit could fix the
KASSERT issue, but
who knows ...

your patch may be the fix because of the way Xen interrupts
work; an hypercall
(i386_switch_context will make 2) may return to
hypervisor_callback()
instead of the code that called it; and the iret will return
to where
the hypercall was done.
I wonder if when this happens within i386_switch_context()
the iret could
return at the wrong place (e.g. in userland code with a
process switch to
fully completed yet). 

Also your patch may cause interrupts that occured while they
were
disabled to be deferred until the next interrupt happens
(usually next clock
tick). This may be easily fixed though.

> 
> However, the related issue, slow hvm is not solved
yet.
> When I use WinXP with recent kernel, I feel it is much
> slower than 4.99.18.  I feel something like below.
> e.g. 4.99.21 Xen3.1.0 took 30 sec to open empty start
menu.
> 
> 4.99.18 Xen3.0.4 hvm WinXP(4/30 source)          base
1x slow
> 4.99.20 Xen3.0.4 hvm WinXP(5/18 src with this patch) 
5x slow
> 4.99.21 Xen3.0.4 hvm WinXP(6/24 src with this patch) 
5x slow
> 4.99.21 Xen3.1.0 hvm WinXP(6/24 src with this patch) 
20x slow

again, maybe yesterday commits fixed it ... please try !

-- 
Manuel Bouyer <bouyerantioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la
difference
--

[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )