List Info

Thread: Xen crash




Xen crash
user name
2007-10-08 06:08:13
Hi Centos-devel,

I'm new to this list and joined since I am volunteering as a
tech
admin for a non profit organization called CouchSurfing
(.com) where
we tried to move the web servers to Xen zones and this has
proven
quite unstable since the our defined zones tends to crash on
a daily
basis with the latest CentOS 5 Xen updates. The physical
boxes have 2
quad core 1.6GHz Xeon CPU's and 4 GB RAM, there is currently
only 1
domain on each box, configured with 2.2GB RAM.

Domain 0:
[rootnd10254 ~]# rpm -qa | grep xen
xen-libs-3.0.3-25.0.4.el5
kernel-xen-2.6.18-8.1.8.el5
kernel-xen-2.6.18-8.1.14.el5
xen-3.0.3-25.0.4.el5

Web1:
[rootweb1 ~]# rpm -qa | grep xen
kernel-xen-2.6.18-8.1.8.el5
kernel-xen-2.6.18-8.1.14.el5

The Domain0 zone is indeed rock stable, while the Web1 etc.
are
crashing daily with the 2.6.18-8.1.14 Xen kernel and the
stack trace
we see after a few hours is as follows:

BUG: soft lockup detected on CPU#5!

Call Trace:
  <IRQ>  [<ffffffff802a76ad>]
softlockup_tick+0xdb/0xed
 [<ffffffff8026ba66>] timer_interrupt+0x396/0x3f2
 [<ffffffff80210a87>] handle_IRQ_event+0x2d/0x60
 [<ffffffff802a79ec>] __do_IRQ+0xa4/0x105
 [<ffffffff802699b3>] do_IRQ+0xe7/0xf5
 [<ffffffff8038dde8>] evtchn_do_upcall+0x86/0xe0
 [<ffffffff8025cc1a>]
do_hypervisor_callback+0x1e/0x2c
 <EOI>  [<ffffffff802063aa>]
hypercall_page+0x3aa/0x1000
 [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000
 [<ffffffff8026afe2>] raw_safe_halt+0x84/0xa8
 [<ffffffff802684f8>] xen_idle+0x38/0x4a
 [<ffffffff80247bcd>] cpu_idle+0x97/0xba

BUG: soft lockup detected on CPU#7!

Etc., etc etc, It does not crash the Xen zones directly, but
clogs up
the Xen web1 console etc. We did not see this when running
the
2.6.18-8.1.8 Xen kernel, instead the Xen zones crashed less
frequent
with a out of memory problem as follows:

Call Trace:
 [<ffffffff802aeefc>] out_of_memory+0x4e/0x1d3
 [<ffffffff8020efe8>] __alloc_pages+0x229/0x2b2
 [<ffffffff8023fd5b>] __lock_page+0x5e/0x64
 [<ffffffff80232637>] read_swap_cache_async+0x42/0xd1
 [<ffffffff802b32a2>] swapin_readahead+0x4e/0x77
 [<ffffffff8020929d>] __handle_mm_fault+0xae3/0xf46
 [<ffffffff80260709>] _spin_lock_irqsave+0x9/0x14
 [<ffffffff80262fe8>] do_page_fault+0xe48/0x11dc
 [<ffffffff80207138>] kmem_cache_free+0x77/0xca
 [<ffffffff8025cb6f>] error_exit+0x0/0x6e

We think the whole problem is how the kernel fails to handle
resource
cloging (to many interrupts, heavy CPU and memory usage in
the defined
zones etc.) from stubmbling on some fuzzy posts on the net,
example:

http://article.gmane.org/gmane.comp.emulators.xen.us
er/26617

Is this problem known to you or new, any ideas on howto
resolve it?


Regards,
Nicolas Sahlqvist
CouchSurfing,.com
_______________________________________________
CentOS-devel mailing list
CentOS-develcentos.org
http://lists.centos.org/mailman/listinfo/centos-devel

Re: Xen crash
country flaguser name
United Kingdom
2007-10-08 10:50:06
Hi,

Nicolas Sahlqvist wrote:
> I'm new to this list and joined since I am volunteering
as a tech
> admin for a non profit organization called CouchSurfing
(.com) where
> we tried to move the web servers to Xen zones and this
has proven
> quite unstable since the our defined zones tends to
crash on a daily
> basis with the latest CentOS 5 Xen updates. The
physical boxes have 2
> quad core 1.6GHz Xeon CPU's and 4 GB RAM, there is
currently only 1
> domain on each box, configured with 2.2GB RAM.

wrong list 

try the CentOS users list, if you come up with a patch /
fix, feel free 
to drop a note with the patch included.

- KB

_______________________________________________
CentOS-devel mailing list
CentOS-develcentos.org
http://lists.centos.org/mailman/listinfo/centos-devel

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )