List Info

Thread: Re: Gentoo crashing?




Re: Gentoo crashing?
country flaguser name
United States
2007-05-14 05:50:31
"Peter Davoust" <worldgnatgmail.com> posted
7c08b4dd0705132304h5eccea49k22513343959aff52mail.gmail.com, excerpted
below, on  Mon, 14 May 2007 02:04:30 -0400:

> I agree, it could be the heat, and that was the first
thing that came to
> my mind, but Vista boots and runs for long periods of
time with no
> issues. I'll check it out with the new kernel in the
morning and see
> what it does.

Note that Gentoo tends to use hardware to its limits rather
more than 
most OSs, MSWormOS and other Linux distributions alike. 
Vista is so new, 
and /does/ stress at least the video hardware rather more
(if aero is on, 
anyway), so I don't know if anyone can rightly say with it,
but certainly 
with older MS platforms, it hasn't been uncommon at /all/
for Gentoo to 
cause problems where MS didn't, and even other Linux
distributions didn't.

Part of the reason is that Gentoo tends to be
compiled/optimized for the 
specific CPU it's running on, so it makes more efficient use
of it, 
including use of functionality distributions (and MS)
compiled for use on 
generic hardware simply don't use, plus simply the fact that
when the CPU 
is busy, it's often getting more done in the same time, so
it IS working 
harder and therefore stressing out the hardware more.

Anyway, just because another OS doesn't have problems on a
computer 
doesn't mean Gentoo won't, and there are quite a number of
folks on the 
forums and on the gentoo-user list that will tell you the
same thing -- 
learned from hard experience.

Meanwhile, you mention specifically that one of the crashes
was during a 
bz2 decompress.  As someone who has HAD memory issues in the
past, I can 
DEFINITELY tell you that bz2 DOES often trigger memory
errors, if 
ANYTHING will!  If the issues with BZ2 turn out to be
common, CHECK THAT 
MEMORY, and check it again!  You mentioned you have 2 gigs. 
Hopefully 
it's in the form of 2 or more sticks.  If so, you should be
able to take 
part of it out and see if the problem persists.  Then test
the other 
memory.  If the problem happens with one set but not the
other, you have 
your problem.  Do note, however, that just because the
problem continues 
to occur with either memory set doesn't necessarily mean
it's not the 
memory, particularly if they are the same brand and size,
purchased from 
the same place at the same time, so are likely in the same
lot.

In my case, I had purchased generic memory that couldn't
quite do its 
rated pc3200 (clock at 200 MHz x 2, since it was DDR).  I
ran memtest and 
it passed with flying colors, because the memory worked
fine, and memtest 
apparently doesn't really stress the memory timings, only
testing the 
memory cells.  However, I was crashing in operation,
sometimes just the 
app, sometimes the entire kernel would panic.  I turned on
the kernel's 
MCE (machine check exception) reporting, and the memory was
indeed the 
problem (google MCEs, there's an app available that you can
run, feeding 
it the numbers, and it'll spit out the error in English),
only wasn't 
quite sure whether it was the memory itself, or the mobo,
causing 
perfectly good memory to generate errors upon data delivery
because it 
couldn't reliably get the data to the CPU.

While I didn't have the necessary BIOS settings at the time,
sometime 
later a BIOS update gave me additional memory settings, and
I found that 
reducing the memory timings by a single notch, to 183 MHz
(DDR doubled to 
366), effectively PC3000 memory, did the trick.  I was even
able to tweak 
some of the individual wait-state settings to get back a bit
of the 
performance I lost with the under-clocking.  The memory and
entire 
machine was rock-stable at the 183 MHz PC3000 memory
setting.

Later I upgraded from my then two 512 MB sticks to four 2 GB
sticks, 8 
gigs memory total.  It was indeed the memory, not the board,
as the new 
memory was just as stable at PC3200 as the old memory had
been at the 
under-clocked PC3000 speed.

Anyway, the way bzip2 works is apparently extremely
stressful on memory, 
as more than anything else, that would trigger the errors. 
Compiles were 
frustrating too, but sometimes I could compile for quite
some time 
without issues.  That's why I didn't think it was the CPUs
even before I 
got the program to read the MCE numbers and tell me what
they were.  They 
confirmed, it was memory related, the errors were on data as
the CPU got 
it.  I just didn't know until I actually changed memory
whether it was 
the mobo generating errors on the data in transit, or the
memory itself.  
It turned out to be the memory.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." 
Richard Stallman

-- 
gentoo-amd64gentoo.org mailing list


[1]

about | contact  Other archives ( Real Estate discussion Medical topics )