List Info

Thread: NMI errors in messages files




NMI errors in messages files
country flaguser name
United Kingdom
2007-04-23 08:05:01
I have a HP Proliant DL740 G1 server running Red Hat
Enterprise Linux AS release 3 (Taroon Update 4)

The following kernel is running on the DL740 G1 server.

[root]# uname -a
Linux nodeP 2.4.21-27.ELsmp #1 SMP Wed Dec 1 21:59:02 EST
2004 i686 i686 i386 GNU/Linux.


The server apparently crashed. 

After rebooting it I checked the messages file and noticed
the following errors:-

Apr 22 06:18:03 praxidike kernel: Uhhuh. NMI received. Dazed
and confused, but trying to continue
Apr 22 06:18:03 praxidike kernel: You probably have a
hardware problem with your RAM chips
Apr 22 06:54:23 praxidike ntpd[1033]: time reset 2.496282 s
Apr 22 06:54:23 praxidike ntpd[1033]: synchronisation lost
Apr 22 07:28:52 praxidike ntpd[1033]: time reset 1.283675 s
Apr 22 07:28:52 praxidike ntpd[1033]: synchronisation lost
Apr 22 07:50:19 praxidike ntpd[1033]: time reset -0.569282
s


Then the same error was later reported in the messages
file:-

Apr 23 10:04:13 praxidike kernel: Uhhuh. NMI received. Dazed
and confused, but trying to continue
Apr 23 10:04:13 praxidike kernel: You probably have a
hardware problem with your RAM chips
Apr 23 10:17:53 praxidike ntpd[1016]: time reset 0.892751 s
Apr 23 10:17:53 praxidike ntpd[1016]: synchronisation lost
Apr 23 10:38:32 praxidike ntpd[1016]: time reset -0.595013
s
Apr 23 10:38:32 praxidike ntpd[1016]: synchronisation lost


Any ideas what the NMI errors are and how to fix them?

Also how can I fix the ntpd synchronisation lost errors?



             

--
nahant-list mailing list
nahant-listredhat.com
h
ttps://www.redhat.com/mailman/listinfo/nahant-list

Re: NMI errors in messages files
country flaguser name
United States
2007-04-23 08:52:30
On Mon, 2007-04-23 at 14:05 +0100, d.qureshimdx.ac.uk
wrote:
> Then the same error was later reported in the messages
file:-
> 
> Apr 23 10:04:13 praxidike kernel: Uhhuh. NMI received.
Dazed and confused, but trying to continue
> Apr 23 10:04:13 praxidike kernel: You probably have a
hardware problem with your RAM chips
> Apr 23 10:17:53 praxidike ntpd[1016]: time reset
0.892751 s
> Apr 23 10:17:53 praxidike ntpd[1016]: synchronisation
lost
> Apr 23 10:38:32 praxidike ntpd[1016]: time reset
-0.595013 s
> Apr 23 10:38:32 praxidike ntpd[1016]: synchronisation
lost
> 
> 
> Any ideas what the NMI errors are and how to fix them?

First, you might want to actually post this on the RHEL3
(Taroon) list
rather than the RHEL4 list since that's what your asking
about.

Is this a system that has been running reliably for some
time?  When was
U4 installed?  If it's been running for some time, and
suddenly you are
seeing this message, then you probably really do need to
look at the
hardware, or at least at any recent changes made to the
machine.  RAM is
a valid guess, but it could be a multitude of things.

Does the system happen to be running the HP Server
Management Agents?
Normally these agents should help record additional
information about
the NMI to help you identify the defective component,
however, there are
some know issues with the older Server Management Agents and
the newer
RHEL3 kernels.  Take a look at:

http://h200003.www2.hp.com/bizsupport
/TechSupport/Document.jsp?lang=en&cc=us&taskId=110&a
mp;prodSeriesId=327496&prodTypeId=15351&prodSeriesId
=327496&objectID=PSD_EU040315_CW01

Updating/installing the HP Server Management Agents may
allow you to get
more insight into exactly what is going wrong.

As far as the ntp issues, I'd worry about getting the system
stable
first, then, if your still having NTP issues, that would be
worth
looking at.

Later,
Tom


--
nahant-list mailing list
nahant-listredhat.com
h
ttps://www.redhat.com/mailman/listinfo/nahant-list

Re: NMI errors in messages files
country flaguser name
United States
2007-04-23 13:52:02
On Mon, 23 Apr 2007, d.qureshimdx.ac.uk wrote:

[...]
> Apr 22 06:18:03 praxidike kernel: You probably have a
hardware problem with your RAM chips
[...]

> Any ideas what the NMI errors are and how to fix them?

Right there in what you posted. It is telling you that you
probably have 
some bad memory. Run memtest86 on the machine to verify it.

-- 
Benjamin Franz

"It is moronic to predict without first establishing an
error rate
  for a prediction and keeping track of one’s past record
of accuracy."
                     -- Nassim Nicholas Taleb, Fooled By
Randomness
--
nahant-list mailing list
nahant-listredhat.com
h
ttps://www.redhat.com/mailman/listinfo/nahant-list

[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )