|
List Info
Thread: Seemingly random SIGILL in SMP
|
|
| Seemingly random SIGILL in SMP |
  United States |
2007-10-05 10:06:29 |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello,
in both -current and 4.0 I occasionally see processes dies
with
SIGILL, apparently at random. Looking at the core files
revealed that
the faulting instruction was always part of a PLT table,
apparently
they're not always flushed out after writing them. I can't
reliably
trigger the fault but building something non-trivial ( like
a
userland ) usually runs into it at some point.
So,
- - does anyone else see this?
- - if so, in SMP or in UP as well? I've never seen this
with an
uniprocessor kernel.
I changed the powerpc-specific part of ld.elf_so to flush
the cache
in a more consistent way and since then I haven't seen any
SIGILL and
my G4's been building stuff from pkgsrc all night.
If you see those SIGILL on a recent -current please try my
patched
ld.elf_so ( just dump it into /libexec, you might have to
use install
instead of cp though ) and see if they go away. The binary
is here:
ftp://ftp.netbsd.org/pub/NetBSD/misc/macallan/macppc/ld.elf_
so
built from yesterday's sources.
If that indeed fixes the SIGILL problem it needs to go into
4.0 fast.
have fun
Michael
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
iQEVAwUBRwZS9cpnzkX8Yg2nAQJh0Qf9Fm8Ed4GzU0rs93LWXBBU881BPskJ
r187
/wopNtUbwfUGdnONLhrH67t1PwPFaNQirDTHSQtNclw41CbSjxOTvPwSNrXW
r/W4
lcaFUXxoRx6BBasTQF22L08L4SbeA8CBeNn/3hz4+1WlCCbzkLEJdj4M9f/M
A0F+
eS53rE7FPPeG6+Gc274mGAVwqF4BSiNFTG4Vn2Bbg6CU+MROW0Ie1BOz6I2c
BG13
C0hv3KbYSdcF76rGml4pJ5hH05WD0JH0pngWWW91sMMp4GOKdvM1jXiw+G6G
i85I
H/LP1yZGEZEPLUUQhV6m22+9iW2C2QaMd9sjNonxm6uDCdOHNPzNZA==
=p5st
-----END PGP SIGNATURE-----
|
|
| Re: Seemingly random SIGILL in SMP |
  United States |
2007-10-05 10:42:27 |
Michael,
-> in both -current and 4.0 I occasionally see processes
dies with
-> SIGILL, apparently at random. Looking at the core
files revealed that
-> the faulting instruction was always part of a PLT
table, apparently
-> they're not always flushed out after writing them. I
can't reliably
-> trigger the fault but building something non-trivial (
like a
-> userland ) usually runs into it at some point.
-> So,
-> - - does anyone else see this?
Yes, I do whenever I try to build the userland. It usually
takes me five or
six tries before completion. I also get the occasional
SIGSTOP and SIGBUS,
I believe. It's very random, it will almost never happen
again at the same
place.
Since I couldn't find anything about this on the net, I just
assumed that it
was bad hardware, cpu or ram.
-> - - if so, in SMP or in UP as well? I've never seen
this with an
-> uniprocessor kernel.
->
I'll build the userland in a UP kernel tonight and let you
know. My iMac G3
has never had this problem and is extremely stable.
-> I changed the powerpc-specific part of ld.elf_so to
flush the cache
-> in a more consistent way and since then I haven't seen
any SIGILL and
-> my G4's been building stuff from pkgsrc all night.
->
-> If you see those SIGILL on a recent -current please
try my patched
-> ld.elf_so ( just dump it into /libexec, you might have
to use install
-> instead of cp though ) and see if they go away. The
binary is here:
->
ftp://ftp.netbsd.org/pub/NetBSD/misc/macallan/macppc/ld.elf_
so
-> built from yesterday's sources.
->
I'll test the new ld.elf_so as well.
Allen
--
You have received an email. Please reboot for the changes
to take effect.
8:20AM up 13 days, 11:19, 2 users, load averages: 0.72,
0.69, 0.61
|
|
| Re: Seemingly random SIGILL in SMP |
  United States |
2007-10-05 10:53:59 |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello,
On Oct 5, 2007, at 11:42, Allen Wong wrote:
> -> in both -current and 4.0 I occasionally see
processes dies with
> -> SIGILL, apparently at random. Looking at the core
files revealed
> that
> -> the faulting instruction was always part of a PLT
table, apparently
> -> they're not always flushed out after writing
them. I can't reliably
> -> trigger the fault but building something
non-trivial ( like a
> -> userland ) usually runs into it at some point.
> -> So,
> -> - - does anyone else see this?
>
> Yes, I do whenever I try to build the userland. It
usually takes
> me five or
> six tries before completion. I also get the occasional
SIGSTOP and
> SIGBUS,
> I believe. It's very random, it will almost never
happen again at
> the same
> place.
To verify it's the same problem please load the core file
into gdb
and disassemble what's at the fault address:
gdb -c whatever.core /path/to/whatever
disassemble 0xwhereveritborked
If the disassembly dump looks like this:
li r11,something
b somewhere
li r11, somethingelse
b elsewhere
or something like that ( just a long list of loads and
branches )
then it's the same problem.
> -> - - if so, in SMP or in UP as well? I've never
seen this with an
> -> uniprocessor kernel.
> ->
>
> I'll build the userland in a UP kernel tonight and let
you know.
> My iMac G3
> has never had this problem and is extremely stable.
That sounds indeed like the problem I'm talking about.
> -> I changed the powerpc-specific part of ld.elf_so
to flush the cache
> -> in a more consistent way and since then I haven't
seen any
> SIGILL and
> -> my G4's been building stuff from pkgsrc all
night.
> ->
> -> If you see those SIGILL on a recent -current
please try my patched
> -> ld.elf_so ( just dump it into /libexec, you might
have to use
> install
> -> instead of cp though ) and see if they go away.
The binary is here:
> ->
ftp://ftp.netbsd.org/pub/NetBSD/misc/macallan/macppc/ld.elf_
so
> -> built from yesterday's sources.
> ->
>
> I'll test the new ld.elf_so as well.
It probably won't work in 4.0, I'll build you one that
does.
have fun
Michael
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
iQEVAwUBRwZeF8pnzkX8Yg2nAQJcjwgAm8F9tMOsLw0vfCwZtbr+VeKLOQv7
cxpX
GzUFFClkhD6nV9bXBgIAYD0A1+d4yRi2LhZ3w7I6YWblDpCvc8TCxPYKk/Bs
gA+z
ZteZLY9K9bv+X1NcvVeiMRsWROaDCLc4AnPdCC9f0r8LshyP8AY33HqQN1xm
q9Xq
iEPEYfWhBHAq7JEUYqVUS3cC2y7tScbgUzUash1GXuDmOegTiBIh/4bOqbkp
YjcC
Wn9oXOBAm90ZcPbvf9paRAgWjGrlUhpXlueJvRK+ElAinAj4Xk53Dqb9l0X4
/LsZ
UV3ixjMaQtUehla/Lvo2m3tLMe0GE4ApF8auau+hkFV0GK1SdCd6xA==
=KGXi
-----END PGP SIGNATURE-----
|
|
| Re: Seemingly random SIGILL in SMP |
  United States |
2007-10-06 00:20:25 |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello,
On Oct 5, 2007, at 11:53, Michael Lorenz wrote:
>> I'll test the new ld.elf_so as well.
>
> It probably won't work in 4.0, I'll build you one that
does.
Here's a patched 4.0 ld.elf_so:
ftp://ftp.netbsd.org/pub/NetBSD/misc/macallan/macppc/ld_so_4
_0.tar.gz
have fun
Michael
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
iQEVAwUBRwcbGcpnzkX8Yg2nAQJVRQf9Fz/51EuJ8e10NAgjby6GwVeTW7HF
mvlh
LghXuiw1cRfw9SH7PI6izoGU5v2kVHCrO7THw8kG/bhj0ZhDH6f2oCvKdkQc
9RKu
mK4SglgY9Z7Y3yWNNjYmh3Mk+2NPKfmSR1uFHpRh4XuaaU83d8maTflGGNC8
74vL
GpqeyqNq6pcfxBgUPXaheoxApmp6TvSwSw9PNOqAME7yi9kkxEDxA+6b9MUM
FAjG
yepuXKjG1yHNKXE/wQOQl8XvpkV/wcU4cSOOe4z0yLKMVKDtYTDeJLpkt/EK
Cc2U
qZhHLoRX3h/9e6Yx/1h75hGhaYAsY6FZs9cNbwI8rIr1fos5t3diMA==
=s03J
-----END PGP SIGNATURE-----
|
|
| Re: Seemingly random SIGILL in SMP |
  United States |
2007-10-06 01:49:39 |
Michael,
-> To verify it's the same problem please load the core
file into gdb
-> and disassemble what's at the fault address:
-> gdb -c whatever.core /path/to/whatever
-> disassemble 0xwhereveritborked
->
-> If the disassembly dump looks like this:
-> li r11,something
-> b somewhere
-> li r11, somethingelse
-> b elsewhere
-> or something like that ( just a long list of loads and
branches )
-> then it's the same problem.
->
I don't get anything from gdb:
# gdb -c /usr/src//lib/libcrypto/sh.core /bin/sh
GNU gdb 5.3nb1
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public
License, and you are
welcome to change it and/or distribute copies of it under
certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show
warranty" for details.
This GDB was configured as "powerpc--netbsd"...(no
debugging symbols
found)...
Core was generated by `sh'.
Program terminated with signal 4, Illegal instruction.
Reading symbols from /lib/libedit.so.2...(no debugging
symbols
found)...done.
Loaded symbols for /lib/libedit.so.2
Reading symbols from /lib/libtermcap.so.0...(no debugging
symbols found)...
done.
Loaded symbols for /lib/libtermcap.so.0
Reading symbols from /lib/libc.so.12...(no debugging symbols
found)...done.
Loaded symbols for /lib/libc.so.12
Reading symbols from /libexec/ld.elf_so...(no debugging
symbols
found)...done.
Loaded symbols for /libexec/ld.elf_so
#0 0xeff7cf74 in ?? () from /lib/libc.so.12
(gdb) disassemble 0xeff7cf74
No function contains specified address.
-> >I'll build the userland in a UP kernel tonight and
let you know.
-> >My iMac G3
-> >has never had this problem and is extremely
stable.
->
-> That sounds indeed like the problem I'm talking
about.
->
A UP kernel builds the userland with no problems.
-> It probably won't work in 4.0, I'll build you one that
does.
->
Can you please build one for 3.1 as well? I'm still working
on booting 4.0
on an ide drive. Thanks!
Allen
--
If all about you there are people panicking while you remain
calm and serene, maybe you don't belong on Wall Street.
Alan Abelson, editor Barron's
11:40PM up 3:25, 2 users, load averages: 0.07, 0.24,
0.24
|
|
| Re: Seemingly random SIGILL in SMP |
  United States |
2007-10-06 02:46:00 |
|
| Hello,
On Oct 6, 2007, at 02:49, Allen Wong wrote:
> -> To verify it's the same problem please load the core file into gdb
> -> and disassemble what's at the fault address:
> -> gdb -c whatever.core /path/to/whatever
> -> disassemble 0xwhereveritborked
> ->
> -> If the disassembly dump looks like this:
> -> li r11,something
> -> b somewhere
> -> li r11, somethingelse
> -> b elsewhere
> -> or something like that ( just a long list of loads and branches )
> -> then it's the same problem.
> ->
>
> I don't get anything from gdb:
...
> Program terminated with signal 4, Illegal instruction.
...
> #0 0xeff7cf74 in ?? () from /lib/libc.so.12
> (gdb) disassemble 0xeff7cf74
> No function contains specified address.
Might be a different issue.
Although, a miscached PLT entry doesn't necessarily give a SIGILL
right away, the cache might contain some valid instruction that could
do pretty much anything before faulting somewhere. Is the stack valid
( eg, does the bt command give a useful stack trace? )
> -> >I'll build the userland in a UP kernel tonight and let you know.
> -> >My iMac G3
> -> >has never had this problem and is extremely stable.
> ->
> -> That sounds indeed like the problem I'm talking about.
> ->
>
> A UP kernel builds the userland with no problems.
Yeah, there are probably more cache syncing problems left.
> -> It probably won't work in 4.0, I'll build you one that does.
> ->
>
> Can you please build one for 3.1 as well? I'm still working on
> booting 4.0
> on an ide drive. Thanks!
That might take some time, I need to download the source first. If
you have the source handy, the patch is small, changes only a handful
lines in a single file. See attachment.
have fun
Michael
|
Approximate file size 965 bytes |
| Re: Seemingly random SIGILL in SMP |
  United States |
2007-10-06 09:13:16 |
Michael,
-> Might be a different issue.
-> Although, a miscached PLT entry doesn't necessarily
give a SIGILL
-> right away, the cache might contain some valid
instruction that could
-> do pretty much anything before faulting somewhere. Is
the stack valid
-> ( eg, does the bt command give a useful stack trace?
)
->
# gdb -c /usr/src//lib/libcrypto/sh.core /bin/sh
<snip>
Program terminated with signal 4, Illegal instruction.
Reading symbols from /lib/libedit.so.2...(no debugging
symbols
found)...done.
Loaded symbols for /lib/libedit.so.2
Reading symbols from /lib/libtermcap.so.0...(no debugging
symbols found)...
done.
Loaded symbols for /lib/libtermcap.so.0
Reading symbols from /lib/libc.so.12...(no debugging symbols
found)...done.
Loaded symbols for /lib/libc.so.12
Reading symbols from /libexec/ld.elf_so...(no debugging
symbols
found)...done.
Loaded symbols for /libexec/ld.elf_so
#0 0xeff7cf74 in ?? () from /lib/libc.so.12
(gdb) bt
#0 0xeff7cf74 in ?? () from /lib/libc.so.12
#1 0x0180ed20 in waitproc ()
#2 0x0180ea94 in dowait ()
#3 0x0180ea18 in waitforjob ()
#4 0x018055bc in evalcommand ()
#5 0x01803c84 in evaltree ()
#6 0x01803c74 in evaltree ()
#7 0x01803b88 in evalstring ()
#8 0x0180f968 in main ()
#9 0x01801990 in _start ()
#10 0xefff3428 in ?? () from /libexec/ld.elf_so
-> >Can you please build one for 3.1 as well? I'm
still working on
-> >booting 4.0
-> >on an ide drive. Thanks!
->
-> That might take some time, I need to download the
source first. If
-> you have the source handy, the patch is small, changes
only a handful
-> lines in a single file. See attachment.
->
I've applied the patch manually and I'm building it now.
I'll let you know.
Allen
--
My friends visited Silicon Valley and all I got was this
lousy sig.
7:00AM up 10:46, 2 users, load averages: 0.74, 0.32,
0.28
|
|
| Re: Seemingly random SIGILL in SMP |
  United States |
2007-10-06 13:40:38 |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello,
On Oct 6, 2007, at 10:13, Allen Wong wrote:
> -> Might be a different issue.
> -> Although, a miscached PLT entry doesn't
necessarily give a SIGILL
> -> right away, the cache might contain some valid
instruction that
> could
> -> do pretty much anything before faulting
somewhere. Is the stack
> valid
> -> ( eg, does the bt command give a useful stack
trace? )
> ->
>
> # gdb -c /usr/src//lib/libcrypto/sh.core /bin/sh
...
> #0 0xeff7cf74 in ?? () from /lib/libc.so.12
> (gdb) bt
> #0 0xeff7cf74 in ?? () from /lib/libc.so.12
> #1 0x0180ed20 in waitproc ()
Could still be a hosed PLT entry that just pointed into
nirvana
instead of faulting right away.
have fun
Michael
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
iQEVAwUBRwfWp8pnzkX8Yg2nAQKgmAf8DbUmJlwDttJRMm69v6IpaPB42Bjl
uaF+
QIy3OFKfgb6qh7SZzda2tW8QkBLQdQoaNCan8oxJQEzEMfmYRDOHUzl8BOj4
ceCN
4Ej72KLKJX1p9BLzcUgmnVcUPOYxyhw0Kvn5SDvwiXfTafww5yCJe7MLzWxz
RP8C
uX8m99zhP1VSgFET5rHZ4N9PmgA16U0PiCLwdGgeC7h2zIT7OmL0GzyiQw/6
If7X
4yKVQUT2em5I92BfqYPO/VW45WBubQMOfA90rVds+KrJlPwbh9sdg+sK1qyR
pv+y
QVx0RXIWHBOIQ4onKbNjCvjmEDSUMbsEPJRtWoysAeUfDtLOnTWrow==
=fV6U
-----END PGP SIGNATURE-----
|
|
| Re: Seemingly random SIGILL in SMP |
  United States |
2007-10-06 20:53:37 |
Michael,
-> -> That might take some time, I need to download
the source first. If
-> -> you have the source handy, the patch is small,
changes only a handful
-> -> lines in a single file. See attachment.
-> ->
->
-> I've applied the patch manually and I'm building it
now. I'll let you know.
->
The patch seems to be working. The userland has been
building for six hours
and no SIGILL or any other signals, for that matter. I'll
build the userland
a few more times to verify.
Allen
--
Documentation is alot like sex. When it's good, it's very,
very good. And
when it's bad, it's still better than nothing.
6:40PM up 22:25, 2 users, load averages: 0.44, 0.36,
0.29
|
|
| Re: Seemingly random SIGILL in SMP |
  United States |
2007-10-07 11:22:57 |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello,
On Oct 6, 2007, at 21:53, Allen Wong wrote:
> -> -> That might take some time, I need to
download the source
> first. If
> -> -> you have the source handy, the patch is
small, changes only a
> handful
> -> -> lines in a single file. See attachment.
> -> ->
> ->
> -> I've applied the patch manually and I'm building
it now. I'll
> let you know.
> ->
>
> The patch seems to be working. The userland has been
building for
> six hours
> and no SIGILL or any other signals, for that matter.
I'll build
> the userland
> a few more times to verify.
Nice
Please let me know if you run into any other trouble with
this. My
dual G4 has been stable so far and I've been torturing it
for the
last few days.
have fun
Michael
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
iQEVAwUBRwkH4cpnzkX8Yg2nAQLPNQf/eFC1P+14IwlR5WmPnXCtrqsLbhRx
5ai/
WKT82h02Z4JO2aaXPIamSzACPjJaPzfPSNdVuMzJgmtspdmUwPQBEPBRvI12
oWvL
EPlvBXVHjK6wOACt/re9VKcal5zlOT8+9nBRAXyktw7vEj3RL3+Wn4UukD2X
+xA8
935SjckkE7hu2+C3nvx6IOYyZqfI75aiqO3HvMPfgx1hlLC93/xjOt9sOEp2
ATJv
J/z6oPS7VsUMpqF7nyt6G+Gh09hCSk2VAcHAyAmna4ewqqAurt3vhfvi2rvK
xrPq
ezgHOvhQEnLFoxkOXbtBqNrEMFP4riDJIX3HYPAf+GO69R0RzJ2Uxg==
=p5u5
-----END PGP SIGNATURE-----
|
|
[1-10]
|
|