List Info

Thread: 4.0rc1 less stable than 3.1?




4.0rc1 less stable than 3.1?
user name
2007-10-07 11:13:50
Hello,

first off, I admit I have bad hardware: 1st gen B&W G3,
with the buggy
IDE chip.  I generally compile my kernel with

 wd*   at atabus? drive ? flags 0x0f00

which should disable UDMA and thus _should_ prevent errors
during IDE
transfers.  (This means that I can't use GENERIC kernels)

I have 2 disks on the primary bus, out of disk space
necessity.

However, the almost-generic kernel:
bash-3.2$ diff GENERIC VINYAMAR
29c29
< options       ALTIVEC         # Include AltiVec
support
---
> #options      ALTIVEC         # Include AltiVec
support
200c200
< config                netbsd  root on ? type ?
---
> config                netbsd  root on wd0a type ?
420c420
< wd*   at atabus? drive ? flags 0x0000
---
> wd*   at atabus? drive ? flags 0x0f00

appears to be less stable than 3.1 - I haven't been able to
compile
userland 4.0 yet, from anoncvs updated with tag netbsd-4
about a week
ago.  The latest failure was:

trap: kernel write DSI trap dd3006f4 by 0x32c924 (DSISR
0x42000000,
err=14), lr 0x32d070
panic: trap
stopped in pid 855,1 (du) at netbsd: cpu_debugger+0x10: lwz
r0,r1,0x14

So this happened during a du so there was a fair bit of disk
activity,
and commonly I notice that heavy disk activity can bring the
machine
(or if I'm lucky, just individual tasks) down - even under
3.1, but
seemingly more so under 4.0.

Is there a way I can check if it is just my hardware?  Is it
possible
to make a kernel that is more robust to these kind of IDE
troubles?
At this pont, I'm considering replacing the machine (which
is my
primary home server) with a boring i386 box - but that means
trying to
find a cheap one that is reasonably quiet...

Joe.

Re: 4.0rc1 less stable than 3.1?
country flaguser name
United States
2007-10-07 11:21:36
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

On Oct 7, 2007, at 12:13, Joachim Thiemann wrote:

> appears to be less stable than 3.1 - I haven't been
able to compile
> userland 4.0 yet, from anoncvs updated with tag
netbsd-4 about a week
> ago.  The latest failure was:
>
> trap: kernel write DSI trap dd3006f4 by 0x32c924 (DSISR
0x42000000,
> err=14), lr 0x32d070
> panic: trap
> stopped in pid 855,1 (du) at netbsd: cpu_debugger+0x10:
lwz r0,r1,0x14

That's not very useful without a stack trace.

> Is there a way I can check if it is just my hardware?

Sort of. Compare the panics and their stack traces, if most
of them  
come from the same kernel subsystem it's not likely your
hardware. If  
they come from all over the place it probably is.
My beige G3 has no such troubles, neither does any of my
other  
machines. It might help to change your IDE cables though,
faster  
modes not working reliably is usually a sign of a bad
cable.

have fun
Michael
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQEVAwUBRwkHkcpnzkX8Yg2nAQJkiwf/XNCvnSo9N36FOF04Vc1Xid53Clhd
LL+V
zOUuABeDy7BVTK1eBi8DKdQmsmzwkHVUfT2of5D9r7/ZTxvwDZxmJu1gNTaM
QBke
OpMFwr4tDKkdCRXYyJZ5NRD0ddOCgELQRz4wGBOIzd8sm3abxcfzA8wp6A+f
uDTp
q9CvhIVPaSPcIySB95M8asl4eQjCnz44DUTEu7i+9JTStV5Uv11DfBtCOw7Y
dMwf
M9pljxjnwwu9Dgi/F/hV7pJgY9L/BwkE7PF0/98tUPf2OiOJKl7q/k4zo2Fh
DbQ5
fJTFgiAmurq/I7dEqTEPDhyi9LWeyXSX1ocrcJUZzoXzKha3ewlJNA==
=kGsh
-----END PGP SIGNATURE-----

Re: 4.0rc1 less stable than 3.1?
user name
2007-10-07 14:29:52
(Sorry, forgot to also reply to port-macppc the last time)

On 07/10/2007, Michael Lorenz <macallannetbsd.org> wrote:
> > > That's not very useful without a stack
trace.

Ok, another crash, this time with stack trace:
panic: pool_put: dino2pl: page header missing
Stopped in pid 85.1 (cvs) at netbsd: cpu_Debugger +0x10: lzw
r0,r1,0x14

The backtrace:
0xd5737980: at panic+0x1b4
...: at pool_do_pu+0x198
...: at pool_put+28
at ffs_reclaim+0x6c
at VOP_RECLAIM+0x30
at vclean+0x98
at vgonel+0x5c
at getcleanvnode+0xe0
at getnewvnode+0xe0
at ffs_vget+0xfc
at ufs_lookup+0x734
at VOP_LOOKUP+0x34
at lookup+0x320
at namei+0x138
at compat_30_sys___stat13+0x54
at syscall_plain+0x1fc
0xd5737f40: user SC trap #278 by 0xefcb9920: srr1=0xf032
r1=0xffffdic0
cr=0x22008048 xer=0 ctr=0xefcb9918

>
> That's a cmdide, isn't it? Sucky but should work at
least with PIO4/
> DMA2.

Yes, my dmesg is:
cmdide0 at pci1 dev 1 function 0
cmdide0: CMD Technology PCI0646 (rev. 0x05)
cmdide0: bus-master DMA support present
cmdide0: primary channel configured to native-PCI mode
cmdide0: using irq 26 for native-PCI interrupt
atabus0 at cmdide0 channel 0
cmdide0: secondary channel configured to native-PCI mode
cmdide0: secondary channel ignored (disabled)
...
wdc0 at obio0 offset 0x20000 irq 13: DMA transfer
atabus1 at wdc0 channel 0
...
wd0 at atabus0 drive 0: <WDC WD2500JB-00REA0>
wd0: drive supports 16-sector PIO transfers, LBA48
addressing
wd0: 232 GB, 484521 cyl, 16 head, 63 sec, 512 bytes/sect x
488397168 sectors
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5
(Ultra/100)
wd1 at atabus0 drive 1: <IBM-DTTA-350640>
wd1: drive supports 16-sector PIO transfers, LBA addressing
wd1: 6197 MB, 12592 cyl, 16 head, 63 sec, 512 bytes/sect x
12692736 sectors
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2
(Ultra/33)
wd0(cmdide0:0:0): using PIO mode 4, DMA mode 2 (using DMA)
wd1(cmdide0:0:1): using PIO mode 4, DMA mode 2 (using DMA)
atapibus0 at atabus1: 2 targets
cd0 at atapibus0 drive 1: <MATSHITA CR-587, , 7S14>
cdrom removable
cd0: drive supports PIO mode 4, DMA mode 2
cd0(wdc0:0:1): using PIO mode 4, DMA mode 2 (using DMA)

> On the other hand, the integrated IDE channels do that
as well,
> Heathrow IDE is a bit weird ( mine spits an error when
probing the
> drives but all that does is a 3 second delay ) but it
works just fine
> if you have good cables. At least the Heathrow in my
beige G3 was
> extremely sensitive - when I first got the machine I
had to run it
> without IDE DMA because of all the errors, data
corruption etc. I got
> on both channels. Didn't think of swapping the cables
right away
> because I didn't expect them both to be bad. Got new
cables - no more
> problems.

Yes, I'll definitively try swapping cables.  However, 3.1
did seem
more stable, so I'll wait for the next crash (shouldn't take
too long
 and
see if the backtrace is similar.

Enjoy!
Joe.

Re: 4.0rc1 less stable than 3.1?
country flaguser name
United States
2007-10-07 18:04:42
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

On Oct 7, 2007, at 15:29, Joachim Thiemann wrote:

> (Sorry, forgot to also reply to port-macppc the last
time)
>
> On 07/10/2007, Michael Lorenz <macallannetbsd.org> wrote:
>>>> That's not very useful without a stack
trace.
>
> Ok, another crash, this time with stack trace:
> panic: pool_put: dino2pl: page header missing
> Stopped in pid 85.1 (cvs) at netbsd: cpu_Debugger
+0x10: lzw  
> r0,r1,0x14
>
> The backtrace:
> 0xd5737980: at panic+0x1b4
> ...: at pool_do_pu+0x198
> ...: at pool_put+28
> at ffs_reclaim+0x6c
> at VOP_RECLAIM+0x30

Hmm, that's file system code. Does it always come form
there?

>> That's a cmdide, isn't it? Sucky but should work at
least with PIO4/
>> DMA2.
>
> Yes, my dmesg is:
> cmdide0 at pci1 dev 1 function 0
> cmdide0: CMD Technology PCI0646 (rev. 0x05)
> cmdide0: bus-master DMA support present
> cmdide0: primary channel configured to native-PCI mode
> cmdide0: using irq 26 for native-PCI interrupt
> atabus0 at cmdide0 channel 0
> cmdide0: secondary channel configured to native-PCI
mode
> cmdide0: secondary channel ignored (disabled)
> ...
> wdc0 at obio0 offset 0x20000 irq 13: DMA transfer
> atabus1 at wdc0 channel 0
> ...
> wd0 at atabus0 drive 0: <WDC WD2500JB-00REA0>
> wd0: drive supports 16-sector PIO transfers, LBA48
addressing
> wd0: 232 GB, 484521 cyl, 16 head, 63 sec, 512
bytes/sect x  
> 488397168 sectors
> wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA
mode 5 (Ultra/ 
> 100)
> wd1 at atabus0 drive 1: <IBM-DTTA-350640>
> wd1: drive supports 16-sector PIO transfers, LBA
addressing
> wd1: 6197 MB, 12592 cyl, 16 head, 63 sec, 512
bytes/sect x 12692736  
> sectors
> wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA
mode 2 (Ultra/ 
> 33)
> wd0(cmdide0:0:0): using PIO mode 4, DMA mode 2 (using
DMA)
> wd1(cmdide0:0:1): using PIO mode 4, DMA mode 2 (using
DMA)
> atapibus0 at atabus1: 2 targets
> cd0 at atapibus0 drive 1: <MATSHITA CR-587, ,
7S14> cdrom removable
> cd0: drive supports PIO mode 4, DMA mode 2
> cd0(wdc0:0:1): using PIO mode 4, DMA mode 2 (using
DMA)

Since you're running everything in PIO4/DMA2 anyway - could
you  
please check if anything changes when the disks are hooked
up to wdc  
at obio? If that works without errors we know it's the
cmdide, if  
they persist I'd suspect the hardware.

>> On the other hand, the integrated IDE channels do
that as well,
>> Heathrow IDE is a bit weird ( mine spits an error
when probing the
>> drives but all that does is a 3 second delay ) but
it works just fine
>> if you have good cables. At least the Heathrow in
my beige G3 was
>> extremely sensitive - when I first got the machine
I had to run it
>> without IDE DMA because of all the errors, data
corruption etc. I got
>> on both channels. Didn't think of swapping the
cables right away
>> because I didn't expect them both to be bad. Got
new cables - no more
>> problems.
>
> Yes, I'll definitively try swapping cables.  However,
3.1 did seem
> more stable, so I'll wait for the next crash (shouldn't
take too long
>  and see
if the backtrace is similar.

Well, if 3.1 didn't crash at all that would be an
indication.

have fun
Michael
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQEVAwUBRwlmCspnzkX8Yg2nAQItCAf+JLMv2CC5Z+hWBvY5WlhkzLJxEcPA
OQEY
jsfNV59Jf1ILS8WFyC4ZJiUhTDlWHpsxBTQfOrsVeGkXf5YvSfR3BnU9xHLi
nrdS
rceZ8Z+Fcqu6U3dDha2k8KXUERB3BG+cQT09pw1uhouLgLUDVulFDyRTOAv+
fWlP
q2pGCPhpAjLvCKIGzzkrx74Di6PscdYSdKJW/PcV5yn2OoVPTL9ar/lE+7AR
rczH
3aN57ESpfecRoXoiOiHamHX6wGWZ2PlXvyU3dMiYov/FyaRUnRh0+QFN6gGn
VfLp
bSDb7/ujFT6X3qALMENIR8n7g/1t3YTn0H4kZottx6+OrUiYKtXHeA==
=A5Qj
-----END PGP SIGNATURE-----

Re: 4.0rc1 less stable than 3.1?
user name
2007-10-08 09:57:34
On 07/10/2007, Michael Lorenz <macallannetbsd.org> wrote:
> Hmm, that's file system code. Does it always come form
there?

Well, I now have a second stack trace.  Note that in the
meantime, I
did the following:  I updated from cvs, and recompiled
(while under
4.0rc1).  The resulting kernel booted and ran as well as the
previous
one, but when it crashed I didn't get a debugging console -
in fact,
no indication of any sort other than unresponsiveness.  I
rebooted
with the 3.1 kernel, rebuilt tools and kernel, and this one
is back to
"normal".  (The compile under 4.0rc1 crashed once
or twice, I think,
with "internal compiler error" or something -
again I would think this
points to data corruption from the drive)

Anyways, the crash with the 4.0rc2 kernel:
panic: pool_get: ncachepl: page empty
0xd565fb40: at panic+0x14
... at cache_enter+0x2b4
at ufs_lookup+0x768
at VOP_LOOKUP+0x34
at lookup+0x320
at namei+0x138
at compat_30_sys___lstat13+0x54
ay syscall_plain+0x1fc
at user SC trap #280 by 0xeff99940 srr1=0xf032 r1=0xfffdad0
cr=0x20028048 xer=0 ctr=0

> Since you're running everything in PIO4/DMA2 anyway -
could you
> please check if anything changes when the disks are
hooked up to wdc
> at obio? If that works without errors we know it's the
cmdide, if
> they persist I'd suspect the hardware.

I can try that, do I do that by disabling the
wdc*    at mediabay? flags 0
line?

> > Yes, I'll definitively try swapping cables. 
However, 3.1 did seem
> > more stable, so I'll wait for the next crash
(shouldn't take too long
> >  and see
if the backtrace is similar.
>
> Well, if 3.1 didn't crash at all that would be an
indication.

Well, it did occasionally, just not quite as reliably 

Joe.

Re: 4.0rc1 less stable than 3.1?
country flaguser name
United States
2007-10-08 14:14:24
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

On Oct 8, 2007, at 10:57, Joachim Thiemann wrote:

> On 07/10/2007, Michael Lorenz <macallannetbsd.org> wrote:
>> Hmm, that's file system code. Does it always come
form there?
>
> Well, I now have a second stack trace.  Note that in
the meantime, I
> did the following:  I updated from cvs, and recompiled
(while under
> 4.0rc1).  The resulting kernel booted and ran as well
as the previous
> one, but when it crashed I didn't get a debugging
console - in fact,
> no indication of any sort other than unresponsiveness. 
I rebooted
> with the 3.1 kernel, rebuilt tools and kernel, and this
one is back to
> "normal".  (The compile under 4.0rc1 crashed
once or twice, I think,
> with "internal compiler error" or something -
again I would think this
> points to data corruption from the drive)
>
> Anyways, the crash with the 4.0rc2 kernel:
> panic: pool_get: ncachepl: page empty
> 0xd565fb40: at panic+0x14
> ... at cache_enter+0x2b4
> at ufs_lookup+0x768
> at VOP_LOOKUP+0x34
> at lookup+0x320
> at namei+0x138
> at compat_30_sys___lstat13+0x54
> ay syscall_plain+0x1fc
> at user SC trap #280 by 0xeff99940 srr1=0xf032
r1=0xfffdad0
> cr=0x20028048 xer=0 ctr=0

Also filesystem stuff.

>> Since you're running everything in PIO4/DMA2 anyway
- could you
>> please check if anything changes when the disks are
hooked up to wdc
>> at obio? If that works without errors we know it's
the cmdide, if
>> they persist I'd suspect the hardware.
>
> I can try that, do I do that by disabling the
> wdc*    at mediabay? flags 0
> line?

You can remove that line - it's for removable IDE devices
found in  
some early powerbooks. wdc at obio should allow DMA by
default.

>>> Yes, I'll definitively try swapping cables. 
However, 3.1 did seem
>>> more stable, so I'll wait for the next crash
(shouldn't take too  
>>> long
>>>  and see
if the backtrace is similar.
>>
>> Well, if 3.1 didn't crash at all that would be an
indication.
>
> Well, it did occasionally, just not quite as reliably


Since nobody else seems to see this kind of problem I'd
strongly  
suspect the cables.

have fun
Michael

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQEVAwUBRwqBkMpnzkX8Yg2nAQJDFAf+MeVVWd0k9GncG44iwCow/6oj3pBQ
FQM8
3yFbcj5UgxqPC2rmfEN58ItriRg43HpEek3tlNRODGWUtLbUxSVIi70dXDgy
qnvf
Dj0z2M4cuFtv3mm0z2DbZxHksqStKyHVOgKdtKhPFaNYAebga8MvyvWdZi6D
+Fiy
npZCC2bccjywYJoSJ15+PLBcWwB1fW58mwUaTvr9mYXM4sYIjkSQ1iQf5uA9
ZJxz
rTjEjk/qICWCSbA/JS0UsYPFwAsb37toZeF6Wq9NuSuvMR3zJwu3to37BXPX
0oYL
s48hiRSd2yRHyN0vQODmHnX92suNUwxtxg07yUIN6Bt/3lU26wJvsA==
=5aZ/
-----END PGP SIGNATURE-----

[1-6]

about | contact  Other archives ( Real Estate discussion Medical topics )