List Info

Thread: wd interface CRC errors




wd interface CRC errors
user name
2006-10-20 16:27:51
I have a NetBSD 2.0.2, i386, with an uptime of 238 days. The
install
is about a year and a half old, running 24/7 since May 2005.

It has three WDs, and wd1/wd2 are a mirror. Nothing has
changed
physically in the system, so the usual 'cable problem'
suggestion 
doesn't seem to apply.

These started in September, and have become common:

Sep 19 08:58:28 mail /netbsd: wd0a: error writing fsbn
1114304 of 1114304-1114319 (wd0 bn 1114367; cn 1105 tn 8 sn
23), retrying
Sep 19 08:58:29 mail /netbsd: wd0: (aborted command,
interface CRC error)
Sep 19 08:58:29 mail /netbsd: wd0: soft error (corrected)

They show up on wd0 and wd1, which share a controller, and a
cable. 
All errors are corrected so far. (18 on wd0, 19 on wd1) 

All of the errors occur while writing, and there's no
locality 
amongst the sectors invovled in the errored writes.

The smart status shows a high count on wd0 for raw read
error rate and
hardware ECC recovered errors, so I'm inclined to replace
that drive.

There's also no obvious pattern to the frequency.

Comments welcome, in case there's something else I've
forgetten to
consider.. Thanks.

Sep 19 08:58:29 mail /netbsd: wd0: (aborted command,
interface CRC error)
Sep 20 20:07:08 mail /netbsd: wd1: (aborted command,
interface CRC error)
Sep 22 10:03:57 mail /netbsd: wd0: (aborted command,
interface CRC error)
Sep 25 15:57:21 mail /netbsd: wd1: (aborted command,
interface CRC error)
Sep 27 13:03:27 mail /netbsd: wd1: (aborted command,
interface CRC error)
Sep 30 03:16:07 mail /netbsd: wd0: (aborted command,
interface CRC error)
Oct  3 04:02:46 mail /netbsd: wd0: (aborted command,
interface CRC error)
Oct  3 11:18:42 mail /netbsd: wd1: (aborted command,
interface CRC error)
Oct  5 02:25:58 mail /netbsd: wd0: (aborted command,
interface CRC error)
Oct  5 12:49:38 mail /netbsd: wd0: (aborted command,
interface CRC error)
Oct  6 07:19:08 mail /netbsd: wd1: (aborted command,
interface CRC error)
Oct  6 07:52:20 mail /netbsd: wd0: (aborted command,
interface CRC error)
Oct  7 03:58:23 mail /netbsd: wd1: (aborted command,
interface CRC error)
Oct  7 04:30:05 mail /netbsd: wd1: (aborted command,
interface CRC error)
Oct  7 08:14:08 mail /netbsd: wd1: (aborted command,
interface CRC error)
Oct  7 19:30:41 mail /netbsd: wd1: (aborted command,
interface CRC error)
Oct  7 23:45:10 mail /netbsd: wd1: (aborted command,
interface CRC error)
Oct 10 02:20:02 mail /netbsd: wd0: (aborted command,
interface CRC error)
Oct 12 00:25:14 mail /netbsd: wd0: (aborted command,
interface CRC error)
Oct 12 00:35:35 mail /netbsd: wd0: (aborted command,
interface CRC error)
Oct 12 22:45:23 mail /netbsd: wd1: (aborted command,
interface CRC error)
Oct 13 00:53:35 mail /netbsd: wd0: (aborted command,
interface CRC error)
Oct 13 02:45:54 mail /netbsd: wd1: (aborted command,
interface CRC error)
Oct 13 10:10:00 mail /netbsd: wd0: (aborted command,
interface CRC error)
Oct 13 21:03:06 mail /netbsd: wd0: (aborted command,
interface CRC error)
Oct 13 23:17:26 mail /netbsd: wd1: (aborted command,
interface CRC error)
Oct 14 01:37:59 mail /netbsd: wd1: (aborted command,
interface CRC error)
Oct 14 04:33:07 mail /netbsd: wd1: (aborted command,
interface CRC error)
Oct 14 07:55:02 mail /netbsd: wd0: (aborted command,
interface CRC error)
Oct 14 12:12:11 mail /netbsd: wd1: (aborted command,
interface CRC error)
Oct 15 12:32:50 mail /netbsd: wd1: (aborted command,
interface CRC error)
Oct 16 07:37:27 mail /netbsd: wd0: (aborted command,
interface CRC error)
Oct 17 11:59:01 mail /netbsd: wd0: (aborted command,
interface CRC error)
Oct 17 17:51:38 mail /netbsd: wd1: (aborted command,
interface CRC error)
Oct 17 21:03:26 mail /netbsd: wd0: (aborted command,
interface CRC error)
Oct 19 20:29:45 mail /netbsd: wd1: (aborted command,
interface CRC error)
Oct 20 07:00:37 mail /netbsd: wd0: (aborted command,
interface CRC error)

-- 
David Maxwell, davidvex.net|davidmaxwell.net -->
An organization gets what it rewards.
			      - Perry Metzger

wd interface CRC errors
user name
2006-10-20 16:55:14
David Maxwell writes:
> 
> I have a NetBSD 2.0.2, i386, with an uptime of 238
days. The install
> is about a year and a half old, running 24/7 since May
2005.
> 
> It has three WDs, and wd1/wd2 are a mirror. Nothing has
changed
> physically in the system, so the usual 'cable problem'
suggestion 
> doesn't seem to apply.

If you havn't already, I'd check that the cables are seated
all the way... 
(I've had them work their way out a little over time, and
cause these 
sorts of issues...)

> These started in September, and have become common:
> 
> Sep 19 08:58:28 mail /netbsd: wd0a: error writing fsbn
1114304 of 1114304-111
> 4319 (wd0 bn 1114367; cn 1105 tn 8 sn 23), retrying
> Sep 19 08:58:29 mail /netbsd: wd0: (aborted command,
interface CRC error)
> Sep 19 08:58:29 mail /netbsd: wd0: soft error
(corrected)
> 
> They show up on wd0 and wd1, which share a controller,
and a cable. 
> All errors are corrected so far. (18 on wd0, 19 on wd1)

> 
> All of the errors occur while writing, and there's no
locality 
> amongst the sectors invovled in the errored writes.

Any "time-of-day" correlations (e.g. when
/etc/daily is running?) 
which might speak to a heavy disk load (and hence power
draw), and 
possibly to a power supply that is starting to fail? 
 
> The smart status shows a high count on wd0 for raw read
error rate and
> hardware ECC recovered errors, so I'm inclined to
replace that drive.

Be careful with these numbers from the SMART info... I've
got a few Seagate
drives where the raw read error rate and hardware ECC
recovered error rate 
move in lock-step, and at a rate of 6/second (when the drive
is idle.  Much 
higher when the drive is active).  I don't have the URLs
handy, but 
this is apparently a 'known issue' with some Seagate
drives... 

You might run some of the tools from sysutils/smartmontools
to see if 
they give any more info (and/or run the SMART diagnostic
bits...).

Later...

Greg Oster


wd interface CRC errors
user name
2006-10-21 15:57:14
On Fri, Oct 20, 2006 at 12:27:51PM -0400, David Maxwell
wrote:
> 
> I have a NetBSD 2.0.2, i386, with an uptime of 238
days. The install
> is about a year and a half old, running 24/7 since May
2005.
> 
> It has three WDs, and wd1/wd2 are a mirror. Nothing has
changed
> physically in the system, so the usual 'cable problem'
suggestion 
> doesn't seem to apply.

Well, it can come from other sources too. Greg suggested
power supply,
and I got this once. The PSU performances can be degraded
after some time,
I assume because of aging chemical capacity. It's also
possible that
drives needs more power when getting older (more friction in
the mechanics).
CRC errors can also be caused by some changes in the
electromagnetic
environnement of the box. Connectors can also cause issues
after some times
(I got this once for SCSI devices: a box which has run fine
for years has
started to show SCSI commutication issues after a failure of
a few hours
of air cooling in the room. I couldn't get it stable again
and had to
remplace the SCSI cable).
So I would still try to first remplace the cables, if it
doesn't help
try a stronger power supply.

> 
> These started in September, and have become common:
> 
> Sep 19 08:58:28 mail /netbsd: wd0a: error writing fsbn
1114304 of 1114304-1114319 (wd0 bn 1114367; cn 1105 tn 8 sn
23), retrying
> Sep 19 08:58:29 mail /netbsd: wd0: (aborted command,
interface CRC error)
> Sep 19 08:58:29 mail /netbsd: wd0: soft error
(corrected)
> 
> They show up on wd0 and wd1, which share a controller,
and a cable. 
> All errors are corrected so far. (18 on wd0, 19 on wd1)

> 
> All of the errors occur while writing, and there's no
locality 
> amongst the sectors invovled in the errored writes.
> 
> The smart status shows a high count on wd0 for raw read
error rate and
> hardware ECC recovered errors, so I'm inclined to
replace that drive.

This could also be a sign of power supply problem.

-- 
Manuel Bouyer <bouyerantioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la
difference
--
[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )