Dear List,
I have been having data corruption problems for the last two
months on
7 servers.
After extensive testing, I have finally narrowed the problem
down to
Debian Etch 2.6.18-5 kernel
with the 3ware PCI controller. The same machine using the
onboard SATA
controller does not
corrupt data.
The machines would also hang occasionally - no errors
displayed on
screen.
I upgraded to a 2.6.23-13 kernel.org kernel 24 hours ago,
and have not
been able to reproduce
these problems since then - Previously it would take about
10 minutes
for the problem to appear.
I could reproduce these problems by using a java program to
insert
logs (30,000,000 records)
into a local postgres 8.2.5 database -
After this I would see
"DETAIL: Could not open file
"pg_clog/0495": No such file or
directory."
type messages in my postgres logs.
I had also managed to corrupt my SVN repository - md5s of
the files no
longer matched
what was in the SVN database... (svnadmin verfify
/path/to/repository)
Has anyone seen these problems?
Below - details as to my raid controller.
Regards
Andrew
---
03:05.0 RAID bus controller: 3ware Inc 7xxx/8xxx-series
PATA/SATA-RAID
(rev 01)
Subsystem: 3ware Inc 7xxx/8xxx-series PATA/SATA-RAID
Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV+ VGASnoop-
ParErr-
Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (2250ns min), Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 22
Region 0: I/O ports at e800 [size=16]
Region 1: Memory at febffc00 (32-bit, non-prefetchable)
[size=16]
Region 2: Memory at fe000000 (32-bit, non-prefetchable)
[size=8M]
Expansion ROM at f0100000 [disabled] [size=64K]
Capabilities: [40] Power Management version 1
Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
|