|
List Info
Thread: Horrendous RAID-6 performance
|
|
| Horrendous RAID-6 performance |
  United States |
2008-03-28 16:34:54 |
My system has a 1.2 TB RAID-6 formatted as a single ext3
partition. All
was well for a while, then I had a minor crash, and had to
repair the
raid. (manual fsck kind of repair).
Now I get absolutely horrible RAID performance:
selene:/var/log# hdparm -tT /dev/md10
/dev/md10:
Timing cached reads: 2472 MB in 2.00 seconds = 1236.69
MB/sec
Timing buffered disk reads: 10 MB in 3.23 seconds =
3.10 MB/sec
WTF? Can anyone provide a clue on how to debug this?
I've been all through the system, nothing shows any sort of
error, nothing
in the logs, mdadmin is happy, smartd is happy, the
partition fsck's
clean.
Could this be because the system is getting full? (I'm
clutching at straws
here.)
selene:/var/log# df /dev/md10
Filesystem 1K-blocks Used Available Use%
Mounted on
/dev/md10 1538311980 1279051236 243632396 84%
/data
Is there some way to verify / rebuild the raid in-place? I
could get most
of the data off and reformat, but obviously I'd rather
not....
--Yan
--
Windows is like a canary in a coal mine, it's the first
thing to die on
your network.
_______________________________________________
mythtv-users mailing list
mythtv-users mythtv.org
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users
a>
|
|
| Re: Horrendous RAID-6 performance |

|
2008-03-28 16:44:03 |
On Fri, Mar 28, 2008 at 5:34 PM, Yan Seiner <yan seiner.com> wrote:
> My system has a 1.2 TB RAID-6 formatted as a single
ext3 partition. All
> was well for a while, then I had a minor crash, and
had to repair the
> raid. (manual fsck kind of repair).
>
> Now I get absolutely horrible RAID performance:
>
> selene:/var/log# hdparm -tT /dev/md10
>
> /dev/md10:
> Timing cached reads: 2472 MB in 2.00 seconds =
1236.69 MB/sec
> Timing buffered disk reads: 10 MB in 3.23 seconds
= 3.10 MB/sec
>
> WTF? Can anyone provide a clue on how to debug this?
>
> I've been all through the system, nothing shows any
sort of error, nothing
> in the logs, mdadmin is happy, smartd is happy, the
partition fsck's
> clean.
>
> Could this be because the system is getting full? (I'm
clutching at straws
> here.)
>
> selene:/var/log# df /dev/md10
> Filesystem 1K-blocks Used Available
Use% Mounted on
> /dev/md10 1538311980 1279051236 243632396
84% /data
>
> Is there some way to verify / rebuild the raid
in-place? I could get most
> of the data off and reformat, but obviously I'd rather
not....
>
can you post the output of cat /proc/mdstat
John
_______________________________________________
mythtv-users mailing list
mythtv-users mythtv.org
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users
a>
|
|
| Re: Horrendous RAID-6 performance |
  United States |
2008-03-28 16:51:10 |
John Drescher
> can you post the output of cat /proc/mdstat
OK, see below. All of the system stuff sits on two SCSI
drives. The
array with bad performance consists of 6-400 GB SATA drives.
They only
have a single 1.2 TB partition.
selene:/var/log# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4]
md10 : active raid6 sdd1[0] sdi1[5] sdh1[4] sdg1[3] sdf1[2]
sde1[1]
1562834944 blocks level 6, 64k chunk, algorithm 2
[6/6] [UUUUUU]
selene:/var/log# uname -a
Linux selene 2.6.24 #1 SMP Fri Jan 25 14:59:10 PST 2008
x86_64 GNU/Linux
lspci:
00:00.0 RAM memory: nVidia Corporation MCP55 Memory
Controller (rev a1)
00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev
a2)
00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a2)
00:01.2 RAM memory: nVidia Corporation MCP55 Memory
Controller (rev a2)
00:02.0 USB Controller: nVidia Corporation MCP55 USB
Controller (rev a1)
00:02.1 USB Controller: nVidia Corporation MCP55 USB
Controller (rev a2)
00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev
a1)
00:05.0 IDE interface: nVidia Corporation MCP55 SATA
Controller (rev a2)
00:05.1 IDE interface: nVidia Corporation MCP55 SATA
Controller (rev a2)
00:05.2 IDE interface: nVidia Corporation MCP55 SATA
Controller (rev a2)
00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev
a2)
00:06.1 Audio device: nVidia Corporation MCP55 High
Definition Audio (rev a2)
00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a2)
00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a2)
00:0d.0 PCI bridge: nVidia Corporation MCP55 PCI Express
bridge (rev a2)
00:0e.0 PCI bridge: nVidia Corporation MCP55 PCI Express
bridge (rev a2)
00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express
bridge (rev a2)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8
[Athlon64/Opteron]
HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8
[Athlon64/Opteron]
Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8
[Athlon64/Opteron]
DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8
[Athlon64/Opteron]
Miscellaneous Control
01:06.0 VGA compatible controller: nVidia Corporation NV34GL
[Quadro NVS
280 PCI] (rev a1)
01:07.0 Multimedia video controller: Brooktree Corporation
Bt878 Video
Capture (rev 11)
01:07.1 Multimedia controller: Brooktree Corporation Bt878
Audio Capture
(rev 11)
01:08.0 Multimedia video controller: Conexant CX23880/1/2/3
PCI Video and
Audio Decoder (rev 05)
01:08.1 Multimedia controller: Conexant CX23880/1/2/3 PCI
Video and Audio
Decoder [Audio Port] (rev 05)
01:08.2 Multimedia controller: Conexant CX23880/1/2/3 PCI
Video and Audio
Decoder [MPEG Port] (rev 05)
01:0b.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A
IEEE-1394a-2000 Controller (PHY/Link)
02:00.0 PCI bridge: PLX Technology, Inc. PEX 8114 PCI
Express-to-PCI/PCI-X
Bridge (rev bc)
03:04.0 SCSI storage controller: Adaptec ASC-29320ALP U320
(rev 10)
05:00.0 VGA compatible controller: nVidia Corporation G70
[GeForce 7600
GT] (rev a1)
--
Windows is like a canary in a coal mine, it's the first
thing to die on
your network.
_______________________________________________
mythtv-users mailing list
mythtv-users mythtv.org
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users
a>
|
|
| Re: Horrendous RAID-6 performance |

|
2008-03-28 17:09:15 |
On Fri, Mar 28, 2008 at 5:51 PM, Yan Seiner <yan seiner.com> wrote:
>
> John Drescher
>
>
> > can you post the output of cat /proc/mdstat
>
> OK, see below. All of the system stuff sits on two
SCSI drives. The
> array with bad performance consists of 6-400 GB SATA
drives. They only
> have a single 1.2 TB partition.
>
> selene:/var/log# cat /proc/mdstat
> Personalities : [raid0] [raid1] [raid6] [raid5]
[raid4]
> md10 : active raid6 sdd1[0] sdi1[5] sdh1[4] sdg1[3]
sdf1[2] sde1[1]
> 1562834944 blocks level 6, 64k chunk, algorithm 2
[6/6] [UUUUUU]
>
My first guess was the system was resyncing the array. That
is
obviously not the case.
Here at work I have a few similar amd systems with 6 320GB
sata2
Seagate 7200.10 drives in linux software raid 6 and I get
around
260MB/s where you get 3.
# hdparm -tT /dev/md1
/dev/md1:
Timing cached reads: 2388 MB in 2.00 seconds = 1193.69
MB/sec
Timing buffered disk reads: 786 MB in 3.00 seconds =
261.88 MB/sec
Can you post the output of cat /proc/interrupts?
And blockdev --report
Are you sure that nothing else is using to the disks at the
moment you
ran hdparm?
John
_______________________________________________
mythtv-users mailing list
mythtv-users mythtv.org
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users
a>
|
|
| Re: Horrendous RAID-6 performance |
  United States |
2008-03-28 17:19:38 |
John Drescher
> On Fri, Mar 28, 2008 at 5:51 PM, Yan Seiner <yan seiner.com> wrote:
>>
>> John Drescher
>>
>>
>> > can you post the output of cat /proc/mdstat
>>
>> OK, see below. All of the system stuff sits on
two SCSI drives. The
>> array with bad performance consists of 6-400 GB
SATA drives. They only
>> have a single 1.2 TB partition.
>>
>> selene:/var/log# cat /proc/mdstat
>> Personalities : [raid0] [raid1] [raid6] [raid5]
[raid4]
>> md10 : active raid6 sdd1[0] sdi1[5] sdh1[4]
sdg1[3] sdf1[2] sde1[1]
>> 1562834944 blocks level 6, 64k chunk,
algorithm 2 [6/6] [UUUUUU]
>>
> My first guess was the system was resyncing the array.
That is
> obviously not the case.
This has been going on for a couple of weeks. I've been
trying to get a
handle on it...
>
> Here at work I have a few similar amd systems with 6
320GB sata2
> Seagate 7200.10 drives in linux software raid 6 and I
get around
> 260MB/s where you get 3.
>
> # hdparm -tT /dev/md1
>
> /dev/md1:
> Timing cached reads: 2388 MB in 2.00 seconds =
1193.69 MB/sec
> Timing buffered disk reads: 786 MB in 3.00 seconds =
261.88 MB/sec
Well, the cached reads are similar but the buffered reads
suck... ISTR
numbers similar to yours before this happened.
>
> Can you post the output of cat /proc/interrupts?
selene:/usr/src# cat /proc/int*
CPU0 CPU1
0: 88 1140 IO-APIC-edge timer
1: 2 0 IO-APIC-edge i8042
4: 90 529 IO-APIC-edge serial
7: 0 0 IO-APIC-edge parport0
8: 0 1 IO-APIC-edge rtc
9: 0 0 IO-APIC-fasteoi acpi
12: 3 1 IO-APIC-edge i8042
16: 292347 12974317 IO-APIC-fasteoi aic79xx,
nvidia, nvidia
17: 3 22 IO-APIC-fasteoi bttv0, Bt87x
audio
18: 394086 19557292 IO-APIC-fasteoi cx88[0],
cx88[0], cx88[0]
19: 54 3071 IO-APIC-fasteoi
firewire_ohci
20: 23 345 IO-APIC-fasteoi HDA Intel
21: 12167868 62393338 IO-APIC-fasteoi sata_nv
22: 13425241 68595156 IO-APIC-fasteoi
ehci_hcd:usb2, sata_nv
23: 14856088 73484861 IO-APIC-fasteoi
ohci_hcd:usb1, sata_nv
1275: 412298 19696213 PCI-MSI-edge eth1
1276: 2776027 193344440 PCI-MSI-edge eth0
NMI: 0 0 Non-maskable interrupts
LOC: 45656488 45974623 Local timer interrupts
RES: 55443525 5019739 Rescheduling interrupts
CAL: 109933 20132 function call interrupts
TLB: 33458 54083 TLB shootdowns
TRM: 0 0 Thermal event interrupts
THR: 0 0 Threshold APIC interrupts
SPU: 0 0 Spurious interrupts
ERR: 0
>
> And blockdev --report
>
rw 256 512 4096 0 781422768 /dev/sdd
rw 256 512 1024 63 781417602 /dev/sdd1
rw 256 512 4096 0 781422768 /dev/sde
rw 256 512 1024 63 781417602 /dev/sde1
rw 256 512 512 0 781420655 /dev/sdf
rw 256 512 1024 63 781417602 /dev/sdf1
rw 256 512 4096 0 781422768 /dev/sdg
rw 256 512 1024 63 781417602 /dev/sdg1
rw 256 512 512 0 781420655 /dev/sdh
rw 256 512 1024 63 781417602 /dev/sdh1
rw 256 512 4096 0 781422768 /dev/sdi
rw 256 512 1024 63 781417602 /dev/sdi1
rw 1024 512 4096 0 3125669888 /dev/md10
> Are you sure that nothing else is using to the disks at
the moment you
> ran hdparm?
Well, now that you mention it.... Nothing is using the
system, but it's
still running at 67% wait states? WTF? What's it waiting
for? How do I
find the process that's waiting?
top - 15:18:08 up 2 days, 7:34, 2 users, load average:
3.12, 3.66, 4.23
Tasks: 219 total, 2 running, 217 sleeping, 0 stopped,
0 zombie
Cpu(s): 3.1%us, 1.0%sy, 0.0%ni, 28.7%id, 66.8%wa,
0.0%hi, 0.5%si,
0.0%st
Mem: 2062908k total, 2046844k used, 16064k free,
18712k buffers
Swap: 3903712k total, 574648k used, 3329064k free,
720916k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
7137 motion 20 0 259m 12m 792 S 6 0.6 369:57.08
motion
7376 root 20 0 1126m 100m 11m S 1 5.0 24:15.67
mythbackend
7109 root 20 0 129m 50m 7148 S 1 2.5 4:21.79
Xorg
4960 root 15 -5 0 0 0 S 0 0.0 4:34.77
bond0
7399 root 15 -5 0 0 0 S 0 0.0 1:06.07
kdvb-fe-0
1 root 20 0 10392 664 632 S 0 0.0 0:04.14
init
2 root 15 -5 0 0 0 S 0 0.0 0:00.02
kthreadd
--
Windows is like a canary in a coal mine, it's the first
thing to die on
your network.
_______________________________________________
mythtv-users mailing list
mythtv-users mythtv.org
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users
a>
|
|
| Re: Horrendous RAID-6 performance |

|
2008-03-28 17:24:39 |
> Well, now that you mention it.... Nothing is using
the system, but it's
> still running at 67% wait states? WTF? What's it
waiting for? How do I
> find the process that's waiting?
>
> top - 15:18:08 up 2 days, 7:34, 2 users, load
average: 3.12, 3.66, 4.23
> Tasks: 219 total, 2 running, 217 sleeping, 0
stopped, 0 zombie
> Cpu(s): 3.1%us, 1.0%sy, 0.0%ni, 28.7%id, 66.8%wa,
0.0%hi, 0.5%si,
> 0.0%st
> Mem: 2062908k total, 2046844k used, 16064k free,
18712k buffers
> Swap: 3903712k total, 574648k used, 3329064k free,
720916k cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM
TIME+ COMMAND
> 7137 motion 20 0 259m 12m 792 S 6 0.6
369:57.08 motion
> 7376 root 20 0 1126m 100m 11m S 1 5.0
24:15.67 mythbackend
> 7109 root 20 0 129m 50m 7148 S 1 2.5
4:21.79 Xorg
> 4960 root 15 -5 0 0 0 S 0 0.0
4:34.77 bond0
> 7399 root 15 -5 0 0 0 S 0 0.0
1:06.07 kdvb-fe-0
> 1 root 20 0 10392 664 632 S 0 0.0
0:04.14 init
> 2 root 15 -5 0 0 0 S 0 0.0
0:00.02 kthreadd
>
>
What is the motion process? And do you see your system is
using 574MB of swap?
John
_______________________________________________
mythtv-users mailing list
mythtv-users mythtv.org
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users
a>
|
|
| Re: Horrendous RAID-6 performance |
  United States |
2008-03-28 17:37:09 |
John Drescher
>> Well, now that you mention it.... Nothing is
using the system, but
>> it's
>> still running at 67% wait states? WTF? What's it
waiting for? How do
>> I
>> find the process that's waiting?
>>
>> top - 15:18:08 up 2 days, 7:34, 2 users, load
average: 3.12, 3.66,
>> 4.23
>> Tasks: 219 total, 2 running, 217 sleeping, 0
stopped, 0 zombie
>> Cpu(s): 3.1%us, 1.0%sy, 0.0%ni, 28.7%id,
66.8%wa, 0.0%hi, 0.5%si,
>> 0.0%st
>> Mem: 2062908k total, 2046844k used, 16064k
free, 18712k
>> buffers
>> Swap: 3903712k total, 574648k used, 3329064k
free, 720916k cached
>>
>> PID USER PR NI VIRT RES SHR S %CPU %MEM
TIME+ COMMAND
>> 7137 motion 20 0 259m 12m 792 S 6 0.6
369:57.08 motion
>> 7376 root 20 0 1126m 100m 11m S 1 5.0
24:15.67
>> mythbackend
>> 7109 root 20 0 129m 50m 7148 S 1 2.5
4:21.79 Xorg
>> 4960 root 15 -5 0 0 0 S 0 0.0
4:34.77 bond0
>> 7399 root 15 -5 0 0 0 S 0 0.0
1:06.07 kdvb-fe-0
>> 1 root 20 0 10392 664 632 S 0 0.0
0:04.14 init
>> 2 root 15 -5 0 0 0 S 0 0.0
0:00.02 kthreadd
>>
>>
> What is the motion process? And do you see your system
is using 574MB of
> swap?
motion is a motion detector program. It monitors cameras
and makes movies
when it detects motion.
I get no change to the waitstates even with motion and myth
dead....
I'm not sure why the system is using that much swap.
Could it be a tickless kernel problem? I just went through
the kernel
config and that's the only thing I can think of that looks
vaguely
unusual.
--Yan
--
Windows is like a canary in a coal mine, it's the first
thing to die on
your network.
_______________________________________________
mythtv-users mailing list
mythtv-users mythtv.org
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users
a>
|
|
| Re: Horrendous RAID-6 performance |

|
2008-03-28 17:47:47 |
I could try to quote all of this in-line, and probably
fail...
Have you tried changing the tuning params listed here?
http://www.mythtv.org/wiki/i
ndex.php/LVM_on_RAID#Performance_Enhancements_.2F_Tuning
D.
On Fri, Mar 28, 2008 at 5:37 PM, Yan Seiner <yan seiner.com> wrote:
>
> John Drescher
>
> >> Well, now that you mention it.... Nothing is
using the system, but
> >> it's
> >> still running at 67% wait states? WTF?
What's it waiting for? How do
> >> I
> >> find the process that's waiting?
> >>
> >> top - 15:18:08 up 2 days, 7:34, 2 users,
load average: 3.12, 3.66,
> >> 4.23
> >> Tasks: 219 total, 2 running, 217 sleeping,
0 stopped, 0 zombie
> >> Cpu(s): 3.1%us, 1.0%sy, 0.0%ni, 28.7%id,
66.8%wa, 0.0%hi, 0.5%si,
> >> 0.0%st
> >> Mem: 2062908k total, 2046844k used,
16064k free, 18712k
> >> buffers
> >> Swap: 3903712k total, 574648k used,
3329064k free, 720916k cached
> >>
> >> PID USER PR NI VIRT RES SHR S %CPU
%MEM TIME+ COMMAND
> >> 7137 motion 20 0 259m 12m 792 S
6 0.6 369:57.08 motion
> >> 7376 root 20 0 1126m 100m 11m S
1 5.0 24:15.67
> >> mythbackend
> >> 7109 root 20 0 129m 50m 7148 S
1 2.5 4:21.79 Xorg
> >> 4960 root 15 -5 0 0 0 S
0 0.0 4:34.77 bond0
> >> 7399 root 15 -5 0 0 0 S
0 0.0 1:06.07 kdvb-fe-0
> >> 1 root 20 0 10392 664 632 S 0
0.0 0:04.14 init
> >> 2 root 15 -5 0 0 0 S 0
0.0 0:00.02 kthreadd
> >>
> >>
> > What is the motion process? And do you see your
system is using 574MB of
> > swap?
>
> motion is a motion detector program. It monitors
cameras and makes movies
> when it detects motion.
>
> I get no change to the waitstates even with motion and
myth dead....
>
> I'm not sure why the system is using that much swap.
>
> Could it be a tickless kernel problem? I just went
through the kernel
> config and that's the only thing I can think of that
looks vaguely
> unusual.
>
> --Yan
>
>
> --
> Windows is like a canary in a coal mine, it's the
first thing to die on
> your network.
>
> _______________________________________________
>
>
> mythtv-users mailing list
> mythtv-users mythtv.org
> http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users
a>
>
_______________________________________________
mythtv-users mailing list
mythtv-users mythtv.org
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users
a>
|
|
| Re: Horrendous RAID-6 performance |

|
2008-03-28 17:44:21 |
> Could it be a tickless kernel problem? I just went
through the kernel
> config and that's the only thing I can think of that
looks vaguely
> unusual.
>
That is definitely possible and it could be causing an
interrupt
problem causing poor disk performance. I would try building
a kernel
with tickless off or use an older known working kernel.
John
_______________________________________________
mythtv-users mailing list
mythtv-users mythtv.org
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users
a>
|
|
| Re: Horrendous RAID-6 performance |

|
2008-03-28 20:29:34 |
On Fri, Mar 28, 2008 at 5:51 PM, Yan Seiner <yan seiner.com> wrote:
>
> John Drescher
>
>
> > can you post the output of cat /proc/mdstat
>
> selene:/var/log# cat /proc/mdstat
> Personalities : [raid0] [raid1] [raid6] [raid5]
[raid4]
> md10 : active raid6 sdd1[0] sdi1[5] sdh1[4] sdg1[3]
sdf1[2] sde1[1]
> 1562834944 blocks level 6, 64k chunk, algorithm 2
[6/6] [UUUUUU]
>
can you try hdparm -tT /dev/sdd1 ( for all your drives)
(ps I'm not sure if that's totally safe, but I did it on the
members
of my raid1 first to see if it let me. It's just reading
right
)
I wonder if you have a failing drive or controller or
channel and it's
not being detected.
Good luck
Billy
_______________________________________________
mythtv-users mailing list
mythtv-users mythtv.org
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users
a>
|
|
|
|