|
List Info
Thread: slowness
|
|
| slowness |

|
2006-06-13 20:12:39 |
I've got a similar problem which may or may not be from a
similar cause.
Every once in a while, I get into a state where some
messages go through
and others are delayed for more than 10 minutes. Since this
only happens
intermittently, it's been hard to track down.
If I take David's suggestion to run 'vmstat' and
'iostat', what do I
look for?
# vmstat
procs -----------memory---------- ---swap-- -----io----
--system--
----cpu----
r b swpd free buff cache si so bi bo in
cs us sy
id wa
0 1 160 88640 333860 634732 0 0 2 19 2
0 1 0
98 1
I presume it would be the "bi" and
"bo" values that I'm interested in,
but what's a high value that would indicate an I/O
bottleneck?
# iostat
Linux 2.6.9-34.ELsmp (nscan1.ucar.edu) 06/13/2006
avg-cpu: %user %nice %sys %iowait %idle
0.55 0.00 0.21 1.49 97.75
Device: tps Blk_read/s Blk_wrtn/s Blk_read
Blk_wrtn
sda 4.55 17.91 153.60 72855596
624837774
sda1 0.00 0.01 0.00 46786
918
sda2 0.90 4.90 5.74 19933003
23353264
sda3 1.52 0.68 11.97 2752722
48690672
sda4 0.00 0.00 0.00 2
0
sda5 0.02 0.13 0.07 514123
284912
sda6 2.35 0.00 18.77 15364
76357176
sda7 0.00 0.00 0.00 1329
320
sda8 14.76 12.19 117.05 49590987
476149472
I ask because my systems also have a hardware RAID
controller, so there
is at least a slight chance that I have the same problem.
--Greg
_______________________________________________
Maia-users mailing list
Maia-users renaissoft.com
http://www.renaissoft.com/mailman/listinfo/maia-users
|
|
| slowness |

|
2006-06-13 20:47:48 |
Hi Greg
iostat -x 1
this showed that "%util" was at 100% when
writing to the drives, also
"wa" was 98% all the time.
We are running 3ware 8006-2LP controllers with Maxtor
6V160E0 drives
Regards
Wayne
On Tue, 2006-06-13 at 14:12 -0600, Greg Woods wrote:
> I've got a similar problem which may or may not be
from a similar cause.
> Every once in a while, I get into a state where some
messages go through
> and others are delayed for more than 10 minutes. Since
this only happens
> intermittently, it's been hard to track down.
>
> If I take David's suggestion to run 'vmstat' and
'iostat', what do I
> look for?
>
> # vmstat
> procs -----------memory---------- ---swap-- -----io----
--system--
> ----cpu----
> r b swpd free buff cache si so bi bo
in cs us sy
> id wa
> 0 1 160 88640 333860 634732 0 0 2 19
2 0 1 0
> 98 1
>
> I presume it would be the "bi" and
"bo" values that I'm interested in,
> but what's a high value that would indicate an I/O
bottleneck?
>
>
> # iostat
> Linux 2.6.9-34.ELsmp (nscan1.ucar.edu) 06/13/2006
>
> avg-cpu: %user %nice %sys %iowait %idle
> 0.55 0.00 0.21 1.49 97.75
>
> Device: tps Blk_read/s Blk_wrtn/s
Blk_read Blk_wrtn
> sda 4.55 17.91 153.60
72855596 624837774
> sda1 0.00 0.01 0.00
46786 918
> sda2 0.90 4.90 5.74
19933003 23353264
> sda3 1.52 0.68 11.97
2752722 48690672
> sda4 0.00 0.00 0.00
2 0
> sda5 0.02 0.13 0.07
514123 284912
> sda6 2.35 0.00 18.77
15364 76357176
> sda7 0.00 0.00 0.00
1329 320
> sda8 14.76 12.19 117.05
49590987 476149472
>
> I ask because my systems also have a hardware RAID
controller, so there
> is at least a slight chance that I have the same
problem.
>
> --Greg
>
>
_______________________________________________
Maia-users mailing list
Maia-users renaissoft.com
http://www.renaissoft.com/mailman/listinfo/maia-users
|
|
| slowness |

|
2006-06-13 21:26:25 |
On Tue, 2006-06-13 at 21:47 +0100, Wayne Lee wrote:
> Hi Greg
>
> iostat -x 1
>
> this showed that "%util" was at 100% when
writing to the drives, also
> "wa" was 98% all the time.
Thank you, that should help me determine if the RAID
controller is the
culprit.
What I see from my logs, and from being able to catch it in
the act
once, is that the messages are in the postqueue and still
locked (they
show up with a "*"), but amavis doesn't
actually start until quite a
while later. Once it gets to amavisd, it goes through
quickly. I have
tried monitoring connections (running netstat every 5
minutes and saving
the results so I can correlate with maillog entries), and
all the amavis
servers that I have configured are not even in use (I see
only 5
connections to the amavis port, but I have 10 servers
configured), so I
have not yet determined why this delay occurs.
Here's an edited example of log entries to illustrate what
I mean. The
entire 12 minute delay occurs after the message is received
by Postfix
and before amavisd gets hold of it. After amavisd, the
connection to the
downstream MTA and subsequent delivery to the mailbox host
is immediate.
Jun 13 14:33:15 nscan4 postfix/cleanup[15819]: D98123F1C033:
message-id=<mailman.0.1150230794.11792.test-mm3 nlr.net>
Jun 13 14:33:16 nscan4 postfix/qmgr[4752]: D98123F1C033:
from=<test-mm3-bounces nlr.net>, size=1981,
nrcpt=1 (queue active)
Jun 13 14:45:52 nscan4 amavis[16636]: (16636-10)
ESMTP::10024 /var/amavis/tmp/amavis-20060613T144527-16636:
<test-mm3-bounces nlr.net> -> <woods mailman.ucar.edu> Received:
SIZE=1981 from nscan4.ucar.edu ([127.0.0.1]) by localhost
(nscan2.u
car.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id
16636-10
for <woods mailman.ucar.edu>; Tue, 13 Jun 2006
14:45:52 -0600 (MDT)
Jun 13 14:45:52 nscan4 amavis[16636]: (16636-10) Checking:
[128.117.64.12] <test-mm3-bounces nlr.net> ->
<woods mailman.ucar.edu>
Jun 13 14:46:03 nscan4 amavis[16636]: (16636-10)
mail_via_smtp: 250
2.6.0 Ok, id=16636-10, from MTA: 250 Ok: queued as
786213F1C063
Jun 13 14:46:03 nscan4 amavis[16636]: (16636-10) Passed
CLEAN,
[128.117.64.12] [
128.117.64.12] <test-mm3-bounces nlr.net> ->
<woods mailman.ucar.edu>,
Message-ID: <mailman.0.1150230794.11792.test-mm3 nlr.net>, Hits: -2.642,
11492 ms
Jun 13 14:46:04 nscan4 postfix/smtp[15435]: D98123F1C033:
to=<woods mailman.ucar.edu>, orig_to=<woods ucar.edu>,
relay=127.0.0.1[127.0.0.1], delay=769, status=sent (250
2.6.0 Ok,
id=16636-10, from MTA: 250 Ok: queued as 786213F1C063)
Jun 13 14:46:04 nscan4 postfix/qmgr[4752]: D98123F1C033:
removed
--Greg
_______________________________________________
Maia-users mailing list
Maia-users renaissoft.com
http://www.renaissoft.com/mailman/listinfo/maia-users
|
|
| slowness |

|
2006-06-13 21:44:16 |
what happens when you force the queue or requeue them, what
postfix
errors if any are in the logs?
do you have smtp-amavis unix set to 10 also in the
master.cf?
Wayne
On Tue, 2006-06-13 at 15:26 -0600, Greg Woods wrote:
> On Tue, 2006-06-13 at 21:47 +0100, Wayne Lee wrote:
> > Hi Greg
> >
> > iostat -x 1
> >
> > this showed that "%util" was at 100%
when writing to the drives, also
> > "wa" was 98% all the time.
>
> Thank you, that should help me determine if the RAID
controller is the
> culprit.
>
> What I see from my logs, and from being able to catch
it in the act
> once, is that the messages are in the postqueue and
still locked (they
> show up with a "*"), but amavis doesn't
actually start until quite a
> while later. Once it gets to amavisd, it goes through
quickly. I have
> tried monitoring connections (running netstat every 5
minutes and saving
> the results so I can correlate with maillog entries),
and all the amavis
> servers that I have configured are not even in use (I
see only 5
> connections to the amavis port, but I have 10 servers
configured), so I
> have not yet determined why this delay occurs.
>
> Here's an edited example of log entries to illustrate
what I mean. The
> entire 12 minute delay occurs after the message is
received by Postfix
> and before amavisd gets hold of it. After amavisd, the
connection to the
> downstream MTA and subsequent delivery to the mailbox
host is immediate.
>
>
> Jun 13 14:33:15 nscan4 postfix/cleanup[15819]:
D98123F1C033:
> message-id=<mailman.0.1150230794.11792.test-mm3 nlr.net>
> Jun 13 14:33:16 nscan4 postfix/qmgr[4752]:
D98123F1C033:
> from=<test-mm3-bounces nlr.net>, size=1981,
nrcpt=1 (queue active)
>
> Jun 13 14:45:52 nscan4 amavis[16636]: (16636-10)
> ESMTP::10024
/var/amavis/tmp/amavis-20060613T144527-16636:
> <test-mm3-bounces nlr.net> ->
<woods mailman.ucar.edu> Received:
> SIZE=1981 from nscan4.ucar.edu ([127.0.0.1]) by
localhost (nscan2.u
> car.edu [127.0.0.1]) (amavisd-new, port 10024) with
ESMTP id 16636-10
> for <woods mailman.ucar.edu>; Tue, 13 Jun 2006
14:45:52 -0600 (MDT)
> Jun 13 14:45:52 nscan4 amavis[16636]: (16636-10)
Checking:
> [128.117.64.12] <test-mm3-bounces nlr.net> -> <woods mailman.ucar.edu>
> Jun 13 14:46:03 nscan4 amavis[16636]: (16636-10)
mail_via_smtp: 250
> 2.6.0 Ok, id=16636-10, from MTA: 250 Ok: queued as
786213F1C063
> Jun 13 14:46:03 nscan4 amavis[16636]: (16636-10) Passed
CLEAN,
> [128.117.64.12] [
> 128.117.64.12] <test-mm3-bounces nlr.net> -> <woods mailman.ucar.edu>,
> Message-ID: <mailman.0.1150230794.11792.test-mm3 nlr.net>, Hits: -2.642,
> 11492 ms
>
> Jun 13 14:46:04 nscan4 postfix/smtp[15435]:
D98123F1C033:
> to=<woods mailman.ucar.edu>,
orig_to=<woods ucar.edu>,
> relay=127.0.0.1[127.0.0.1], delay=769, status=sent (250
2.6.0 Ok,
> id=16636-10, from MTA: 250 Ok: queued as 786213F1C063)
> Jun 13 14:46:04 nscan4 postfix/qmgr[4752]:
D98123F1C033: removed
>
>
> --Greg
>
> _______________________________________________
> Maia-users mailing list
> Maia-users renaissoft.com
> http://www.renaissoft.com/mailman/listinfo/maia-users
_______________________________________________
Maia-users mailing list
Maia-users renaissoft.com
http://www.renaissoft.com/mailman/listinfo/maia-users
|
|
| slowness |

|
2006-06-13 21:45:21 |
Greg
Also can you telnet to 127.0.0.1 10024 while this is
happening, if so is
there any delay,
Wayne
On Tue, 2006-06-13 at 15:26 -0600, Greg Woods wrote:
> On Tue, 2006-06-13 at 21:47 +0100, Wayne Lee wrote:
> > Hi Greg
> >
> > iostat -x 1
> >
> > this showed that "%util" was at 100%
when writing to the drives, also
> > "wa" was 98% all the time.
>
> Thank you, that should help me determine if the RAID
controller is the
> culprit.
>
> What I see from my logs, and from being able to catch
it in the act
> once, is that the messages are in the postqueue and
still locked (they
> show up with a "*"), but amavis doesn't
actually start until quite a
> while later. Once it gets to amavisd, it goes through
quickly. I have
> tried monitoring connections (running netstat every 5
minutes and saving
> the results so I can correlate with maillog entries),
and all the amavis
> servers that I have configured are not even in use (I
see only 5
> connections to the amavis port, but I have 10 servers
configured), so I
> have not yet determined why this delay occurs.
>
> Here's an edited example of log entries to illustrate
what I mean. The
> entire 12 minute delay occurs after the message is
received by Postfix
> and before amavisd gets hold of it. After amavisd, the
connection to the
> downstream MTA and subsequent delivery to the mailbox
host is immediate.
>
>
> Jun 13 14:33:15 nscan4 postfix/cleanup[15819]:
D98123F1C033:
> message-id=<mailman.0.1150230794.11792.test-mm3 nlr.net>
> Jun 13 14:33:16 nscan4 postfix/qmgr[4752]:
D98123F1C033:
> from=<test-mm3-bounces nlr.net>, size=1981,
nrcpt=1 (queue active)
>
> Jun 13 14:45:52 nscan4 amavis[16636]: (16636-10)
> ESMTP::10024
/var/amavis/tmp/amavis-20060613T144527-16636:
> <test-mm3-bounces nlr.net> ->
<woods mailman.ucar.edu> Received:
> SIZE=1981 from nscan4.ucar.edu ([127.0.0.1]) by
localhost (nscan2.u
> car.edu [127.0.0.1]) (amavisd-new, port 10024) with
ESMTP id 16636-10
> for <woods mailman.ucar.edu>; Tue, 13 Jun 2006
14:45:52 -0600 (MDT)
> Jun 13 14:45:52 nscan4 amavis[16636]: (16636-10)
Checking:
> [128.117.64.12] <test-mm3-bounces nlr.net> -> <woods mailman.ucar.edu>
> Jun 13 14:46:03 nscan4 amavis[16636]: (16636-10)
mail_via_smtp: 250
> 2.6.0 Ok, id=16636-10, from MTA: 250 Ok: queued as
786213F1C063
> Jun 13 14:46:03 nscan4 amavis[16636]: (16636-10) Passed
CLEAN,
> [128.117.64.12] [
> 128.117.64.12] <test-mm3-bounces nlr.net> -> <woods mailman.ucar.edu>,
> Message-ID: <mailman.0.1150230794.11792.test-mm3 nlr.net>, Hits: -2.642,
> 11492 ms
>
> Jun 13 14:46:04 nscan4 postfix/smtp[15435]:
D98123F1C033:
> to=<woods mailman.ucar.edu>,
orig_to=<woods ucar.edu>,
> relay=127.0.0.1[127.0.0.1], delay=769, status=sent (250
2.6.0 Ok,
> id=16636-10, from MTA: 250 Ok: queued as 786213F1C063)
> Jun 13 14:46:04 nscan4 postfix/qmgr[4752]:
D98123F1C033: removed
>
>
> --Greg
>
> _______________________________________________
> Maia-users mailing list
> Maia-users renaissoft.com
> http://www.renaissoft.com/mailman/listinfo/maia-users
_______________________________________________
Maia-users mailing list
Maia-users renaissoft.com
http://www.renaissoft.com/mailman/listinfo/maia-users
|
|
| slowness |

|
2006-06-14 15:14:25 |
On Tue, 2006-06-13 at 22:44 +0100, Wayne Lee wrote:
> what happens when you force the queue or requeue them,
what postfix
> errors if any are in the logs?
Since the messages being delayed are already active, forcing
the queue
doesn't do anything.
>
> do you have smtp-amavis unix set to 10 also in the
master.cf?
Always good to check the obvious and the stupid first, but
yes,
$max_servers in amavisd.conf and the value in master.cf are
both set to
10.
--Greg
_______________________________________________
Maia-users mailing list
Maia-users renaissoft.com
http://www.renaissoft.com/mailman/listinfo/maia-users
|
|
| slowness |

|
2006-06-14 19:30:37 |
On Tue, 2006-06-13 at 21:47 +0100, Wayne Lee wrote:
> iostat -x 1
>
> this showed that "%util" was at 100% when
writing to the drives, also
> "wa" was 98% all the time.
Amazingly enough, it looks like I am very close to the same
place you
were.
First, I increased the number of amavisd servers (in both
master.cf and
amavisd.conf) from 10 to 50. I then restarted amavisd and
reloaded
Postfix. This caused an immediate spike in load average up
to 25 or so,
and succeeded in draining the active queue. While in this
mode, I ran
the iostat command, and sure enough, the util% is at or near
100% while
this is going on. However, vmstat never shows a high
"wa", so I'm not
sure I've got the same problem. But I might, so I ask next:
how did you
adjust the RAID controller? Are there software tools to do
that, or do
you have to reboot and get into the RAID controller BIOS? I
also have
3ware hardware RAID, but with SCSI drives:
scsi0 : 3ware 9000 Storage Controller
Wayne also said:
> The raid card had write cache disabled and also set for
50% background and
> 50% I/O tasks
I presume this means you enabled the write cache. What else
did you do
about background vs. I/O? As you can tell, I know very
little about
hardware RAID controllers, this is the first time I have
ever had them
on any of my systems.
--Greg
_______________________________________________
Maia-users mailing list
Maia-users renaissoft.com
http://www.renaissoft.com/mailman/listinfo/maia-users
|
|
| slowness |

|
2006-06-14 20:12:49 |
Sorry the "wa" output is from top (debian sarge)
Our 3ware cards have an onboard webserver
run 3dmd from the cli (you need the 3dmd apps) and connect
to IP:1080
Wayne
On Wed, 2006-06-14 at 13:30 -0600, Greg Woods wrote:
> On Tue, 2006-06-13 at 21:47 +0100, Wayne Lee wrote:
>
> > iostat -x 1
> >
> > this showed that "%util" was at 100%
when writing to the drives, also
> > "wa" was 98% all the time.
>
> Amazingly enough, it looks like I am very close to the
same place you
> were.
>
> First, I increased the number of amavisd servers (in
both master.cf and
> amavisd.conf) from 10 to 50. I then restarted amavisd
and reloaded
> Postfix. This caused an immediate spike in load average
up to 25 or so,
> and succeeded in draining the active queue. While in
this mode, I ran
> the iostat command, and sure enough, the util% is at or
near 100% while
> this is going on. However, vmstat never shows a high
"wa", so I'm not
> sure I've got the same problem. But I might, so I ask
next: how did you
> adjust the RAID controller? Are there software tools to
do that, or do
> you have to reboot and get into the RAID controller
BIOS? I also have
> 3ware hardware RAID, but with SCSI drives:
>
> scsi0 : 3ware 9000 Storage Controller
>
>
> Wayne also said:
>
>
> > The raid card had write cache disabled and also
set for 50% background and
> > 50% I/O tasks
>
> I presume this means you enabled the write cache. What
else did you do
> about background vs. I/O? As you can tell, I know very
little about
> hardware RAID controllers, this is the first time I
have ever had them
> on any of my systems.
>
> --Greg
>
>
_______________________________________________
Maia-users mailing list
Maia-users renaissoft.com
http://www.renaissoft.com/mailman/listinfo/maia-users
|
|
| slowness |

|
2006-06-14 20:34:19 |
On Wed, 2006-06-14 at 13:30 -0600, Greg Woods wrote:
> Amazingly enough, it looks like I am very close to the
same place you
> were.
I may have spoken too soon. I am again experiencing delays
of up to 20
minutes, but there is very little activity on the system.
Only about 7
or so of my 50 amavisd processes are active. Messages are
getting stuck
in the postfix Active queue, just as before. I am certain
that I have
kept the master.cf maxproc value and the $max_servers
setting in
amavisd.conf in lock step with one another, but for some
reason the qmgr
daemon is not processing the messages in the active queue
even though
there are amavisd servers available. If anyone has any ideas
about this
(I'm going to go to the Postfix list next) I would greatly
appreciate it
as the users are beating my door down.
One interesting thing also is that while the delays are
occurring, the
web server is very slow to respond. This all suggests that
the slowness
has to do with the MySQL database as that is the only common
element
between Postfix and the Mailguard web system, but I cannot
find any
reason for it. MySQL has been configured to allow 1000
connections and I
only see 200 or so. The database server also doesn't appear
to be bogged
down, so I am at a loss to explain what I am seeing.
Could this have anything to do with the fact that, somehow
during the
upgrade from RC5 to 1.0.1, all my tables got converted to
MyISAM? They
were InnoDB before and I am planning to convert them back
(with ALTER
TABLES statements) but I can't do that without taking the
entire mail
system out of service, so that has to wait until the
weekend.
--Greg
_______________________________________________
Maia-users mailing list
Maia-users renaissoft.com
http://www.renaissoft.com/mailman/listinfo/maia-users
|
|
| slowness |

|
2006-06-14 20:59:44 |
Are you able to connect manually to 127.0.0.1 10024 when
this is
happening?
Wayne
On Wed, 2006-06-14 at 14:34 -0600, Greg Woods wrote:
> On Wed, 2006-06-14 at 13:30 -0600, Greg Woods wrote:
>
> > Amazingly enough, it looks like I am very close to
the same place you
> > were.
>
> I may have spoken too soon. I am again experiencing
delays of up to 20
> minutes, but there is very little activity on the
system. Only about 7
> or so of my 50 amavisd processes are active. Messages
are getting stuck
> in the postfix Active queue, just as before. I am
certain that I have
> kept the master.cf maxproc value and the $max_servers
setting in
> amavisd.conf in lock step with one another, but for
some reason the qmgr
> daemon is not processing the messages in the active
queue even though
> there are amavisd servers available. If anyone has any
ideas about this
> (I'm going to go to the Postfix list next) I would
greatly appreciate it
> as the users are beating my door down.
>
> One interesting thing also is that while the delays are
occurring, the
> web server is very slow to respond. This all suggests
that the slowness
> has to do with the MySQL database as that is the only
common element
> between Postfix and the Mailguard web system, but I
cannot find any
> reason for it. MySQL has been configured to allow 1000
connections and I
> only see 200 or so. The database server also doesn't
appear to be bogged
> down, so I am at a loss to explain what I am seeing.
>
> Could this have anything to do with the fact that,
somehow during the
> upgrade from RC5 to 1.0.1, all my tables got converted
to MyISAM? They
> were InnoDB before and I am planning to convert them
back (with ALTER
> TABLES statements) but I can't do that without taking
the entire mail
> system out of service, so that has to wait until the
weekend.
>
> --Greg
>
>
_______________________________________________
Maia-users mailing list
Maia-users renaissoft.com
http://www.renaissoft.com/mailman/listinfo/maia-users
|
|
|
|