Kurt Buff wrote:
> Update below - the problem isn't resolved.
>
> Kurt Buff wrote:
> > Pete Barnwell wrote:
> > > Kurt Buff wrote:
> > > > In addition to the problem with (some)
quarantined emails being
> > > > unretrievable, we had to reboot the
server today.
> > > >
> > > > I am unable to track the reason why it
locked up, but
> > > thought I'd put what I
> > > > know in front of folks to see if I can
get a response.
> > > >
> > > > The web pages were hanging, that is,
people would click on
> > > a link (any link)
> > > > and the page would just sit there and go
no further . I
> > > tried restarting
> > > > apache with apachectl, but while it did
restart, the
> > > problem didn't go away.
> > > >
> > > > I'm not finding much in the logs, except
some PHP errors. a
> > > sample from my
> > > > syslog is attached.
> > > >
> > > >
> > >
> > >
> > > 2007-06-27 09:15:15 Local7.Notice zmx1 Jun 27
09:15:15
> > > httpd[4228]: PHP Fatal error: Allowed memory
size of 8388608
> > > bytes exhausted (tried to allocate 1464539
bytes) in
> > > /usr/local/share/pear/Mail/mimeDecode.php on
line 401
> > >
> > >
> > > Try increasing the amount of memory allowed
to php by changing
> > >
> > > memory_limit =
> > >
> > > in /etc/php.ini I'm finding 60M is enough
YMMV etc.
> > >
> > > Rgds
> > >
> > > Pete
> >
> >
> > That was it!
> >
> > Thanks!
> >
> > Kurt
>
> I was forced again to reboot the server - Monday
morning and
> this morning.
>
> Today I was able to narrow down (I hope) the time when
the
> problem occurred.
> I'm hoping someone can give me a clue where to start
looking,
> because I'm
> not finding a lot in the logs.
>
> At around 10:20 to 10:40 today, the web pages for Maia
> stopped responding. I
> was unable to log in to the web site, as the page for
login
> didn't present
> itself. Also, I noticed that the 10:00 hourly cron job
didn't
> run. This job
> does process-quarantine.pl, send-quarantine-digests.pl,
and
> stats-snapshot.pl. After the reboot, the 11:00 cron
ran, and
> the 9:00 job
> ran before the reboot.
>
> I did do 'amavisd restart' and 'apachectl restart', to
no
> avail. I rebooted
> the server at roughly 10:45, after I found over 300
messages
> in the postfix
> queues. This, of course, means that postfix was
accepting
> mail, but that
> somewhere between there and delivery to our Exchange
server,
> things were
> getting fubar'ed.
>
> I have found nothing in the httpd-*.logs.
>
> I notice in the maillog that postfix stopped relaying
messages to our
> Exchange server at shortly after 10:00 - the last
message
> that shows a relay
> to our Exchange server is at 10:04, which explains the
> buildup of messages
> in the postfix queues.
>
> However, it doesn't look to me as if amavis has
actually
> finished talking
> with MySQL, as shown by these eventlog entries, at the
time I
> rebooted the
> machine (I assume that they are actually caused by the
reboot!):
>
> 2007-07-18 10:45:16 System4.Info zmx1 Jul 18
> 10:45:15 kernel:
> Stopping mysql.
> 2007-07-18 10:45:19 Mail.Warning zmx1 Jul 18
10:45:17
> amavis[58566]: (58566-08) SA TIMED OUT, backtrace: at
>
/usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/Bayes
St
> ore/MySQL.
> pm line 492nteval {...} called at
>
/usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/Bayes
St
> ore/MySQL.pm
> line
>
492ntMail::SpamAssassin::BayesStore::MySQL::tok_touch_all(
'M
> ail::SpamAss
> assin::BayesStore::MySQL=HASH(0xa16e77c)',
> 'ARRAY(0x92d9bd4)', 1184778316)
> called at
>
/usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/Bayes
.pm
> line 1278ntMail::SpamAssassin::Bayes::s
> can('Mail::SpamAssassin::Bayes=HASH(0xa15335c)',
> 'Mail::SpamAssassin::PerMsgStatus=HASH(0xa473b38)',
> 'Mail::SpamAssassin::Message=HASH(0xa94def8)') called
at
> /usr/local/lib/perl5/site_perl/5
> .8.8/Mail/SpamAssassin/EvalTests.pm line
>
2517ntMail::SpamAssassin::PerMsgStatus::check_bayes('Mail:
:S
> pamAssassin::P
> erMsgStatus=HASH(0xa473b38)', 'ARRAY(0xa956858)', 0.20,
0.40)
> called at /
> usr/local/lib/perl5/site_perl/5.8.8/Mail/Spam...
> 2007-07-18 10:45:19 Mail.Warning zmx1 Jul 18
10:45:17
> amavis[58562]: (58562-09) SA TIMED OUT, backtrace: at
>
/usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/Bayes
St
> ore/MySQL.
> pm line 492nteval {...} called at
>
/usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/Bayes
St
> ore/MySQL.pm
> line
>
492ntMail::SpamAssassin::BayesStore::MySQL::tok_touch_all(
'M
> ail::SpamAss
> assin::BayesStore::MySQL=HASH(0xa16e77c)',
> 'ARRAY(0x92d9bd4)', 1184778318)
> called at
>
/usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/Bayes
.pm
> line 1278ntMail::SpamAssassin::Bayes::s
> can('Mail::SpamAssassin::Bayes=HASH(0xa15335c)',
> 'Mail::SpamAssassin::PerMsgStatus=HASH(0xaef53b8)',
> 'Mail::SpamAssassin::Message=HASH(0xad7b4a8)') called
at
> /usr/local/lib/perl5/site_perl/5
> .8.8/Mail/SpamAssassin/EvalTests.pm line
>
2517ntMail::SpamAssassin::PerMsgStatus::check_bayes('Mail:
:S
> pamAssassin::P
> erMsgStatus=HASH(0xaef53b8)', 'ARRAY(0xada2650)', 0.20,
0.40)
> called at /
> usr/local/lib/perl5/site_perl/5.8.8/Mail/Spam...
> 2007-07-18 10:45:19 Mail.Error zmx1 Jul 18
10:45:17
> amavis[58566]: (58566-08) TROUBLE in check_mail:
spam_scan FAILED:
> DBD::mysql::st execute failed: MySQL server has gone
away a
> t /usr/local/sbin/amavisd line 3219, <GEN7> line
4533.
> 2007-07-18 10:45:19 Mail.Warning zmx1 Jul 18
10:45:17
> amavis[59593]: (59593-02) connect_to_sql: unable to
connect to DSN
> 'DBI:mysql:maia:localhost': Can't connect to local
MySQL se
> rver through socket '/tmp/mysql.sock' (2)
> 2007-07-18 10:45:19 Mail.Warning zmx1 Jul 18
10:45:17
> amavis[59594]: (59594-01) connect_to_sql: unable to
connect to DSN
> 'DBI:mysql:maia:localhost': Can't connect to local
MySQL se
> rver through socket '/tmp/mysql.sock' (2)
> 2007-07-18 10:45:19 Mail.Warning zmx1 Jul 18
10:45:17
> amavis[59598]: (59598-01) connect_to_sql: unable to
connect to DSN
> 'DBI:mysql:maia:localhost': Can't connect to local
MySQL se
> rver through socket '/tmp/mysql.sock' (2)
> 2007-07-18 10:45:19 Mail.Warning zmx1 Jul 18
10:45:17
> amavis[59601]: (59601-01) connect_to_sql: unable to
connect to DSN
> 'DBI:mysql:maia:localhost': Can't connect to local
MySQL se
> rver through socket '/tmp/mysql.sock' (2)
> 2007-07-18 10:45:19 Mail.Warning zmx1 Jul 18
10:45:17
> amavis[59603]: (59603-01) connect_to_sql: unable to
connect to DSN
> 'DBI:mysql:maia:localhost': Can't connect to local
MySQL se
> rver through socket '/tmp/mysql.sock' (2)
> 2007-07-18 10:45:19 Mail.Error zmx1 Jul 18
10:45:17
> amavis[58562]: (58562-09) TROUBLE in check_mail:
spam_scan FAILED:
> DBD::mysql::st execute failed: MySQL server has gone
away a
> t /usr/local/sbin/amavisd line 3219, <GEN7> line
54780.
Let's try that again, shall we?
It did happen again. This time, I notice that there were
message from my
hourly cron jobs that got hung in queue, that look like
this:
2007-07-24 16:00:00 Maia: [process-quarantine] Another
instance [94281]
is currently running.
DBD::mysql::st execute failed: Lost connection to MySQL
server during
query at
/var/amavisd/maia/scripts/send-quarantine-digests.pl
line 141.
2007-07-24 18:11:38 Maia: [send-quarantine-digests]
FATAL ERROR:
Couldn't
execute query: Lost connection to MySQL server during
query
DBI connect('maia:localhost:3306','vscan',...) failed:
Can't connect to
local
MySQL server through socket '/tmp/mysql.sock' (2) at
/var/amavisd/maia/scripts/stats-snapshot.pl line 100
2007-07-24 18:11:39 Maia: [stats-snapshot] FATAL ERROR:
Can't connect
to the
Maia database (verify $dsn, $username, and $password in
maia.conf)
and this:
DBD::mysql::st execute failed: Lost connection to MySQL
server during
query at
/var/amavisd/maia/scripts/process-quarantine-sub.pl
line 407.
2007-07-24 15:00:01 Maia: [process-quarantine-sub]
Warning: --report
not
specified, reporting is disabled.
2007-07-24 18:11:38 Maia: [process-quarantine-sub]
FATAL ERROR:
Couldn't execute
query: Lost connection to MySQL server during query
2007-07-24 18:11:38 Maia: [process-quarantine] FATAL
ERROR: Subroutine
terminated
with a fatal error.
DBI connect('maia:localhost:3306','vscan',...) failed:
Can't connect to
local
MySQL server through socket '/tmp/mysql.sock' (2) at
/var/amavisd/maia/scripts/send-quarantine-digests.pl
line 122
2007-07-24 18:11:39 Maia: [send-quarantine-digests]
FATAL ERROR: Can't
connect to
the Maia database (verify $dsn, $username, and
$password in maia.conf)
DBI connect('maia:localhost:3306','vscan',...) failed:
Can't connect to
local
MySQL server through socket '/tmp/mysql.sock' (2) at
/var/amavisd/maia/scripts/stats-snapshot.pl line 100
2007-07-24 18:11:39 Maia: [stats-snapshot] FATAL ERROR:
Can't connect
to the Maia
database (verify $dsn, $username, and $password in
maia.conf)
Also, when trying to stop amavisd, I get a response that it
can't stop,
because there is no pid file.
_______________________________________________
Maia-users mailing list
Maia-users renaissoft.com
http://www.renaissoft.com/mailman/listinfo/maia-users
|
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Kurt Buff wrote:
> Update below - the problem isn't resolved.
- From what you're describing, it sounds like your problem
may be with a
corrupt MySQL table or index. Have you checked your MySQL
logs for
indications of any errors?
What happens (disturbingly often, it seems, with MySQL) is
that the
bayes_token table gets corrupted--it's the busiest table in
the
database, so it's the most likely to get corrupted--and when
that
happens MySQL shuts down and tries to restart itself, over
and over
again. What you get is intermittent database connectivity:
the
connection attempts that hit the server during the seconds
or minutes
that it's alive and running go through, but the ones that
arrive while
mysqld is restarting can't connect.
The fact that you're seeing database connection failures
from
amavisd-maia, the process-quarantine script, and the PHP
scripts all
point to intermittent database failures. I'd investigate
your MySQL
logs, as that seems to be the heart of the problem.
If it does turn out to be a table/index corruption, you can
use
mysqldump to salvage what information you can from the
affected table
and then drop and recreate the table and reload that data.
If it's any
of the bayes_* tables, though, just drop and recreate all
of
them--they'll regenerate on their own over the course of a
day or two.
This was discussed a few weeks ago on this list, so there's
more advice
on this in the archives if you need it.
As for why tables get corrupted in the first place, that's a
matter of
debate. Some people blame bugs in MySQL, while the MySQL
devs claim
InnoDB tables aren't corruptible unless your hardware (disk
or RAM) is
faulty. Personally, I've seen it happen with bad RAM, which
causes the
data to be corrupted in the write cache before it's handed
to the
filesystem. Clearly it could also be caused by a faulty
drive, but
that's usually much easier to diagnose.
- --
Robert LeBlanc <rjl renaissoft.com>
Renaissoft, Inc.
Maia Mailguard <http://www.maiamail
guard.com/>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
iD8DBQFGp9P1GmqOER2NHewRApzmAJ9uUBPgwKLs5Jy1JJ8HbuIHtOit7QCe
PHZg
AIPKOWfpHHZ4yg6aw2zlsKk=
=+Lgy
-----END PGP SIGNATURE-----
_______________________________________________
Maia-users mailing list
Maia-users renaissoft.com
http://www.renaissoft.com/mailman/listinfo/maia-users
|