List Info

Thread: SA BAYES TIMING INFO




SA BAYES TIMING INFO
user name
2006-08-15 08:37:06
We have 4 boxes : 2 relays, one DB server and one box that
do admin
stuff (web interface + sa-learn, ....)

Relays :
Loaded with amavis + clamav + maia
2x AMD Athlon MP
1GB RAM
IDE HDD

DB Server
Actually, we got perfs problem with this one, probably
related to
Software RAID - new LSI Raid cards ordered
PostgreSQL 8.1.4
AMD Opteron 3GHz
1GB RAM
2x IDE HDD, software raid

Web + admin
This box run once per hour process-quarantine.pl
Once a day a massive DB cleanup :  sa-learn --force-expire
and vacuumdb
full analyse for database
AMD Opteron 3GHz
1GB RAM
IDE HDD



Software versions : 
Distrib : Gentoo, NPTL
Kernel : 2.6.15 and 16
PostgreSQL 8.1.4
Spamassassin : 3.1.3
Perl : 5.8.8 + threads


SA config :
use_bayes                               1
bayes_auto_learn                        0
bayes_sql_override_username             amavis
bayes_auto_expire                       0



sa-learn --dump magic :
0.000          0          3          0  non-token data:
bayes db version
0.000          0      69192          0  non-token data:
nspam
0.000          0      16634          0  non-token data: nham
0.000          0     124718          0  non-token data:
ntokens
0.000          0 1155203541          0  non-token data:
oldest atime
0.000          0 1155627458          0  non-token data:
newest atime
0.000          0          0          0  non-token data: last
journal
sync atime
0.000          0 1155590460          0  non-token data: last
expiry
atime
0.000          0     325082          0  non-token data: last
expire
atime delta
0.000          0     159258          0  non-token data: last
expire
reduction count



Timing details :

2006-08-15 09:57:55 Maia: [process-quarantine-sub] TIMING
[total 24368
ms] - msg-prep: 2 (0%), train-bayes: 23700 (97%),
delete-mail: 666 (3%),
rundown: 0 (0%)
2006-08-15 09:57:59 Maia: [process-quarantine-sub] Learned
mail item
189436 as non-spam
2006-08-15 09:57:59 Maia: [process-quarantine-sub] Deleted
spam/non-spam
recipient references to mail item 189436
2006-08-15 09:58:00 Maia: [process-quarantine-sub] Deleted
mail item
189436
2006-08-15 09:58:00 Maia: [process-quarantine-sub] TIMING
[total 4439
ms] - msg-prep: 2 (0%), train-bayes: 3855 (87%),
delete-mail: 581 (13%),
rundown: 0 (0%)2006-08-15 09:58:20 Maia:
[process-quarantine-sub]
Learned mail item 189476 as non-spam
2006-08-15 09:58:20 Maia: [process-quarantine-sub] Deleted
spam/non-spam
recipient references to mail item 189476
2006-08-15 09:58:21 Maia: [process-quarantine-sub] Deleted
mail item
189476
2006-08-15 09:58:21 Maia: [process-quarantine-sub] TIMING
[total 20659
ms] - msg-prep: 2 (0%), train-bayes: 20136 (97%),
delete-mail: 521 (3%),
rundown: 0 (0%





_______________________________________________
Maia-users mailing list
Maia-usersrenaissoft.com
http://www.renaissoft.com/mailman/listinfo/maia-users
SA BAYES TIMING INFO
user name
2006-08-15 13:47:28
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

(This message is now CC'd to both maia-users and
spamassassin mailing lists )
( Continuing the thread in SpamAssassin ML RE: slow sql
bayes store)

Alexandre Ghisoli wrote:

> DB Server
> Actually, we got perfs problem with this one, probably
related to
> Software RAID - new LSI Raid cards ordered
> PostgreSQL 8.1.4
> AMD Opteron 3GHz
> 1GB RAM
> 2x IDE HDD, software raid

> 0.000          0     124718          0  non-token data:
ntokens

> 2006-08-15 09:57:55 Maia: [process-quarantine-sub]
TIMING [total 24368
> ms] - msg-prep: 2 (0%), train-bayes: 23700 (97%),
delete-mail: 666 (3%),
> rundown: 0 (0%)


Ok.  This looks like the best example yet of what I'm
looking for.  Good job
presenting that data.  

Furthermore, from the parts I have quoted above, I think I
can say without a
doubt that *something* is messed up here.  Even with
software raid, that box
should be able to handle learning a message faster than 24
seconds.  Actually,
unless you get a very good card, the opty might be able to
handle the raid stuff
better than many hardware raid cards.

124k rows should not be a problem for a database.  I'm
really thinking there's
an algorithm problem withing the bayes learning code.  It's
making too many sql
calls, or has a big 'O' problem... something.

( spamassasin folks, the original full message is archived
at
http://www.renaissoft.com/pipermail/maia-u
sers/2006-August/007188.html )

To the spamassassin mailing list:  These results seem
typical of the reports I
have seen. It has spanned both mysql and postgresql, several
OS's, SCSI or IDE,
RAID or not.  The only consistent thing is that it is slow.

There is also ageneral consensus that it seems like it got
really slow around
the time 3.x was installed, though we haven't yet had any
solid reports to go
back and forth and test it empirically.

- --
David Morton
Maia Mailguard                        - http://www.maiamailguard
.com
Morton Software Design and Consulting - http://www.dgrmm.net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


iD8DBQFE4dBwUy30ODPkzl0RAmqTAKCfXa7x3A9d/n93RYswkqkRVK+eNwCd
FeQS
ZG+cxXgJ1I/jvIXEbhb8onc=
=S7Jk
-----END PGP SIGNATURE-----
_______________________________________________
Maia-users mailing list
Maia-usersrenaissoft.com
http://www.renaissoft.com/mailman/listinfo/maia-users
[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )