Aaron Bennett wrote:
> I started using Razor2 late last week and it's helped
my spam detection;
> however enabling reporting for it has KILLED my
quarantine processing
> time. I had to disable reporting in order to make
things work -- one of
> my goals is to keep the learn/report process running at
least once per
> hour and I stopped this mornings' run after six hours.
Without
> reporting, it will run in probably 45 minutes. I run
it so often
> because I've got a dedicated database server which
runs the learn/report
> run so it doesn't slow down the relays and having it
run often gives my
> users the benefit of rapid learning.
>
> Is there anything to be done to speed up razor2?
Alternately, it would
> be a nice feature if the learn/report cycles could be
broken up. Say,
> allow the --learn to run but don't purge the database
until the --report
> has been run; then I could run the --report at like
10:00 PM and still
> do the learning often.
To a point, reporting will always be at the mercy of network
latency and
server availability on the part of the report servers. That
said, you
/do/ have some control over which services you report to.
Assuming
you've applied the patches from Ticket #288
<htt
ps://secure.renaissoft.com/maia/ticket/288>, you
should be able to
selectively enable/disable reporting to services as you see
fit, either
from the command-line or in your maia.conf file.
Without these patches, SpamAssassin will try to report to
any services
you have installed and use for lookups. What this means is
that if you
have the Razor, Pyzor, DCC, and SpamCop plugins installed
and you use
them to score your mail, SpamAssassin will try to send its
reports to
all of them, no matter what you've tried to tell it in the
maia.conf
file. The patches are needed to make SpamAssassin handle
reporting as
advertised. The SpamAssassin devs finally accepted my
patches last
week, so hopefully this should be resolved in SpamAssassin
3.1.5.
Presuming you've applied the patches, then, I'd advise you
to disable
reporting to Pyzor, since that's the one that's least
reliable and
suffers from the most server downtime. Not having to wait
an extra 10
seconds for Pyzor servers for every report makes a vast
improvement.
The other thing you can do, of course, is reduce the timeout
values (in
seconds) in your local.cf file for the services you're
reporting to, e.g.
dcc_timeout 10
pyzor_timeout 5
razor_timeout 10
As for separating learning and reporting tasks, it's
certainly
conceivable as a future feature, but bear in mind that for
the same
reasons that you want to do your Bayes training frequently,
the
reporting services would prefer you to do your reporting
frequently as
well, since a large part of their benefit comes from having
received a
large number of spam reports about an item before you
receive your copy
of it. If you delay your reporting until late in the
evening, millions
of people will have already received copies of the spam
without the
benefit of your report to help classify it. That's why
reporting and
Bayes training really go hand in hand--just as the local
learning helps
improve the effectiveness of your Bayes database, the
reporting helps
improve the effectiveness of Razor/Pyzor/DCC/SpamCop.
Don't get overly concerned about the fact that a
process-quarantine run
takes hours to run, either; by all means schedule it to run
hourly, it
won't start a new run until the current run completes, and
the current
run periodically checks for new items to process as it goes,
so you're
not really "falling behind" as much as you might
think. At very busy
sites in fact, the process-quarantine run is effectively a
continuous,
24/7 process--a run that never ends, constantly learning and
reporting
items as it manages its queue in a slow but steady stream.
During hours
with lighter traffic it gets to "catch up" a
bit, while it "falls
behind" during peak traffic hours. As long as it
manages to get through
a full day's items within 24 hours or less, it all balances
out. It's
only a problem if it starts taking consistently longer than
a day to
process a day's worth of items.
--
Robert LeBlanc <rjl renaissoft.com>
Renaissoft, Inc.
Maia Mailguard <http://www.maiamail
guard.com/>
_______________________________________________
Maia-users mailing list
Maia-users renaissoft.com
http://www.renaissoft.com/mailman/listinfo/maia-users
|