On Thu, 2007-08-09 at 09:25 +0200, Matthias Andree wrote:
> On Wed, 08 Aug 2007, David Relson forwarded this
message:
> > The question is: The wordlist.db is coming too big
and growing fast,
> > about 50Mb per week. It's already 500 Mb. I would
like to know if there
> > is any limitation about it, recomendation or
anything, so it doesn't
> > affect my performance.
>
> If using the "-u" option, try without -- and
note you'll probably have
> to adjust training scripts (turn -Ns into -s and turn
-Sn into -n).
Or you could just do nothing and let it play out. The
growth of your
wordlist should approximate a logistic function... that is,
it will grow
exponentially at first, and then logarithmically as you
begin to exhaust
the possible token space. Like Thomas Malthus' unfounded
fears about
human population growth, your concern about a rapidly
growing wordlist
is likely unfounded. It should begin to rapidly slow its
growth.
Undermining bogofilter's ability to update the wordlist with
appropriate
tokens will only hamper accuracy. That said, it couldn't
hurt to purge
old hapaxes occasionally, and using thresh_update to slow
wordlist
growth once accuracy is high might also serve you well.
Tom
_______________________________________________
Bogofilter mailing list
Bogofilter bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter
|