List Info

Thread: Fw: Wordlist too big




Fw: Wordlist too big
country flaguser name
United States
2007-08-08 21:04:28

From: Andrey Bibiano Jardim <andreyjardimufsj.edu.br>
To: bogofilterbogofilter.org
Subject: Wordlist too big
Date: Tue, 07 Aug 2007 11:46:56 -0300
User-Agent: Thunderbird 1.5.0.7 (X11/20061008)

Hi list.
As startup, I'm using only one wordlist.db for everyone. The
training
is possible by forwarding spams to a certain address, the
same to ham.

The question is: The wordlist.db is coming too big and
growing fast,
about 50Mb per week. It's already 500 Mb. I would like to
know if there
is any limitation about it, recomendation or anything, so it
doesn't
affect my performance.

Thanks in advance;
    Andrey

_______________________________________________
Bogofilter mailing list
Bogofilterbogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter

Re: Fw: Wordlist too big
user name
2007-08-09 02:25:07
On Wed, 08 Aug 2007, David Relson forwarded this message:

> From: Andrey Bibiano Jardim <andreyjardimufsj.edu.br>
> To: bogofilterbogofilter.org
> Subject: Wordlist too big
> Date: Tue, 07 Aug 2007 11:46:56 -0300
> User-Agent: Thunderbird 1.5.0.7 (X11/20061008)
> 
> Hi list.
> As startup, I'm using only one wordlist.db for
everyone. The training
> is possible by forwarding spams to a certain address,
the same to ham.
> 
> The question is: The wordlist.db is coming too big and
growing fast,
> about 50Mb per week. It's already 500 Mb. I would like
to know if there
> is any limitation about it, recomendation or anything,
so it doesn't
> affect my performance.

If using the "-u" option, try without -- and note
you'll probably have
to adjust training scripts (turn -Ns into -s and turn -Sn
into -n).

-- 
Matthias Andree
_______________________________________________
Bogofilter mailing list
Bogofilterbogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter

Re: Fw: Wordlist too big
country flaguser name
United States
2007-08-09 07:12:43
On Thu, 2007-08-09 at 09:25 +0200, Matthias Andree wrote:
> On Wed, 08 Aug 2007, David Relson forwarded this
message:
> > The question is: The wordlist.db is coming too big
and growing fast,
> > about 50Mb per week. It's already 500 Mb. I would
like to know if there
> > is any limitation about it, recomendation or
anything, so it doesn't
> > affect my performance.
> 
> If using the "-u" option, try without -- and
note you'll probably have
> to adjust training scripts (turn -Ns into -s and turn
-Sn into -n).

Or you could just do nothing and let it play out.  The
growth of your
wordlist should approximate a logistic function... that is,
it will grow
exponentially at first, and then logarithmically as you
begin to exhaust
the possible token space.  Like Thomas Malthus' unfounded
fears about
human population growth, your concern about a rapidly
growing wordlist
is likely unfounded.  It should begin to rapidly slow its
growth.
Undermining bogofilter's ability to update the wordlist with
appropriate
tokens will only hamper accuracy.  That said, it couldn't
hurt to purge
old hapaxes occasionally, and using thresh_update to slow
wordlist
growth once accuracy is high might also serve you well.

Tom


_______________________________________________
Bogofilter mailing list
Bogofilterbogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter

[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )