List Info

Thread: Re: Floating point errors?




Re: Floating point errors?
user name
2007-07-23 15:29:44
On Tue, 17 Jul 2007, Ingomar Wesp wrote:

> For some reason, when manually marking spam or ham,
bogofilter was
> always called with the -N and -S options respectively,
even if the
> message was not previously registered at all.

Ugh. Perhaps Bogofilter should provide some protection
against this kind
of mistake. Would it make sense to complain when a message
that has never
been registered is being unregistered? (It would be quite
easy to
implement imho: compute a hash of token list generated from
the message,
turn it into a quasitoken like .MSG_COUNT, increment its
count during
registration, check and decrement it during
unregistration.)

> I assume that this lead to a condition where the
individual spam count of 
> several tokens were larger than the overall spam
message count.

This is quite likely.

--Pavel Kankovsky aka Peak  [ Boycott Microsoft--http://www.vcnet.com/bms
]
"Resistance is futile. Open your source code and
prepare for assimilation."

_______________________________________________
Bogofilter mailing list
Bogofilterbogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter

Re: Floating point errors?
country flaguser name
United States
2007-07-23 17:57:27
On Mon, 23 Jul 2007 22:29:44 +0200 (CEST)
Pavel Kankovsky wrote:

> On Tue, 17 Jul 2007, Ingomar Wesp wrote:
> 
> > For some reason, when manually marking spam or
ham, bogofilter was
> > always called with the -N and -S options
respectively, even if the
> > message was not previously registered at all.
> 
> Ugh. Perhaps Bogofilter should provide some protection
against this
> kind of mistake. Would it make sense to complain when a
message that
> has never been registered is being unregistered? (It
would be quite
> easy to implement imho: compute a hash of token list
generated from
> the message, turn it into a quasitoken like .MSG_COUNT,
increment its
> count during registration, check and decrement it
during
> unregistration.)
> 
> > I assume that this lead to a condition where the
individual spam
> > count of several tokens were larger than the
overall spam message
> > count.
> 
> This is quite likely.

Hi Pavel,

My .MSG_COUNT are approx 550,000 and 140,000.  Adding a
"dot"
token for each would add many, many tokens to my wordlist. 
As I don't
believe I need them, this seems wasteful.  On the other
hand, if you
(or someone else) wants to implement such a capability, it
could be an
option.

When a ham/spam count exceeds .MSG_COUNT it's an indication
that
something is b0rked.  Generating an error message might be
appropriate.  The idea results in a new issue -- how to make
the
problem known when bogofilter is running in the background.

As a more modest proposal, checking each token's ham and
spam counts
against .MSG_COUNT wouldn't use much computing power and
might be
helpful...

Regards,

David

_______________________________________________
Bogofilter mailing list
Bogofilterbogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )