List Info

Thread: spam learning....




spam learning....
user name
2006-11-26 18:40:52
I'm trying to put together public folders for all the
SPAM/HAM(good email) to be 
dumped into for bayes learning.

Seems to me that it would be MUCH easier if I could pull out
all the messages 
via SQL than IMAP.  I got this far...  I'm not sure how to
reconstruct the 
entire email message into a string for me to dump into my
bayesian "learner" 
(bogofilter).  It should be the same for SA or any other.

How do you get all the physical message blocks in order?
(of course, for bogofilter, this doesn't really matter.)



select phymsg.id, messageblk
from
dbmail_users u,
dbmail_mailboxes mb,
dbmail_messages msg,
dbmail_physmessage phymsg,
dbmail_messageblks blks
where
u.user_idnr = mb.owner_idnr
and u.userid = '__public__'
and mb.name = 'SPAM'
and mb.mailbox_idnr = msg.mailbox_idnr
and msg.deleted_flag = 0
and msg.physmessage_id = phymsg.id
and phymsg.id = blks.physmessage_id
order by phymsg.id
;
_______________________________________________
DBmail mailing list
DBmaildbmail.org
htt
ps://mailman.fastxs.nl/mailman/listinfo/dbmail
spam learning....
user name
2006-11-26 19:54:57
Tom Allison wrote:
> How do you get all the physical message blocks in
order?
> (of course, for bogofilter, this doesn't really
matter.)
> 
> 
> 
> select phymsg.id, messageblk
> from
> dbmail_users u,
> dbmail_mailboxes mb,
> dbmail_messages msg,
> dbmail_physmessage phymsg,
> dbmail_messageblks blks
> where
> u.user_idnr = mb.owner_idnr
> and u.userid = '__public__'
> and mb.name = 'SPAM'
> and mb.mailbox_idnr = msg.mailbox_idnr
> and msg.deleted_flag = 0
> and msg.physmessage_id = phymsg.id
> and phymsg.id = blks.physmessage_id
> order by phymsg.id
> ;

If you only want to read the Public/Spam box, why do you
want to read
the box through SQL? Use fetchmail I'd say.


-- 
 
____________________________________________________________
____
  Paul Stevens                                      paul at
nfg.nl
  NET FACILITIES GROUP                     GPG/PGP:
1024D/11F8CD31
  The Netherlands________________________________http://www.nfg.nl
_______________________________________________
DBmail mailing list
DBmaildbmail.org
htt
ps://mailman.fastxs.nl/mailman/listinfo/dbmail
spam learning....
user name
2006-11-26 20:36:34
Paul J Stevens wrote:
> Tom Allison wrote:
>> How do you get all the physical message blocks in
order?
>> (of course, for bogofilter, this doesn't really
matter.)
>>
>>
>>
>> select phymsg.id, messageblk
>> from
>> dbmail_users u,
>> dbmail_mailboxes mb,
>> dbmail_messages msg,
>> dbmail_physmessage phymsg,
>> dbmail_messageblks blks
>> where
>> u.user_idnr = mb.owner_idnr
>> and u.userid = '__public__'
>> and mb.name = 'SPAM'
>> and mb.mailbox_idnr = msg.mailbox_idnr
>> and msg.deleted_flag = 0
>> and msg.physmessage_id = phymsg.id
>> and phymsg.id = blks.physmessage_id
>> order by phymsg.id
>> ;
> 
> If you only want to read the Public/Spam box, why do
you want to read
> the box through SQL? Use fetchmail I'd say.
> 
> 

I don't have pop configured and I am not sure about the next
question:  who do 
you authenticate as in order to access PUBLIC folders?  I'll
assume this depends 
on who has rights (ACL) to the folder.
_______________________________________________
DBmail mailing list
DBmaildbmail.org
htt
ps://mailman.fastxs.nl/mailman/listinfo/dbmail
spam learning....
user name
2006-11-26 21:17:20
On Sun, 2006-11-26 at 15:36 -0500, Tom Allison wrote:

> I don't have pop configured

It's not so hard. I'd recommend dbmail.conf settings like
this:

[POP]
PORT=110
BINDIP=127.0.0.1
NCHILDREN=1
MAXCHILDREN=3
MINSPARECHILDREN=1
MAXSPARECHILDREN=2

This will limit dbmail-pop3d to running only 1-3 processes,
and only
accepting connections from localhost. I figure you'd only be
accessing
1-2 spam mailboxes at a time, so this should be plenty.

>  and I am not sure about the next question:  who do 
> you authenticate as in order to access PUBLIC folders? 
I'll assume this depends 
> on who has rights (ACL) to the folder.

Exactly. Create a spam user and give them full access to
those public
mailboxes. Give the 'anyone' user read/write access so that
all other
users can copy messages over to that mailbox via IMAP.

Aaron

_______________________________________________
DBmail mailing list
DBmaildbmail.org
htt
ps://mailman.fastxs.nl/mailman/listinfo/dbmail
spam learning....
user name
2006-11-26 21:51:27
Tom Allison wrote:
> Paul J Stevens wrote:
>>
>> If you only want to read the Public/Spam box, why
do you want to read
>> the box through SQL? Use fetchmail I'd say.
>>
>>
> 
> I don't have pop configured and I am not sure about the
next question: 
> who do you authenticate as in order to access PUBLIC
folders?  I'll
> assume this depends on who has rights (ACL) to the
folder.

Who's talking about POP. Fetchmail talks imap fluently.

Setup a user 'spamadmin' who has full access to the
Public/Spam folder.
Subscribe all users to Public/Spam.
Setup a fetchmailrc that feeds the contents to sa-learn
daily.


--- acl ---

insert into dbmail_acl values
(<spamadmin:user_idnr>,<Spam:mailbox_idnr>,1,1,1
,1,1,1,1,1,1);
insert into dbmail_acl values
(<spamadmin:user_idnr>,<NoSpam:mailbox_idnr>,1,1
,1,1,1,1,1,1,1);


--- /etc/fetchmailrc ---
poll localhost with protocol imap
  user spamadmin
  mda "/usr/bin/sa-learn --spam --single"
  password sekret;
  folder "#Public/Spam"
  keep

poll localhost with protocol imap
  user spamadmin
  mda "/usr/bin/sa-learn --ham --single"
  password sekret;
  folder "#Public/NoSpam"
  keep


--- autosubscriber ---

#!/bin/sh

# auto-subscribe all users to mailbox '#Public/Spam'

query() {
        echo "$" | mysql -N --batch dbmail
}


get_all_userids() {
        query "select user_idnr from dbmail_users where
userid not in
("__!internal_delivery_user!__","__public__","anyone&qu
ot;,"spamadmin")"
}

get_user_idnr() {
        [ -n "$1" ] || return 1
        query "select user_idnr from dbmail_users where
userid ="$1""
}
get_mailboxid() {
        [ -n "$1" ] || return 1
        [ -n "$2" ] || return 1
        mailbox="$1"
        owner="$2"
        query "select mailbox_idnr from
dbmail_mailboxes where
name="$mailbox" and
owner_idnr="$owner""
}

subscribe() {
        user_idnr="$1"
        mailbox_idnr="$2"
        [ -n "$mailbox_idnr" ] || return 1
        query "insert into dbmail_subscription values
("$user_idnr","$mailbox_idnr")"
&>/dev/null
}


main() {
        public_idnr=`get_user_idnr "__public__"`
        spambox_idnr=`get_mailboxid "Spam"
$public_idnr`
        nospambox_idnr=`get_mailboxid "NoSpam"
$public_idnr`
        for user_idnr in `get_all_userids`; do
                subscribe $user_idnr $spambox_idnr
                subscribe $user_idnr $nospambox_idnr
        done
}

main
------------------------------


-- 
 
____________________________________________________________
____
  Paul Stevens                                      paul at
nfg.nl
  NET FACILITIES GROUP                     GPG/PGP:
1024D/11F8CD31
  The Netherlands________________________________http://www.nfg.nl
_______________________________________________
DBmail mailing list
DBmaildbmail.org
htt
ps://mailman.fastxs.nl/mailman/listinfo/dbmail
spam learning....
user name
2006-11-27 07:04:04
Tom Allison wrote:
> Paul J Stevens wrote:
>> Tom Allison wrote:
>>> How do you get all the physical message blocks
in order?
>>> (of course, for bogofilter, this doesn't really
matter.)
>>>
>>>
>>>
>>> select phymsg.id, messageblk
>>> from
>>> dbmail_users u,
>>> dbmail_mailboxes mb,
>>> dbmail_messages msg,
>>> dbmail_physmessage phymsg,
>>> dbmail_messageblks blks
>>> where
>>> u.user_idnr = mb.owner_idnr
>>> and u.userid = '__public__'
>>> and mb.name = 'SPAM'
>>> and mb.mailbox_idnr = msg.mailbox_idnr
>>> and msg.deleted_flag = 0
>>> and msg.physmessage_id = phymsg.id
>>> and phymsg.id = blks.physmessage_id
>>> order by phymsg.id
>>> ;
>>
>> If you only want to read the Public/Spam box, why
do you want to read
>> the box through SQL? Use fetchmail I'd say.
>>
>>
> 
> I don't have pop configured and I am not sure about the
next question: 
> who do you authenticate as in order to access PUBLIC
folders?  I'll
> assume this depends on who has rights (ACL) to the
folder.
> _______________________________________________
> DBmail mailing list
> DBmaildbmail.org
> htt
ps://mailman.fastxs.nl/mailman/listinfo/dbmail


Remember that the bayes database needs to be trained with at
least 200
SPAM's AND 200 HAM's before it can score new messages.

To get the most out of bayes you would also like to train it
with HAM's on
a regulary basis.

What is your approach for feeding HAM's into bayes? I
suppose you are not
creating public folders for HAM...!?

-- 
Med Hilsen/Best regards
Geir Voll Nielsen
_______________________________________________
DBmail mailing list
DBmaildbmail.org
htt
ps://mailman.fastxs.nl/mailman/listinfo/dbmail
spam learning....
user name
2006-11-27 11:17:40
Geir Voll Nielsen wrote:

> Remember that the bayes database needs to be trained
with at least 200
> SPAM's AND 200 HAM's before it can score new messages.
> 
> To get the most out of bayes you would also like to
train it with HAM's on
> a regulary basis.
> 
> What is your approach for feeding HAM's into bayes? I
suppose you are not
> creating public folders for HAM...!?
> 


I'm using my 4 year old database from bogofilter.  Together
we've seen a lot of 
spam.

But at this point I have it set to train on error only.  So
a public HAM/SPAM 
mailbox is going to be just right for me.  I think.  Is
there some way that I 
can set the folders to be write only, but you can't delete,
copy, move, or read?

The other option is to give everyone the same folder names
and use SQL instead 
of the aforementioned fetchmail to access the files.  Not
sure which is best. 
Right now I'm working off a common list for bogofilter that
I'm running through 
procmail.
_______________________________________________
DBmail mailing list
DBmaildbmail.org
htt
ps://mailman.fastxs.nl/mailman/listinfo/dbmail
spam learning....
user name
2006-11-28 01:08:12
Paul J Stevens wrote:
> Tom Allison wrote:
>> Paul J Stevens wrote:
>>> If you only want to read the Public/Spam box,
why do you want to read
>>> the box through SQL? Use fetchmail I'd say.
>>>
>>>
>> I don't have pop configured and I am not sure about
the next question: 
>> who do you authenticate as in order to access
PUBLIC folders?  I'll
>> assume this depends on who has rights (ACL) to the
folder.
> 
> Who's talking about POP. Fetchmail talks imap fluently.
> 
> Setup a user 'spamadmin' who has full access to the
Public/Spam folder.
> Subscribe all users to Public/Spam.
> Setup a fetchmailrc that feeds the contents to sa-learn
daily.
> 


These are some great scripts, thanks.
(minor corrections for me to sort out:
bogofilter instead of sa-learn,
postgresql instead of mysql
but the concept is all there in technicolor!)

I think I started on the wrong foot.

I set up a system user for the ownership of bogofilter's
wordlist.db.
Unfortunately, I can't actually run anything as that user...

So, I can either set the /etc/passwd shell from /bin/false
to /bin/(something)
or change the user to a non-system user...

Thoughts?
_______________________________________________
DBmail mailing list
DBmaildbmail.org
htt
ps://mailman.fastxs.nl/mailman/listinfo/dbmail
spam learning....
user name
2006-11-28 08:55:14

Tom Allison wrote:
> I think I started on the wrong foot.
> 
> I set up a system user for the ownership of
bogofilter's wordlist.db.
> Unfortunately, I can't actually run anything as that
user...
> 
> So, I can either set the /etc/passwd shell from
/bin/false to
> /bin/(something)
> or change the user to a non-system user...
> 
> Thoughts?

Like I said, I use bogofilter in addition to
amavis/spamassassin/dspam
on some of my systems. I use the amavis user to run
bogofilter in the
postfix content filter chain, where the amavis user has
/bin/sh.

>From fetchmailrc (set to run as user 'amavis' in
/etc/init.d/fetchmail)
I call bogofilter as mda to read the Public/Ham and
Public/Spam mailboxes.


-- 
 
____________________________________________________________
____
  Paul Stevens                                      paul at
nfg.nl
  NET FACILITIES GROUP                     GPG/PGP:
1024D/11F8CD31
  The Netherlands________________________________http://www.nfg.nl
_______________________________________________
DBmail mailing list
DBmaildbmail.org
htt
ps://mailman.fastxs.nl/mailman/listinfo/dbmail
[1-9]

about | contact  Other archives ( Real Estate discussion Medical topics )