List Info

Thread: questions about import email for training




questions about import email for training
user name
2006-06-08 06:00:11
Hello,

I am a student and plan to attend TREC(Text Retrieval
Conference) this year. I want to enhance a spam filtering
system by providing some extra function. I choose SpamBayes
as filter and try to add some other function for my
experiment.

Now I want to import the training data to SpamBayes for my
experiment. My training email corpora are downloaded from
TREC website http://
plg.uwaterloo.ca/~gvcormac/treccorpus/

I keep the file in my E:\ drive. I also download
Python-2.3.5.exe, Win32all-163.exe and SpamBayes1.0.4.exe.

I looked at your company's website and find there's
information about how to import by using command line.(4.4
in frequent asked questions).I input command(python
sb_mboxtrain.py -g ~/tmp/newham -s ~/tmp/newspam ) into the
Windows comand prompt,it doesn't work. Later I key this
command into Python2.3.5 command line,still can't work.

Could you please tell me how can I import email into the
SpamBayes? Could you please be detail your steps since I am
not quite familar with command line environment.

Thanks lot! Hope to hear you soon!

Lin Chen
_______________________________________________
SpamBayespython.org
htt
p://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.
net/faq.html
questions about import email for training
user name
2006-06-08 06:36:16
> I looked at your company's website

LOL... afaik spambayes isn't a company, but a volunteer
project. Just like
most great software 

> and find there's information about how to import by
using command
> line.(4.4 in frequent asked questions).I input
command(python
> sb_mboxtrain.py -g ~/tmp/newham -s ~/tmp/newspam ) into
the Windows
> comand prompt,it doesn't work. Later I key this
command into Python2.3.5
> command line,still can't work.

I'm only a regular user, not one of the developers, but
I'll try to help
as good as I can.
I use Linux, not "Windows", and I'm quite
familiar with the command line.
Don't worry, concerning spambayes or python, the command
line is identical
for "Windows" and Linux.

The syntax you used seems correct for what I would use on
Linux. However I
notice you use ~ which on Linux is a subsitution for the
current user's
/home directory. This is a different system from
"Windows". I don't know
if you can use the ~ in "Windows". Don't blame
spambayes, don't blame
"Windows". It's just different, just like
driving on the left in the UK.
You should try again with the absolute path names of the
mailbox files.
Something like "python sb_mboxtrain.py -g
E:\directory\where\you\copied\the\mbox\files\tmp\ne
wham" (and similar for
the spam). Inside the python command line you obviously
don't have to use
"python sb_mboxtrain.py" but just
"sb_mboxtrain.py"

Also, it would be very, *very* helpfull if you would give
any error or
status messages you get. We don't have a crystal ball so we
cannot see
what is happening on your screen. Don't be afraid to be
verbose.

Good luck!

Amedee Van Gasse

-- 
Disclaimer:
By sending an email to ANY of my addresses you are agreeing
that:

   1. I am by definition, "the intended
recipient"
   2. All information in the email is mine to do with as I
see fit and
make such financial profit, political mileage, or good joke
as it lends
itself to. In particular, I may quote it on usenet.
   3. I may take the contents as representing the views of
your company.
   4. This overrides any disclaimer or statement of
confidentiality that
may be included on your message.

_______________________________________________
SpamBayespython.org
htt
p://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.
net/faq.html
questions about import email for training
user name
2006-06-08 09:59:25
> I also download Python-2.3.5.exe,

Python 2.4 has a much better email package; it would be well
worth  
using that.

> Win32all-163.exe

This is *extremely* out-of-date.  You don't actually need
it for any  
command-line work with SpamBayes (although whether it is
installed  
does effect the default database location), but if you use
it for  
something else, you should get an up-to-date version.

> I input command(python sb_mboxtrain.py -g ~/tmp/newham
-s ~/tmp/ 
> newspam ) into the Windows comand prompt,it doesn't
work. Later I  
> key this command into Python2.3.5 command line,still
can't work.

As Amedee said, it would really help to know what error
message you  
got, and you should check that ~/tmp/newham is actually a
valid path  
to a ham mailbox (possible if you are using Cygwin,
extremely  
unlikely otherwise).

> Could you please tell me how can I import email into
the SpamBayes?

Train as ham: sb_filter.py -g < path\to\ham\message

Train as spam: sb_filter.py -s < path\to\spam\message

Are probably the best commands, if you are using messages
like the  
ones in the 2005 TREC corpus.  This will use the default
values for  
your database.

=Tony.Meyer

-- 
Please always include the list (spambayes at python.org) in
your replies
(reply-all), and please don't send me personal mail about
SpamBayes.
http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this.


_______________________________________________
SpamBayespython.org
htt
p://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.
net/faq.html
[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )