List Info

Thread: nutch-user@lucene.apache.org




nutch-user@lucene.apache.org
user name
2007-11-07 08:36:20
Hi Milan,

We have developed a Nutch plugin which could  be used for
that and uses our
text classification library. The plugin consists in a Nutch
Indexer which
creates a special field for the documents and a searcher
which allows you to
switch the filter on.
We have used it for classifying spam on forums but I am sure
that this
should work on porn just as well. You can find more details
on our Text
Classification API on http:/
/www.digitalpebble.com/solutionsTC.html. The
Nutch plugin is just a wrapper for that library.

Best,

Julien

-- 
http://www.digitalpebble
.com
Open Source Solutions for Text Engineering

-------- Original Message -------- Subject: SaveSearch or
Adult
FilterDate: Wed,
07 Nov 2007 14:24:37 +0000From: Milan Krendzelak
<mkrendzelakmtld.mobi>Reply-To:
nutch-userlucene.apache.orgTo: nutch-userlucene.apache.org

Hi,

does somebody have any idea how to implement save search in
Nutch.

I think will be cool to use Bayesian technique to classify
the web site
as adult (porno) and store flag in index. Of cause some
other technique
could be used as: regex, black list etc etc...

Cheers,
Milan Krendzelak
Senior Software Developer
[1]

about | contact  Other archives ( Real Estate discussion Medical topics )