List Info

Thread: MoreLikeThis and stopword stemming




MoreLikeThis and stopword stemming
country flaguser name
United States
2007-10-10 12:52:55
What is the appropriate way of achieving both stopwords and
stemming of 
stopwords when the MoreLikeThis class is used? My analyzer 
(MoreLikeThis.setAnalyzer) uses the Snowball filter, and is
initialized 
with a stopwords set:

analyzer = new StandardAnalyzer(stopwords) {
             public TokenStream tokenStream(String
fieldName, 
java.io.Reader reader) {
             return new SnowballFilter(super
.tokenStream(fieldName,reader),
             "English");
             }
};



If I do NOTsupply a separate stopwords list to the
MoreLikeThis object 
(that is, using MoreLikeThis.setStopWords), will "the
right thing" happen; 
that is, will my input text to the MoreLikeThis object be
stemmed and 
(stemmed) stopwords removed before a query is formed? It
seems that 
MoreLikeThis.setStopWords uses a simple lookup of words in
the stop words 
list (no stemming) which is not what I want.

Thanks in advance
Donna


Donna L. Gresh
Services Research, Mathematical Sciences Department
IBM T.J. Watson Research Center
(914) 945-2472
http:
//www.research.ibm.com/people/g/donnagresh
greshus.ibm.com
[1]

about | contact  Other archives ( Real Estate discussion Medical topics )