I also said, "Stopword removal is a reasonable default
because it works
fairly well for a general text corpus." Ultraseek keeps
stopwords but
most engines don't. I think it is fine as a default. I also
think you
have to understand stopwords at some point.
wunder
On 11/5/07 9:59 PM, "Chris Hostetter"
<hossman_lucene fucit.org> wrote:
>
> : This isn't a problem in Lucene or Solr. It is a
result of the analyzers
> : you have chosen to use. If you choose to remove
stopwords, you will not
> : be able to match stopwords.
>
> I believe paul's point was that this use of stopwords
is in the "text"
> fieldtype in the example schema.xml ... which many
people use as is.
>
> I'm personally of the mindset that it's fine like it
is. While people who
> understand that "an" is a stop word might ask
"why does 'rating:PG AND
> name:an' match 40K movies, it should match 0?"
there is another (probably
> larger) group of people who won't know how the search
is implemented, or
> that "an" is a stop word, and they will look
at the same results and ask
> "why am i getting 40K results? most of these don't
have 'an' in the title?
> i should only be getting X results."
>
> That second group of people aren't going to be any
happier if you
> give them 0 results instead -- at least this way people
get some results
> to work with.
>
> -Hoss
|