Walter:
Thanks for the feedback.
On 2/19/07, Walter Underwood <wunderwood netflix.com> wrote:
> Lucene/Solr does this automatically. That is how a
tf.idf
> engine works, it boosts rare words.
>
> Do you have examples of problems or are you worrying
about
> something that might happen?
Actually my use case is the following: Lets say
hypothetically you
have a field with 100 "sentence long title". If
you read those title
you can pretty much group them into 5 subject matter. A
hypothetical
example is.. (Total number of title is 125, 25 of them can
not be
grouped)
22 title is about = How good is Person X
14 title is about = How bad is Product Y
10 title is about = bond weather
36 title is about = How cool is the movie Z
18 title is about = The next big MS virus.
What I am trying to achive is
I would like to weed out "bond weather" as a group
cos it is not
interesting in my use case .. Lets say it is noise not
signal. So I
thought I could use some "common words" ..
Furthermore I was thinking
having common words .. I could boost certain field i.e. if
the Person
X is a known person example a "Prime minister" or
" a "movie star"
having certain word attached to another known word meaning
its
important. Maybe I defined my problem wrongly.. I hope
above gives
you an overview..
Regards
|