List Info

Thread: indexing and searching the document title question




indexing and searching the document title question
country flaguser name
United States
2007-02-27 11:59:35
Hi,
According to the FAQ, by indexing the title of the document
and performing a search against the shorter field will
automatically give it a higher weight than matches against
the document content.  That is what I am trying to
accomplish with a "NAME" field.  If someone enters
a close match of the name of a document (example Names:
"Color Me Mine" ,"Pittsburgh and Its
Countryside"), I want that document to get a hit.  The
search is user entered, so I want it to be case-insensitive.
 I also don't want it to have to be an exact match.  Search
terms such as "Pittsburgh Countryside" should
match up against a name of "Pittsburgh and Its
Countryside".


Here I am adding the name field to my document:
String value= "Color Me Mine";
document.add(new Field("NAME", value,
Field.Store.YES,
				Field.Index.TOKENIZED));

Performing a search:
NAME:color me mine ->returns no results
NAME:color -> returns the document

I tried indexing the document without the value tokenized:
document.add(new Field("NAME", value,
Field.Store.YES,
				Field.Index.UN_TOKENIZED));

This caused the search to be case sensitive.

I am about to modify my indexing/searching code to use a
secondary field, "name_lowercase", this field
would of course contain the name of the object in lowercase
and I would lowercase my search terms in I construct my
TermQuery for this field.  

Is this a valid approach, or am I missing something?

Thanks!  




------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org


Re: indexing and searching the document title question
user name
2007-02-27 12:13:45
You've probably got it right. But I'd add a couple of
things....

1> by using the correct analyzer at index and query time,
the
casing will be taken care of for you.

2> you don't want UN_TOKENIZED for fields you search on
in general because there's no parsing. So if you indexed
"This is a String", searching on "This"
or "this" wouldn't match.

3> In your code fragment, you didn't show what Analyzer
you
use. This is way more important than you think.

4> get a copy of Luke (google lucene luke). It'll let you
examine
your index and save you a world of hurt. There have been
some
very nice improvements lately along with 2.1 compatability.

5> If you want searches and indexing to use different
analyzers
on different fields, see PerFieldAnalyzerWrapper.

6> You'll probably find yourself storing the same data
multiple
times, once for searching and once for displaying. So you'll
search
on the lowercased, indexed field and display the
UN_TOKENIZED
version since it'll retain the capitalization.

7> I think your underlying problem is that the syntax of
the search
isn't correct. You're really searching on
NAME:color
defaultfield:me
defaultfield:mine

You want something like +NAME:color +NAME:me +NAME:mine

Best
Erick

On 2/27/07, Phillip Rhodes <spamsucksrhoderunner.com> wrote:
>
> Hi,
> According to the FAQ, by indexing the title of the
document and performing
> a search against the shorter field will automatically
give it a higher
> weight than matches against the document content.  That
is what I am trying
> to accomplish with a "NAME" field.  If
someone enters a close match of the
> name of a document (example Names: "Color Me
Mine" ,"Pittsburgh and Its
> Countryside"), I want that document to get a hit. 
The search is user
> entered, so I want it to be case-insensitive.  I also
don't want it to have
> to be an exact match.  Search terms such as
"Pittsburgh Countryside" should
> match up against a name of "Pittsburgh and Its
Countryside".
>
>
> Here I am adding the name field to my document:
> String value= "Color Me Mine";
> document.add(new Field("NAME", value,
Field.Store.YES,
>                                
Field.Index.TOKENIZED));
>
> Performing a search:
> NAME:color me mine ->returns no results
> NAME:color -> returns the document
>
> I tried indexing the document without the value
tokenized:
> document.add(new Field("NAME", value,
Field.Store.YES,
>                                
Field.Index.UN_TOKENIZED));
>
> This caused the search to be case sensitive.
>
> I am about to modify my indexing/searching code to use
a secondary field,
> "name_lowercase", this field would of course
contain the name of the object
> in lowercase and I would lowercase my search terms in I
construct my
> TermQuery for this field.
>
> Is this a valid approach, or am I missing something?
>
> Thanks!
>
>
>
>
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
> For additional commands, e-mail: java-user-helplucene.apache.org
>
>
[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )