List Info

Thread: StandardAnalyzer question ...




StandardAnalyzer question ...
user name
2006-02-20 16:05:17
Hi,

When StandardAnalyzer is used to index documents, arent the
terms, 
amongst other things, lower cased and stored that ways in
the index?

I have a index field that I index like this:

....
ramWriter = new IndexWriter(ramDir, standardAnalyzer, true);
....
...
...
doc.add(Field.Text("categoryNames",
categoryNames));
...
...

(I periodically write contents from the ram directory to the
file system 
directory.)

When I search this field via luke using the standard
analyzer I find 
words like this:
....
Digital Cameras
Digital Camera Batteries
....

Shouldn't the words indexed look like:

....
digital cameras
digital camera batteries
....

If I understand this right, when using standard analyzer,
shouldn't the 
terms be indexed  in lower case?

Thanks,


------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org

StandardAnalyzer question ...
user name
2006-02-20 16:21:00
Hello,

Not yet an expert in the field, but as I've understood the
thing the
terms are indexed as you specify them (through the filters)
but the
contents are stored depending on whether you want it or not
(Filed.UnStored(), which happens to be on its way to get
deprecated).

So maybe you search the lower cased but indeed get the cased
as the
result in this very CASE.

/oskar 

On Mon, 2006-02-20 at 09:05 -0700, Mufaddal Khumri wrote:
> Hi,
> 
> When StandardAnalyzer is used to index documents, arent
the terms, 
> amongst other things, lower cased and stored that ways
in the index?
> 
> I have a index field that I index like this:
> 
> ....
> ramWriter = new IndexWriter(ramDir, standardAnalyzer,
true);
> ....
> ...
> ...
> doc.add(Field.Text("categoryNames",
categoryNames));
> ...
> ...
> 
> (I periodically write contents from the ram directory
to the file system 
> directory.)
> 
> When I search this field via luke using the standard
analyzer I find 
> words like this:
> ....
> Digital Cameras
> Digital Camera Batteries
> ....
> 
> Shouldn't the words indexed look like:
> 
> ....
> digital cameras
> digital camera batteries
> ....
> 
> If I understand this right, when using standard
analyzer, shouldn't the 
> terms be indexed  in lower case?
> 
> Thanks,
> 
> 
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
> For additional commands, e-mail: java-user-helplucene.apache.org


------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org

exact match ..
user name
2006-02-20 17:17:39
lets say i do this while indexing:

doc.add(Field.Text("categoryNames",
categoryNames));

Now while searching categoryNames, I do a search for
"digital cameras". 
I only want to match the exact phrase digital cameras with
documents who 
have exactly the phrase "digital cameras" in the
categoryNames field. I 
do not want results that have "digital camera
batteries" part of the 
result.

Whats the best way to accomplish this?

thanks.

------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org

[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )