List Info

Thread: FuzzyQuery using termDocs() to reduce count of Boolean Queries




FuzzyQuery using termDocs() to reduce count of Boolean Queries
user name
2007-11-07 03:51:32
Hi!

I asked this one already on the user mailing list but maybe
it's more 
appropriate here:

As a simple example imagine every document in your index to
have a 
field "language" and "country". A tuple
of language+country is what I call a 
context.

You want to search context-specific, i.e. language+country
is always part of 
the query (QueryFilter).

FuzzyTermEnum doesn't know about these contexts hence
building a BooleanQuery
of all similar terms. E.g. "hello" means
"hallo" in german - only one 
character difference. But when searching in context
english+USA I don't care 
about german terms. So I don't want/need "hallo"
in the BooleanQuery in this 
case.

So I came up with the idea to use reader.termDocs() instead
of terms() in 
FuzzyTermEnum. By means of a QueryFilter (it's BitSet
respectively) for each 
context I could determine whether a fuzzy term makes sense
to be included in 
the BooleanQuery or not.

This results (potentially) in a smaller BooleanQuery but I
wonder whether this 
approach will gain any mentionable performance advantage
(maybe reduce IO?).

Thanks for feedback
Timo

------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribelucene.apache.org
For additional commands, e-mail: java-dev-helplucene.apache.org


[1]

about | contact  Other archives ( Real Estate discussion Medical topics )