Hi!
I asked this one already on the user mailing list but maybe
it's more
appropriate here:
As a simple example imagine every document in your index to
have a
field "language" and "country". A tuple
of language+country is what I call a
context.
You want to search context-specific, i.e. language+country
is always part of
the query (QueryFilter).
FuzzyTermEnum doesn't know about these contexts hence
building a BooleanQuery
of all similar terms. E.g. "hello" means
"hallo" in german - only one
character difference. But when searching in context
english+USA I don't care
about german terms. So I don't want/need "hallo"
in the BooleanQuery in this
case.
So I came up with the idea to use reader.termDocs() instead
of terms() in
FuzzyTermEnum. By means of a QueryFilter (it's BitSet
respectively) for each
context I could determine whether a fuzzy term makes sense
to be included in
the BooleanQuery or not.
This results (potentially) in a smaller BooleanQuery but I
wonder whether this
approach will gain any mentionable performance advantage
(maybe reduce IO?).
Thanks for feedback
Timo
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribe lucene.apache.org
For additional commands, e-mail: java-dev-help lucene.apache.org
|