List Info

Thread: Re: Wildcard vs Term query




Re: Wildcard vs Term query
country flaguser name
United Kingdom
2007-09-26 04:21:53
Are you using the out of the box Lucene QueryParser?  It
will automatically detect wildcard queries by the presence
of * or ? chars.
If the user input does not contain these characters a plain
TermQuery is used.

BooleanQuery.setMaxClauseCount can be used to control the
upper limit on terms produced by Wildcard/Fuzzy Queries.
If this limit is exceeded (e.g when searching for something
like "a*" ) then an exception is thrown.

Cheers
Mark
----- Original Message ----
From: John Byrne <john.byrnepropylon.com>
To: java-userlucene.apache.org
Sent: Wednesday, 26 September, 2007 9:48:17 AM
Subject: Wildcard vs Term query

Hi,

I'm working my way through the Lucene In Action book, and
there is one 
thing I need explained that I didn't find there;

While wildcard queries are potentially slower than ordinary
term 
queries, are they slower even if theyt don't contain a
wildcard? 
Significantly slower?

The reason I ask is that if we assume we are going to allow
wildcards in 
a search engine, but we want to optimize, to take advantage
of  when 
they are NOT used, do we have to check for the presence of
"*" or "?" in 
the term, and create the most appropriate query, or can I
assume that 
when a wildcard is not present, the WildcardQuery will be as
fast (or 
almost as fast) a a plain term query?

Thanks in advance!
John B.

------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org






     
___________________________________________________________
Yahoo! Answers - Got a question? Someone out there knows the
answer. Try it
now.
http://uk.answers.yahoo.
com/

------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org


Re: Wildcard vs Term query
country flaguser name
Ireland
2007-09-26 04:45:16
I'm not using the QueryParser at all. I need to do a little
more with 
the terms, so i'm explicitly creating a Query from a single
term. What I 
was hoping was to avoid something like this:
...
if(term.contains("*") ||
terms.contains("?")   {
    return new WildcardQuery(...
}
else   {
return new TermQuery(...
...

and instead just go like this:
...
return new WilcardQuery(...
...
on the basis that the WildacardQuery would only be slower if
it does 
contain a wildcard character. But as you pointed out, the
QueryParser 
makes this optimization, so I suppose I should too.

mark harwood wrote:
> Are you using the out of the box Lucene QueryParser? 
It will automatically detect wildcard queries by the
presence of * or ? chars.
> If the user input does not contain these characters a
plain TermQuery is used.
>
> BooleanQuery.setMaxClauseCount can be used to control
the upper limit on terms produced by Wildcard/Fuzzy
Queries.
> If this limit is exceeded (e.g when searching for
something like "a*" ) then an exception is
thrown.
>
> Cheers
> Mark
> ----- Original Message ----
> From: John Byrne <john.byrnepropylon.com>
> To: java-userlucene.apache.org
> Sent: Wednesday, 26 September, 2007 9:48:17 AM
> Subject: Wildcard vs Term query
>
> Hi,
>
> I'm working my way through the Lucene In Action book,
and there is one 
> thing I need explained that I didn't find there;
>
> While wildcard queries are potentially slower than
ordinary term 
> queries, are they slower even if theyt don't contain a
wildcard? 
> Significantly slower?
>
> The reason I ask is that if we assume we are going to
allow wildcards in 
> a search engine, but we want to optimize, to take
advantage of  when 
> they are NOT used, do we have to check for the presence
of "*" or "?" in 
> the term, and create the most appropriate query, or can
I assume that 
> when a wildcard is not present, the WildcardQuery will
be as fast (or 
> almost as fast) a a plain term query?
>
> Thanks in advance!
> John B.
>
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
> For additional commands, e-mail: java-user-helplucene.apache.org
>
>
>
>
>
>
>      
___________________________________________________________
> Yahoo! Answers - Got a question? Someone out there
knows the answer. Try it
> now.
> http://uk.answers.yahoo.
com/ 
>
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
> For additional commands, e-mail: java-user-helplucene.apache.org
>
>
>
>   


------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org


Re: Wildcard vs Term query
country flaguser name
United States
2007-09-26 05:02:25
WildcardQuery won't be slower than TermQuery if there are no
wildcard  
characters.  Beyond what QueryParser does, WildcardQuery
itself  
reverts to a TermQuery:

   public Query rewrite(IndexReader reader) throws
IOException {
       if (this.termContainsWildcard) {
           return super.rewrite(reader);
       }

       return new TermQuery(getTerm());
   }

I personally would optimize which query gets created, but
performance- 
wise you won't pay a penalty for just using WildcardQuery.

	Erik


On Sep 26, 2007, at 5:45 AM, John Byrne wrote:

> I'm not using the QueryParser at all. I need to do a
little more  
> with the terms, so i'm explicitly creating a Query from
a single  
> term. What I was hoping was to avoid something like
this:
> ...
> if(term.contains("*") ||
terms.contains("?")   {
>    return new WildcardQuery(...
> }
> else   {
> return new TermQuery(...
> ...
>
> and instead just go like this:
> ...
> return new WilcardQuery(...
> ...
> on the basis that the WildacardQuery would only be
slower if it  
> does contain a wildcard character. But as you pointed
out, the  
> QueryParser makes this optimization, so I suppose I
should too.
>
> mark harwood wrote:
>> Are you using the out of the box Lucene
QueryParser?  It will  
>> automatically detect wildcard queries by the
presence of * or ?  
>> chars.
>> If the user input does not contain these characters
a plain  
>> TermQuery is used.
>>
>> BooleanQuery.setMaxClauseCount can be used to
control the upper  
>> limit on terms produced by Wildcard/Fuzzy Queries.
>> If this limit is exceeded (e.g when searching for
something like  
>> "a*" ) then an exception is thrown.
>>
>> Cheers
>> Mark
>> ----- Original Message ----
>> From: John Byrne <john.byrnepropylon.com>
>> To: java-userlucene.apache.org
>> Sent: Wednesday, 26 September, 2007 9:48:17 AM
>> Subject: Wildcard vs Term query
>>
>> Hi,
>>
>> I'm working my way through the Lucene In Action
book, and there is  
>> one thing I need explained that I didn't find
there;
>>
>> While wildcard queries are potentially slower than
ordinary term  
>> queries, are they slower even if theyt don't
contain a wildcard?  
>> Significantly slower?
>>
>> The reason I ask is that if we assume we are going
to allow  
>> wildcards in a search engine, but we want to
optimize, to take  
>> advantage of  when they are NOT used, do we have to
check for the  
>> presence of "*" or "?" in the
term, and create the most  
>> appropriate query, or can I assume that when a
wildcard is not  
>> present, the WildcardQuery will be as fast (or
almost as fast) a a  
>> plain term query?
>>
>> Thanks in advance!
>> John B.
>>
>>
------------------------------------------------------------
---------
>> To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
>> For additional commands, e-mail: java-user-helplucene.apache.org
>>
>>
>>
>>
>>
>>
>>      
___________________________________________________________
>> Yahoo! Answers - Got a question? Someone out there
knows the  
>> answer. Try it
>> now.
>> http://uk.answers.yahoo.
com/
>>
------------------------------------------------------------
---------
>> To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
>> For additional commands, e-mail: java-user-helplucene.apache.org
>>
>>
>>
>>
>
>
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
> For additional commands, e-mail: java-user-helplucene.apache.org


------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org


[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )