List Info

Thread: Re: Search performance using BooleanQueries in BooleanQueries




Re: Search performance using BooleanQueries in BooleanQueries
country flaguser name
Canada
2007-11-06 16:14:01
On 29-Oct-07, at 9:43 AM, Paul Elschot wrote:

> On Friday 26 October 2007 09:36:58 Ard Schrijvers
wrote:
>> +prop1:a +prop2:b +prop3:c +prop4:d +prop5:e
>>
>> is much faster than
>>
>> (+(+(+(+prop1:a +prop2:b) +prop3:c) +prop4:d)
+prop5:e)
>>
>> where the second one is a result from BooleanQuery
in  
>> BooleanQuery, and
>> all have Occur.MUST.
>>
>
> SImplifying boolean queries like this is not available
in Lucene,  
> but it
> would have a positive effect on search performance,
especially when
> prop1:a and prop2:b have a high document frequency.

Wait--shouldn't the outer-most BooleanQuery provide most of
this  
speedup already (since it should be skipTo'ing between the
nested  
BooleanQueries and the outermost).  Is it the indirection
and sub- 
query management that is causing the performance difference,
or  
differences in skiptTo behaviour?

-Mike

------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org


Re: Search performance using BooleanQueries in BooleanQueries
country flaguser name
Netherlands
2007-11-06 17:02:06
On Tuesday 06 November 2007 23:14:01 Mike Klaas wrote:
> On 29-Oct-07, at 9:43 AM, Paul Elschot wrote:
> > On Friday 26 October 2007 09:36:58 Ard Schrijvers
wrote:
> >> +prop1:a +prop2:b +prop3:c +prop4:d +prop5:e
> >>
> >> is much faster than
> >>
> >> (+(+(+(+prop1:a +prop2:b) +prop3:c) +prop4:d)
+prop5:e)
> >>
> >> where the second one is a result from
BooleanQuery in
> >> BooleanQuery, and
> >> all have Occur.MUST.
> >
> > SImplifying boolean queries like this is not
available in Lucene,
> > but it
> > would have a positive effect on search
performance, especially when
> > prop1:a and prop2:b have a high document
frequency.
>
> Wait--shouldn't the outer-most BooleanQuery provide
most of this
> speedup already (since it should be skipTo'ing between
the nested
> BooleanQueries and the outermost).  Is it the
indirection and sub-
> query management that is causing the performance
difference, or
> differences in skiptTo behaviour?

The usual Lucene answer to performance questions: it
depends.

After every hit, next() needs to be called on a subquery
before
skipTo() can be used to find the next hit. It is currently
not defined which 
subquery will be used for this first next().

The structure of the scorers normally follows the structure
of
the BooleanQueries, so the indirection over the deep
subquery
scores could well  be relevant to performance, too.

Which of these factors actually dominates performance is
hard
to predict in advance. The point of skipTo() is that is
tries to avoid
disk I/O as much as possible for the first time that the
query is
executed. Later executions are much more likely to hit the
OS cache,
and then the indirections will be more relevant to
performance.

I'd like to have a good way to do a performance test on a
first
query execution, in the sense that it does not hit the OS
cache
for its skipTo() executions, but I have not found a good way
yet.

Regards,
Paul Elschot

------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org


[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )