List Info

Thread: search performance degrades by order of magnitude when using SortField.




search performance degrades by order of magnitude when using SortField.
user name
2006-05-29 04:57:45
Hi Lucene experts,

I'm having a problem with Sort performance during searches.
 I'm using
Lucene 1.9.1.

I need to Sort by a date field in the document.  When I use
the
default Sort.RELEVANCE, query response time is ~6ms. 
However, when I
specify a sort, e.g. Searcher.search( query, new Sort(
"mydatefield" )
 ), the query response time gets multiplied by a factor of
10 or 20.
Also, CPU usage shoots up to nearly 90%.   Is this expected
behavior?
I thought the default sort and sort by field should perform
roughly
the same when the values are cached in memory, since they
both have to
do a top-K ranking over the same number of raw hits.   The
performance
gets disproportionately worse as I increase the number of
parallel
threads that query the same Searcher object.

Also, in my previous experience with sorting by a field in
Lucene, I
seem to remember there being a preload time when you first
search with
a sort by field, sometimes taking 30 seconds or so to load
all of the
field's values into the in-memory cache associated with the
Searcher
object.  This initial preload time doesn't seem to be
happening in my
case -- does that mean that for some reason Lucene is not
caching the
field values?

I have an index of 1 million documents, taking up about 1.7G
of
diskspace.  I specify -Xmx2000m when running my java search
application.

Any advice or insight would be much appreciated.

Thanks,
~Heng

------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org

search performance degrades by order of magnitude when using SortField.
user name
2006-05-29 21:56:25
: default Sort.RELEVANCE, query response time is ~6ms. 
However, when I
: specify a sort, e.g. Searcher.search( query, new Sort(
"mydatefield" )
:  ), the query response time gets multiplied by a factor of
10 or 20.
	...
: do a top-K ranking over the same number of raw hits.   The
performance
: gets disproportionately worse as I increase the number of
parallel
: threads that query the same Searcher object.

How many sequential queries are you running against the same
Searcher
instance? ... the performance drop you are seeing may be a
result of each
of those threads trying to build the same FieldCache on your
sort field in
parrallel.

being 10x or 20x slower sounds like a lot .. but 10x 6ms is
still only
60ms 
.. have you timed how long it takes just to build a
FieldCache on
that field?

: Also, in my previous experience with sorting by a field in
Lucene, I
: seem to remember there being a preload time when you first
search with
: a sort by field, sometimes taking 30 seconds or so to load
all of the
: field's values into the in-memory cache associated with
the Searcher
: object.  This initial preload time doesn't seem to be
happening in my
: case -- does that mean that for some reason Lucene is not
caching the
: field values?

that's the FieldCache initialization i was refering to --
it's based on
reusing the same instenad of IndexReader (or IndexSearcher),
as long as
you are using the same instance over and over you'll reuse
the
FieldCache and only pay that cost once (or maybe N times if
you have N
parrallel query threads and they all try to hit the
FieldCache
immediately).

30 seconds sounds extremely long though ... you may be
remembering
incorrectly how significant the penalty was.

: I have an index of 1 million documents, taking up about
1.7G of
: diskspace.  I specify -Xmx2000m when running my java
search
: application.

the big issue when sorting on a field is what type of data
is in that
field: is it a int? a long? a String? .. if it is a String
how often does
the same String value appear for multiple documents? ..
these all affect
how much RAM the FieldCache takes up.  you mentioned sorting
by date, did
you store the date as a String? in what format? with what
precision?




-Hoss


------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )