List Info

Thread: Slow response




Slow response
country flaguser name
United States
2007-09-06 16:04:13
I am pretty new to Solr and this is my first post to this
list so please
forgive me if I make any glaring errors. 

 

Here's my problem. When I do a search using the Solr admin
interface for
a term that I know does not exist in my index the QTime is
about 1ms.
However, if I add facets to the search the response takes
more than 20
seconds (and sometimes longer) to return. Here is the slow
URL - 

 

/select?qf=AUTHOR_t+SUBJECT_t+TITLE_t&wt=xml&f.AUTHO
R_facet.facet.sort=t
rue&f.FORMAT_t.facet.limit=25&start=0&facet=true
&facet.mincount=1&q=frak
&f.FORMAT_t.facet.mincount=1&f.ITYPE_facet.facet.min
count=1&f.SUBJECT_fa
cet.facet.limit=25&facet.field=AUTHOR_facet&facet.fi
eld=FORMAT_t&facet.f
ield=LANGUAGE_t&facet.field=PUBDATE_t&facet.field=SU
BJECT_facet&facet.fi
eld=AGENCY_facet&facet.field=ITYPE_facet&f.AGENCY_fa
cet.facet.sort=true&
f.AGENCY_facet.facet.limit=-1&rows=10&f.ITYPE_facet.
facet.limit=-1&f.ITY
PE_facet.facet.sort=true&f.AUTHOR_facet.facet.limit=25&a
mp;f.LANGUAGE_t.face
t.sort=true&f.PUBDATE_t.facet.limit=-1&f.AGENCY_face
t.facet.mincount=1&f
.AUTHOR_facet.facet.mincount=1&fl=*&fl=score&qt=
dismax&version=2.2&f.SUB
JECT_facet.facet.sort=true&f.SUBJECT_facet.facet.mincoun
t=1&f.PUBDATE_t.
facet.sort=false&f.FORMAT_t.facet.sort=true&f.LANGUA
GE_t.facet.limit=25&
f.LANGUAGE_t.facet.mincount=1&f.PUBDATE_t.facet.mincount
=1

 

I am pretty sure I can't be the first to ask this question
but I can't
seem to find anything online with the answer. Thanks for
your help.

 

Aaron

Re: Slow response
user name
2007-09-06 16:16:43
On 9/6/07, Aaron Hammond <aaron.hammondsirsidynix.com> wrote:
> I am pretty new to Solr and this is my first post to
this list so please
> forgive me if I make any glaring errors.
>
> Here's my problem. When I do a search using the Solr
admin interface for
> a term that I know does not exist in my index the QTime
is about 1ms.
> However, if I add facets to the search the response
takes more than 20
> seconds (and sometimes longer) to return. Here is the
slow URL -

Faceting on multi-value fields is more a function of the
number of
terms in the field (and their distribution) rather than the
number of
hits for a query.  That said, perhaps faceting should be
able to bail
out if there are no hits.

Is your question more about why faceting takes so long in
general, or
why it takes so long if there are no results?  If you
haven't, try
optimizing your index for facet faceting in general.  How
many docs do
you have in your index?

As a side note, the way multi-valued faceting currently
works, it's
actually normally faster if the query returns a large number
of hits.

-Yonik

RE: Slow response
country flaguser name
United States
2007-09-06 17:16:15
Thank-you for your response, this does shed some light on
the subject.
Our basic question was why were we seeing slower responses
the smaller
our result set got. 

Currently we are searching about 1.2 million documents with
the source
document about 2KB, but we do duplicate some of the data. I
bumped up my
filterCache to 5 million and the 2nd search I did for an
non-indexed
term came back in 2.1 seconds so that is much improved. I am
a little
concerned about having this value so high but this is our
problem and we
will play with it. 

I do have a few follow-up questions. First, in regards to
the
filterCache once a single search has been done and facets
requested, as
long as new facets aren't requested and the size is large
enough then
the filters will remain in the cache, correct?

Also, you mention that faceting is more a "function of
the number of the
number of terms in the field". The 2 fields causing our
problems are
Authors and Subjects. If we divided up the data that made
these facets
into more specific fields (Primary author, secondary author,
etc.) would
this perform better? So the number of facet fields would
increase but
the unique terms for a given facet should be less.

Thanks again for all your help.

Aaron


-----Original Message-----
From: yseeleygmail.com [mailto:yseeleygmail.com] On Behalf Of Yonik
Seeley
Sent: Thursday, September 06, 2007 4:17 PM
To: solr-userlucene.apache.org
Subject: Re: Slow response

On 9/6/07, Aaron Hammond <aaron.hammondsirsidynix.com> wrote:
> I am pretty new to Solr and this is my first post to
this list so
please
> forgive me if I make any glaring errors.
>
> Here's my problem. When I do a search using the Solr
admin interface
for
> a term that I know does not exist in my index the QTime
is about 1ms.
> However, if I add facets to the search the response
takes more than 20
> seconds (and sometimes longer) to return. Here is the
slow URL -

Faceting on multi-value fields is more a function of the
number of
terms in the field (and their distribution) rather than the
number of
hits for a query.  That said, perhaps faceting should be
able to bail
out if there are no hits.

Is your question more about why faceting takes so long in
general, or
why it takes so long if there are no results?  If you
haven't, try
optimizing your index for facet faceting in general.  How
many docs do
you have in your index?

As a side note, the way multi-valued faceting currently
works, it's
actually normally faster if the query returns a large number
of hits.

-Yonik

Re: Slow response
country flaguser name
Canada
2007-09-06 17:25:08
On 6-Sep-07, at 3:16 PM, Aaron Hammond wrote:

> Thank-you for your response, this does shed some light
on the subject.
> Our basic question was why were we seeing slower
responses the smaller
> our result set got.
>
> Currently we are searching about 1.2 million documents
with the source
> document about 2KB, but we do duplicate some of the
data. I bumped  
> up my
> filterCache to 5 million and the 2nd search I did for
an non-indexed
> term came back in 2.1 seconds so that is much improved.
I am a little
> concerned about having this value so high but this is
our problem  
> and we
> will play with it.
>
> I do have a few follow-up questions. First, in regards
to the
> filterCache once a single search has been done and
facets  
> requested, as
> long as new facets aren't requested and the size is
large enough then
> the filters will remain in the cache, correct?
>
> Also, you mention that faceting is more a
"function of the number  
> of the
> number of terms in the field". The 2 fields
causing our problems are
> Authors and Subjects. If we divided up the data that
made these facets
> into more specific fields (Primary author, secondary
author, etc.)  
> would
> this perform better? So the number of facet fields
would increase but
> the unique terms for a given facet should be less.

There are essentially two facet computation strategies:

1. cached bitsets: a bitset for each term is generated and 

intersected with the query restul bitset.  This is more
general and  
performs well up to a few thousand terms.

2. field enumeration: cache the field contents, and generate
counts  
using this data.  Relatively independent of #unique terms,
but  
requires at most a single facet value per field per
document.

So, if you factor author into Primary author/Secondary
author, where  
each is guaranteed to only have one value per doc, this
could greatly  
accelerate your faceting.  There are probably fewer unique
subjects,  
so strategy 1 is likely fine.

To use strategy 2, just make sure that
multivalued="false" is set for  
those fields in schema.xml

-Mike

Re: Slow response
country flaguser name
Canada
2007-09-06 17:27:26
On 6-Sep-07, at 3:25 PM, Mike Klaas wrote:

>
> There are essentially two facet computation
strategies:
>
> 1. cached bitsets: a bitset for each term is generated
and  
> intersected with the query restul bitset.  This is more
general and  
> performs well up to a few thousand terms.
>
> 2. field enumeration: cache the field contents, and
generate counts  
> using this data.  Relatively independent of #unique
terms, but  
> requires at most a single facet value per field per
document.
>
> So, if you factor author into Primary author/Secondary
author,  
> where each is guaranteed to only have one value per
doc, this could  
> greatly accelerate your faceting.  There are probably
fewer unique  
> subjects, so strategy 1 is likely fine.
>
> To use strategy 2, just make sure that
multivalued="false" is set  
> for those fields in schema.xml

I forgot to mention that strategy 2 also requires a single
token for  
each doc (see http://wiki.apache.org/s
olr/ 
FAQ#head-14f9f2d84fb2cd1ff389f97f19acdb6ca55e4cd3)

-Mike

Re: Slow response
country flaguser name
Canada
2007-09-14 18:05:18
On 14-Sep-07, at 3:38 PM, Tom Hill wrote:

> Hi Mike,
>
> Thanks for clarifying what has been a bit of a black
box to me.
>
> A couple of questions, to increase my understanding, if
you don't  
> mind.
>
> If I am only using fields with
multiValued="false", with a type of  
> "string"
> or "integer"  (untokenized), does solr
automatically use approach  
> 2? Or is
> this something I have to actively configure?

It'll happen automatically.

> And is approach 2 better than 1? Or vice versa? Or is
the answer "it
> depends"? 

It depends 

> If, as I suspect, the answer was "it
depends", are there any general
> guidelines on when to use or approach or the other?

Yeah, it usually depends on how many unique facet values
there are,  
how many documents are returned in the query, and how much
memory you  
have.  1 is usually faster when there are few terms; 2 is
usually  
faster when there are many terms.

Things can be further complicated by additional parameters,
like  
facet.enum.cache.minDf (http://wiki.apache.org/s
olr/ 
SimpleFacetParameters#head-3ea6fc5d1056447295c38c9675e35ce06
fd95f97)

-Mike

>
>
>
>
> On 9/6/07, Mike Klaas <mike.klaasgmail.com> wrote:
>>
>>
>> On 6-Sep-07, at 3:25 PM, Mike Klaas wrote:
>>
>>>
>>> There are essentially two facet computation
strategies:
>>>
>>> 1. cached bitsets: a bitset for each term is
generated and
>>> intersected with the query restul bitset.  This
is more general and
>>> performs well up to a few thousand terms.
>>>
>>> 2. field enumeration: cache the field contents,
and generate counts
>>> using this data.  Relatively independent of
#unique terms, but
>>> requires at most a single facet value per field
per document.
>>>
>>> So, if you factor author into Primary
author/Secondary author,
>>> where each is guaranteed to only have one value
per doc, this could
>>> greatly accelerate your faceting.  There are
probably fewer unique
>>> subjects, so strategy 1 is likely fine.
>>>
>>> To use strategy 2, just make sure that
multivalued="false" is set
>>> for those fields in schema.xml
>>
>> I forgot to mention that strategy 2 also requires a
single token for
>> each doc (see http://wiki.apache.org/s
olr/
>> FAQ#head-14f9f2d84fb2cd1ff389f97f19acdb6ca55e4cd3)
>>
>> -Mike
>>


[1-6]

about | contact  Other archives ( Real Estate discussion Medical topics )