|
List Info
Thread: optimizing single document searches
|
|
| optimizing single document searches |
  United States |
2007-02-27 17:25:15 |
I am using Lucene in a little bit weird way, instead of
searching all
the documents for a specific query, I am searching a single
document for
many specific queries.
On a single document of 10k characters, doing about 40k
searches takes
about 5 seconds. This is not bad, but I was wondering if I
can somehow
speed this up. It also takes about 5 seconds to generate
the
searchTerms (which is fine, since I will do it once and
cache it).
I'm not sure what information would be needed, but my
queries look
something like this:
"Brooklyn NY"
I am currently using SpanNearQuery with a slop of 0 and
inOrder of
false. Is there perhaps another type of Query I can use to
speed things
up? TermQuery doesn't work since I have multiple terms, and
PhraseQuery
seems to take around the same time, and is not compatible
with
SpanNearQuery (I later merge this query with another in a
SpanNearQuery).
I can live without merging this into the SpanNearQuery, as
long as I can
find something that can do the 40k searches faster.
Russ
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
For additional commands, e-mail: java-user-help lucene.apache.org
|
|
| Re: optimizing single document searches |
  Sweden |
2007-02-27 17:37:55 |
28 feb 2007 kl. 00.25 skrev Ruslan Sivak:
]
> On a single document of 10k characters, doing about 40k
searches
> takes about 5 seconds. This is not bad, but I was
wondering if I
> can somehow speed this up.
Your corpus contains only one document? Try contrib/memory,
an index
optimized for that scenario.
--
karl
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
For additional commands, e-mail: java-user-help lucene.apache.org
|
|
| Re: optimizing single document searches |

|
2007-02-27 17:49:45 |
Which is very, very cool. I wound up using it for hit
counting and it
works like a charm....
On 2/27/07, karl wettin <karl.wettin gmail.com> wrote:
>
>
> 28 feb 2007 kl. 00.25 skrev Ruslan Sivak:
> ]
>
> > On a single document of 10k characters, doing
about 40k searches
> > takes about 5 seconds. This is not bad, but I was
wondering if I
> > can somehow speed this up.
>
> Your corpus contains only one document? Try
contrib/memory, an index
> optimized for that scenario.
>
> --
> karl
>
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
> For additional commands, e-mail: java-user-help lucene.apache.org
>
>
|
|
| Re: optimizing single document searches |
  Canada |
2007-02-27 17:49:06 |
Thanks, I will try it tommorow... Is it significantly
different from using a standard index on a ramdir?
Russ
Sent wirelessly via BlackBerry from T-Mobile.
-----Original Message-----
From: karl wettin <karl.wettin gmail.com>
Date: Wed, 28 Feb 2007 00:37:55
To:java-user lucene.apache.org
Subject: Re: optimizing single document searches
28 feb 2007 kl. 00.25 skrev Ruslan Sivak:
]
> On a single document of 10k characters, doing about 40k
searches
> takes about 5 seconds. This is not bad, but I was
wondering if I
> can somehow speed this up.
Your corpus contains only one document? Try contrib/memory,
an index
optimized for that scenario.
--
karl
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
For additional commands, e-mail: java-user-help lucene.apache.org
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
For additional commands, e-mail: java-user-help lucene.apache.org
|
|
| Re: optimizing single document searches |
  Sweden |
2007-02-27 18:09:20 |
28 feb 2007 kl. 00.49 skrev Russ:
> Thanks, I will try it tommorow... Is it significantly
different
> from using a standard index on a ramdir?
>
A bit different.
You can also try LUCENE-550. It has about the same speed as
contrib/
memory but can handle multiple documents and use reader,
writer and
searcher as any other index.
--
karl
> Russ
> Sent wirelessly via BlackBerry from T-Mobile.
>
> -----Original Message-----
> From: karl wettin <karl.wettin gmail.com>
> Date: Wed, 28 Feb 2007 00:37:55
> To:java-user lucene.apache.org
> Subject: Re: optimizing single document searches
>
>
> 28 feb 2007 kl. 00.25 skrev Ruslan Sivak:
> ]
>
>> On a single document of 10k characters, doing about
40k searches
>> takes about 5 seconds. This is not bad, but I was
wondering if I
>> can somehow speed this up.
>
> Your corpus contains only one document? Try
contrib/memory, an index
> optimized for that scenario.
>
> --
> karl
>
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
> For additional commands, e-mail: java-user-help lucene.apache.org
>
>
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
> For additional commands, e-mail: java-user-help lucene.apache.org
>
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
For additional commands, e-mail: java-user-help lucene.apache.org
|
|
| Re: optimizing single document searches |
  Canada |
2007-02-27 18:01:54 |
I will definatelly check it out tommorow.
I also forgot to mention that I am not interested in the
hits themselves, only whether or not there was a hit. Is
there something I can use that's optimized for this
scenario, or should I look into rewriting the search method
of the indexarsearcher? Currently I just check
hits.size().
Russ
Sent wirelessly via BlackBerry from T-Mobile.
-----Original Message-----
From: "Erick Erickson" <erickerickson gmail.com>
Date: Tue, 27 Feb 2007 18:49:45
To:java-user lucene.apache.org
Subject: Re: optimizing single document searches
Which is very, very cool. I wound up using it for hit
counting and it
works like a charm....
On 2/27/07, karl wettin <karl.wettin gmail.com> wrote:
>
>
> 28 feb 2007 kl. 00.25 skrev Ruslan Sivak:
> ]
>
> > On a single document of 10k characters, doing
about 40k searches
> > takes about 5 seconds. This is not bad, but I was
wondering if I
> > can somehow speed this up.
>
> Your corpus contains only one document? Try
contrib/memory, an index
> optimized for that scenario.
>
> --
> karl
>
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
> For additional commands, e-mail: java-user-help lucene.apache.org
>
>
|
|
| Re: optimizing single document searches |
  Netherlands |
2007-02-28 13:41:08 |
On Wednesday 28 February 2007 01:01, Russ wrote:
> I will definatelly check it out tommorow.
>
> I also forgot to mention that I am not interested in
the hits themselves,
only whether or not there was a hit. Is there something I
can use that's
optimized for this scenario, or should I look into rewriting
the search
method of the indexarsearcher? Currently I just check
hits.size().
For a single document: get the Scorer from the Query via
Weight.
Then check the return value of Scorer.next(), it will
indicate whether
the only doc matches the query.
Regards,
Paul Elschot.
>
> Russ
> Sent wirelessly via BlackBerry from T-Mobile.
>
> -----Original Message-----
> From: "Erick Erickson" <erickerickson gmail.com>
> Date: Tue, 27 Feb 2007 18:49:45
> To:java-user lucene.apache.org
> Subject: Re: optimizing single document searches
>
> Which is very, very cool. I wound up using it for hit
counting and it
> works like a charm....
>
> On 2/27/07, karl wettin <karl.wettin gmail.com> wrote:
> >
> >
> > 28 feb 2007 kl. 00.25 skrev Ruslan Sivak:
> > ]
> >
> > > On a single document of 10k characters, doing
about 40k searches
> > > takes about 5 seconds. This is not bad, but
I was wondering if I
> > > can somehow speed this up.
> >
> > Your corpus contains only one document? Try
contrib/memory, an index
> > optimized for that scenario.
> >
> > --
> > karl
> >
> >
------------------------------------------------------------
---------
> > To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
> > For additional commands, e-mail:
java-user-help lucene.apache.org
> >
> >
>
>
>
>
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
For additional commands, e-mail: java-user-help lucene.apache.org
|
|
| Re: optimizing single document searches |
  United States |
2007-02-28 15:28:55 |
karl wettin wrote:
>
> 28 feb 2007 kl. 00.49 skrev Russ:
>
>> Thanks, I will try it tommorow... Is it
significantly different from
>> using a standard index on a ramdir?
>>
>
> A bit different.
>
> You can also try LUCENE-550. It has about the same
speed as
> contrib/memory but can handle multiple documents and
use reader,
> writer and searcher as any other index.
>
> --karl
>
Karl,
Thank you. I tried the contrib/memory and it's awesome.
Got my search
time down to 300ms from 5 seconds.
I'm still having some performance issues on the set up. I
can probably
live with them, as I'll be caching these terms, but maybe I
can optimize
it somehow. It currently takes about 3.5 seconds to set up.
I am
basically creating 40k SpanNearQueries. Here is my method
that creates
them. Is there anything I can improve?
private static Analyzer analyzer=new StandardAnalyzer();
public static SpanNearQuery createSpanNearQuery(String
string, int slop,
boolean inOrder)
{
Vector terms=new Vector();
TokenStream
tokenizer=Lucene.analyzer.tokenStream("body", new
StringReader(string));
Token token = null;
do {
try {
token=tokenizer.next();
} catch (Exception e) {
e.printStackTrace();
}
if (token!=null)
{
terms.add(new SpanTermQuery(new
Term("body",token.termText())));
}
}
while (token!=null && terms.size()<10);
SpanTermQuery[] termsArray=new
SpanTermQuery[terms.size()];
for (int i=0;i<terms.size();i++)
{
termsArray[i]=(SpanTermQuery) terms.get(i);
}
return new SpanNearQuery(termsArray,slop,inOrder);
}
Russ
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
For additional commands, e-mail: java-user-help lucene.apache.org
|
|
[1-8]
|
|