|
List Info
Thread: WildcardQuery and SpanQuery
|
|
| WildcardQuery and SpanQuery |

|
2007-07-17 22:58:44 |
Hi everybody,
We recently need to support wildcard search terms
"*", "?" together
with SpanQuery. It seems that there's no SpanWildcardQuery
available.
After looking into the lucene source code for a while, I
guess we can
either:
1. Use SpanRegexQuery, or
2. Write our own SpanWildcardQuery, and implements the
rewrite(IndexReader) method to rewrite the query into a
SpanOrQuery
with some SpanTermQuery.
Of the two approaches, Option 1 seems to be easier. But I am
rather
concerned about the performance of using regular expression.
On the
other hand, I am not sure if there are any other concerns I
am not
aware of for option 2 (i.e. is there a reason why there's
no
SpanWildcardQuery in the first place?)
Any advices ?
Cedric
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
For additional commands, e-mail: java-user-help lucene.apache.org
|
|
| Re: WildcardQuery and SpanQuery |
  Netherlands |
2007-07-18 01:52:51 |
On Wednesday 18 July 2007 05:58, Cedric Ho wrote:
> Hi everybody,
>
> We recently need to support wildcard search terms
"*", "?" together
> with SpanQuery. It seems that there's no
SpanWildcardQuery available.
> After looking into the lucene source code for a while,
I guess we can
> either:
>
> 1. Use SpanRegexQuery, or
>
> 2. Write our own SpanWildcardQuery, and implements the
> rewrite(IndexReader) method to rewrite the query into a
SpanOrQuery
> with some SpanTermQuery.
>
> Of the two approaches, Option 1 seems to be easier. But
I am rather
> concerned about the performance of using regular
expression. On the
> other hand, I am not sure if there are any other
concerns I am not
> aware of for option 2 (i.e. is there a reason why
there's no
> SpanWildcardQuery in the first place?)
>
> Any advices ?
The basic problem you are facing is that in Lucene
the expansion of the terms is tightly coupled to the
generation
of a combination query using the expanded terms.
In contrib/surround the term expansion and query generation
are decoupled using a visitor pattern for the terms. The
code is here:
http://svn.apache.org/viewvc/lucene/java/trunk/contrib/
surround/src/java/org/apache/lucene/queryParser/surround/que
ry
In surround a wild card term can provide either an OR of
normal term queries, or a SpanOrQuery of span term queries.
This query generation is in class SimpleTerm, which has one
method
for a normal boolean OR query over the terms, and one for
a span query for the terms.
In both cases surround uses a regular expression to expand
the matching terms, but that could be changed to use
another wildcard expansion mechanisms than the ones in
SrndPrefixQuery and SrndTruncQuery, which
are subclasses of SimpleTerm.
With the term expansion and the query combination split,
it is also necessary to limit the maximum number of
expanded
terms in another way than Lucene does. In surround the
classes BasicQueryFactory and TooManyBasicQueries are
used for that.
Regards,
Paul Elschot
>
> Cedric
>
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
> For additional commands, e-mail: java-user-help lucene.apache.org
>
>
>
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
For additional commands, e-mail: java-user-help lucene.apache.org
|
|
| Re: WildcardQuery and SpanQuery |

|
2007-07-18 05:30:13 |
Thanks for the quick response Paul =)
However I am lost while looking at the surround package. Are
you
suggesting I can solve my problem at hand using the surround
package?
On 7/18/07, Paul Elschot <paul.elschot xs4all.nl> wrote:
> On Wednesday 18 July 2007 05:58, Cedric Ho wrote:
> > Hi everybody,
> >
> > We recently need to support wildcard search terms
"*", "?" together
> > with SpanQuery. It seems that there's no
SpanWildcardQuery available.
> > After looking into the lucene source code for a
while, I guess we can
> > either:
> >
> > 1. Use SpanRegexQuery, or
> >
> > 2. Write our own SpanWildcardQuery, and implements
the
> > rewrite(IndexReader) method to rewrite the query
into a SpanOrQuery
> > with some SpanTermQuery.
> >
> > Of the two approaches, Option 1 seems to be
easier. But I am rather
> > concerned about the performance of using regular
expression. On the
> > other hand, I am not sure if there are any other
concerns I am not
> > aware of for option 2 (i.e. is there a reason why
there's no
> > SpanWildcardQuery in the first place?)
> >
> > Any advices ?
>
> The basic problem you are facing is that in Lucene
> the expansion of the terms is tightly coupled to the
generation
> of a combination query using the expanded terms.
>
> In contrib/surround the term expansion and query
generation
> are decoupled using a visitor pattern for the terms.
The code is here:
> http://svn.apache.org/viewvc/lucene/java/trunk/contrib/
surround/src/java/org/apache/lucene/queryParser/surround/que
ry
>
> In surround a wild card term can provide either an OR
of
> normal term queries, or a SpanOrQuery of span term
queries.
> This query generation is in class SimpleTerm, which has
one method
> for a normal boolean OR query over the terms, and one
for
> a span query for the terms.
>
> In both cases surround uses a regular expression to
expand
> the matching terms, but that could be changed to use
> another wildcard expansion mechanisms than the ones in
> SrndPrefixQuery and SrndTruncQuery, which
> are subclasses of SimpleTerm.
>
> With the term expansion and the query combination
split,
> it is also necessary to limit the maximum number of
expanded
> terms in another way than Lucene does. In surround the
> classes BasicQueryFactory and TooManyBasicQueries are
> used for that.
>
> Regards,
> Paul Elschot
>
>
>
> >
> > Cedric
> >
> >
------------------------------------------------------------
---------
> > To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
> > For additional commands, e-mail:
java-user-help lucene.apache.org
> >
> >
> >
>
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
> For additional commands, e-mail: java-user-help lucene.apache.org
>
>
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
For additional commands, e-mail: java-user-help lucene.apache.org
|
|
| Re: WildcardQuery and SpanQuery |
  United States |
2007-07-18 05:51:11 |
You could give this a shot (From my Qsol query parser):
package com.mhs.qsol.spans;
/**
* Copyright 2006 Mark Miller (markrmiller gmail.com)
*
* Licensed under the Apache License, Version 2.0 (the
"License");
* you may not use this file except in compliance with the
License.
* You may obtain a copy of the License at
*
* http://www
.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in
writing, software
* distributed under the License is distributed on an
"AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
express or implied.
* See the License for the specific language governing
permissions and
* limitations under the License.
*/
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collection;
import java.util.Set;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.WildcardQuery;
import org.apache.lucene.search.spans.SpanOrQuery;
import org.apache.lucene.search.spans.SpanQuery;
import org.apache.lucene.search.spans.SpanTermQuery;
import org.apache.lucene.search.spans.Spans;
/**
* author mark miller
*
*/
public class SpanWildcardQuery extends SpanQuery {
private Term term;
private BooleanQuery rewrittenWildQuery;
public SpanWildcardQuery(Term term) {
this.term = term;
}
public Term getTerm() {
return term;
}
public Query rewrite(IndexReader reader) throws
IOException {
WildcardQuery wildQuery = new WildcardQuery(term);
rewrittenWildQuery = (BooleanQuery)
wildQuery.rewrite(reader);
BooleanQuery bq = (BooleanQuery)
rewrittenWildQuery.rewrite(reader);
BooleanClause[] clauses = bq.getClauses();
SpanQuery[] sqs = new SpanQuery[clauses.length];
for (int i = 0; i < clauses.length; i++) {
BooleanClause clause = clauses[i];
TermQuery tq = (TermQuery) clause.getQuery();
sqs[i] = new SpanTermQuery(tq.getTerm());
sqs[i].setBoost(tq.getBoost());
}
SpanOrQuery query = new SpanOrQuery(sqs);
query.setBoost(wildQuery.getBoost());
return query;
}
public Spans getSpans(IndexReader reader) throws
IOException {
throw new UnsupportedOperationException(
"Query should have been
rewritten");
}
public String getField() {
return term.field();
}
/**
* deprecated use extractTerms instead
* see #extractTerms(Set);
*/
public Collection getTerms() {
Collection terms = new ArrayList();
terms.add(term);
return terms;
}
public void extractTerms(Set terms) {
terms.add(term);
}
public String toString(String field) {
StringBuffer buffer = new StringBuffer();
buffer.append("spanWildcardQuery(");
buffer.append(term);
buffer.append(")");
// buffer.append(ToStringUtils.boost(getBoost()));
return buffer.toString();
}
}
Cedric Ho wrote:
> Hi everybody,
>
> We recently need to support wildcard search terms
"*", "?" together
> with SpanQuery. It seems that there's no
SpanWildcardQuery available.
> After looking into the lucene source code for a while,
I guess we can
> either:
>
> 1. Use SpanRegexQuery, or
>
> 2. Write our own SpanWildcardQuery, and implements the
> rewrite(IndexReader) method to rewrite the query into a
SpanOrQuery
> with some SpanTermQuery.
>
> Of the two approaches, Option 1 seems to be easier. But
I am rather
> concerned about the performance of using regular
expression. On the
> other hand, I am not sure if there are any other
concerns I am not
> aware of for option 2 (i.e. is there a reason why
there's no
> SpanWildcardQuery in the first place?)
>
> Any advices ?
>
> Cedric
>
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
> For additional commands, e-mail: java-user-help lucene.apache.org
>
>
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
For additional commands, e-mail: java-user-help lucene.apache.org
|
|
| Re: WildcardQuery and SpanQuery |
  Netherlands |
2007-07-18 13:54:22 |
On Wednesday 18 July 2007 12:30, Cedric Ho wrote:
> Thanks for the quick response Paul =)
>
> However I am lost while looking at the surround
package.
That is not really surprising, the code is factored to the
bone, and it
is hardly documented.
You could have a look at the test code to start.
Also the surround.txt file in the contrib/surround directory
should
be helpful.
> Are you
> suggesting I can solve my problem at hand using the
surround package?
In case the surround syntax fits what you need, you might
use the surround
package.
You could also use your own parser and target the
o.a.l.queryParser.surround.query package.
The code posted by Mark Miller may solve your problem, too.
Regards,
Paul Elschot
>
>
> On 7/18/07, Paul Elschot <paul.elschot xs4all.nl> wrote:
> > On Wednesday 18 July 2007 05:58, Cedric Ho wrote:
> > > Hi everybody,
> > >
> > > We recently need to support wildcard search
terms "*", "?" together
> > > with SpanQuery. It seems that there's no
SpanWildcardQuery available.
> > > After looking into the lucene source code for
a while, I guess we can
> > > either:
> > >
> > > 1. Use SpanRegexQuery, or
> > >
> > > 2. Write our own SpanWildcardQuery, and
implements the
> > > rewrite(IndexReader) method to rewrite the
query into a SpanOrQuery
> > > with some SpanTermQuery.
> > >
> > > Of the two approaches, Option 1 seems to be
easier. But I am rather
> > > concerned about the performance of using
regular expression. On the
> > > other hand, I am not sure if there are any
other concerns I am not
> > > aware of for option 2 (i.e. is there a reason
why there's no
> > > SpanWildcardQuery in the first place?)
> > >
> > > Any advices ?
> >
> > The basic problem you are facing is that in
Lucene
> > the expansion of the terms is tightly coupled to
the generation
> > of a combination query using the expanded terms.
> >
> > In contrib/surround the term expansion and query
generation
> > are decoupled using a visitor pattern for the
terms. The code is here:
> >
http://svn.apache.org/viewvc/lucene/java/trunk/contrib/
surround/src/java/org/apache/lucene/queryParser/surround/que
ry
> >
> > In surround a wild card term can provide either an
OR of
> > normal term queries, or a SpanOrQuery of span term
queries.
> > This query generation is in class SimpleTerm,
which has one method
> > for a normal boolean OR query over the terms, and
one for
> > a span query for the terms.
> >
> > In both cases surround uses a regular expression
to expand
> > the matching terms, but that could be changed to
use
> > another wildcard expansion mechanisms than the
ones in
> > SrndPrefixQuery and SrndTruncQuery, which
> > are subclasses of SimpleTerm.
> >
> > With the term expansion and the query combination
split,
> > it is also necessary to limit the maximum number
of expanded
> > terms in another way than Lucene does. In surround
the
> > classes BasicQueryFactory and TooManyBasicQueries
are
> > used for that.
> >
> > Regards,
> > Paul Elschot
> >
> >
> >
> > >
> > > Cedric
> > >
> > >
------------------------------------------------------------
---------
> > > To unsubscribe, e-mail:
java-user-unsubscribe lucene.apache.org
> > > For additional commands, e-mail:
java-user-help lucene.apache.org
> > >
> > >
> > >
> >
> >
------------------------------------------------------------
---------
> > To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
> > For additional commands, e-mail:
java-user-help lucene.apache.org
> >
> >
>
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
> For additional commands, e-mail: java-user-help lucene.apache.org
>
>
>
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
For additional commands, e-mail: java-user-help lucene.apache.org
|
|
| Re: WildcardQuery and SpanQuery |

|
2007-07-19 20:41:29 |
Thanks so much for helping ~ I will try it out tomorrow.
Regards,
Cedric
On 7/19/07, Paul Elschot <paul.elschot xs4all.nl> wrote:
> On Wednesday 18 July 2007 12:30, Cedric Ho wrote:
> > Thanks for the quick response Paul =)
> >
> > However I am lost while looking at the surround
package.
>
> That is not really surprising, the code is factored to
the bone, and it
> is hardly documented.
> You could have a look at the test code to start.
> Also the surround.txt file in the contrib/surround
directory should
> be helpful.
>
> > Are you
> > suggesting I can solve my problem at hand using
the surround package?
>
> In case the surround syntax fits what you need, you
might use the surround
> package.
>
> You could also use your own parser and target the
> o.a.l.queryParser.surround.query package.
> The code posted by Mark Miller may solve your problem,
too.
>
> Regards,
> Paul Elschot
>
>
> >
> >
> > On 7/18/07, Paul Elschot <paul.elschot xs4all.nl> wrote:
> > > On Wednesday 18 July 2007 05:58, Cedric Ho
wrote:
> > > > Hi everybody,
> > > >
> > > > We recently need to support wildcard
search terms "*", "?" together
> > > > with SpanQuery. It seems that there's no
SpanWildcardQuery available.
> > > > After looking into the lucene source
code for a while, I guess we can
> > > > either:
> > > >
> > > > 1. Use SpanRegexQuery, or
> > > >
> > > > 2. Write our own SpanWildcardQuery, and
implements the
> > > > rewrite(IndexReader) method to rewrite
the query into a SpanOrQuery
> > > > with some SpanTermQuery.
> > > >
> > > > Of the two approaches, Option 1 seems to
be easier. But I am rather
> > > > concerned about the performance of using
regular expression. On the
> > > > other hand, I am not sure if there are
any other concerns I am not
> > > > aware of for option 2 (i.e. is there a
reason why there's no
> > > > SpanWildcardQuery in the first place?)
> > > >
> > > > Any advices ?
> > >
> > > The basic problem you are facing is that in
Lucene
> > > the expansion of the terms is tightly coupled
to the generation
> > > of a combination query using the expanded
terms.
> > >
> > > In contrib/surround the term expansion and
query generation
> > > are decoupled using a visitor pattern for the
terms. The code is here:
> > >
> http://svn.apache.org/viewvc/lucene/java/trunk/contrib/
surround/src/java/org/apache/lucene/queryParser/surround/que
ry
> > >
> > > In surround a wild card term can provide
either an OR of
> > > normal term queries, or a SpanOrQuery of span
term queries.
> > > This query generation is in class SimpleTerm,
which has one method
> > > for a normal boolean OR query over the terms,
and one for
> > > a span query for the terms.
> > >
> > > In both cases surround uses a regular
expression to expand
> > > the matching terms, but that could be changed
to use
> > > another wildcard expansion mechanisms than
the ones in
> > > SrndPrefixQuery and SrndTruncQuery, which
> > > are subclasses of SimpleTerm.
> > >
> > > With the term expansion and the query
combination split,
> > > it is also necessary to limit the maximum
number of expanded
> > > terms in another way than Lucene does. In
surround the
> > > classes BasicQueryFactory and
TooManyBasicQueries are
> > > used for that.
> > >
> > > Regards,
> > > Paul Elschot
> > >
> > >
> > >
> > > >
> > > > Cedric
> > > >
> > > >
------------------------------------------------------------
---------
> > > > To unsubscribe, e-mail:
java-user-unsubscribe lucene.apache.org
> > > > For additional commands, e-mail:
java-user-help lucene.apache.org
> > > >
> > > >
> > > >
> > >
> > >
------------------------------------------------------------
---------
> > > To unsubscribe, e-mail:
java-user-unsubscribe lucene.apache.org
> > > For additional commands, e-mail:
java-user-help lucene.apache.org
> > >
> > >
> >
> >
------------------------------------------------------------
---------
> > To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
> > For additional commands, e-mail:
java-user-help lucene.apache.org
> >
> >
> >
>
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
> For additional commands, e-mail: java-user-help lucene.apache.org
>
>
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
For additional commands, e-mail: java-user-help lucene.apache.org
|
|
[1-6]
|
|