|
List Info
Thread: QueryParser and NGrams
|
|
| QueryParser and NGrams |
  Sweden |
2007-10-11 12:47:19 |
I don't understand, why does the following code create 2
phrase
queries instead of 20 term queries? I'm quite sure I've
previously
had QueryParser doing the latter.
System.out.println(new QueryParser("f", new
Analyzer() {
public TokenStream tokenStream(String string, Reader
reader) {
return new NGramTokenFilter(new
StandardTokenizer(reader), 2, 5);
}
}).parse("hello world"));
f:"he el ll lo hel ell llo hell ello hello"
f:"wo or rl ld wor orl
rld worl orld world"
--
karl
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
For additional commands, e-mail: java-user-help lucene.apache.org
|
|
| Re: QueryParser and NGrams |
  Sweden |
2007-10-11 13:40:00 |
I now realize that that phrase makes sense, and that it was
another
"feature" in my code that confused me.
So, forget about it.
Bada bing, bada bom.
--
karl
11 okt 2007 kl. 19.47 skrev Karl Wettin:
> I don't understand, why does the following code create
2 phrase
> queries instead of 20 term queries? I'm quite sure I've
previously
> had QueryParser doing the latter.
>
> System.out.println(new QueryParser("f", new
Analyzer() {
> public TokenStream tokenStream(String string, Reader
reader) {
> return new NGramTokenFilter(new
StandardTokenizer(reader), 2, 5);
> }
> }).parse("hello world"));
>
>
> f:"he el ll lo hel ell llo hell ello hello"
f:"wo or rl ld wor orl
> rld worl orld world"
>
>
> --
> karl
>
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
> For additional commands, e-mail: java-user-help lucene.apache.org
>
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
For additional commands, e-mail: java-user-help lucene.apache.org
|
|
| Re: QueryParser and NGrams |
  Sweden |
2007-10-11 13:51:10 |
No, sorry, I'm still confused. It ought to be a term
queries?
--
the flooding troll
11 okt 2007 kl. 20.40 skrev Karl Wettin:
> I now realize that that phrase makes sense, and that it
was another
> "feature" in my code that confused me.
>
> So, forget about it.
>
> Bada bing, bada bom.
>
>
> --
> karl
>
> 11 okt 2007 kl. 19.47 skrev Karl Wettin:
>
>> I don't understand, why does the following code
create 2 phrase
>> queries instead of 20 term queries? I'm quite sure
I've previously
>> had QueryParser doing the latter.
>>
>> System.out.println(new QueryParser("f",
new Analyzer() {
>> public TokenStream tokenStream(String string,
Reader reader) {
>> return new NGramTokenFilter(new
StandardTokenizer(reader), 2, 5);
>> }
>> }).parse("hello world"));
>>
>>
>> f:"he el ll lo hel ell llo hell ello
hello" f:"wo or rl ld wor orl
>> rld worl orld world"
>>
>>
>> --
>> karl
>>
>>
------------------------------------------------------------
---------
>> To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
>> For additional commands, e-mail: java-user-help lucene.apache.org
>>
>
>
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
> For additional commands, e-mail: java-user-help lucene.apache.org
>
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
For additional commands, e-mail: java-user-help lucene.apache.org
|
|
| Re: QueryParser and NGrams |

|
2007-10-11 20:09:46 |
: No, sorry, I'm still confused. It ought to be a term
queries?
: > > System.out.println(new
QueryParser("f", new Analyzer() {
: > > public TokenStream tokenStream(String string,
Reader reader) {
: > > return new NGramTokenFilter(new
StandardTokenizer(reader), 2, 5);
: > > }
: > > }).parse("hello world"));
query parser does one pass looking for operators, it sees
two pieces of
input: "hello" and "world" it hands each
individually to your analyzer.
for each word, your analyzer produces multiple tokens --
which are not at
the same position (they have non zero positionIncrimentGap)
QueryParser
sees the multiple tokens at consecutive positions, and
constructs a phrase
query (per word). if there was at least one token with a
positionIncriment of 0, it would have created a
MultiPhraseQuery (per
word), if all had 0 positionIncriments, it would have
constructed a
BooleanQuery (per word) containing TermQueries.
-Hoss
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
For additional commands, e-mail: java-user-help lucene.apache.org
|
|
[1-4]
|
|