List Info

Thread: QueryParser and NGrams




QueryParser and NGrams
country flaguser name
Sweden
2007-10-11 12:47:19
I don't understand, why does the following code create 2
phrase  
queries instead of 20 term queries? I'm quite sure I've
previously  
had QueryParser doing the latter.

System.out.println(new QueryParser("f", new
Analyzer() {
   public TokenStream tokenStream(String string, Reader
reader) {
     return new NGramTokenFilter(new
StandardTokenizer(reader), 2, 5);
   }
}).parse("hello world"));


f:"he el ll lo hel ell llo hell ello hello"
f:"wo or rl ld wor orl  
rld worl orld world"


-- 
karl

------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org


Re: QueryParser and NGrams
country flaguser name
Sweden
2007-10-11 13:40:00
I now realize that that phrase makes sense, and that it was
another  
"feature" in my code that confused me.

So, forget about it.

Bada bing, bada bom.


-- 
karl

11 okt 2007 kl. 19.47 skrev Karl Wettin:

> I don't understand, why does the following code create
2 phrase  
> queries instead of 20 term queries? I'm quite sure I've
previously  
> had QueryParser doing the latter.
>
> System.out.println(new QueryParser("f", new
Analyzer() {
>   public TokenStream tokenStream(String string, Reader
reader) {
>     return new NGramTokenFilter(new
StandardTokenizer(reader), 2, 5);
>   }
> }).parse("hello world"));
>
>
> f:"he el ll lo hel ell llo hell ello hello"
f:"wo or rl ld wor orl  
> rld worl orld world"
>
>
> -- 
> karl
>
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
> For additional commands, e-mail: java-user-helplucene.apache.org
>


------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org


Re: QueryParser and NGrams
country flaguser name
Sweden
2007-10-11 13:51:10
No, sorry, I'm still confused. It ought to be a term
queries?

--  
the flooding troll


11 okt 2007 kl. 20.40 skrev Karl Wettin:

> I now realize that that phrase makes sense, and that it
was another  
> "feature" in my code that confused me.
>
> So, forget about it.
>
> Bada bing, bada bom.
>
>
> -- 
> karl
>
> 11 okt 2007 kl. 19.47 skrev Karl Wettin:
>
>> I don't understand, why does the following code
create 2 phrase  
>> queries instead of 20 term queries? I'm quite sure
I've previously  
>> had QueryParser doing the latter.
>>
>> System.out.println(new QueryParser("f",
new Analyzer() {
>>   public TokenStream tokenStream(String string,
Reader reader) {
>>     return new NGramTokenFilter(new
StandardTokenizer(reader), 2, 5);
>>   }
>> }).parse("hello world"));
>>
>>
>> f:"he el ll lo hel ell llo hell ello
hello" f:"wo or rl ld wor orl  
>> rld worl orld world"
>>
>>
>> -- 
>> karl
>>
>>
------------------------------------------------------------
---------
>> To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
>> For additional commands, e-mail: java-user-helplucene.apache.org
>>
>
>
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
> For additional commands, e-mail: java-user-helplucene.apache.org
>


------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org


Re: QueryParser and NGrams
user name
2007-10-11 20:09:46
: No, sorry, I'm still confused. It ought to be a term
queries?

: > > System.out.println(new
QueryParser("f", new Analyzer() {
: > >   public TokenStream tokenStream(String string,
Reader reader) {
: > >     return new NGramTokenFilter(new
StandardTokenizer(reader), 2, 5);
: > >   }
: > > }).parse("hello world"));

query parser does one pass looking for operators, it sees
two pieces of 
input: "hello" and "world" it hands each
individually to your analyzer.  

for each word, your analyzer produces multiple tokens --
which are not at 
the same position (they have non zero positionIncrimentGap) 
QueryParser 
sees the multiple tokens at consecutive positions, and
constructs a phrase 
query (per word).  if there was at least one token with a 
positionIncriment of 0, it would have created a
MultiPhraseQuery (per 
word), if all had 0 positionIncriments, it would have
constructed a 
BooleanQuery (per word) containing TermQueries.





-Hoss


------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org


[1-4]

about | contact  Other archives ( Real Estate discussion Medical topics )