List Info

Thread: Using wildcard with accented words




Using wildcard with accented words
country flaguser name
United States
2007-10-22 14:45:20
I have problem searching accented words with wild card.
although I have
configured schema using <filter
class="solr.ISOLatin1AccentFilterFactory"/>
both in index and query part. 
it is working for q=chrétien and find documents with
"chretien" but
searching for q=chré* does not work,  but q=chre* works
fine. 
is this a bug or I am doing something wrong? 

-- 
View this message in context: http://www.nabble.com/Using-wil
dcard-with-accented-words-tf4673239.html#a13351144
Sent from the Solr - User mailing list archive at
Nabble.com.


Re: Using wildcard with accented words
country flaguser name
United States
2007-10-22 16:43:35
On Oct 22, 2007, at 4:06 PM, Erik Hatcher wrote:
> On Oct 22, 2007, at 3:45 PM, kshadkhast wrote:
>> I have problem searching accented words with wild
card. although I  
>> have
>> configured schema using <filter  
>>
class="solr.ISOLatin1AccentFilterFactory"/>
>> both in index and query part.
>> it is working for q=chrétien and find documents
with "chretien" but
>> searching for q=chré* does not work,  but q=chre*
works fine.
>> is this a bug or I am doing something wrong?
>
> It's a bit tricky here.... Lucene's QueryParser, the
heart of  
> Solr's query parsing, does not analyze wildcard query
parts.   
> Consider stemmed words, for example, on why that is a
problem.   In  
> this case it does make sense to run it through a filter
that  
> normalizes diacritics on characters, but unfortunately
Solr doesn't  
> support what you need at this point.

Further on this, QueryParser does have some settings
specific to  
wildcard queries, such as lowercasing the prefix part.

Perhaps this is a case that Solr could address with a third
analyzer  
configuration (it already has "query", and
"index" differentiation)  
that could be incorporated for wildcard queries.   Thoughts
on that?

	Erik


Re: Using wildcard with accented words
user name
2007-10-22 20:03:29
On 10/22/07, Erik Hatcher <erikehatchersolutions.com>
wrote:
> Perhaps this is a case that Solr could address with a
third analyzer
> configuration (it already has "query", and
"index" differentiation)
> that could be incorporated for wildcard queries.  
Thoughts on that?

I've actually thought about it previously.... it would be
nice for it
to all work automatically for the user.  Seems like the
implementation
should be based on the TokenFilter level, then things like
synonym
filters, stemmers, etc, would do nothing.

Perhaps add some new methods to BaseTokenFilterFactory to do
prefix,
wildcard, etc, transformations?

Another gotcha is handling multiple tokens.
What happens if someone queries for myfield:foo-bar*
with a letter tokenizer or a word-delimiter filter?  It's
not a simple
prefix query at all!

-Yonik

Re: Using wildcard with accented words
user name
2007-10-23 12:19:43
On 10/23/07, Yonik Seeley <yonikapache.org> wrote:
> I've actually thought about it previously.... it would
be nice for it
> to all work automatically for the user.  Seems like the
implementation
> should be based on the TokenFilter level, then things
like synonym
> filters, stemmers, etc, would do nothing.

I concur that it could be really useful. Currently, we have
to
implement the ISOLatin1AccentFilterFactory filter in the
client part
of our applications but it could be great to be able to have
this part
in Solr directly and it would be more consistent with
non-wildcard
queries behaviour.

Regards,

-- 
Guillaume

[1-4]

about | contact  Other archives ( Real Estate discussion Medical topics )