List Info

Thread: Score of exact matches




Score of exact matches
user name
2007-11-05 23:05:12
Hi all,

I use Solr 1.2 on a job advertising site. I started from the
default
setup that runs all documents and queries through
EnglishPorterFilterFactory. As a result for example an ad
with
"accounts" in its title is matched when someone
runs a query for
"accountant" because both are stemmed to the
"account" word and then
they match.

Is it somehow possible to give a higher score to exact
matches and
sort them before matches from stemmed terms?

Close to this is a problem with accents - I can remove
accents from
both documents and from queries and then run the query on
non-accented
terms. But I'd like to give higher score to documents where
the search
term matches exactly (i.e. including accents and possibly
letter
capitalization, etc) and sort them before more fuzzy
searches.

To me it looks like I have to run multiple sub-queries for
each query,
one for exact match, one for accents removed and one for
stemmed words
and then combine the results and compute the final score for
each
match. Is that possible?

Thanks!

PaPa

Re: Score of exact matches
country flaguser name
United States
2007-11-05 23:23:56
On 5-Nov-07, at 9:05 PM, Papalagi Pakeha wrote:

> Hi all,
>
> I use Solr 1.2 on a job advertising site. I started
from the default
> setup that runs all documents and queries through
> EnglishPorterFilterFactory. As a result for example an
ad with
> "accounts" in its title is matched when
someone runs a query for
> "accountant" because both are stemmed to the
"account" word and then
> they match.
>
> Is it somehow possible to give a higher score to exact
matches and
> sort them before matches from stemmed terms?
>
> Close to this is a problem with accents - I can remove
accents from
> both documents and from queries and then run the query
on non-accented
> terms. But I'd like to give higher score to documents
where the search
> term matches exactly (i.e. including accents and
possibly letter
> capitalization, etc) and sort them before more fuzzy
searches.
>
> To me it looks like I have to run multiple sub-queries
for each query,
> one for exact match, one for accents removed and one
for stemmed words
> and then combine the results and compute the final
score for each
> match. Is that possible?

One way to do this is to index both alternatives at every
term  
position.  So when stemming, you'd store (account
accountant)  
(account accounts), etc., when filtering, (epee épée)
(fantome  
fantôme), etc.

Now when querying, transform your query into
<canonicalized version>  
<original version>^10:

épée -> epee épée^10
accountant -> account accountant^10

A bit of work to do in general, though.

-Mike
Re: Score of exact matches
user name
2007-11-05 23:37:40
This is fairly straightforward and works well with the
DisMax
handler. Indes the text into three different fields with
three
different sets of analyzers. Use something like this in the
request handler:

 <requestHandler name="multimatch"
class="solr.DisMaxRequestHandler" >
    <lst name="defaults">
     <float name="tie">0.01</float>
     <str name="qf">
           exact^16 noaccent^4 stemmed
     </str>
     <str name="pf">
           exact^16 noaccent^4 stemmed
     </str>
   </lst>
 </requestHandler>

You will probably need to adjust the weights for your
content,
though I expect these are a good starting place.

Per-field analyzers are very easy to use in Solr and are
extremely powerful. I wish we'd thought of that in
Ultraseek.

wunder
==
Search Guy, Netflix
Formerly: Architect, Ultraseek

On 11/5/07 9:05 PM, "Papalagi Pakeha"
<papalagi.pakehagmail.com> wrote:

> Hi all,
> 
> I use Solr 1.2 on a job advertising site. I started
from the default
> setup that runs all documents and queries through
> EnglishPorterFilterFactory. As a result for example an
ad with
> "accounts" in its title is matched when
someone runs a query for
> "accountant" because both are stemmed to the
"account" word and then
> they match.
> 
> Is it somehow possible to give a higher score to exact
matches and
> sort them before matches from stemmed terms?
> 
> Close to this is a problem with accents - I can remove
accents from
> both documents and from queries and then run the query
on non-accented
> terms. But I'd like to give higher score to documents
where the search
> term matches exactly (i.e. including accents and
possibly letter
> capitalization, etc) and sort them before more fuzzy
searches.
> 
> To me it looks like I have to run multiple sub-queries
for each query,
> one for exact match, one for accents removed and one
for stemmed words
> and then combine the results and compute the final
score for each
> match. Is that possible?
> 
> Thanks!
> 
> PaPa


RE: Score of exact matches
country flaguser name
United States
2007-11-06 13:34:15
What is the performance profile of this against merely
searching against
one field? My situation is millions of small records with an
average of
200 bytes/text field.

Lance 

-----Original Message-----
From: Walter Underwood [mailto:wunderwoodnetflix.com] 
Sent: Monday, November 05, 2007 9:38 PM
To: solr-userlucene.apache.org
Subject: Re: Score of exact matches

This is fairly straightforward and works well with the
DisMax handler.
Indes the text into three different fields with three
different sets of
analyzers. Use something like this in the request handler:

 <requestHandler name="multimatch"
class="solr.DisMaxRequestHandler" >
    <lst name="defaults">
     <float name="tie">0.01</float>
     <str name="qf">
           exact^16 noaccent^4 stemmed
     </str>
     <str name="pf">
           exact^16 noaccent^4 stemmed
     </str>
   </lst>
 </requestHandler>

You will probably need to adjust the weights for your
content, though I
expect these are a good starting place.

Per-field analyzers are very easy to use in Solr and are
extremely
powerful. I wish we'd thought of that in Ultraseek.

wunder
==
Search Guy, Netflix
Formerly: Architect, Ultraseek

On 11/5/07 9:05 PM, "Papalagi Pakeha"
<papalagi.pakehagmail.com> wrote:

> Hi all,
> 
> I use Solr 1.2 on a job advertising site. I started
from the default 
> setup that runs all documents and queries through 
> EnglishPorterFilterFactory. As a result for example an
ad with 
> "accounts" in its title is matched when
someone runs a query for 
> "accountant" because both are stemmed to the
"account" word and then 
> they match.
> 
> Is it somehow possible to give a higher score to exact
matches and 
> sort them before matches from stemmed terms?
> 
> Close to this is a problem with accents - I can remove
accents from 
> both documents and from queries and then run the query
on non-accented

> terms. But I'd like to give higher score to documents
where the search

> term matches exactly (i.e. including accents and
possibly letter 
> capitalization, etc) and sort them before more fuzzy
searches.
> 
> To me it looks like I have to run multiple sub-queries
for each query,

> one for exact match, one for accents removed and one
for stemmed words

> and then combine the results and compute the final
score for each 
> match. Is that possible?
> 
> Thanks!
> 
> PaPa


[1-4]

about | contact  Other archives ( Real Estate discussion Medical topics )