On 5-Nov-07, at 9:05 PM, Papalagi Pakeha wrote:
> Hi all,
>
> I use Solr 1.2 on a job advertising site. I started
from the default
> setup that runs all documents and queries through
> EnglishPorterFilterFactory. As a result for example an
ad with
> "accounts" in its title is matched when
someone runs a query for
> "accountant" because both are stemmed to the
"account" word and then
> they match.
>
> Is it somehow possible to give a higher score to exact
matches and
> sort them before matches from stemmed terms?
>
> Close to this is a problem with accents - I can remove
accents from
> both documents and from queries and then run the query
on non-accented
> terms. But I'd like to give higher score to documents
where the search
> term matches exactly (i.e. including accents and
possibly letter
> capitalization, etc) and sort them before more fuzzy
searches.
>
> To me it looks like I have to run multiple sub-queries
for each query,
> one for exact match, one for accents removed and one
for stemmed words
> and then combine the results and compute the final
score for each
> match. Is that possible?
One way to do this is to index both alternatives at every
term
position. So when stemming, you'd store (account
accountant)
(account accounts), etc., when filtering, (epee épée)
(fantome
fantôme), etc.
Now when querying, transform your query into
<canonicalized version>
<original version>^10:
épée -> epee épée^10
accountant -> account accountant^10
A bit of work to do in general, though.
-Mike
|