List Info

Thread: Best way to change weighting based on the presence of a field




Best way to change weighting based on the presence of a field
user name
2007-10-05 16:06:03
Howdy all,

We are attempting to provide access to about 8 million
records of
highly variable quality and length. In a nutshell, we are
trying to
find a way to deprioritize "suspect" records
without discriminating
against useful records that happen to be short. We do not
wish to
eliminate suspect records from the results -- just
deprioritize them a
bit.

We have been indexing a field that marks a record as likely
to be good
or bad, and I'm trying to figure out the most efficient way
to use it
(should I be trying this at all?). As a newbie, my first
inclination
was to OR the search terms with the same terms combined with
a "good
record marker" with a modest boost.

However, this method seems really clunky, and I'm wondering
if there's
a better way to accomplish what we're trying to do. Thanks,

kyle

Re: Best way to change weighting based on the presence of a field
country flaguser name
Canada
2007-10-05 16:12:27
On 5-Oct-07, at 2:06 PM, Kyle Banerjee wrote:

> Howdy all,
>
> We are attempting to provide access to about 8 million
records of
> highly variable quality and length. In a nutshell, we
are trying to
> find a way to deprioritize "suspect" records
without discriminating
> against useful records that happen to be short. We do
not wish to
> eliminate suspect records from the results -- just
deprioritize them a
> bit.
>
> We have been indexing a field that marks a record as
likely to be good
> or bad, and I'm trying to figure out the most efficient
way to use it
> (should I be trying this at all?). As a newbie, my
first inclination
> was to OR the search terms with the same terms combined
with a "good
> record marker" with a modest boost.
>
> However, this method seems really clunky, and I'm
wondering if there's
> a better way to accomplish what we're trying to do.
Thanks,

If you know at index time that the document is shady, the
easiest way  
to de-emphasize it globally is to set the document boost to
some  
value other than one.

<doc boost="0.5">...

cheers,
-Mike

Re: Best way to change weighting based on the presence of a field
user name
2007-10-05 17:01:49
> If you know at index time that the document is shady,
the easiest way
> to de-emphasize it globally is to set the document
boost to some
> value other than one.
>
> <doc boost="0.5">...

I considered that, but assumed we'd get the values wrong at
first and
have to do a lot of tinkering before we got it right. Is
there a good
way to do this at query time, or do you really need to do
this when
loading? It would be feasible to boost at load time, but
recovery
times from bad decisions are longer than I was hoping for.

kyle

Re: Best way to change weighting based on the presence of a field
country flaguser name
Canada
2007-10-05 18:50:00
On 5-Oct-07, at 3:01 PM, Kyle Banerjee wrote:

>> If you know at index time that the document is
shady, the easiest way
>> to de-emphasize it globally is to set the document
boost to some
>> value other than one.
>>
>> <doc boost="0.5">...
>
> I considered that, but assumed we'd get the values
wrong at first and
> have to do a lot of tinkering before we got it right.
Is there a good
> way to do this at query time, or do you really need to
do this when
> loading? It would be feasible to boost at load time,
but recovery
> times from bad decisions are longer than I was hoping
for.

The other option is to use a function query on the value
stored in a  
field (which could represent a range of 'badness').  This
can be used  
directly in the dismax handler using the bf (boost function)
query  
parameter.

-Mike

Re: Best way to change weighting based on the presence of a field
user name
2007-10-06 07:37:06
> In the near future, you can do a real query-time boost
(score multiplication)
> by another field or function
> https:
//issues.apache.org/jira/browse/SOLR-334
>
> And even quickly update all the values of the field
being used as the boost:
> https:
//issues.apache.org/jira/browse/SOLR-351

Thanks, all the feedback people are providing is very
helpful. For the
short term, it looks like the ticket might to use a function
query on
the value stored in a field that represents the quality of
the record.

kyle

[1-5]

about | contact  Other archives ( Real Estate discussion Medical topics )