List Info

Thread: Payloads for multiValued fields?




Payloads for multiValued fields?
country flaguser name
United States
2007-08-16 11:20:49
When searching a multiValued field, is it possible to know
which of  
the multiple fields the match was in?

For example if I have an index of documents, each of which
has  
multiple image captions stored in separate fields, I'd like
to be  
able to link from the search results to the caption in the
original  
document.

One possibility could be attaching metadata to a field,
similar to  
payloads for terms. At the moment all I can think of is
adding  
metadata inside the stored field and stripping that out when
it's  
indexed and displayed, but that's not ideal.

alf.

Re: Payloads for multiValued fields?
country flaguser name
United States
2007-08-16 11:28:30
On 16 Aug 2007, at 17:20, Alf Eaton wrote:

> When searching a multiValued field, is it possible to
know which of  
> the multiple fields the match was in?
>
> For example if I have an index of documents, each of
which has  
> multiple image captions stored in separate fields, I'd
like to be  
> able to link from the search results to the caption in
the original  
> document.
>
> One possibility could be attaching metadata to a field,
similar to  
> payloads for terms. At the moment all I can think of is
adding  
> metadata inside the stored field and stripping that out
when it's  
> indexed and displayed, but that's not ideal.

Actually on reflection all this would need would be for the 

Highlighter to add a field to the response, saying which
item of the  
multiValued field the match was in. Is that possible?

alf.



Re: Payloads for multiValued fields?
user name
2007-08-16 11:34:17
On 8/16/07, Alf Eaton <listshubmed.org> wrote:
>
> On 16 Aug 2007, at 17:20, Alf Eaton wrote:
>
> > When searching a multiValued field, is it possible
to know which of
> > the multiple fields the match was in?
> >
> > For example if I have an index of documents, each
of which has
> > multiple image captions stored in separate fields,
I'd like to be
> > able to link from the search results to the
caption in the original
> > document.
> >
> > One possibility could be attaching metadata to a
field, similar to
> > payloads for terms. At the moment all I can think
of is adding
> > metadata inside the stored field and stripping
that out when it's
> > indexed and displayed, but that's not ideal.
>
> Actually on reflection all this would need would be for
the
> Highlighter to add a field to the response, saying
which item of the
> multiValued field the match was in. Is that possible?

Could you perhaps index the captions as
#1 this is the first caption
#2 this is the second caption

And then when just look for #n in the highlighted results?
For display, you could also strip out the #n in the
captions.

-Yonik

Re: Payloads for multiValued fields?
country flaguser name
United States
2007-08-16 11:34:21
On 16 Aug 2007, at 17:34, Yonik Seeley wrote:

> On 8/16/07, Alf Eaton <listshubmed.org> wrote:
>>
>> On 16 Aug 2007, at 17:20, Alf Eaton wrote:
>>
>>> When searching a multiValued field, is it
possible to know which of
>>> the multiple fields the match was in?
>>>
>>> For example if I have an index of documents,
each of which has
>>> multiple image captions stored in separate
fields, I'd like to be
>>> able to link from the search results to the
caption in the original
>>> document.
>>>
>>> One possibility could be attaching metadata to
a field, similar to
>>> payloads for terms. At the moment all I can
think of is adding
>>> metadata inside the stored field and stripping
that out when it's
>>> indexed and displayed, but that's not ideal.
>>
>> Actually on reflection all this would need would be
for the
>> Highlighter to add a field to the response, saying
which item of the
>> multiValued field the match was in. Is that
possible?
>
> Could you perhaps index the captions as
> #1 this is the first caption
> #2 this is the second caption
>
> And then when just look for #n in the highlighted
results?
> For display, you could also strip out the #n in the
captions.

I think that would probably work, yes - '#1' wouldn't be
indexed so  
wouldn't affect the search results.

Thanks,
alf.

Re: Payloads for multiValued fields?
country flaguser name
United States
2007-10-24 06:12:06
Yonik Seeley wrote:
> On 8/16/07, Alf Eaton <listshubmed.org> wrote:
>> On 16 Aug 2007, at 17:20, Alf Eaton wrote:
>>
>>> When searching a multiValued field, is it
possible to know which of
>>> the multiple fields the match was in?
>>>
>>> For example if I have an index of documents,
each of which has
>>> multiple image captions stored in separate
fields, I'd like to be
>>> able to link from the search results to the
caption in the original
>>> document.
>>>
>>> One possibility could be attaching metadata to
a field, similar to
>>> payloads for terms. At the moment all I can
think of is adding
>>> metadata inside the stored field and stripping
that out when it's
>>> indexed and displayed, but that's not ideal.
>> Actually on reflection all this would need would be
for the
>> Highlighter to add a field to the response, saying
which item of the
>> multiValued field the match was in. Is that
possible?
> 
> Could you perhaps index the captions as
> #1 this is the first caption
> #2 this is the second caption
> 
> And then when just look for #n in the highlighted
results?
> For display, you could also strip out the #n in the
captions.
> 

This was working ok for a while, but there's a problem: the
highlighter
doesn't return the whole caption - just the highlighted part
- so
sometimes the #n at the start of the caption field doesn't
get returned
and isn't available. Any other ideas? Perhaps there's a way
for the
response to say which fields of each document were matched?

alf


Re: Payloads for multiValued fields?
user name
2007-10-24 07:22:47
On 10/24/07, Alf Eaton <listshubmed.org> wrote:
> Yonik Seeley wrote:
> > Could you perhaps index the captions as
> > #1 this is the first caption
> > #2 this is the second caption
> >
> > And then when just look for #n in the highlighted
results?
> > For display, you could also strip out the #n in
the captions.
> >
>
> This was working ok for a while, but there's a problem:
the highlighter
> doesn't return the whole caption - just the highlighted
part - so
> sometimes the #n at the start of the caption field
doesn't get returned
> and isn't available. Any other ideas? Perhaps there's a
way for the
> response to say which fields of each document were
matched?

Perhaps try hl.fragsize=0

ht
tp://wiki.apache.org/solr/HighlightingParameters

-Yonik

Re: Payloads for multiValued fields?
country flaguser name
United States
2007-10-24 09:10:14
Yonik Seeley wrote:
> On 10/24/07, Alf Eaton <listshubmed.org> wrote:
>> Yonik Seeley wrote:
>>> Could you perhaps index the captions as
>>> #1 this is the first caption
>>> #2 this is the second caption
>>>
>>> And then when just look for #n in the
highlighted results?
>>> For display, you could also strip out the #n in
the captions.
>>>
>> This was working ok for a while, but there's a
problem: the highlighter
>> doesn't return the whole caption - just the
highlighted part - so
>> sometimes the #n at the start of the caption field
doesn't get returned
>> and isn't available. Any other ideas? Perhaps
there's a way for the
>> response to say which fields of each document were
matched?
> 
> Perhaps try hl.fragsize=0
> 
> ht
tp://wiki.apache.org/solr/HighlightingParameters

Yes, I was just trying that this morning and it's an
improvement, though
not ideal if the field contains a lot of text (in other
words it's still
a suboptimal workaround).

I do think it might be useful for the response to contain an
element
saying which fields were matched by the query, including
which
sub-sections of a multi-valued field were matched.

alf

Re: Payloads for multiValued fields?
country flaguser name
Canada
2007-10-24 13:36:00
On 24-Oct-07, at 7:10 AM, Alf Eaton wrote:

>
> Yes, I was just trying that this morning and it's an
improvement,  
> though
> not ideal if the field contains a lot of text (in other
words it's  
> still
> a suboptimal workaround).
>
> I do think it might be useful for the response to
contain an element
> saying which fields were matched by the query,
including which
> sub-sections of a multi-valued field were matched.

This isn't readily-accessible information.   Text search
engines work  
by storing a list of documents and occurrence frequency for
each  
document _per term_.  At that point, the information about
the  
structure of the document is not available.

It is computable given sufficient effort, but certainly not
something  
Solr should provide by default.

Have you considered storing each section as a separate Solr
Document?

-Mike

Re: Payloads for multiValued fields?
country flaguser name
United Kingdom
2007-10-24 14:39:30
Mike Klaas wrote:
> On 24-Oct-07, at 7:10 AM, Alf Eaton wrote:
>> Yes, I was just trying that this morning and it's
an improvement, though
>> not ideal if the field contains a lot of text (in
other words it's still
>> a suboptimal workaround).
>>
>> I do think it might be useful for the response to
contain an element
>> saying which fields were matched by the query,
including which
>> sub-sections of a multi-valued field were matched.
> 
> This isn't readily-accessible information.   Text
search engines work by
> storing a list of documents and occurrence frequency
for each document
> _per term_.  At that point, the information about the
structure of the
> document is not available.

The highlighting engine seems to know which fields were
matched by the
query though - enough to be able to use hl.requireFieldMatch
to only
return snippets from matched fields. The highlighter seems
to have a
small problem with snippets reaching across multivalued
fields, but if
that was sorted out then in theory the highlighter should be
able to
tell you which field, and which of the multiple values, was
matched, no?

> Have you considered storing each section as a separate
Solr Document?

I have considered this - in theory it would be easy enough
to create a
separate index just for these items, but it adds an extra
lump of
complexity to the search engine that I'd rather avoid. The
workaround of
adding a marked-up value to the indexed field, setting
hl.fragsize to 0
and parsing out the marked-up value from the highlighted
fragment should
be good enough for now.

alf

[1-9]

about | contact  Other archives ( Real Estate discussion Medical topics )