|
List Info
Thread: Payloads for multiValued fields?
|
|
| Payloads for multiValued fields? |
  United States |
2007-08-16 11:20:49 |
When searching a multiValued field, is it possible to know
which of
the multiple fields the match was in?
For example if I have an index of documents, each of which
has
multiple image captions stored in separate fields, I'd like
to be
able to link from the search results to the caption in the
original
document.
One possibility could be attaching metadata to a field,
similar to
payloads for terms. At the moment all I can think of is
adding
metadata inside the stored field and stripping that out when
it's
indexed and displayed, but that's not ideal.
alf.
|
|
| Re: Payloads for multiValued fields? |
  United States |
2007-08-16 11:28:30 |
On 16 Aug 2007, at 17:20, Alf Eaton wrote:
> When searching a multiValued field, is it possible to
know which of
> the multiple fields the match was in?
>
> For example if I have an index of documents, each of
which has
> multiple image captions stored in separate fields, I'd
like to be
> able to link from the search results to the caption in
the original
> document.
>
> One possibility could be attaching metadata to a field,
similar to
> payloads for terms. At the moment all I can think of is
adding
> metadata inside the stored field and stripping that out
when it's
> indexed and displayed, but that's not ideal.
Actually on reflection all this would need would be for the
Highlighter to add a field to the response, saying which
item of the
multiValued field the match was in. Is that possible?
alf.
|
|
| Re: Payloads for multiValued fields? |

|
2007-08-16 11:34:17 |
On 8/16/07, Alf Eaton <lists hubmed.org> wrote:
>
> On 16 Aug 2007, at 17:20, Alf Eaton wrote:
>
> > When searching a multiValued field, is it possible
to know which of
> > the multiple fields the match was in?
> >
> > For example if I have an index of documents, each
of which has
> > multiple image captions stored in separate fields,
I'd like to be
> > able to link from the search results to the
caption in the original
> > document.
> >
> > One possibility could be attaching metadata to a
field, similar to
> > payloads for terms. At the moment all I can think
of is adding
> > metadata inside the stored field and stripping
that out when it's
> > indexed and displayed, but that's not ideal.
>
> Actually on reflection all this would need would be for
the
> Highlighter to add a field to the response, saying
which item of the
> multiValued field the match was in. Is that possible?
Could you perhaps index the captions as
#1 this is the first caption
#2 this is the second caption
And then when just look for #n in the highlighted results?
For display, you could also strip out the #n in the
captions.
-Yonik
|
|
| Re: Payloads for multiValued fields? |
  United States |
2007-08-16 11:34:21 |
On 16 Aug 2007, at 17:34, Yonik Seeley wrote:
> On 8/16/07, Alf Eaton <lists hubmed.org> wrote:
>>
>> On 16 Aug 2007, at 17:20, Alf Eaton wrote:
>>
>>> When searching a multiValued field, is it
possible to know which of
>>> the multiple fields the match was in?
>>>
>>> For example if I have an index of documents,
each of which has
>>> multiple image captions stored in separate
fields, I'd like to be
>>> able to link from the search results to the
caption in the original
>>> document.
>>>
>>> One possibility could be attaching metadata to
a field, similar to
>>> payloads for terms. At the moment all I can
think of is adding
>>> metadata inside the stored field and stripping
that out when it's
>>> indexed and displayed, but that's not ideal.
>>
>> Actually on reflection all this would need would be
for the
>> Highlighter to add a field to the response, saying
which item of the
>> multiValued field the match was in. Is that
possible?
>
> Could you perhaps index the captions as
> #1 this is the first caption
> #2 this is the second caption
>
> And then when just look for #n in the highlighted
results?
> For display, you could also strip out the #n in the
captions.
I think that would probably work, yes - '#1' wouldn't be
indexed so
wouldn't affect the search results.
Thanks,
alf.
|
|
| Re: Payloads for multiValued fields? |
  United States |
2007-10-24 06:12:06 |
Yonik Seeley wrote:
> On 8/16/07, Alf Eaton <lists hubmed.org> wrote:
>> On 16 Aug 2007, at 17:20, Alf Eaton wrote:
>>
>>> When searching a multiValued field, is it
possible to know which of
>>> the multiple fields the match was in?
>>>
>>> For example if I have an index of documents,
each of which has
>>> multiple image captions stored in separate
fields, I'd like to be
>>> able to link from the search results to the
caption in the original
>>> document.
>>>
>>> One possibility could be attaching metadata to
a field, similar to
>>> payloads for terms. At the moment all I can
think of is adding
>>> metadata inside the stored field and stripping
that out when it's
>>> indexed and displayed, but that's not ideal.
>> Actually on reflection all this would need would be
for the
>> Highlighter to add a field to the response, saying
which item of the
>> multiValued field the match was in. Is that
possible?
>
> Could you perhaps index the captions as
> #1 this is the first caption
> #2 this is the second caption
>
> And then when just look for #n in the highlighted
results?
> For display, you could also strip out the #n in the
captions.
>
This was working ok for a while, but there's a problem: the
highlighter
doesn't return the whole caption - just the highlighted part
- so
sometimes the #n at the start of the caption field doesn't
get returned
and isn't available. Any other ideas? Perhaps there's a way
for the
response to say which fields of each document were matched?
alf
|
|
| Re: Payloads for multiValued fields? |

|
2007-10-24 07:22:47 |
On 10/24/07, Alf Eaton <lists hubmed.org> wrote:
> Yonik Seeley wrote:
> > Could you perhaps index the captions as
> > #1 this is the first caption
> > #2 this is the second caption
> >
> > And then when just look for #n in the highlighted
results?
> > For display, you could also strip out the #n in
the captions.
> >
>
> This was working ok for a while, but there's a problem:
the highlighter
> doesn't return the whole caption - just the highlighted
part - so
> sometimes the #n at the start of the caption field
doesn't get returned
> and isn't available. Any other ideas? Perhaps there's a
way for the
> response to say which fields of each document were
matched?
Perhaps try hl.fragsize=0
ht
tp://wiki.apache.org/solr/HighlightingParameters
-Yonik
|
|
| Re: Payloads for multiValued fields? |
  United States |
2007-10-24 09:10:14 |
Yonik Seeley wrote:
> On 10/24/07, Alf Eaton <lists hubmed.org> wrote:
>> Yonik Seeley wrote:
>>> Could you perhaps index the captions as
>>> #1 this is the first caption
>>> #2 this is the second caption
>>>
>>> And then when just look for #n in the
highlighted results?
>>> For display, you could also strip out the #n in
the captions.
>>>
>> This was working ok for a while, but there's a
problem: the highlighter
>> doesn't return the whole caption - just the
highlighted part - so
>> sometimes the #n at the start of the caption field
doesn't get returned
>> and isn't available. Any other ideas? Perhaps
there's a way for the
>> response to say which fields of each document were
matched?
>
> Perhaps try hl.fragsize=0
>
> ht
tp://wiki.apache.org/solr/HighlightingParameters
Yes, I was just trying that this morning and it's an
improvement, though
not ideal if the field contains a lot of text (in other
words it's still
a suboptimal workaround).
I do think it might be useful for the response to contain an
element
saying which fields were matched by the query, including
which
sub-sections of a multi-valued field were matched.
alf
|
|
| Re: Payloads for multiValued fields? |
  Canada |
2007-10-24 13:36:00 |
On 24-Oct-07, at 7:10 AM, Alf Eaton wrote:
>
> Yes, I was just trying that this morning and it's an
improvement,
> though
> not ideal if the field contains a lot of text (in other
words it's
> still
> a suboptimal workaround).
>
> I do think it might be useful for the response to
contain an element
> saying which fields were matched by the query,
including which
> sub-sections of a multi-valued field were matched.
This isn't readily-accessible information. Text search
engines work
by storing a list of documents and occurrence frequency for
each
document _per term_. At that point, the information about
the
structure of the document is not available.
It is computable given sufficient effort, but certainly not
something
Solr should provide by default.
Have you considered storing each section as a separate Solr
Document?
-Mike
|
|
| Re: Payloads for multiValued fields? |
  United Kingdom |
2007-10-24 14:39:30 |
Mike Klaas wrote:
> On 24-Oct-07, at 7:10 AM, Alf Eaton wrote:
>> Yes, I was just trying that this morning and it's
an improvement, though
>> not ideal if the field contains a lot of text (in
other words it's still
>> a suboptimal workaround).
>>
>> I do think it might be useful for the response to
contain an element
>> saying which fields were matched by the query,
including which
>> sub-sections of a multi-valued field were matched.
>
> This isn't readily-accessible information. Text
search engines work by
> storing a list of documents and occurrence frequency
for each document
> _per term_. At that point, the information about the
structure of the
> document is not available.
The highlighting engine seems to know which fields were
matched by the
query though - enough to be able to use hl.requireFieldMatch
to only
return snippets from matched fields. The highlighter seems
to have a
small problem with snippets reaching across multivalued
fields, but if
that was sorted out then in theory the highlighter should be
able to
tell you which field, and which of the multiple values, was
matched, no?
> Have you considered storing each section as a separate
Solr Document?
I have considered this - in theory it would be easy enough
to create a
separate index just for these items, but it adds an extra
lump of
complexity to the search engine that I'd rather avoid. The
workaround of
adding a marked-up value to the indexed field, setting
hl.fragsize to 0
and parsing out the marked-up value from the highlighted
fragment should
be good enough for now.
alf
|
|
[1-9]
|
|