List Info

Thread: Created: (SOLR-236) Field collapsing




Created: (SOLR-236) Field collapsing
country flaguser name
United States
2007-05-11 17:14:16
Field collapsing
----------------

                 Key: SOLR-236
                 URL: https:
//issues.apache.org/jira/browse/SOLR-236
             Project: Solr
          Issue Type: New Feature
          Components: search
    Affects Versions: 1.2
            Reporter: Emmanuel Keller


This patch include a new feature called "Field
collapsing".

"Used in order to collapse a group of results with
similar value for a given field to a single entry in the
result set. Site collapsing is a special case of this, where
all results for a given web site is collapsed into one or
two entries in the result set, typically with an associated
"more documents from this site" link. See also
Duplicate detection."
http://www.fastsearch.com/glossary.aspx?m=48&amid=2
99

The implementation add 3 new query parameters (SolrParams):
"collapse" set to true to enable collapsing.
"collapse.field" to choose the field used to group
results
"collapse.max" to select how many continuous
results are allowed before collapsing

TODO (in progress):
- More documentation (on source code)
- Test cases


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


Updated: (SOLR-236) Field collapsing
country flaguser name
United States
2007-05-11 17:16:15
     [ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atla
ssian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Emmanuel Keller updated SOLR-236:
---------------------------------

    Attachment: collapse_field.patch

Field Collapsing

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https:
//issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: collapse_field.patch
>
>
> This patch include a new feature called "Field
collapsing".
> "Used in order to collapse a group of results with
similar value for a given field to a single entry in the
result set. Site collapsing is a special case of this, where
all results for a given web site is collapsed into one or
two entries in the result set, typically with an associated
"more documents from this site" link. See also
Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=2
99
> The implementation add 3 new query parameters
(SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to
group results
> "collapse.max" to select how many continuous
results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


Updated: (SOLR-236) Field collapsing
country flaguser name
United States
2007-05-11 17:50:15
     [ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atla
ssian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Emmanuel Keller updated SOLR-236:
---------------------------------

    Attachment: collapse_field.patch

Remplacing HashDocSet by BitDocSet for hasMoreResult for
better performances

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https:
//issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: collapse_field.patch,
collapse_field.patch
>
>
> This patch include a new feature called "Field
collapsing".
> "Used in order to collapse a group of results with
similar value for a given field to a single entry in the
result set. Site collapsing is a special case of this, where
all results for a given web site is collapsed into one or
two entries in the result set, typically with an associated
"more documents from this site" link. See also
Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=2
99
> The implementation add 3 new query parameters
(SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to
group results
> "collapse.max" to select how many continuous
results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


Commented: (SOLR-236) Field collapsing
country flaguser name
United States
2007-05-12 20:59:15
    [ https://issues.apache.org/jira/browse/SO
LR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:
comment-tabpanel#action_12495334 ] 

Ryan McKinley commented on SOLR-236:
------------------------------------

This looks good.  Someone with better lucene chops should
look at the IndexSearcher getDocListAndSet part...

A few comments/questions about the interface:

If you apply all the example docs and hit:
http://localhost:8983/solr/select/?q=*&collaps
e=true

you get 500.  We should use:  params.required().get(
"collapse.field" ) to have a nicer error:

With:
http://lo
calhost:8983/solr/select/?q=*&collaps
e=true&collapse.field=manu&collapse.max=1

the collapse info at the bottom says:

<lst name="collapse_counts">
 <int name="has_more_results">3</int>
 <int name="has_more_results">5</int>
 <int name="has_more_results">9</int>
</lst>

what does that mean?  How would you use it? How does it
relate to the <result docs?









> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https:
//issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: collapse_field.patch,
collapse_field.patch
>
>
> This patch include a new feature called "Field
collapsing".
> "Used in order to collapse a group of results with
similar value for a given field to a single entry in the
result set. Site collapsing is a special case of this, where
all results for a given web site is collapsed into one or
two entries in the result set, typically with an associated
"more documents from this site" link. See also
Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=2
99
> The implementation add 3 new query parameters
(SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to
group results
> "collapse.max" to select how many continuous
results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


Commented: (SOLR-236) Field collapsing
country flaguser name
United States
2007-05-13 06:03:15
    [ https://issues.apache.org/jira/browse/SO
LR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:
comment-tabpanel#action_12495356 ] 

Emmanuel Keller commented on SOLR-236:
--------------------------------------

My turn to miss something ;)
You are right, we have to use
params.required().get("collapse.field"). 

About collapse info:
<int name="has_more_results">3</int>
means that the third doc of the result has been collapsed
and that some consecutive results having same field has been
removed.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https:
//issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: collapse_field.patch,
collapse_field.patch
>
>
> This patch include a new feature called "Field
collapsing".
> "Used in order to collapse a group of results with
similar value for a given field to a single entry in the
result set. Site collapsing is a special case of this, where
all results for a given web site is collapsed into one or
two entries in the result set, typically with an associated
"more documents from this site" link. See also
Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=2
99
> The implementation add 3 new query parameters
(SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to
group results
> "collapse.max" to select how many continuous
results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


Commented: (SOLR-236) Field collapsing
country flaguser name
United States
2007-05-13 09:47:15
    [ https://issues.apache.org/jira/browse/SO
LR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:
comment-tabpanel#action_12495368 ] 

Yonik Seeley commented on SOLR-236:
-----------------------------------

A good starting point might be here:
http://www.nabble.com/result-grouping--tf2910425.h
tml#a8131895

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https:
//issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: collapse_field.patch,
collapse_field.patch
>
>
> This patch include a new feature called "Field
collapsing".
> "Used in order to collapse a group of results with
similar value for a given field to a single entry in the
result set. Site collapsing is a special case of this, where
all results for a given web site is collapsed into one or
two entries in the result set, typically with an associated
"more documents from this site" link. See also
Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=2
99
> The implementation add 3 new query parameters
(SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to
group results
> "collapse.max" to select how many continuous
results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


Commented: (SOLR-236) Field collapsing
country flaguser name
United States
2007-05-13 09:47:15
    [ https://issues.apache.org/jira/browse/SO
LR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:
comment-tabpanel#action_12495367 ] 

Yonik Seeley commented on SOLR-236:
-----------------------------------

Thanks for looking into this Emmanuel.
It appears as if this only collapses adjacent documents,
correct?

We should really try to get everyone on the same page...
hash out the exact semantics of "collapsing", and
the most useful interface.  An efficient implementation  can
follow.

A good starting point might be here:

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https:
//issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: collapse_field.patch,
collapse_field.patch
>
>
> This patch include a new feature called "Field
collapsing".
> "Used in order to collapse a group of results with
similar value for a given field to a single entry in the
result set. Site collapsing is a special case of this, where
all results for a given web site is collapsed into one or
two entries in the result set, typically with an associated
"more documents from this site" link. See also
Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=2
99
> The implementation add 3 new query parameters
(SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to
group results
> "collapse.max" to select how many continuous
results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


Commented: (SOLR-236) Field collapsing
country flaguser name
United States
2007-05-13 10:53:15
    [ https://issues.apache.org/jira/browse/SO
LR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:
comment-tabpanel#action_12495376 ] 

Emmanuel Keller commented on SOLR-236:
--------------------------------------

Yonik,

You are right, only adjacent documents are collapsed. 
I work on a large index ( 2.000.000 documents) growing every
day.  The first goal was to group results, preserving score
ranking and achieving good performances.  This
"light" implementation meets our needs.
I am currently working on a second implementation taking
care of the semantics.

P.S.: Congratulations for this great application.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https:
//issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: collapse_field.patch,
collapse_field.patch
>
>
> This patch include a new feature called "Field
collapsing".
> "Used in order to collapse a group of results with
similar value for a given field to a single entry in the
result set. Site collapsing is a special case of this, where
all results for a given web site is collapsed into one or
two entries in the result set, typically with an associated
"more documents from this site" link. See also
Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=2
99
> The implementation add 3 new query parameters
(SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to
group results
> "collapse.max" to select how many continuous
results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


Updated: (SOLR-236) Field collapsing
country flaguser name
United States
2007-05-13 16:10:15
     [ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atla
ssian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Emmanuel Keller updated SOLR-236:
---------------------------------

    Attachment: field_collapsing.patch

This release is more conform with the semantics of
"field collapsing".

Parameters are:

collapse=true                   // enable collapsing
collapse.field=[field]       // indexed field used for
collapsing
collapse.max=[integer]  // Start collapsing after n
document
collapse.type=[normal|adjacent] // Default value is
"normal"

- "adjacent" collapse only consecutive documents.
- "normal" collapse all documents having equal
collapsing field.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https:
//issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: collapse_field.patch,
collapse_field.patch, field_collapsing.patch
>
>
> This patch include a new feature called "Field
collapsing".
> "Used in order to collapse a group of results with
similar value for a given field to a single entry in the
result set. Site collapsing is a special case of this, where
all results for a given web site is collapsed into one or
two entries in the result set, typically with an associated
"more documents from this site" link. See also
Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=2
99
> The implementation add 3 new query parameters
(SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to
group results
> "collapse.max" to select how many continuous
results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


Updated: (SOLR-236) Field collapsing
country flaguser name
United States
2007-05-14 04:28:16
     [ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atla
ssian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Emmanuel Keller updated SOLR-236:
---------------------------------

    Attachment: field_collapsing.patch

Corrects a bug on the previous version when using a value
greater than 1 as collapse.max parameter.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https:
//issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: collapse_field.patch,
collapse_field.patch, field_collapsing.patch,
field_collapsing.patch
>
>
> This patch include a new feature called "Field
collapsing".
> "Used in order to collapse a group of results with
similar value for a given field to a single entry in the
result set. Site collapsing is a special case of this, where
all results for a given web site is collapsed into one or
two entries in the result set, typically with an associated
"more documents from this site" link. See also
Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=2
99
> The implementation add 3 new query parameters
(SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to
group results
> "collapse.max" to select how many continuous
results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


[1-10] [11-20] [21-30] [31-40] [41-50] [51-56]

about | contact  Other archives ( Real Estate discussion Medical topics )