|
|
| Created: (SOLR-236) Field collapsing |
  United States |
2007-05-11 17:14:16 |
Field collapsing
----------------
Key: SOLR-236
URL: https:
//issues.apache.org/jira/browse/SOLR-236
Project: Solr
Issue Type: New Feature
Components: search
Affects Versions: 1.2
Reporter: Emmanuel Keller
This patch include a new feature called "Field
collapsing".
"Used in order to collapse a group of results with
similar value for a given field to a single entry in the
result set. Site collapsing is a special case of this, where
all results for a given web site is collapsed into one or
two entries in the result set, typically with an associated
"more documents from this site" link. See also
Duplicate detection."
http://www.fastsearch.com/glossary.aspx?m=48&amid=2
99
The implementation add 3 new query parameters (SolrParams):
"collapse" set to true to enable collapsing.
"collapse.field" to choose the field used to group
results
"collapse.max" to select how many continuous
results are allowed before collapsing
TODO (in progress):
- More documentation (on source code)
- Test cases
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
|
|
| Updated: (SOLR-236) Field collapsing |
  United States |
2007-05-11 17:16:15 |
[
https://issues.apache.org/jira/browse/SOLR-236?page=com.atla
ssian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Emmanuel Keller updated SOLR-236:
---------------------------------
Attachment: collapse_field.patch
Field Collapsing
> Field collapsing
> ----------------
>
> Key: SOLR-236
> URL: https:
//issues.apache.org/jira/browse/SOLR-236
> Project: Solr
> Issue Type: New Feature
> Components: search
> Affects Versions: 1.2
> Reporter: Emmanuel Keller
> Attachments: collapse_field.patch
>
>
> This patch include a new feature called "Field
collapsing".
> "Used in order to collapse a group of results with
similar value for a given field to a single entry in the
result set. Site collapsing is a special case of this, where
all results for a given web site is collapsed into one or
two entries in the result set, typically with an associated
"more documents from this site" link. See also
Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=2
99
> The implementation add 3 new query parameters
(SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to
group results
> "collapse.max" to select how many continuous
results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
|
|
| Updated: (SOLR-236) Field collapsing |
  United States |
2007-05-11 17:50:15 |
[
https://issues.apache.org/jira/browse/SOLR-236?page=com.atla
ssian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Emmanuel Keller updated SOLR-236:
---------------------------------
Attachment: collapse_field.patch
Remplacing HashDocSet by BitDocSet for hasMoreResult for
better performances
> Field collapsing
> ----------------
>
> Key: SOLR-236
> URL: https:
//issues.apache.org/jira/browse/SOLR-236
> Project: Solr
> Issue Type: New Feature
> Components: search
> Affects Versions: 1.2
> Reporter: Emmanuel Keller
> Attachments: collapse_field.patch,
collapse_field.patch
>
>
> This patch include a new feature called "Field
collapsing".
> "Used in order to collapse a group of results with
similar value for a given field to a single entry in the
result set. Site collapsing is a special case of this, where
all results for a given web site is collapsed into one or
two entries in the result set, typically with an associated
"more documents from this site" link. See also
Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=2
99
> The implementation add 3 new query parameters
(SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to
group results
> "collapse.max" to select how many continuous
results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
|
|
| Commented: (SOLR-236) Field collapsing |
  United States |
2007-05-12 20:59:15 |
[ https://issues.apache.org/jira/browse/SO
LR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:
comment-tabpanel#action_12495334 ]
Ryan McKinley commented on SOLR-236:
------------------------------------
This looks good. Someone with better lucene chops should
look at the IndexSearcher getDocListAndSet part...
A few comments/questions about the interface:
If you apply all the example docs and hit:
http://localhost:8983/solr/select/?q=* &collaps
e=true
you get 500. We should use: params.required().get(
"collapse.field" ) to have a nicer error:
With:
http://lo
calhost:8983/solr/select/?q=* &collaps
e=true&collapse.field=manu&collapse.max=1
the collapse info at the bottom says:
<lst name="collapse_counts">
<int name="has_more_results">3</int>
<int name="has_more_results">5</int>
<int name="has_more_results">9</int>
</lst>
what does that mean? How would you use it? How does it
relate to the <result docs?
> Field collapsing
> ----------------
>
> Key: SOLR-236
> URL: https:
//issues.apache.org/jira/browse/SOLR-236
> Project: Solr
> Issue Type: New Feature
> Components: search
> Affects Versions: 1.2
> Reporter: Emmanuel Keller
> Attachments: collapse_field.patch,
collapse_field.patch
>
>
> This patch include a new feature called "Field
collapsing".
> "Used in order to collapse a group of results with
similar value for a given field to a single entry in the
result set. Site collapsing is a special case of this, where
all results for a given web site is collapsed into one or
two entries in the result set, typically with an associated
"more documents from this site" link. See also
Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=2
99
> The implementation add 3 new query parameters
(SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to
group results
> "collapse.max" to select how many continuous
results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
|
|
| Commented: (SOLR-236) Field collapsing |
  United States |
2007-05-13 06:03:15 |
[ https://issues.apache.org/jira/browse/SO
LR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:
comment-tabpanel#action_12495356 ]
Emmanuel Keller commented on SOLR-236:
--------------------------------------
My turn to miss something ;)
You are right, we have to use
params.required().get("collapse.field").
About collapse info:
<int name="has_more_results">3</int>
means that the third doc of the result has been collapsed
and that some consecutive results having same field has been
removed.
> Field collapsing
> ----------------
>
> Key: SOLR-236
> URL: https:
//issues.apache.org/jira/browse/SOLR-236
> Project: Solr
> Issue Type: New Feature
> Components: search
> Affects Versions: 1.2
> Reporter: Emmanuel Keller
> Attachments: collapse_field.patch,
collapse_field.patch
>
>
> This patch include a new feature called "Field
collapsing".
> "Used in order to collapse a group of results with
similar value for a given field to a single entry in the
result set. Site collapsing is a special case of this, where
all results for a given web site is collapsed into one or
two entries in the result set, typically with an associated
"more documents from this site" link. See also
Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=2
99
> The implementation add 3 new query parameters
(SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to
group results
> "collapse.max" to select how many continuous
results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
|
|
| Commented: (SOLR-236) Field collapsing |
  United States |
2007-05-13 09:47:15 |
[ https://issues.apache.org/jira/browse/SO
LR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:
comment-tabpanel#action_12495368 ]
Yonik Seeley commented on SOLR-236:
-----------------------------------
A good starting point might be here:
http://www.nabble.com/result-grouping--tf2910425.h
tml#a8131895
> Field collapsing
> ----------------
>
> Key: SOLR-236
> URL: https:
//issues.apache.org/jira/browse/SOLR-236
> Project: Solr
> Issue Type: New Feature
> Components: search
> Affects Versions: 1.2
> Reporter: Emmanuel Keller
> Attachments: collapse_field.patch,
collapse_field.patch
>
>
> This patch include a new feature called "Field
collapsing".
> "Used in order to collapse a group of results with
similar value for a given field to a single entry in the
result set. Site collapsing is a special case of this, where
all results for a given web site is collapsed into one or
two entries in the result set, typically with an associated
"more documents from this site" link. See also
Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=2
99
> The implementation add 3 new query parameters
(SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to
group results
> "collapse.max" to select how many continuous
results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
|
|
| Commented: (SOLR-236) Field collapsing |
  United States |
2007-05-13 09:47:15 |
[ https://issues.apache.org/jira/browse/SO
LR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:
comment-tabpanel#action_12495367 ]
Yonik Seeley commented on SOLR-236:
-----------------------------------
Thanks for looking into this Emmanuel.
It appears as if this only collapses adjacent documents,
correct?
We should really try to get everyone on the same page...
hash out the exact semantics of "collapsing", and
the most useful interface. An efficient implementation can
follow.
A good starting point might be here:
> Field collapsing
> ----------------
>
> Key: SOLR-236
> URL: https:
//issues.apache.org/jira/browse/SOLR-236
> Project: Solr
> Issue Type: New Feature
> Components: search
> Affects Versions: 1.2
> Reporter: Emmanuel Keller
> Attachments: collapse_field.patch,
collapse_field.patch
>
>
> This patch include a new feature called "Field
collapsing".
> "Used in order to collapse a group of results with
similar value for a given field to a single entry in the
result set. Site collapsing is a special case of this, where
all results for a given web site is collapsed into one or
two entries in the result set, typically with an associated
"more documents from this site" link. See also
Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=2
99
> The implementation add 3 new query parameters
(SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to
group results
> "collapse.max" to select how many continuous
results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
|
|
| Commented: (SOLR-236) Field collapsing |
  United States |
2007-05-13 10:53:15 |
[ https://issues.apache.org/jira/browse/SO
LR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:
comment-tabpanel#action_12495376 ]
Emmanuel Keller commented on SOLR-236:
--------------------------------------
Yonik,
You are right, only adjacent documents are collapsed.
I work on a large index ( 2.000.000 documents) growing every
day. The first goal was to group results, preserving score
ranking and achieving good performances. This
"light" implementation meets our needs.
I am currently working on a second implementation taking
care of the semantics.
P.S.: Congratulations for this great application.
> Field collapsing
> ----------------
>
> Key: SOLR-236
> URL: https:
//issues.apache.org/jira/browse/SOLR-236
> Project: Solr
> Issue Type: New Feature
> Components: search
> Affects Versions: 1.2
> Reporter: Emmanuel Keller
> Attachments: collapse_field.patch,
collapse_field.patch
>
>
> This patch include a new feature called "Field
collapsing".
> "Used in order to collapse a group of results with
similar value for a given field to a single entry in the
result set. Site collapsing is a special case of this, where
all results for a given web site is collapsed into one or
two entries in the result set, typically with an associated
"more documents from this site" link. See also
Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=2
99
> The implementation add 3 new query parameters
(SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to
group results
> "collapse.max" to select how many continuous
results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
|
|
| Updated: (SOLR-236) Field collapsing |
  United States |
2007-05-13 16:10:15 |
[
https://issues.apache.org/jira/browse/SOLR-236?page=com.atla
ssian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Emmanuel Keller updated SOLR-236:
---------------------------------
Attachment: field_collapsing.patch
This release is more conform with the semantics of
"field collapsing".
Parameters are:
collapse=true // enable collapsing
collapse.field=[field] // indexed field used for
collapsing
collapse.max=[integer] // Start collapsing after n
document
collapse.type=[normal|adjacent] // Default value is
"normal"
- "adjacent" collapse only consecutive documents.
- "normal" collapse all documents having equal
collapsing field.
> Field collapsing
> ----------------
>
> Key: SOLR-236
> URL: https:
//issues.apache.org/jira/browse/SOLR-236
> Project: Solr
> Issue Type: New Feature
> Components: search
> Affects Versions: 1.2
> Reporter: Emmanuel Keller
> Attachments: collapse_field.patch,
collapse_field.patch, field_collapsing.patch
>
>
> This patch include a new feature called "Field
collapsing".
> "Used in order to collapse a group of results with
similar value for a given field to a single entry in the
result set. Site collapsing is a special case of this, where
all results for a given web site is collapsed into one or
two entries in the result set, typically with an associated
"more documents from this site" link. See also
Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=2
99
> The implementation add 3 new query parameters
(SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to
group results
> "collapse.max" to select how many continuous
results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
|
|
| Updated: (SOLR-236) Field collapsing |
  United States |
2007-05-14 04:28:16 |
[
https://issues.apache.org/jira/browse/SOLR-236?page=com.atla
ssian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Emmanuel Keller updated SOLR-236:
---------------------------------
Attachment: field_collapsing.patch
Corrects a bug on the previous version when using a value
greater than 1 as collapse.max parameter.
> Field collapsing
> ----------------
>
> Key: SOLR-236
> URL: https:
//issues.apache.org/jira/browse/SOLR-236
> Project: Solr
> Issue Type: New Feature
> Components: search
> Affects Versions: 1.2
> Reporter: Emmanuel Keller
> Attachments: collapse_field.patch,
collapse_field.patch, field_collapsing.patch,
field_collapsing.patch
>
>
> This patch include a new feature called "Field
collapsing".
> "Used in order to collapse a group of results with
similar value for a given field to a single entry in the
result set. Site collapsing is a special case of this, where
all results for a given web site is collapsed into one or
two entries in the result set, typically with an associated
"more documents from this site" link. See also
Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=2
99
> The implementation add 3 new query parameters
(SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to
group results
> "collapse.max" to select how many continuous
results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
|
|