List Info

Thread: Group by in Lucene ?




Group by in Lucene ?
user name
2007-11-04 23:57:27
Hi.

I have a situation where I'm searching amongst some 100K
feeds and only want
one result per site in return. I have developed a really
simple method of
grouping which just scrolls through the resultset(hitset)
until a maxNum
docs of feeds from a set of unique sites is populated. Since
I don't wanna
reinvent the wheel, I want to know if Lucene has something
like this built.
I as well will use Solr soon and then my own homecooked
recipe will not work
so I really need a standard way of doing this.

I know Nutch has something like it called depupField which
default is set to
2.

Anyone?


Kindly

//Marcus

-- 
Marcus Herou Solution Architect & Core Java developer
Tailsweep AB
+46702561312
marcus.heroutailsweep.com
http://www.tailsweep.com

Re: Group by in Lucene ?
user name
2007-11-05 06:01:23
Solr has an issue outstanding right now that implements
something that  
may be close to what you want.  They are calling it Field
Collapsing.   
See https:
//issues.apache.org/jira/browse/SOLR-236

-Grant

On Nov 5, 2007, at 12:57 AM, Marcus Herou wrote:

> Hi.
>
> I have a situation where I'm searching amongst some
100K feeds and  
> only want
> one result per site in return. I have developed a
really simple  
> method of
> grouping which just scrolls through the
resultset(hitset) until a  
> maxNum
> docs of feeds from a set of unique sites is populated.
Since I don't  
> wanna
> reinvent the wheel, I want to know if Lucene has
something like this  
> built.
> I as well will use Solr soon and then my own homecooked
recipe will  
> not work
> so I really need a standard way of doing this.
>
> I know Nutch has something like it called depupField
which default  
> is set to
> 2.
>
> Anyone?
>
>
> Kindly
>
> //Marcus
>
> -- 
> Marcus Herou Solution Architect & Core Java
developer Tailsweep AB
> +46702561312
> marcus.heroutailsweep.com
> http://www.tailsweep.com


--------------------------
Grant Ingersoll
http://lucene.granti
ngersoll.com

Lucene Boot Camp Training:
ApacheCon Atlanta, Nov. 12, 2007.  Sign up now!  http://www.apachecon.com


Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance

http://w
iki.apache.org/lucene-java/LuceneFAQ



------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org


Re: Group by in Lucene ?
user name
2007-11-05 06:49:09
Thanks. They seem to have got real far in the dev cycle on
this. Seems like
it will hit the road in Solr 1.3.

However I would really like this feature to be developed for
Core Lucene,
how do I start that process?
Develop it yourself you would say  I'm
serious isn't it a really cool and
useful feature ?

Kindly

//Marcus

On 11/5/07, Grant Ingersoll <gsingersapache.org> wrote:
>
> Solr has an issue outstanding right now that implements
something that
> may be close to what you want.  They are calling it
Field Collapsing.
> See https:
//issues.apache.org/jira/browse/SOLR-236
>
> -Grant
>
> On Nov 5, 2007, at 12:57 AM, Marcus Herou wrote:
>
> > Hi.
> >
> > I have a situation where I'm searching amongst
some 100K feeds and
> > only want
> > one result per site in return. I have developed a
really simple
> > method of
> > grouping which just scrolls through the
resultset(hitset) until a
> > maxNum
> > docs of feeds from a set of unique sites is
populated. Since I don't
> > wanna
> > reinvent the wheel, I want to know if Lucene has
something like this
> > built.
> > I as well will use Solr soon and then my own
homecooked recipe will
> > not work
> > so I really need a standard way of doing this.
> >
> > I know Nutch has something like it called
depupField which default
> > is set to
> > 2.
> >
> > Anyone?
> >
> >
> > Kindly
> >
> > //Marcus
> >
> > --
> > Marcus Herou Solution Architect & Core Java
developer Tailsweep AB
> > +46702561312
> > marcus.heroutailsweep.com
> > http://www.tailsweep.com

>
> --------------------------
> Grant Ingersoll
> http://lucene.granti
ngersoll.com
>
> Lucene Boot Camp Training:
> ApacheCon Atlanta, Nov. 12, 2007.  Sign up now!  http://www.apachecon.com

>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance

> http://w
iki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
> For additional commands, e-mail: java-user-helplucene.apache.org
>
>


-- 
Marcus Herou Solution Architect & Core Java developer
Tailsweep AB
+46702561312
marcus.heroutailsweep.com
http://www.tailsweep.com

Re: Group by in Lucene ?
user name
2007-11-05 15:03:08
On Nov 5, 2007, at 7:49 AM, Marcus Herou wrote:

> Thanks. They seem to have got real far in the dev cycle
on this.  
> Seems like
> it will hit the road in Solr 1.3.
>
> However I would really like this feature to be
developed for Core  
> Lucene,
> how do I start that process?
> Develop it yourself you would say  I'm
serious isn't it a really  
> cool and
> useful feature ?


We're always open to well-thought out and tested patches. 
See the  
Wiki for info on contributing.

-Grant


--------------------------
Grant Ingersoll
http://lucene.granti
ngersoll.com

Lucene Boot Camp Training:
ApacheCon Atlanta, Nov. 12, 2007.  Sign up now!  http://www.apachecon.com


Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance

http://w
iki.apache.org/lucene-java/LuceneFAQ



------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org


Re: Group by in Lucene ?
user name
2007-11-06 02:19:47
Cool.

I'll do since this is a field which I can spend time in.

Kindly

//Marcus
On 11/5/07, Grant Ingersoll <gsingersapache.org> wrote:
>
>
> On Nov 5, 2007, at 7:49 AM, Marcus Herou wrote:
>
> > Thanks. They seem to have got real far in the dev
cycle on this.
> > Seems like
> > it will hit the road in Solr 1.3.
> >
> > However I would really like this feature to be
developed for Core
> > Lucene,
> > how do I start that process?
> > Develop it yourself you would say  I'm
serious isn't it a really
> > cool and
> > useful feature ?
>
>
> We're always open to well-thought out and tested
patches.  See the
> Wiki for info on contributing.
>
> -Grant
>
>
> --------------------------
> Grant Ingersoll
> http://lucene.granti
ngersoll.com
>
> Lucene Boot Camp Training:
> ApacheCon Atlanta, Nov. 12, 2007.  Sign up now!  http://www.apachecon.com

>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance

> http://w
iki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
> For additional commands, e-mail: java-user-helplucene.apache.org
>
>


-- 
Marcus Herou Solution Architect & Core Java developer
Tailsweep AB
+46702561312
marcus.heroutailsweep.com
http://www.tailsweep.com

[1-5]

about | contact  Other archives ( Real Estate discussion Medical topics )