List Info

Thread: Re: get doc/query similarity




Re: get doc/query similarity
country flaguser name
United States
2008-04-16 09:43:45
> From: Marvin Humphrey <marvinrectangular.com>
> 
> Neat.  Not that this is what you're doing, but I can
imagine something  
> like this being used as a supervisory tool for people
who get paid for  
> generating content when the primary criteria is volume
rather than  
> quality.  Copy-and-paste documents with minor
variations would appear  
> tightly grouped in vector space.

Bingo. There are many situations where the authors are
generating content under incentives, and where quality may
be degraded in favor of volume or some other parameters.

> Please let us know how it goes.

Will do, and thanks for your time and advice. I still think
it'd be nice if KS exposed such a similarity computation via
the API; it'd be much more efficient that way.




     
____________________________________________________________
________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9
tAcJ


_______________________________________________
KinoSearch mailing list
KinoSearchrectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch


Re: get doc/query similarity
country flaguser name
United States
2008-04-16 13:04:20
On Apr 16, 2008, at 7:43 AM, jack_tanneryahoo.com
wrote:

> Will do, and thanks for your time and advice. I still
think it'd be  
> nice if KS exposed such a similarity computation via
the API; it'd  
> be much more efficient that way.

I agree, and I would have liked to have discussed that.  Had
you not  
been constrained by having to use the maint branch, I might
have  
steered things in that direction.

A lot of best work on KS, both high-level design and
low-level code,  
has arisen from collaborations between myself and someone
who has an  
itch to scratch.  I'm always on the lookout for such
potential partners.

In your case, though, my impression was that you were quite 

knowledgeable, but that your project did not need the devel
branch  
badly enough to guarantee sustained momentum over the course
of what  
would likely be a drawn-out design discussion.

Exposing similarity measures would be superficially easy --
all the  
relevant material is in KinoSearch::Search::Similarity. 
However, the  
actual APIs to interface with the math in Similarity are
internal and  
not set up for use the way you described your needs.  The
bigger  
problems were how to get at "an indexed document",
how to list its  
terms, and so on, outside of the context of the existing
search API.

Marvin Humphrey
Rectangular Research
http://www.rectangular.co
m/


_______________________________________________
KinoSearch mailing list
KinoSearchrectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch


[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )