On Feb 23, 2007, at 10:13 AM, Gal Nitzan wrote:
> Hi,
>
> Since I ran into SOLR project the other day I was
wandering that
> maybe SOLR
> could be Nutch's SE...
We've been using it since I realized we could not get the
Lucene
queries we wanted out of the Nutch OpenSearch SE. Solr also
integrates a lot better with back end development as it
outputs in
JSON, XML and Python. It's even more valuable with Sami's
SolrIndexer
patch that he has on his blog-- Nutch indexing now goes
straight to
Solr. It's very fast and so far robust -- I've lazily
crawled 600K
pages on a single CPU (with stored content!) in the past few
days
after integrating Sami's stuff with no obvious problems
yet.
From what I can tell, you lose some of the advanced Nutch
scoring
features. We get the document boost (Sami, I owe you a
patch) but
that's about it. The Nutch SE is also a great "Google
in a Box" setup
for people that want that. For that reason I am not so sure
Solr
should replace Nutch's SE. Solr is more useful for people
that want
to do something programatically or queries more complex than
"AND"
with the Nutch. There's no search front end in Solr other
than the
admin interface.
Would love to see Sami's patch in trunk as an indexing
plugin, though.
-Brian
|