List Info

Thread: Re: lucene with hadoop but without nutch, looking for documentation




Re: lucene with hadoop but without nutch, looking for documentation
country flaguser name
France
2007-07-19 02:40:16
I'm well aware of the 2 possibilities you're proposing, but
I don't 
think it would fit with the existing software of the company
I'm working 
in. I guess I'll have to crawl among Nutch's guts to find
what I'm 
looking for, and export it. Once I'll have managed this,
I'll try to 
make the tutorial that today lacks for me.
> Nutch is intended to handle large collections.  The
simplest way to get hold
> of large collections is to simply search the web.
>
> But Nutch is not just a web search engine.  It also
provides distributed
> creation of indexes and distributed search which is the
motivation of my
> comment about it being the networked version of
Lucene.
>
> So, while I agree with your statement that Nutch was
"especially designed to
> deal with web documents", but would strongly
disagree that this is a
> limitation.  For one thing, if you actually have gobs
of documents, you
> probably will have to store them in a networked form
somehow.  That
> networked form is probably pretty easy to make
accessible via HTTP and that
> makes a web-oriented search engine like Nutch just what
you need.
>
> Another way to say this is that is if you need a
general purpose
> networked/distributed search engine and you have a
web-oriented distributed
> search engine, you can either adapt the search engine
to not be web
> oriented, or you can adapt your collection to be
web-oriented.
>
>
> On 7/18/07 8:32 AM, "Samuel LEMOINE"
<samuel.lemoinelingway.com> wrote:
>
>   
>> You quote Nutch as being "the networked
version of Lucene", but from
>> what I've seen it's more precise than that,
especially designed to deal
>> with web documents... am I wrong assuming this ?
>>     
>
>
>   


[1]

about | contact  Other archives ( Real Estate discussion Medical topics )