where are you from Sergio?
Sergio Morales wrote:
>
> Hi Payo,
>
> You need to add the right plugin to your nutch
configuration file. Here is
> an extraction from my installation:
>
> NUTCH_HOMEconfnutch-site.xml:
>
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl"
href="configuration.xsl"?>
> <configuration>
> <property>
> <name>plugin.includes</name>
>
>
<value>nutch-extensionpoints|ontology|protocol-ftp|pro
tocol-httpclient|urlfilter-regex|parse-(text|html|pdf|rtf|ms
word|js|mspowerpoint|msexcel|oo|rss)|index-(basic|more)|quer
y-(basic|site|url|more)|summary-lucene|scoring-opic</valu
e>
> </property>
> ...
>
> Using the above configuration, I am able to index text,
html, pbd, excel,
> etc.
>
> Not sure about XML, I think there is already an
enhacement request for
> this in JIRA.
>
> I hope this helps,
>
> Sergio
>
> ----- Original Message ----
> From: payo <payo22 yahoo.com>
> To: nutch-user lucene.apache.org
> Sent: Friday, 19 October, 2007 4:16:20 PM
> Subject: Re: Indexing documents
>
>
>
>
> Goethe wrote:
>>
>>
>>
>> payo wrote:
>>>
>>> Hi
>>>
>>> my questions are
>>>
>>> 1.- Nutch can index documents PDF, HTML and
XML?
>>>
>>> 2.- Nutxh can index remote documents?
>>>
>>> thanks
>>>
>>
>> Yes to both questions, and for the first question
Nutch already comes
>> with
>> the plugins necessary to index those files types.
>>
>>
>
> where i can obtain information on this?
>
> --
> View this message in context:
> http://www.nabble.com/Indexing-documents-tf4653
264.html#a13295436
> Sent from the Nutch - User mailing list archive at
Nabble.com.
>
>
>
___________________________________________________________
> Want ideas for reducing your carbon footprint? Visit
Yahoo! For Good
> http://uk.promotions.yahoo.com/forgood/environment.html
a>
>
--
View this message in context: http://www.nabble.com/Indexing-documents-tf4653
264.html#a13302250
Sent from the Nutch - User mailing list archive at
Nabble.com.
|