List Info

Thread: Re: Indexing documents




Re: Indexing documents
country flaguser name
United States
2007-10-19 14:04:29
Hi Payo,

You need to add the right plugin to your nutch configuration
file. Here is an extraction from my installation:

NUTCH_HOMEconfnutch-site.xml:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl"
href="configuration.xsl"?>
<configuration>
 <property>
   <name>plugin.includes</name>
  
<value>nutch-extensionpoints|ontology|protocol-ftp|pro
tocol-httpclient|urlfilter-regex|parse-(text|html|pdf|rtf|ms
word|js|mspowerpoint|msexcel|oo|rss)|index-(basic|more)|quer
y-(basic|site|url|more)|summary-lucene|scoring-opic</valu
e>
 </property>
...

Using the above configuration, I am able to index text,
html, pbd, excel, etc.

Not sure about XML, I think there is already an enhacement
request for this in JIRA. 

I hope this helps,

Sergio

----- Original Message ----
From: payo <payo22yahoo.com>
To: nutch-userlucene.apache.org
Sent: Friday, 19 October, 2007 4:16:20 PM
Subject: Re: Indexing documents




Goethe wrote:
> 
> 
> 
> payo wrote:
>> 
>> Hi
>> 
>> my questions are
>> 
>> 1.- Nutch can index documents PDF, HTML and XML?
>> 
>> 2.- Nutxh can index remote documents?
>> 
>> thanks
>> 
> 
> Yes to both questions, and for the first question Nutch
already comes with
> the plugins necessary to index those files types.
> 
> 

where i can obtain information on this?

-- 
View this message in context: http://www.nabble.com/Indexing-documents-tf4653
264.html#a13295436
Sent from the Nutch - User mailing list archive at
Nabble.com.


     
___________________________________________________________ 
Want ideas for reducing your carbon footprint? Visit Yahoo!
For Good  http://uk.promotions.yahoo.com/forgood/environment.html
[1]

about | contact  Other archives ( Real Estate discussion Medical topics )