List Info

Thread: How to add data into segment with my own plugin ?




How to add data into segment with my own plugin ?
country flaguser name
United States
2007-02-23 10:22:31
I'm making a plugin to parse specific data from web page,
and after I will
export dump segments into a specific SQL database. So I wont
use the
indexing/lucense part I think.

I tried to do my plugin using the wiki tutorial, It works
well for the parse
part, (and see it in the log).
(I'm just adding some content into the parseData object)
But I don't see how to add my parsed data in a segment.

I see that a segment have this kind of data for a page
(correct me if i'm
worng):
CrawlDatnum
Content
ParseData
ParseText

for exemple I would like to add some informations in
"ParseData" but I can't
figure out how.

thanks for your help, i'm quite new using nutch.
-- 
View this message in context: http://www.nabble
.com/How-to-add-data-into-segment-with-my-own-plugin---tf327
9715.html#a9121761
Sent from the Nutch - Dev mailing list archive at
Nabble.com.


RE: How to add data into segment with my own plugin ?
country flaguser name
United States
2007-02-26 10:37:44
The kind of information I want to add is like dozens of
"flags page
features", so no big data but many parts. Like
"flag_content=xhtml;
flag_page-size=xx; flag_page-depth=xx;
list_picture_files=...". 

I saw that the parse object fields looks quite rigid, and I
would have to do
lots of modifications to add my data in it.
( public ParseData(ParseStatus status, String title,
Outlink[] outlinks,
Metadata contentMeta)  )
So I think it's better that I put this data in the Index
using Metadata
objects.

But, adding like 20 to 40 small flags as Metadata in the
index:
- is it a waste of performance for nutch index bulding
process? (since I'll
not use these flags to do specific queries but just a global
export of all)
- is it easier to export (make a link with) Metadata from
index to a SQLbase
than Segments data? 



Jeremy Huylebroeck wrote:
> 
> 
> As far as I know, you can do this.
> 
> You can either add things in the Metadata objects, but
it is limited to
> String values.
> 
> Or you can extend the Parse object, have a different
OutputFormat for it
> that would read/write your information from the
segments.
> 
> The fetcher/parser would have to be modified slightly,
but nothing hard
> to do.
> We did something around those lines, and it works
perfectly in Nutch
> 0.8.
> 
> 
> 
> Any other way?
> 
> 
> 

-- 
View this message in context: http://www.nabble
.com/How-to-add-data-into-segment-with-my-own-plugin---tf327
9715.html#a9162123
Sent from the Nutch - Dev mailing list archive at
Nabble.com.


[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )