List Info

Thread: adding dmoz meta data to index.




adding dmoz meta data to index.
user name
2007-11-06 13:29:55
Hi All,

I need to add dmoz meta-data to my index. I see some people
have commented
about it but I didn't find a solution. Can someone read the
steps below and
give me some hints or pointers? This is the code that I
added:

1) injector.java: datum.setCategory("dmoz-cat"); 

2) crawldatum.java: add a new private data 'category' along
with set and get
methods for it. 

3) BasicIndexingFilter.java: doc.add(new
Field("category",
datum.getCategory(),Field.Store.YES,
Field.Index.UN_TOKENIZED));

However, the code breaks at the third step ( when I run
index ) saying that
category is null. 

Another way I was thinking about is whether I am supposed to
add the
category to the metadata in CrawlDatum. In that case do I
have to modify the
readFields() method on CrawlDatum? 

Thanks in advance.



-- 
View this message in context: http://www.nabble.com/adding-dmoz-
meta-data-to-index.-tf4760430.html#a13614050
Sent from the Nutch - Dev mailing list archive at
Nabble.com.


Re: adding dmoz meta data to index.
user name
2007-11-07 08:10:34
Hello,

i'm implementing something similiar at the moment. i'm
feeding nutch  
with a url-list with an annotated ID. this ID must go into
the lucene  
index, so that i can do a 1:many relation between a database
and the  
crawled pages.

i've added the custom data into the meta-data field in the
datum. see  
InjectMapper:

// add myID to the crawlDatum as metaData
MapWritable meta = new MapWritable();
meta.put(new Text("myID"), new Text(myID));
datum.setMetaData(meta);

now the ID is saved in the CrawlDatum-Object. On the
indexing-side  
i've written a new plugin index-id, but it's simply a
modified index- 
basic ;) the essence is:

MapWritable meta = datum.getMetaData();

String id = ((Text)meta.get(new
Text("myID"))).toString();
		
if (id != "") {
	Field myid = new Field("myid", id,
Field.Store.YES,  
Field.Index.UN_TOKENIZED);
	mederiid.setBoost(5.0f);
	doc.add(myid);
	LOG.info("The following ID was added to the index:
" + myid);
}

So, that's where i stand at the moment. Now i have to build
a custom  
query interface, so that i can search in my MySQL-database
and enrich  
the results with my crawled sites.

maybe we can join forces. feel free to contact me 
greetings,
	Sebastian Steinmetz

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )