cha wrote:
> First of all thanks for your reply.
>
you're welcome.
> Am really got confused !! pardon me..
> I dont know whether i need to put the given code by
creating new class in
> nutch directory?
> Do i have to import other classes or packages..?? any
thing i need to take
> care of??
>
I can suggest you download eclipse, then using the tutorial
on nutch
wiki called running nutch on eclipse, set up the project.
Then for
example in the org.apache.nutch.tools package create a new
class and
then paste the previously mentioned code.
//here fs is an instance of FileSystem object, seqFile
is a Path to
the crawldb
MapFile.Reader reader = new MapFile.Reader (fs, seqFile,
conf);
then in the loop change the below from
out.println(key);
to
out.println("<url><loc>" + key +
"</loc></url>");
> I have tried creating a new separate class in nutch
directory..but gives
> lotsa errors related to packages/class not found.Still
try to figuring out
> whats wrong there.
>
> Secondly How should am able to read the urls from
crawldb once the class get
> running..I have know idea how should i figure it out..
>
> How can fit the output of my url in some xml
format.i.e.
> <url>
> <loc>http://www.exampl
e.com/</loc>
> </url>
> <url>
> <loc>http://www.examp
le1.com/</loc>
> </url>
> ...........
> So can you please elaborate me how should i do this..
>
> Thanks a lot for your time..
>
Well, there is nothing more i can do except write the code
my own : )
You can first try to be more familiar with Java programming
if need be.
Good luck
> Cheers,
> Cha
>
> Enis Soztutar wrote:
>
>> cha wrote:
>>
>>> Thanks enis,
>>>
>>> am getting some idea from that..
>>> Can you tell me in which class i should
implement that.
>>> I havent have hadoop install on my box.
>>>
>>>
>>>
>> Just make a new class in nutch and put the code
there : ) As long as
>> you have hadoop jar in your classpath, you do not
need to checkout the
>> hadoop codebase.
>>
>>
>>
>>
>
>
|