List Info

Thread: Re-crawling Problem




Re-crawling Problem
user name
2007-06-26 10:37:08
Hi all,
I'm having same trouble trying to carawl and recrawl my
local 
filesystem. I'm using the script posted at 
http://w
iki.apache.org/nutch/IntranetRecrawl


My filesystem is made like this:

../
../first/
../first/file1.pdf
../first/second/
../first/second/file2.pdf
../first/second/third
../first/second/third/file2.pdf
../first/second/third/fourth/
../first/second/third/fourth/file4.pdf
../first/second/third/fourth/fifth
../first/second/third/fourth/fifth/file5.pdf


On the first crawl "round" everything seems
fine....it stops at the 
"first" directory (depth 1)
On the first recrawl(depth 3) it stops at the
"third" directory and all 
the files seem indexed correctly.
On the second recrawl(always depth 3) it arrives at the
fifth diretory 
but none of the files are indexed.

any idea?
thanks
Luca

[1]

about | contact  Other archives ( Real Estate discussion Medical topics )