List Info

Thread: Speed of reading local files




Speed of reading local files
user name
2006-09-18 17:07:54
Hi

I have changed the protocol-http plugin so that Nutch will
read from local
file system, instead of from the Internet, on those
already-crawled pages.
(I tried to use FILE:// protocol, but it seemed to me the
interconnection
information among pages were lost). Right now, I have made
it work, but
it's very slow. It took 10 minutes executing
"fetch" command on 400 pages.
And I was on a 4 CPU box with 4 threads. I am wondering if
this is normal,
because this is euqal to 400 hours/box to read 1 million
pages, which is
>15 days.

Any suggestion will be appreciated.

Zhen
[1]

about | contact  Other archives ( Real Estate discussion Medical topics )