List Info

Thread: Really big indexing and timeouts?




Really big indexing and timeouts?
country flaguser name
United States
2007-07-30 22:39:50
Is anybody doing really big indexing jobs on Nutch and
Hadoop, say 50M 
or more and seeing indexer timeout jobs?

Dennis

Re: Really big indexing and timeouts?
user name
2007-07-31 09:38:18
Hi Dennis,

On 7/31/07, Dennis Kubes <kubesapache.org> wrote:
> Is anybody doing really big indexing jobs on Nutch and
Hadoop, say 50M
> or more and seeing indexer timeout jobs?

I think we did a ~30M url indexing and didn't run into any
problems.

Did you get a task timeout? (can it be related to a slowish
indexing
filter like language-identifier?)

>
> Dennis
>


-- 
Doğacan Güney
Re: Really big indexing and timeouts?
user name
2007-07-31 12:07:52
Actually, I am starting to think it is related to hard disks
beginning 
to fail.  We have some machines that have double or triple
the load with 
the exact same number of tasks.  One thing I am seeing is
that hard 
disks don't just fail (ok some do), but most actually just
slow down 
when then are starting to break down.

Dennis Kubes

Doğacan Güney wrote:
> Hi Dennis,
> 
> On 7/31/07, Dennis Kubes <kubesapache.org> wrote:
>> Is anybody doing really big indexing jobs on Nutch
and Hadoop, say 50M
>> or more and seeing indexer timeout jobs?
> 
> I think we did a ~30M url indexing and didn't run into
any problems.
> 
> Did you get a task timeout? (can it be related to a
slowish indexing
> filter like language-identifier?)
> 
>> Dennis
>>
> 
> 

[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )