List Info

Thread: RE: Question on running simultaneous jobs




RE: Question on running simultaneous jobs
user name
2008-01-10 16:27:02
being paged out is sad - but the worst case is still no
worse than killing the job (where all the data has to be
*recomputed* back into memory on restart - not just swapped
in from disk)
 
the best and average cases are likely way better ..
 
(disk capacity seems no issue at all - but perhaps we are
blessed to be in this state).

________________________________

From: Doug Cutting [mailto:cuttingapache.org]
Sent: Thu 1/10/2008 2:24 PM
To: hadoop-userlucene.apache.org
Subject: Re: Question on running simultaneous jobs



Joydeep Sen Sarma wrote:
> can we suspend jobs (just unix suspend) instead of
killing them?

We could, but they'd still consume RAM and disk.  The RAM
might
eventually get paged out, but relying on that is probably a
bad idea.
So, this could work for tasks that don't use much memory and
whose
intermediate data is small, but that's frequently not the
case.

Doug


Re: Question on running simultaneous jobs
country flaguser name
United States
2008-01-10 16:39:41
Joydeep Sen Sarma wrote:
> being paged out is sad - but the worst case is still no
worse than killing the job (where all the data has to be
*recomputed* back into memory on restart - not just swapped
in from disk)

In my experience, once a large process is paged out it is
almost always 
faster to restart it than to wait for it to get paged back
in with 
random disk accesses.  If there were a way to explicitly
write out a 
process's working set, and then restore it later, using
sequential disk 
accesses, that might be effective.  Virtualization systems
support such 
operations, so perhaps tasktrackers should start a Xen
instance per task?

Doug

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )