List Info

Thread: RE: Question on running simultaneous jobs




RE: Question on running simultaneous jobs
user name
2008-01-09 17:22:55
> that can run(per job) at any given time.  
 
not possible afaik - but i will be happy to hear otherwise.
 
priorities are a good substitute though. there's no point
needlessly restricting concurrency if there's nothing else
to run. if there is something else more important to run -
then in most cases, assigning a higher priority to that
other thing would make the right thing happen.
 
except with long running tasks (usually reducers) that
cannot be preempted. (Hadoop does not seem to use OS process
priorities at all. I wonder if process priorities can be
used as a substitute for pre-emption.)
 
HOD is another solution that you might want to look into -
my understanding is that with HOD u can restrict the number
of machines used by a job.
 
________________________________

From: Xavier Stevens [mailto:Xavier.Stevensfox.com]
Sent: Wed 1/9/2008 2:57 PM
To: hadoop-userlucene.apache.org
Subject: RE: Question on running simultaneous jobs



This doesn't work to solve this issue because it sets the
total number
of map/reduce tasks. When setting the total number of map
tasks I get an
ArrayOutOfBoundsException within Hadoop; I believe because
of the input
dataset size (around 90 million lines).

I think it is important to make a distinction between
setting total
number of map/reduce tasks and the number that can run(per
job) at any
given time.  I would like only to restrict the later, while
allowing
Hadoop to divide the data into chunks as it sees fit.


-----Original Message-----
From: Ted Dunning [mailto:tdunningveoh.com]
Sent: Wednesday, January 09, 2008 1:50 PM
To: hadoop-userlucene.apache.org
Subject: Re: Question on running simultaneous jobs


You may need to upgrade, but 15.1 does just fine with
multiple jobs in
the cluster.  Use conf.setNumMapTasks(int) and
conf.setNumReduceTasks(int).


On 1/9/08 11:25 AM, "Xavier Stevens"
<Xavier.Stevensfox.com> wrote:

> Does Hadoop support running simultaneous jobs?  If so,
what parameters

> do I need to set in my job configuration?  We basically
want to give a

> job that takes a really long time, half of the total
resources of the
> cluster so other jobs don't queue up behind it.
>
> I am using Hadoop 0.14.2 currently.  I tried setting
> mapred.tasktracker.tasks.maximum to be half of the
maximum specified
> in mapred-default.xml.  This shows the change in the
web
> administration page for the job, but it has no effect
on the actual
> numbers of tasks running.
>
> Thanks,
>
> Xavier
>





Re: Question on running simultaneous jobs
country flaguser name
United States
2008-01-09 18:58:05
I will add to the discussion that the ability to have
multiple tasks of 
equal priority all making progress simultaneously is
important in 
academic environments. There are a number of undergraduate
programs 
which are starting to use Hadoop in code labs for students.

Multiple students should be able to submit jobs and if one
student's 
poorly-written task is grinding up a lot of cycles on a
shared cluster, 
other students still need to be able to test their code in
the meantime; 
ideally, they would not need to enter a lengthy job queue.
... I'd say 
that this actually applies to development clusters in
general, where 
individual task performance is less important than the
ability of 
multiple developers to test code concurrently.

- Aaron



Joydeep Sen Sarma wrote:
>> that can run(per job) at any given time.  
>  
> not possible afaik - but i will be happy to hear
otherwise.
>  
> priorities are a good substitute though. there's no
point needlessly restricting concurrency if there's nothing
else to run. if there is something else more important to
run - then in most cases, assigning a higher priority to
that other thing would make the right thing happen.
>  
> except with long running tasks (usually reducers) that
cannot be preempted. (Hadoop does not seem to use OS process
priorities at all. I wonder if process priorities can be
used as a substitute for pre-emption.)
>  
> HOD is another solution that you might want to look
into - my understanding is that with HOD u can restrict the
number of machines used by a job.
>  
> ________________________________
> 
> From: Xavier Stevens [mailto:Xavier.Stevensfox.com]
> Sent: Wed 1/9/2008 2:57 PM
> To: hadoop-userlucene.apache.org
> Subject: RE: Question on running simultaneous jobs
> 
> 
> 
> This doesn't work to solve this issue because it sets
the total number
> of map/reduce tasks. When setting the total number of
map tasks I get an
> ArrayOutOfBoundsException within Hadoop; I believe
because of the input
> dataset size (around 90 million lines).
> 
> I think it is important to make a distinction between
setting total
> number of map/reduce tasks and the number that can
run(per job) at any
> given time.  I would like only to restrict the later,
while allowing
> Hadoop to divide the data into chunks as it sees fit.
> 
> 
> -----Original Message-----
> From: Ted Dunning [mailto:tdunningveoh.com]
> Sent: Wednesday, January 09, 2008 1:50 PM
> To: hadoop-userlucene.apache.org
> Subject: Re: Question on running simultaneous jobs
> 
> 
> You may need to upgrade, but 15.1 does just fine with
multiple jobs in
> the cluster.  Use conf.setNumMapTasks(int) and
> conf.setNumReduceTasks(int).
> 
> 
> On 1/9/08 11:25 AM, "Xavier Stevens"
<Xavier.Stevensfox.com> wrote:
> 
>> Does Hadoop support running simultaneous jobs?  If
so, what parameters
> 
>> do I need to set in my job configuration?  We
basically want to give a
> 
>> job that takes a really long time, half of the
total resources of the
>> cluster so other jobs don't queue up behind it.
>>
>> I am using Hadoop 0.14.2 currently.  I tried
setting
>> mapred.tasktracker.tasks.maximum to be half of the
maximum specified
>> in mapred-default.xml.  This shows the change in
the web
>> administration page for the job, but it has no
effect on the actual
>> numbers of tasks running.
>>
>> Thanks,
>>
>> Xavier
>>
> 
> 
> 
> 
> 
> 

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )