I will add to the discussion that the ability to have
multiple tasks of
equal priority all making progress simultaneously is
important in
academic environments. There are a number of undergraduate
programs
which are starting to use Hadoop in code labs for students.
Multiple students should be able to submit jobs and if one
student's
poorly-written task is grinding up a lot of cycles on a
shared cluster,
other students still need to be able to test their code in
the meantime;
ideally, they would not need to enter a lengthy job queue.
... I'd say
that this actually applies to development clusters in
general, where
individual task performance is less important than the
ability of
multiple developers to test code concurrently.
- Aaron
Joydeep Sen Sarma wrote:
>> that can run(per job) at any given time.
>
> not possible afaik - but i will be happy to hear
otherwise.
>
> priorities are a good substitute though. there's no
point needlessly restricting concurrency if there's nothing
else to run. if there is something else more important to
run - then in most cases, assigning a higher priority to
that other thing would make the right thing happen.
>
> except with long running tasks (usually reducers) that
cannot be preempted. (Hadoop does not seem to use OS process
priorities at all. I wonder if process priorities can be
used as a substitute for pre-emption.)
>
> HOD is another solution that you might want to look
into - my understanding is that with HOD u can restrict the
number of machines used by a job.
>
> ________________________________
>
> From: Xavier Stevens [mailto:Xavier.Stevens fox.com]
> Sent: Wed 1/9/2008 2:57 PM
> To: hadoop-user lucene.apache.org
> Subject: RE: Question on running simultaneous jobs
>
>
>
> This doesn't work to solve this issue because it sets
the total number
> of map/reduce tasks. When setting the total number of
map tasks I get an
> ArrayOutOfBoundsException within Hadoop; I believe
because of the input
> dataset size (around 90 million lines).
>
> I think it is important to make a distinction between
setting total
> number of map/reduce tasks and the number that can
run(per job) at any
> given time. I would like only to restrict the later,
while allowing
> Hadoop to divide the data into chunks as it sees fit.
>
>
> -----Original Message-----
> From: Ted Dunning [mailto:tdunning veoh.com]
> Sent: Wednesday, January 09, 2008 1:50 PM
> To: hadoop-user lucene.apache.org
> Subject: Re: Question on running simultaneous jobs
>
>
> You may need to upgrade, but 15.1 does just fine with
multiple jobs in
> the cluster. Use conf.setNumMapTasks(int) and
> conf.setNumReduceTasks(int).
>
>
> On 1/9/08 11:25 AM, "Xavier Stevens"
<Xavier.Stevens fox.com> wrote:
>
>> Does Hadoop support running simultaneous jobs? If
so, what parameters
>
>> do I need to set in my job configuration? We
basically want to give a
>
>> job that takes a really long time, half of the
total resources of the
>> cluster so other jobs don't queue up behind it.
>>
>> I am using Hadoop 0.14.2 currently. I tried
setting
>> mapred.tasktracker.tasks.maximum to be half of the
maximum specified
>> in mapred-default.xml. This shows the change in
the web
>> administration page for the job, but it has no
effect on the actual
>> numbers of tasks running.
>>
>> Thanks,
>>
>> Xavier
>>
>
>
>
>
>
>
|