The value in hadoop-site.xml overrides the value set
programmatically.
You can set a value for maptasks/reducetasks in
mapred-default.xml instead of hadoop-site.xml -- this value
will serve as a default that can be overridden
programmatically. However, mapred-default.xml is due to be
eliminated in 0.16, and I am not sure what the recommended
way now is.
-Michael
On 11/30/07 12:00 AM, "Jason Venner" <jason attributor.com> wrote:
We have several 8 processor machines in our cluster, and for
most of our
mapper tasks we would like to spawn 8 per machine.
We have 1 mapper task that is extremely resource intensive
and we can
only spawn 1.
We do have multiple arms for our DFS, so we would like to
run multiple
reduce jobs on each machine.
We have had little luck changing these parameters by setting
the numbers
via JobConf
jobConf.setNumMapTasks(int n)
jobConf.setNumReduceTasks(int n)
What we have ended up doing is reconfiguring the cluster by
changing the
hadoop-site.xml between the different runs, which is
awkward.
Have we just fumble fingered it, or is there a way, that we
are missing
to set the concurrency for mappers and reducers, on a per
job basis?
|