|
List Info
Thread: Reduce hangs
|
|
| Reduce hangs |
  United States |
2008-01-18 15:52:24 |
Hi,
If someone knows how to fix the problem described below,
please help me
out. Thanks!
I am testing Hadoop on 2-node cluster and the
"reduce" always hangs at
some stage, even if I use different clusters. My OS is
Debian Linux kernel
2.6 (AMD Opteron w/ 4GB Mem). Hadoop verision is 0.15.2.
Java version is
1.5.0_01-b08.
I simply tried "./bin/hadoop jar hadoop-0.15.2-test.jar
mrbench" and when
the map stage finishes, the reduce stage will hang somewhere
in the
middle, sometimes at 0%. I also tried any other mapreduce
program I can
find in the example jar package but they all hang.
The log file simply print
2008-01-18 15:15:50,831 INFO
org.apache.hadoop.mapred.TaskTracker:
task_200801181424_0004_r_000000_0 0.0% reduce > copy
>
2008-01-18 15:15:56,841 INFO
org.apache.hadoop.mapred.TaskTracker:
task_200801181424_0004_r_000000_0 0.0% reduce > copy
>
2008-01-18 15:16:02,850 INFO
org.apache.hadoop.mapred.TaskTracker:
task_200801181424_0004_r_000000_0 0.0% reduce > copy
>
forever.
The program does work if I start Hadoop only on single
node.
Below is my hadoop-site.xml configuration:
<configuration>
<property>
<name>fs.default.name</name>
<value>10.0.0.1:60000</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>10.0.0.1:60001</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/raid/hadoop/data</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/raid/hadoop/mapred</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/raid/hadoop/tmp</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx1024m</value>
</property>
<property>
<name>mapred.tasktracker.tasks.maximum</name>
<value>4</value>
</property>
<!--
<property>
<name>mapred.map.tasks</name>
<value>7</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>3</value>
</property>
-->
<property>
<name>fs.inmemory.size.mb</name>
<value>200</value>
</property>
<property>
<name>dfs.block.size</name>
<value>134217728</value>
</property>
<property>
<name>io.sort.factor</name>
<value>100</value>
</property>
<property>
<name>io.sort.mb</name>
<value>200</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
</configuration>
|
|
| Re: Reduce hangs |

|
2008-01-18 15:59:17 |
I had the same problem. If I recall, the fix is to add the
following to
your hadoop-site.xml file:
<property>
<name>mapred.reduce.copy.backoff</name>
<value>5</value>
</property>
See hadoop-1984
Miles
On 18/01/2008, Yunhong Gu1 <ygu1 cs.uic.edu> wrote:
>
>
> Hi,
>
> If someone knows how to fix the problem described
below, please help me
> out. Thanks!
>
> I am testing Hadoop on 2-node cluster and the
"reduce" always hangs at
> some stage, even if I use different clusters. My OS is
Debian Linux kernel
> 2.6 (AMD Opteron w/ 4GB Mem). Hadoop verision is
0.15.2. Java version is
> 1.5.0_01-b08.
>
> I simply tried "./bin/hadoop jar
hadoop-0.15.2-test.jar mrbench" and when
> the map stage finishes, the reduce stage will hang
somewhere in the
> middle, sometimes at 0%. I also tried any other
mapreduce program I can
> find in the example jar package but they all hang.
>
> The log file simply print
> 2008-01-18 15:15:50,831 INFO
org.apache.hadoop.mapred.TaskTracker:
> task_200801181424_0004_r_000000_0 0.0% reduce > copy
>
> 2008-01-18 15:15:56,841 INFO
org.apache.hadoop.mapred.TaskTracker:
> task_200801181424_0004_r_000000_0 0.0% reduce > copy
>
> 2008-01-18 15:16:02,850 INFO
org.apache.hadoop.mapred.TaskTracker:
> task_200801181424_0004_r_000000_0 0.0% reduce > copy
>
>
> forever.
>
> The program does work if I start Hadoop only on single
node.
>
> Below is my hadoop-site.xml configuration:
>
> <configuration>
>
> <property>
> <name>fs.default.name</name>
> <value>10.0.0.1:60000</value>
> </property>
>
> <property>
> <name>mapred.job.tracker</name>
> <value>10.0.0.1:60001</value>
> </property>
>
> <property>
> <name>dfs.data.dir</name>
> <value>/raid/hadoop/data</value>
> </property>
>
> <property>
> <name>mapred.local.dir</name>
> <value>/raid/hadoop/mapred</value>
> </property>
>
> <property>
> <name>hadoop.tmp.dir</name>
> <value>/raid/hadoop/tmp</value>
> </property>
>
> <property>
> <name>mapred.child.java.opts</name>
> <value>-Xmx1024m</value>
> </property>
>
> <property>
>
<name>mapred.tasktracker.tasks.maximum</name>
> <value>4</value>
> </property>
>
> <!--
> <property>
> <name>mapred.map.tasks</name>
> <value>7</value>
> </property>
>
> <property>
> <name>mapred.reduce.tasks</name>
> <value>3</value>
> </property>
> -->
>
> <property>
> <name>fs.inmemory.size.mb</name>
> <value>200</value>
> </property>
>
> <property>
> <name>dfs.block.size</name>
> <value>134217728</value>
> </property>
>
> <property>
> <name>io.sort.factor</name>
> <value>100</value>
> </property>
>
> <property>
> <name>io.sort.mb</name>
> <value>200</value>
> </property>
>
> <property>
> <name>io.file.buffer.size</name>
> <value>131072</value>
> </property>
>
> </configuration>
>
>
|
|
| Re: Reduce hangs |
  United States |
2008-01-18 16:13:03 |
Hi, Miles,
Thanks for your information. I applied this but the problem
still exists.
By the way, when this happens, the CPUs are idle and doing
nothing.
Yunhong
On Fri, 18 Jan 2008, Miles Osborne wrote:
> I had the same problem. If I recall, the fix is to add
the following to
> your hadoop-site.xml file:
>
> <property>
> <name>mapred.reduce.copy.backoff</name>
> <value>5</value>
> </property>
>
> See hadoop-1984
>
> Miles
>
>
> On 18/01/2008, Yunhong Gu1 <ygu1 cs.uic.edu> wrote:
>>
>>
>> Hi,
>>
>> If someone knows how to fix the problem described
below, please help me
>> out. Thanks!
>>
>> I am testing Hadoop on 2-node cluster and the
"reduce" always hangs at
>> some stage, even if I use different clusters. My OS
is Debian Linux kernel
>> 2.6 (AMD Opteron w/ 4GB Mem). Hadoop verision is
0.15.2. Java version is
>> 1.5.0_01-b08.
>>
>> I simply tried "./bin/hadoop jar
hadoop-0.15.2-test.jar mrbench" and when
>> the map stage finishes, the reduce stage will hang
somewhere in the
>> middle, sometimes at 0%. I also tried any other
mapreduce program I can
>> find in the example jar package but they all hang.
>>
>> The log file simply print
>> 2008-01-18 15:15:50,831 INFO
org.apache.hadoop.mapred.TaskTracker:
>> task_200801181424_0004_r_000000_0 0.0% reduce >
copy >
>> 2008-01-18 15:15:56,841 INFO
org.apache.hadoop.mapred.TaskTracker:
>> task_200801181424_0004_r_000000_0 0.0% reduce >
copy >
>> 2008-01-18 15:16:02,850 INFO
org.apache.hadoop.mapred.TaskTracker:
>> task_200801181424_0004_r_000000_0 0.0% reduce >
copy >
>>
>> forever.
>>
>> The program does work if I start Hadoop only on
single node.
>>
>> Below is my hadoop-site.xml configuration:
>>
>> <configuration>
>>
>> <property>
>> <name>fs.default.name</name>
>> <value>10.0.0.1:60000</value>
>> </property>
>>
>> <property>
>> <name>mapred.job.tracker</name>
>> <value>10.0.0.1:60001</value>
>> </property>
>>
>> <property>
>> <name>dfs.data.dir</name>
>> <value>/raid/hadoop/data</value>
>> </property>
>>
>> <property>
>> <name>mapred.local.dir</name>
>> <value>/raid/hadoop/mapred</value>
>> </property>
>>
>> <property>
>> <name>hadoop.tmp.dir</name>
>> <value>/raid/hadoop/tmp</value>
>> </property>
>>
>> <property>
>> <name>mapred.child.java.opts</name>
>> <value>-Xmx1024m</value>
>> </property>
>>
>> <property>
>>
<name>mapred.tasktracker.tasks.maximum</name>
>> <value>4</value>
>> </property>
>>
>> <!--
>> <property>
>> <name>mapred.map.tasks</name>
>> <value>7</value>
>> </property>
>>
>> <property>
>> <name>mapred.reduce.tasks</name>
>> <value>3</value>
>> </property>
>> -->
>>
>> <property>
>> <name>fs.inmemory.size.mb</name>
>> <value>200</value>
>> </property>
>>
>> <property>
>> <name>dfs.block.size</name>
>> <value>134217728</value>
>> </property>
>>
>> <property>
>> <name>io.sort.factor</name>
>> <value>100</value>
>> </property>
>>
>> <property>
>> <name>io.sort.mb</name>
>> <value>200</value>
>> </property>
>>
>> <property>
>> <name>io.file.buffer.size</name>
>> <value>131072</value>
>> </property>
>>
>> </configuration>
>>
>>
>
|
|
| Re: Reduce hangs |

|
2008-01-18 16:16:18 |
I think it takes a while to actually work, so be patient!
Miles
On 18/01/2008, Yunhong Gu1 <ygu1 cs.uic.edu> wrote:
>
>
> Hi, Miles,
>
> Thanks for your information. I applied this but the
problem still exists.
> By the way, when this happens, the CPUs are idle and
doing nothing.
>
> Yunhong
>
> On Fri, 18 Jan 2008, Miles Osborne wrote:
>
> > I had the same problem. If I recall, the fix is
to add the following to
> > your hadoop-site.xml file:
> >
> > <property>
> >
<name>mapred.reduce.copy.backoff</name>
> > <value>5</value>
> > </property>
> >
> > See hadoop-1984
> >
> > Miles
> >
> >
> > On 18/01/2008, Yunhong Gu1 <ygu1 cs.uic.edu> wrote:
> >>
> >>
> >> Hi,
> >>
> >> If someone knows how to fix the problem
described below, please help me
> >> out. Thanks!
> >>
> >> I am testing Hadoop on 2-node cluster and the
"reduce" always hangs at
> >> some stage, even if I use different clusters.
My OS is Debian Linux
> kernel
> >> 2.6 (AMD Opteron w/ 4GB Mem). Hadoop verision
is 0.15.2. Java version
> is
> >> 1.5.0_01-b08.
> >>
> >> I simply tried "./bin/hadoop jar
hadoop-0.15.2-test.jar mrbench" and
> when
> >> the map stage finishes, the reduce stage will
hang somewhere in the
> >> middle, sometimes at 0%. I also tried any
other mapreduce program I can
> >> find in the example jar package but they all
hang.
> >>
> >> The log file simply print
> >> 2008-01-18 15:15:50,831 INFO
org.apache.hadoop.mapred.TaskTracker:
> >> task_200801181424_0004_r_000000_0 0.0% reduce
> copy >
> >> 2008-01-18 15:15:56,841 INFO
org.apache.hadoop.mapred.TaskTracker:
> >> task_200801181424_0004_r_000000_0 0.0% reduce
> copy >
> >> 2008-01-18 15:16:02,850 INFO
org.apache.hadoop.mapred.TaskTracker:
> >> task_200801181424_0004_r_000000_0 0.0% reduce
> copy >
> >>
> >> forever.
> >>
> >> The program does work if I start Hadoop only
on single node.
> >>
> >> Below is my hadoop-site.xml configuration:
> >>
> >> <configuration>
> >>
> >> <property>
> >> <name>fs.default.name</name>
> >> <value>10.0.0.1:60000</value>
> >> </property>
> >>
> >> <property>
> >>
<name>mapred.job.tracker</name>
> >> <value>10.0.0.1:60001</value>
> >> </property>
> >>
> >> <property>
> >> <name>dfs.data.dir</name>
> >>
<value>/raid/hadoop/data</value>
> >> </property>
> >>
> >> <property>
> >> <name>mapred.local.dir</name>
> >>
<value>/raid/hadoop/mapred</value>
> >> </property>
> >>
> >> <property>
> >> <name>hadoop.tmp.dir</name>
> >>
<value>/raid/hadoop/tmp</value>
> >> </property>
> >>
> >> <property>
> >>
<name>mapred.child.java.opts</name>
> >> <value>-Xmx1024m</value>
> >> </property>
> >>
> >> <property>
> >>
<name>mapred.tasktracker.tasks.maximum</name>
> >> <value>4</value>
> >> </property>
> >>
> >> <!--
> >> <property>
> >> <name>mapred.map.tasks</name>
> >> <value>7</value>
> >> </property>
> >>
> >> <property>
> >>
<name>mapred.reduce.tasks</name>
> >> <value>3</value>
> >> </property>
> >> -->
> >>
> >> <property>
> >>
<name>fs.inmemory.size.mb</name>
> >> <value>200</value>
> >> </property>
> >>
> >> <property>
> >> <name>dfs.block.size</name>
> >> <value>134217728</value>
> >> </property>
> >>
> >> <property>
> >> <name>io.sort.factor</name>
> >> <value>100</value>
> >> </property>
> >>
> >> <property>
> >> <name>io.sort.mb</name>
> >> <value>200</value>
> >> </property>
> >>
> >> <property>
> >>
<name>io.file.buffer.size</name>
> >> <value>131072</value>
> >> </property>
> >>
> >> </configuration>
> >>
> >>
> >
>
|
|
| Re: Reduce hangs |
  United States |
2008-01-18 17:44:34 |
When this was happening to us, there was a block replication
error and
one node was in an endless loop trying to replicate a block
to another
node which would not accept it. In our case most of the
cluster was idle
but a cpu on the machine trying send the block was heavily
used.
We never were able to isolate the cause, and it stopped
happening for us
when we upgraded to 0.15.1
---
Attributor is hiring Hadoop Wranglers, contact if
interested.
Yunhong Gu1 wrote:
>
> Hi,
>
> If someone knows how to fix the problem described
below, please help
> me out. Thanks!
>
> I am testing Hadoop on 2-node cluster and the
"reduce" always hangs at
> some stage, even if I use different clusters. My OS is
Debian Linux
> kernel 2.6 (AMD Opteron w/ 4GB Mem). Hadoop verision is
0.15.2. Java
> version is 1.5.0_01-b08.
>
> I simply tried "./bin/hadoop jar
hadoop-0.15.2-test.jar mrbench" and
> when the map stage finishes, the reduce stage will hang
somewhere in
> the middle, sometimes at 0%. I also tried any other
mapreduce program
> I can find in the example jar package but they all
hang.
>
> The log file simply print
> 2008-01-18 15:15:50,831 INFO
org.apache.hadoop.mapred.TaskTracker:
> task_200801181424_0004_r_000000_0 0.0% reduce > copy
>
> 2008-01-18 15:15:56,841 INFO
org.apache.hadoop.mapred.TaskTracker:
> task_200801181424_0004_r_000000_0 0.0% reduce > copy
>
> 2008-01-18 15:16:02,850 INFO
org.apache.hadoop.mapred.TaskTracker:
> task_200801181424_0004_r_000000_0 0.0% reduce > copy
>
>
> forever.
>
> The program does work if I start Hadoop only on single
node.
>
> Below is my hadoop-site.xml configuration:
>
> <configuration>
>
> <property>
> <name>fs.default.name</name>
> <value>10.0.0.1:60000</value>
> </property>
>
> <property>
> <name>mapred.job.tracker</name>
> <value>10.0.0.1:60001</value>
> </property>
>
> <property>
> <name>dfs.data.dir</name>
> <value>/raid/hadoop/data</value>
> </property>
>
> <property>
> <name>mapred.local.dir</name>
> <value>/raid/hadoop/mapred</value>
> </property>
>
> <property>
> <name>hadoop.tmp.dir</name>
> <value>/raid/hadoop/tmp</value>
> </property>
>
> <property>
> <name>mapred.child.java.opts</name>
> <value>-Xmx1024m</value>
> </property>
>
> <property>
>
<name>mapred.tasktracker.tasks.maximum</name>
> <value>4</value>
> </property>
>
> <!--
> <property>
> <name>mapred.map.tasks</name>
> <value>7</value>
> </property>
>
> <property>
> <name>mapred.reduce.tasks</name>
> <value>3</value>
> </property>
> -->
>
> <property>
> <name>fs.inmemory.size.mb</name>
> <value>200</value>
> </property>
>
> <property>
> <name>dfs.block.size</name>
> <value>134217728</value>
> </property>
>
> <property>
> <name>io.sort.factor</name>
> <value>100</value>
> </property>
>
> <property>
> <name>io.sort.mb</name>
> <value>200</value>
> </property>
>
> <property>
> <name>io.file.buffer.size</name>
> <value>131072</value>
> </property>
>
> </configuration>
>
|
|
| Re: Reduce hangs |
  United States |
2008-01-18 18:09:24 |
I am using 0.15.2, and in my case, CPUs on both nodes are
idle. It looks
like the program is trapped into a synchronization deadlock
or some
waiting state that will never be awaken.
Yunhong
On Fri, 18 Jan 2008, Jason Venner wrote:
> When this was happening to us, there was a block
replication error and one
> node was in an endless loop trying to replicate a block
to another node which
> would not accept it. In our case most of the cluster
was idle but a cpu on
> the machine trying send the block was heavily used.
>
> We never were able to isolate the cause, and it stopped
happening for us when
> we upgraded to 0.15.1
>
> ---
> Attributor is hiring Hadoop Wranglers, contact if
interested.
>
> Yunhong Gu1 wrote:
>>
>> Hi,
>>
>> If someone knows how to fix the problem described
below, please help me
>> out. Thanks!
>>
>> I am testing Hadoop on 2-node cluster and the
"reduce" always hangs at some
>> stage, even if I use different clusters. My OS is
Debian Linux kernel 2.6
>> (AMD Opteron w/ 4GB Mem). Hadoop verision is
0.15.2. Java version is
>> 1.5.0_01-b08.
>>
>> I simply tried "./bin/hadoop jar
hadoop-0.15.2-test.jar mrbench" and when
>> the map stage finishes, the reduce stage will hang
somewhere in the middle,
>> sometimes at 0%. I also tried any other mapreduce
program I can find in the
>> example jar package but they all hang.
>>
>> The log file simply print
>> 2008-01-18 15:15:50,831 INFO
org.apache.hadoop.mapred.TaskTracker:
>> task_200801181424_0004_r_000000_0 0.0% reduce >
copy >
>> 2008-01-18 15:15:56,841 INFO
org.apache.hadoop.mapred.TaskTracker:
>> task_200801181424_0004_r_000000_0 0.0% reduce >
copy >
>> 2008-01-18 15:16:02,850 INFO
org.apache.hadoop.mapred.TaskTracker:
>> task_200801181424_0004_r_000000_0 0.0% reduce >
copy >
>>
>> forever.
>>
>> The program does work if I start Hadoop only on
single node.
>>
>> Below is my hadoop-site.xml configuration:
>>
>> <configuration>
>>
>> <property>
>> <name>fs.default.name</name>
>> <value>10.0.0.1:60000</value>
>> </property>
>>
>> <property>
>> <name>mapred.job.tracker</name>
>> <value>10.0.0.1:60001</value>
>> </property>
>>
>> <property>
>> <name>dfs.data.dir</name>
>> <value>/raid/hadoop/data</value>
>> </property>
>>
>> <property>
>> <name>mapred.local.dir</name>
>> <value>/raid/hadoop/mapred</value>
>> </property>
>>
>> <property>
>> <name>hadoop.tmp.dir</name>
>> <value>/raid/hadoop/tmp</value>
>> </property>
>>
>> <property>
>> <name>mapred.child.java.opts</name>
>> <value>-Xmx1024m</value>
>> </property>
>>
>> <property>
>>
<name>mapred.tasktracker.tasks.maximum</name>
>> <value>4</value>
>> </property>
>>
>> <!--
>> <property>
>> <name>mapred.map.tasks</name>
>> <value>7</value>
>> </property>
>>
>> <property>
>> <name>mapred.reduce.tasks</name>
>> <value>3</value>
>> </property>
>> -->
>>
>> <property>
>> <name>fs.inmemory.size.mb</name>
>> <value>200</value>
>> </property>
>>
>> <property>
>> <name>dfs.block.size</name>
>> <value>134217728</value>
>> </property>
>>
>> <property>
>> <name>io.sort.factor</name>
>> <value>100</value>
>> </property>
>>
>> <property>
>> <name>io.sort.mb</name>
>> <value>200</value>
>> </property>
>>
>> <property>
>> <name>io.file.buffer.size</name>
>> <value>131072</value>
>> </property>
>>
>> </configuration>
>>
>
|
|
| Re: Reduce hangs |
  United States |
2008-01-18 18:10:36 |
The program "mrbench" takes 1 second on a single
node, so I think waiting
for 1 minute should be long enough. And I also restarted
Hadoop after I
updated the config file.
Yunhong
On Fri, 18 Jan 2008, Miles Osborne wrote:
> I think it takes a while to actually work, so be
patient!
>
> Miles
>
> On 18/01/2008, Yunhong Gu1 <ygu1 cs.uic.edu> wrote:
>>
>>
>> Hi, Miles,
>>
>> Thanks for your information. I applied this but the
problem still exists.
>> By the way, when this happens, the CPUs are idle
and doing nothing.
>>
>> Yunhong
>>
>> On Fri, 18 Jan 2008, Miles Osborne wrote:
>>
>>> I had the same problem. If I recall, the fix
is to add the following to
>>> your hadoop-site.xml file:
>>>
>>> <property>
>>>
<name>mapred.reduce.copy.backoff</name>
>>> <value>5</value>
>>> </property>
>>>
>>> See hadoop-1984
>>>
>>> Miles
>>>
>>>
>>> On 18/01/2008, Yunhong Gu1 <ygu1 cs.uic.edu> wrote:
>>>>
>>>>
>>>> Hi,
>>>>
>>>> If someone knows how to fix the problem
described below, please help me
>>>> out. Thanks!
>>>>
>>>> I am testing Hadoop on 2-node cluster and
the "reduce" always hangs at
>>>> some stage, even if I use different
clusters. My OS is Debian Linux
>> kernel
>>>> 2.6 (AMD Opteron w/ 4GB Mem). Hadoop
verision is 0.15.2. Java version
>> is
>>>> 1.5.0_01-b08.
>>>>
>>>> I simply tried "./bin/hadoop jar
hadoop-0.15.2-test.jar mrbench" and
>> when
>>>> the map stage finishes, the reduce stage
will hang somewhere in the
>>>> middle, sometimes at 0%. I also tried any
other mapreduce program I can
>>>> find in the example jar package but they
all hang.
>>>>
>>>> The log file simply print
>>>> 2008-01-18 15:15:50,831 INFO
org.apache.hadoop.mapred.TaskTracker:
>>>> task_200801181424_0004_r_000000_0 0.0%
reduce > copy >
>>>> 2008-01-18 15:15:56,841 INFO
org.apache.hadoop.mapred.TaskTracker:
>>>> task_200801181424_0004_r_000000_0 0.0%
reduce > copy >
>>>> 2008-01-18 15:16:02,850 INFO
org.apache.hadoop.mapred.TaskTracker:
>>>> task_200801181424_0004_r_000000_0 0.0%
reduce > copy >
>>>>
>>>> forever.
>>>>
>>>> The program does work if I start Hadoop
only on single node.
>>>>
>>>> Below is my hadoop-site.xml configuration:
>>>>
>>>> <configuration>
>>>>
>>>> <property>
>>>>
<name>fs.default.name</name>
>>>>
<value>10.0.0.1:60000</value>
>>>> </property>
>>>>
>>>> <property>
>>>>
<name>mapred.job.tracker</name>
>>>>
<value>10.0.0.1:60001</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>dfs.data.dir</name>
>>>>
<value>/raid/hadoop/data</value>
>>>> </property>
>>>>
>>>> <property>
>>>>
<name>mapred.local.dir</name>
>>>>
<value>/raid/hadoop/mapred</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hadoop.tmp.dir</name>
>>>>
<value>/raid/hadoop/tmp</value>
>>>> </property>
>>>>
>>>> <property>
>>>>
<name>mapred.child.java.opts</name>
>>>> <value>-Xmx1024m</value>
>>>> </property>
>>>>
>>>> <property>
>>>>
<name>mapred.tasktracker.tasks.maximum</name>
>>>> <value>4</value>
>>>> </property>
>>>>
>>>> <!--
>>>> <property>
>>>>
<name>mapred.map.tasks</name>
>>>> <value>7</value>
>>>> </property>
>>>>
>>>> <property>
>>>>
<name>mapred.reduce.tasks</name>
>>>> <value>3</value>
>>>> </property>
>>>> -->
>>>>
>>>> <property>
>>>>
<name>fs.inmemory.size.mb</name>
>>>> <value>200</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>dfs.block.size</name>
>>>> <value>134217728</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>io.sort.factor</name>
>>>> <value>100</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>io.sort.mb</name>
>>>> <value>200</value>
>>>> </property>
>>>>
>>>> <property>
>>>>
<name>io.file.buffer.size</name>
>>>> <value>131072</value>
>>>> </property>
>>>>
>>>> </configuration>
>>>>
>>>>
>>>
>>
>
|
|
| Re: Reduce hangs |
  United States |
2008-01-18 19:56:36 |
Looks like we still have this unsolved mysterious problem:
http
://issues.apache.org/jira/browse/HADOOP-1374
Could it be related to HADOOP-1246? Arun?
Thanks,
--Konstantin
Yunhong Gu1 wrote:
>
> Hi,
>
> If someone knows how to fix the problem described
below, please help me
> out. Thanks!
>
> I am testing Hadoop on 2-node cluster and the
"reduce" always hangs at
> some stage, even if I use different clusters. My OS is
Debian Linux
> kernel 2.6 (AMD Opteron w/ 4GB Mem). Hadoop verision is
0.15.2. Java
> version is 1.5.0_01-b08.
>
> I simply tried "./bin/hadoop jar
hadoop-0.15.2-test.jar mrbench" and
> when the map stage finishes, the reduce stage will hang
somewhere in the
> middle, sometimes at 0%. I also tried any other
mapreduce program I can
> find in the example jar package but they all hang.
>
> The log file simply print
> 2008-01-18 15:15:50,831 INFO
org.apache.hadoop.mapred.TaskTracker:
> task_200801181424_0004_r_000000_0 0.0% reduce > copy
>
> 2008-01-18 15:15:56,841 INFO
org.apache.hadoop.mapred.TaskTracker:
> task_200801181424_0004_r_000000_0 0.0% reduce > copy
>
> 2008-01-18 15:16:02,850 INFO
org.apache.hadoop.mapred.TaskTracker:
> task_200801181424_0004_r_000000_0 0.0% reduce > copy
>
>
> forever.
>
> The program does work if I start Hadoop only on single
node.
>
> Below is my hadoop-site.xml configuration:
>
> <configuration>
>
> <property>
> <name>fs.default.name</name>
> <value>10.0.0.1:60000</value>
> </property>
>
> <property>
> <name>mapred.job.tracker</name>
> <value>10.0.0.1:60001</value>
> </property>
>
> <property>
> <name>dfs.data.dir</name>
> <value>/raid/hadoop/data</value>
> </property>
>
> <property>
> <name>mapred.local.dir</name>
> <value>/raid/hadoop/mapred</value>
> </property>
>
> <property>
> <name>hadoop.tmp.dir</name>
> <value>/raid/hadoop/tmp</value>
> </property>
>
> <property>
> <name>mapred.child.java.opts</name>
> <value>-Xmx1024m</value>
> </property>
>
> <property>
>
<name>mapred.tasktracker.tasks.maximum</name>
> <value>4</value>
> </property>
>
> <!--
> <property>
> <name>mapred.map.tasks</name>
> <value>7</value>
> </property>
>
> <property>
> <name>mapred.reduce.tasks</name>
> <value>3</value>
> </property>
> -->
>
> <property>
> <name>fs.inmemory.size.mb</name>
> <value>200</value>
> </property>
>
> <property>
> <name>dfs.block.size</name>
> <value>134217728</value>
> </property>
>
> <property>
> <name>io.sort.factor</name>
> <value>100</value>
> </property>
>
> <property>
> <name>io.sort.mb</name>
> <value>200</value>
> </property>
>
> <property>
> <name>io.file.buffer.size</name>
> <value>131072</value>
> </property>
>
> </configuration>
>
>
|
|
| Re: Reduce hangs |
  United States |
2008-01-18 21:25:55 |
Yes, it looks like HADOOP-1374
The program actually failed after a while:
gu ncdm-8:~/hadoop-0.15.2$ ./bin/hadoop jar
hadoop-0.15.2-test.jar mrbench
MRBenchmark.0.0.2
08/01/18 18:53:08 INFO mapred.MRBench: creating control
file: 1 numLines,
ASCENDING sortOrder
08/01/18 18:53:08 INFO mapred.MRBench: created control file:
/benchmarks/MRBench/mr_input/input_-450753747.txt
08/01/18 18:53:09 INFO mapred.MRBench: Running job 0:
input=/benchmarks/MRBench/mr_input
output=/benchmarks/MRBench/mr_output/output_1843693325
08/01/18 18:53:09 INFO mapred.FileInputFormat: Total input
paths to process : 1
08/01/18 18:53:09 INFO mapred.JobClient: Running job:
job_200801181852_0001
08/01/18 18:53:10 INFO mapred.JobClient: map 0% reduce 0%
08/01/18 18:53:17 INFO mapred.JobClient: map 100% reduce
0%
08/01/18 18:53:25 INFO mapred.JobClient: map 100% reduce
16%
08/01/18 19:08:27 INFO mapred.JobClient: Task Id :
task_200801181852_0001_m_000001_0, Status : FAILED
Too many fetch-failures
08/01/18 19:08:27 WARN mapred.JobClient: Error reading task
outputncdm15
08/01/18 19:08:27 WARN mapred.JobClient: Error reading task
outputncdm15
08/01/18 19:08:34 INFO mapred.JobClient: map 100% reduce
100%
08/01/18 19:08:35 INFO mapred.JobClient: Job complete:
job_200801181852_0001
08/01/18 19:08:35 INFO mapred.JobClient: Counters: 10
08/01/18 19:08:35 INFO mapred.JobClient: Job Counters
08/01/18 19:08:35 INFO mapred.JobClient: Launched map
tasks=3
08/01/18 19:08:35 INFO mapred.JobClient: Launched reduce
tasks=1
08/01/18 19:08:35 INFO mapred.JobClient: Data-local map
tasks=2
08/01/18 19:08:35 INFO mapred.JobClient: Map-Reduce
Framework
08/01/18 19:08:35 INFO mapred.JobClient: Map input
records=1
08/01/18 19:08:35 INFO mapred.JobClient: Map output
records=1
08/01/18 19:08:35 INFO mapred.JobClient: Map input
bytes=2
08/01/18 19:08:35 INFO mapred.JobClient: Map output
bytes=5
08/01/18 19:08:35 INFO mapred.JobClient: Reduce input
groups=1
08/01/18 19:08:35 INFO mapred.JobClient: Reduce input
records=1
08/01/18 19:08:35 INFO mapred.JobClient: Reduce output
records=1
DataLines Maps Reduces AvgTime (milliseconds)
1 2 1 926333
On Fri, 18 Jan 2008, Konstantin Shvachko wrote:
> Looks like we still have this unsolved mysterious
problem:
>
> http
://issues.apache.org/jira/browse/HADOOP-1374
>
> Could it be related to HADOOP-1246? Arun?
>
> Thanks,
> --Konstantin
>
> Yunhong Gu1 wrote:
>>
>> Hi,
>>
>> If someone knows how to fix the problem described
below, please help me
>> out. Thanks!
>>
>> I am testing Hadoop on 2-node cluster and the
"reduce" always hangs at some
>> stage, even if I use different clusters. My OS is
Debian Linux kernel 2.6
>> (AMD Opteron w/ 4GB Mem). Hadoop verision is
0.15.2. Java version is
>> 1.5.0_01-b08.
>>
>> I simply tried "./bin/hadoop jar
hadoop-0.15.2-test.jar mrbench" and when
>> the map stage finishes, the reduce stage will hang
somewhere in the middle,
>> sometimes at 0%. I also tried any other mapreduce
program I can find in the
>> example jar package but they all hang.
>>
>> The log file simply print
>> 2008-01-18 15:15:50,831 INFO
org.apache.hadoop.mapred.TaskTracker:
>> task_200801181424_0004_r_000000_0 0.0% reduce >
copy >
>> 2008-01-18 15:15:56,841 INFO
org.apache.hadoop.mapred.TaskTracker:
>> task_200801181424_0004_r_000000_0 0.0% reduce >
copy >
>> 2008-01-18 15:16:02,850 INFO
org.apache.hadoop.mapred.TaskTracker:
>> task_200801181424_0004_r_000000_0 0.0% reduce >
copy >
>>
>> forever.
>>
>> The program does work if I start Hadoop only on
single node.
>>
>> Below is my hadoop-site.xml configuration:
>>
>> <configuration>
>>
>> <property>
>> <name>fs.default.name</name>
>> <value>10.0.0.1:60000</value>
>> </property>
>>
>> <property>
>> <name>mapred.job.tracker</name>
>> <value>10.0.0.1:60001</value>
>> </property>
>>
>> <property>
>> <name>dfs.data.dir</name>
>> <value>/raid/hadoop/data</value>
>> </property>
>>
>> <property>
>> <name>mapred.local.dir</name>
>> <value>/raid/hadoop/mapred</value>
>> </property>
>>
>> <property>
>> <name>hadoop.tmp.dir</name>
>> <value>/raid/hadoop/tmp</value>
>> </property>
>>
>> <property>
>> <name>mapred.child.java.opts</name>
>> <value>-Xmx1024m</value>
>> </property>
>>
>> <property>
>>
<name>mapred.tasktracker.tasks.maximum</name>
>> <value>4</value>
>> </property>
>>
>> <!--
>> <property>
>> <name>mapred.map.tasks</name>
>> <value>7</value>
>> </property>
>>
>> <property>
>> <name>mapred.reduce.tasks</name>
>> <value>3</value>
>> </property>
>> -->
>>
>> <property>
>> <name>fs.inmemory.size.mb</name>
>> <value>200</value>
>> </property>
>>
>> <property>
>> <name>dfs.block.size</name>
>> <value>134217728</value>
>> </property>
>>
>> <property>
>> <name>io.sort.factor</name>
>> <value>100</value>
>> </property>
>>
>> <property>
>> <name>io.sort.mb</name>
>> <value>200</value>
>> </property>
>>
>> <property>
>> <name>io.file.buffer.size</name>
>> <value>131072</value>
>> </property>
>>
>> </configuration>
>>
>>
>
|
|
| RE: Reduce hangs |
  Hong Kong |
2008-01-19 00:02:15 |
Hi Yunhong,
As per the output it seems the job ran to successful
completion (albeit with
some failures)...
Devaraj
> -----Original Message-----
> From: Yunhong Gu1 [mailto:ygu1 cs.uic.edu]
> Sent: Saturday, January 19, 2008 8:56 AM
> To: hadoop-user lucene.apache.org
> Subject: Re: Reduce hangs
>
>
>
> Yes, it looks like HADOOP-1374
>
> The program actually failed after a while:
>
>
> gu ncdm-8:~/hadoop-0.15.2$ ./bin/hadoop jar
> hadoop-0.15.2-test.jar mrbench
> MRBenchmark.0.0.2
> 08/01/18 18:53:08 INFO mapred.MRBench: creating control
file:
> 1 numLines, ASCENDING sortOrder
> 08/01/18 18:53:08 INFO mapred.MRBench: created control
file:
> /benchmarks/MRBench/mr_input/input_-450753747.txt
> 08/01/18 18:53:09 INFO mapred.MRBench: Running job 0:
> input=/benchmarks/MRBench/mr_input
> output=/benchmarks/MRBench/mr_output/output_1843693325
> 08/01/18 18:53:09 INFO mapred.FileInputFormat: Total
input
> paths to process : 1
> 08/01/18 18:53:09 INFO mapred.JobClient: Running job:
> job_200801181852_0001
> 08/01/18 18:53:10 INFO mapred.JobClient: map 0% reduce
0%
> 08/01/18 18:53:17 INFO mapred.JobClient: map 100%
reduce 0%
> 08/01/18 18:53:25 INFO mapred.JobClient: map 100%
reduce 16%
> 08/01/18 19:08:27 INFO mapred.JobClient: Task Id :
> task_200801181852_0001_m_000001_0, Status : FAILED Too
many
> fetch-failures
> 08/01/18 19:08:27 WARN mapred.JobClient: Error reading
task
> outputncdm15
> 08/01/18 19:08:27 WARN mapred.JobClient: Error reading
task
> outputncdm15
> 08/01/18 19:08:34 INFO mapred.JobClient: map 100%
reduce 100%
> 08/01/18 19:08:35 INFO mapred.JobClient: Job complete:
> job_200801181852_0001
> 08/01/18 19:08:35 INFO mapred.JobClient: Counters: 10
> 08/01/18 19:08:35 INFO mapred.JobClient: Job
Counters
> 08/01/18 19:08:35 INFO mapred.JobClient: Launched
map tasks=3
> 08/01/18 19:08:35 INFO mapred.JobClient: Launched
reduce tasks=1
> 08/01/18 19:08:35 INFO mapred.JobClient: Data-local
map tasks=2
> 08/01/18 19:08:35 INFO mapred.JobClient: Map-Reduce
Framework
> 08/01/18 19:08:35 INFO mapred.JobClient: Map input
records=1
> 08/01/18 19:08:35 INFO mapred.JobClient: Map output
records=1
> 08/01/18 19:08:35 INFO mapred.JobClient: Map input
bytes=2
> 08/01/18 19:08:35 INFO mapred.JobClient: Map output
bytes=5
> 08/01/18 19:08:35 INFO mapred.JobClient: Reduce
input groups=1
> 08/01/18 19:08:35 INFO mapred.JobClient: Reduce
input records=1
> 08/01/18 19:08:35 INFO mapred.JobClient: Reduce
output records=1
> DataLines Maps Reduces AvgTime (milliseconds)
> 1 2 1 926333
>
>
>
> On Fri, 18 Jan 2008, Konstantin Shvachko wrote:
>
> > Looks like we still have this unsolved mysterious
problem:
> >
> > http
://issues.apache.org/jira/browse/HADOOP-1374
> >
> > Could it be related to HADOOP-1246? Arun?
> >
> > Thanks,
> > --Konstantin
> >
> > Yunhong Gu1 wrote:
> >>
> >> Hi,
> >>
> >> If someone knows how to fix the problem
described below,
> please help
> >> me out. Thanks!
> >>
> >> I am testing Hadoop on 2-node cluster and the
"reduce"
> always hangs
> >> at some stage, even if I use different
clusters. My OS is Debian
> >> Linux kernel 2.6 (AMD Opteron w/ 4GB Mem).
Hadoop verision
> is 0.15.2.
> >> Java version is 1.5.0_01-b08.
> >>
> >> I simply tried "./bin/hadoop jar
hadoop-0.15.2-test.jar
> mrbench" and
> >> when the map stage finishes, the reduce stage
will hang
> somewhere in
> >> the middle, sometimes at 0%. I also tried any
other
> mapreduce program
> >> I can find in the example jar package but they
all hang.
> >>
> >> The log file simply print
> >> 2008-01-18 15:15:50,831 INFO
org.apache.hadoop.mapred.TaskTracker:
> >> task_200801181424_0004_r_000000_0 0.0% reduce
> copy >
> >> 2008-01-18 15:15:56,841 INFO
org.apache.hadoop.mapred.TaskTracker:
> >> task_200801181424_0004_r_000000_0 0.0% reduce
> copy >
> >> 2008-01-18 15:16:02,850 INFO
org.apache.hadoop.mapred.TaskTracker:
> >> task_200801181424_0004_r_000000_0 0.0% reduce
> copy >
> >>
> >> forever.
> >>
> >> The program does work if I start Hadoop only
on single node.
> >>
> >> Below is my hadoop-site.xml configuration:
> >>
> >> <configuration>
> >>
> >> <property>
> >> <name>fs.default.name</name>
> >> <value>10.0.0.1:60000</value>
> >> </property>
> >>
> >> <property>
> >>
<name>mapred.job.tracker</name>
> >> <value>10.0.0.1:60001</value>
> >> </property>
> >>
> >> <property>
> >> <name>dfs.data.dir</name>
> >>
<value>/raid/hadoop/data</value>
> >> </property>
> >>
> >> <property>
> >> <name>mapred.local.dir</name>
> >>
<value>/raid/hadoop/mapred</value>
> >> </property>
> >>
> >> <property>
> >> <name>hadoop.tmp.dir</name>
> >> <value>/raid/hadoop/tmp</value>
> >> </property>
> >>
> >> <property>
> >>
<name>mapred.child.java.opts</name>
> >> <value>-Xmx1024m</value>
> >> </property>
> >>
> >> <property>
> >>
<name>mapred.tasktracker.tasks.maximum</name>
> >> <value>4</value>
> >> </property>
> >>
> >> <!--
> >> <property>
> >> <name>mapred.map.tasks</name>
> >> <value>7</value>
> >> </property>
> >>
> >> <property>
> >>
<name>mapred.reduce.tasks</name>
> >> <value>3</value>
> >> </property>
> >> -->
> >>
> >> <property>
> >>
<name>fs.inmemory.size.mb</name>
> >> <value>200</value>
> >> </property>
> >>
> >> <property>
> >> <name>dfs.block.size</name>
> >> <value>134217728</value>
> >> </property>
> >>
> >> <property>
> >> <name>io.sort.factor</name>
> >> <value>100</value>
> >> </property>
> >>
> >> <property>
> >> <name>io.sort.mb</name>
> >> <value>200</value>
> >> </property>
> >>
> >> <property>
> >>
<name>io.file.buffer.size</name>
> >> <value>131072</value>
> >> </property>
> >>
> >> </configuration>
> >>
> >>
> >
>
|
|
|
|