List Info

Thread: Reduce hangs




Reduce hangs
country flaguser name
United States
2008-01-18 15:52:24
Hi,

If someone knows how to fix the problem described below,
please help me 
out. Thanks!

I am testing Hadoop on 2-node cluster and the
"reduce" always hangs at 
some stage, even if I use different clusters. My OS is
Debian Linux kernel 
2.6 (AMD Opteron w/ 4GB Mem). Hadoop verision is 0.15.2.
Java version is 
1.5.0_01-b08.

I simply tried "./bin/hadoop jar hadoop-0.15.2-test.jar
mrbench" and when 
the map stage finishes, the reduce stage will hang somewhere
in the 
middle, sometimes at 0%. I also tried any other mapreduce
program I can 
find in the example jar package but they all hang.

The log file simply print
2008-01-18 15:15:50,831 INFO
org.apache.hadoop.mapred.TaskTracker: 
task_200801181424_0004_r_000000_0 0.0% reduce > copy
>
2008-01-18 15:15:56,841 INFO
org.apache.hadoop.mapred.TaskTracker: 
task_200801181424_0004_r_000000_0 0.0% reduce > copy
>
2008-01-18 15:16:02,850 INFO
org.apache.hadoop.mapred.TaskTracker: 
task_200801181424_0004_r_000000_0 0.0% reduce > copy
>

forever.

The program does work if I start Hadoop only on single
node.

Below is my hadoop-site.xml configuration:

<configuration>

<property>
    <name>fs.default.name</name>
    <value>10.0.0.1:60000</value>
</property>

<property>
    <name>mapred.job.tracker</name>
    <value>10.0.0.1:60001</value>
</property>

<property>
    <name>dfs.data.dir</name>
    <value>/raid/hadoop/data</value>
</property>

<property>
    <name>mapred.local.dir</name>
    <value>/raid/hadoop/mapred</value>
</property>

<property>
   <name>hadoop.tmp.dir</name>
   <value>/raid/hadoop/tmp</value>
</property>

<property>
   <name>mapred.child.java.opts</name>
   <value>-Xmx1024m</value>
</property>

<property>
  
<name>mapred.tasktracker.tasks.maximum</name>
   <value>4</value>
</property>

<!--
<property>
   <name>mapred.map.tasks</name>
   <value>7</value>
</property>

<property>
   <name>mapred.reduce.tasks</name>
   <value>3</value>
</property>
-->

<property>
   <name>fs.inmemory.size.mb</name>
   <value>200</value>
</property>

<property>
   <name>dfs.block.size</name>
   <value>134217728</value>
</property>

<property>
   <name>io.sort.factor</name>
   <value>100</value>
</property>

<property>
   <name>io.sort.mb</name>
   <value>200</value>
</property>

<property>
   <name>io.file.buffer.size</name>
   <value>131072</value>
</property>

</configuration>


Re: Reduce hangs
user name
2008-01-18 15:59:17
I had the same problem.  If I recall, the fix is to add the
following to
your hadoop-site.xml file:

<property>
<name>mapred.reduce.copy.backoff</name>
<value>5</value>
</property>

See hadoop-1984

Miles


On 18/01/2008, Yunhong Gu1 <ygu1cs.uic.edu> wrote:
>
>
> Hi,
>
> If someone knows how to fix the problem described
below, please help me
> out. Thanks!
>
> I am testing Hadoop on 2-node cluster and the
"reduce" always hangs at
> some stage, even if I use different clusters. My OS is
Debian Linux kernel
> 2.6 (AMD Opteron w/ 4GB Mem). Hadoop verision is
0.15.2. Java version is
> 1.5.0_01-b08.
>
> I simply tried "./bin/hadoop jar
hadoop-0.15.2-test.jar mrbench" and when
> the map stage finishes, the reduce stage will hang
somewhere in the
> middle, sometimes at 0%. I also tried any other
mapreduce program I can
> find in the example jar package but they all hang.
>
> The log file simply print
> 2008-01-18 15:15:50,831 INFO
org.apache.hadoop.mapred.TaskTracker:
> task_200801181424_0004_r_000000_0 0.0% reduce > copy
>
> 2008-01-18 15:15:56,841 INFO
org.apache.hadoop.mapred.TaskTracker:
> task_200801181424_0004_r_000000_0 0.0% reduce > copy
>
> 2008-01-18 15:16:02,850 INFO
org.apache.hadoop.mapred.TaskTracker:
> task_200801181424_0004_r_000000_0 0.0% reduce > copy
>
>
> forever.
>
> The program does work if I start Hadoop only on single
node.
>
> Below is my hadoop-site.xml configuration:
>
> <configuration>
>
> <property>
>     <name>fs.default.name</name>
>     <value>10.0.0.1:60000</value>
> </property>
>
> <property>
>     <name>mapred.job.tracker</name>
>     <value>10.0.0.1:60001</value>
> </property>
>
> <property>
>     <name>dfs.data.dir</name>
>     <value>/raid/hadoop/data</value>
> </property>
>
> <property>
>     <name>mapred.local.dir</name>
>     <value>/raid/hadoop/mapred</value>
> </property>
>
> <property>
>    <name>hadoop.tmp.dir</name>
>    <value>/raid/hadoop/tmp</value>
> </property>
>
> <property>
>    <name>mapred.child.java.opts</name>
>    <value>-Xmx1024m</value>
> </property>
>
> <property>
>   
<name>mapred.tasktracker.tasks.maximum</name>
>    <value>4</value>
> </property>
>
> <!--
> <property>
>    <name>mapred.map.tasks</name>
>    <value>7</value>
> </property>
>
> <property>
>    <name>mapred.reduce.tasks</name>
>    <value>3</value>
> </property>
> -->
>
> <property>
>    <name>fs.inmemory.size.mb</name>
>    <value>200</value>
> </property>
>
> <property>
>    <name>dfs.block.size</name>
>    <value>134217728</value>
> </property>
>
> <property>
>    <name>io.sort.factor</name>
>    <value>100</value>
> </property>
>
> <property>
>    <name>io.sort.mb</name>
>    <value>200</value>
> </property>
>
> <property>
>    <name>io.file.buffer.size</name>
>    <value>131072</value>
> </property>
>
> </configuration>
>
>
Re: Reduce hangs
country flaguser name
United States
2008-01-18 16:13:03
Hi, Miles,

Thanks for your information. I applied this but the problem
still exists. 
By the way, when this happens, the CPUs are idle and doing
nothing.

Yunhong

On Fri, 18 Jan 2008, Miles Osborne wrote:

> I had the same problem.  If I recall, the fix is to add
the following to
> your hadoop-site.xml file:
>
> <property>
> <name>mapred.reduce.copy.backoff</name>
> <value>5</value>
> </property>
>
> See hadoop-1984
>
> Miles
>
>
> On 18/01/2008, Yunhong Gu1 <ygu1cs.uic.edu> wrote:
>>
>>
>> Hi,
>>
>> If someone knows how to fix the problem described
below, please help me
>> out. Thanks!
>>
>> I am testing Hadoop on 2-node cluster and the
"reduce" always hangs at
>> some stage, even if I use different clusters. My OS
is Debian Linux kernel
>> 2.6 (AMD Opteron w/ 4GB Mem). Hadoop verision is
0.15.2. Java version is
>> 1.5.0_01-b08.
>>
>> I simply tried "./bin/hadoop jar
hadoop-0.15.2-test.jar mrbench" and when
>> the map stage finishes, the reduce stage will hang
somewhere in the
>> middle, sometimes at 0%. I also tried any other
mapreduce program I can
>> find in the example jar package but they all hang.
>>
>> The log file simply print
>> 2008-01-18 15:15:50,831 INFO
org.apache.hadoop.mapred.TaskTracker:
>> task_200801181424_0004_r_000000_0 0.0% reduce >
copy >
>> 2008-01-18 15:15:56,841 INFO
org.apache.hadoop.mapred.TaskTracker:
>> task_200801181424_0004_r_000000_0 0.0% reduce >
copy >
>> 2008-01-18 15:16:02,850 INFO
org.apache.hadoop.mapred.TaskTracker:
>> task_200801181424_0004_r_000000_0 0.0% reduce >
copy >
>>
>> forever.
>>
>> The program does work if I start Hadoop only on
single node.
>>
>> Below is my hadoop-site.xml configuration:
>>
>> <configuration>
>>
>> <property>
>>     <name>fs.default.name</name>
>>     <value>10.0.0.1:60000</value>
>> </property>
>>
>> <property>
>>     <name>mapred.job.tracker</name>
>>     <value>10.0.0.1:60001</value>
>> </property>
>>
>> <property>
>>     <name>dfs.data.dir</name>
>>     <value>/raid/hadoop/data</value>
>> </property>
>>
>> <property>
>>     <name>mapred.local.dir</name>
>>     <value>/raid/hadoop/mapred</value>
>> </property>
>>
>> <property>
>>    <name>hadoop.tmp.dir</name>
>>    <value>/raid/hadoop/tmp</value>
>> </property>
>>
>> <property>
>>    <name>mapred.child.java.opts</name>
>>    <value>-Xmx1024m</value>
>> </property>
>>
>> <property>
>>   
<name>mapred.tasktracker.tasks.maximum</name>
>>    <value>4</value>
>> </property>
>>
>> <!--
>> <property>
>>    <name>mapred.map.tasks</name>
>>    <value>7</value>
>> </property>
>>
>> <property>
>>    <name>mapred.reduce.tasks</name>
>>    <value>3</value>
>> </property>
>> -->
>>
>> <property>
>>    <name>fs.inmemory.size.mb</name>
>>    <value>200</value>
>> </property>
>>
>> <property>
>>    <name>dfs.block.size</name>
>>    <value>134217728</value>
>> </property>
>>
>> <property>
>>    <name>io.sort.factor</name>
>>    <value>100</value>
>> </property>
>>
>> <property>
>>    <name>io.sort.mb</name>
>>    <value>200</value>
>> </property>
>>
>> <property>
>>    <name>io.file.buffer.size</name>
>>    <value>131072</value>
>> </property>
>>
>> </configuration>
>>
>>
>

Re: Reduce hangs
user name
2008-01-18 16:16:18
I think it takes a while to actually work, so be patient!

Miles

On 18/01/2008, Yunhong Gu1 <ygu1cs.uic.edu> wrote:
>
>
> Hi, Miles,
>
> Thanks for your information. I applied this but the
problem still exists.
> By the way, when this happens, the CPUs are idle and
doing nothing.
>
> Yunhong
>
> On Fri, 18 Jan 2008, Miles Osborne wrote:
>
> > I had the same problem.  If I recall, the fix is
to add the following to
> > your hadoop-site.xml file:
> >
> > <property>
> >
<name>mapred.reduce.copy.backoff</name>
> > <value>5</value>
> > </property>
> >
> > See hadoop-1984
> >
> > Miles
> >
> >
> > On 18/01/2008, Yunhong Gu1 <ygu1cs.uic.edu> wrote:
> >>
> >>
> >> Hi,
> >>
> >> If someone knows how to fix the problem
described below, please help me
> >> out. Thanks!
> >>
> >> I am testing Hadoop on 2-node cluster and the
"reduce" always hangs at
> >> some stage, even if I use different clusters.
My OS is Debian Linux
> kernel
> >> 2.6 (AMD Opteron w/ 4GB Mem). Hadoop verision
is 0.15.2. Java version
> is
> >> 1.5.0_01-b08.
> >>
> >> I simply tried "./bin/hadoop jar
hadoop-0.15.2-test.jar mrbench" and
> when
> >> the map stage finishes, the reduce stage will
hang somewhere in the
> >> middle, sometimes at 0%. I also tried any
other mapreduce program I can
> >> find in the example jar package but they all
hang.
> >>
> >> The log file simply print
> >> 2008-01-18 15:15:50,831 INFO
org.apache.hadoop.mapred.TaskTracker:
> >> task_200801181424_0004_r_000000_0 0.0% reduce
> copy >
> >> 2008-01-18 15:15:56,841 INFO
org.apache.hadoop.mapred.TaskTracker:
> >> task_200801181424_0004_r_000000_0 0.0% reduce
> copy >
> >> 2008-01-18 15:16:02,850 INFO
org.apache.hadoop.mapred.TaskTracker:
> >> task_200801181424_0004_r_000000_0 0.0% reduce
> copy >
> >>
> >> forever.
> >>
> >> The program does work if I start Hadoop only
on single node.
> >>
> >> Below is my hadoop-site.xml configuration:
> >>
> >> <configuration>
> >>
> >> <property>
> >>     <name>fs.default.name</name>
> >>     <value>10.0.0.1:60000</value>
> >> </property>
> >>
> >> <property>
> >>    
<name>mapred.job.tracker</name>
> >>     <value>10.0.0.1:60001</value>
> >> </property>
> >>
> >> <property>
> >>     <name>dfs.data.dir</name>
> >>    
<value>/raid/hadoop/data</value>
> >> </property>
> >>
> >> <property>
> >>     <name>mapred.local.dir</name>
> >>    
<value>/raid/hadoop/mapred</value>
> >> </property>
> >>
> >> <property>
> >>    <name>hadoop.tmp.dir</name>
> >>   
<value>/raid/hadoop/tmp</value>
> >> </property>
> >>
> >> <property>
> >>   
<name>mapred.child.java.opts</name>
> >>    <value>-Xmx1024m</value>
> >> </property>
> >>
> >> <property>
> >>   
<name>mapred.tasktracker.tasks.maximum</name>
> >>    <value>4</value>
> >> </property>
> >>
> >> <!--
> >> <property>
> >>    <name>mapred.map.tasks</name>
> >>    <value>7</value>
> >> </property>
> >>
> >> <property>
> >>   
<name>mapred.reduce.tasks</name>
> >>    <value>3</value>
> >> </property>
> >> -->
> >>
> >> <property>
> >>   
<name>fs.inmemory.size.mb</name>
> >>    <value>200</value>
> >> </property>
> >>
> >> <property>
> >>    <name>dfs.block.size</name>
> >>    <value>134217728</value>
> >> </property>
> >>
> >> <property>
> >>    <name>io.sort.factor</name>
> >>    <value>100</value>
> >> </property>
> >>
> >> <property>
> >>    <name>io.sort.mb</name>
> >>    <value>200</value>
> >> </property>
> >>
> >> <property>
> >>   
<name>io.file.buffer.size</name>
> >>    <value>131072</value>
> >> </property>
> >>
> >> </configuration>
> >>
> >>
> >
>
Re: Reduce hangs
country flaguser name
United States
2008-01-18 17:44:34
When this was happening to us, there was a block replication
error and 
one node was in an endless loop trying to replicate a block
to another 
node which would not accept it. In our case most of the
cluster was idle 
but a cpu on the machine trying send the block was heavily
used.

We never were able to isolate the cause, and it stopped
happening for us 
when we upgraded to 0.15.1

---
Attributor is hiring Hadoop Wranglers, contact if
interested.

Yunhong Gu1 wrote:
>
> Hi,
>
> If someone knows how to fix the problem described
below, please help 
> me out. Thanks!
>
> I am testing Hadoop on 2-node cluster and the
"reduce" always hangs at 
> some stage, even if I use different clusters. My OS is
Debian Linux 
> kernel 2.6 (AMD Opteron w/ 4GB Mem). Hadoop verision is
0.15.2. Java 
> version is 1.5.0_01-b08.
>
> I simply tried "./bin/hadoop jar
hadoop-0.15.2-test.jar mrbench" and 
> when the map stage finishes, the reduce stage will hang
somewhere in 
> the middle, sometimes at 0%. I also tried any other
mapreduce program 
> I can find in the example jar package but they all
hang.
>
> The log file simply print
> 2008-01-18 15:15:50,831 INFO
org.apache.hadoop.mapred.TaskTracker: 
> task_200801181424_0004_r_000000_0 0.0% reduce > copy
>
> 2008-01-18 15:15:56,841 INFO
org.apache.hadoop.mapred.TaskTracker: 
> task_200801181424_0004_r_000000_0 0.0% reduce > copy
>
> 2008-01-18 15:16:02,850 INFO
org.apache.hadoop.mapred.TaskTracker: 
> task_200801181424_0004_r_000000_0 0.0% reduce > copy
>
>
> forever.
>
> The program does work if I start Hadoop only on single
node.
>
> Below is my hadoop-site.xml configuration:
>
> <configuration>
>
> <property>
>    <name>fs.default.name</name>
>    <value>10.0.0.1:60000</value>
> </property>
>
> <property>
>    <name>mapred.job.tracker</name>
>    <value>10.0.0.1:60001</value>
> </property>
>
> <property>
>    <name>dfs.data.dir</name>
>    <value>/raid/hadoop/data</value>
> </property>
>
> <property>
>    <name>mapred.local.dir</name>
>    <value>/raid/hadoop/mapred</value>
> </property>
>
> <property>
>   <name>hadoop.tmp.dir</name>
>   <value>/raid/hadoop/tmp</value>
> </property>
>
> <property>
>   <name>mapred.child.java.opts</name>
>   <value>-Xmx1024m</value>
> </property>
>
> <property>
>  
<name>mapred.tasktracker.tasks.maximum</name>
>   <value>4</value>
> </property>
>
> <!--
> <property>
>   <name>mapred.map.tasks</name>
>   <value>7</value>
> </property>
>
> <property>
>   <name>mapred.reduce.tasks</name>
>   <value>3</value>
> </property>
> -->
>
> <property>
>   <name>fs.inmemory.size.mb</name>
>   <value>200</value>
> </property>
>
> <property>
>   <name>dfs.block.size</name>
>   <value>134217728</value>
> </property>
>
> <property>
>   <name>io.sort.factor</name>
>   <value>100</value>
> </property>
>
> <property>
>   <name>io.sort.mb</name>
>   <value>200</value>
> </property>
>
> <property>
>   <name>io.file.buffer.size</name>
>   <value>131072</value>
> </property>
>
> </configuration>
>

Re: Reduce hangs
country flaguser name
United States
2008-01-18 18:09:24
I am using 0.15.2, and in my case, CPUs on both nodes are
idle. It looks 
like the program is trapped into a synchronization deadlock
or some 
waiting state that will never be awaken.

Yunhong

On Fri, 18 Jan 2008, Jason Venner wrote:

> When this was happening to us, there was a block
replication error and one 
> node was in an endless loop trying to replicate a block
to another node which 
> would not accept it. In our case most of the cluster
was idle but a cpu on 
> the machine trying send the block was heavily used.
>
> We never were able to isolate the cause, and it stopped
happening for us when 
> we upgraded to 0.15.1
>
> ---
> Attributor is hiring Hadoop Wranglers, contact if
interested.
>
> Yunhong Gu1 wrote:
>> 
>> Hi,
>> 
>> If someone knows how to fix the problem described
below, please help me 
>> out. Thanks!
>> 
>> I am testing Hadoop on 2-node cluster and the
"reduce" always hangs at some 
>> stage, even if I use different clusters. My OS is
Debian Linux kernel 2.6 
>> (AMD Opteron w/ 4GB Mem). Hadoop verision is
0.15.2. Java version is 
>> 1.5.0_01-b08.
>> 
>> I simply tried "./bin/hadoop jar
hadoop-0.15.2-test.jar mrbench" and when 
>> the map stage finishes, the reduce stage will hang
somewhere in the middle, 
>> sometimes at 0%. I also tried any other mapreduce
program I can find in the 
>> example jar package but they all hang.
>> 
>> The log file simply print
>> 2008-01-18 15:15:50,831 INFO
org.apache.hadoop.mapred.TaskTracker: 
>> task_200801181424_0004_r_000000_0 0.0% reduce >
copy >
>> 2008-01-18 15:15:56,841 INFO
org.apache.hadoop.mapred.TaskTracker: 
>> task_200801181424_0004_r_000000_0 0.0% reduce >
copy >
>> 2008-01-18 15:16:02,850 INFO
org.apache.hadoop.mapred.TaskTracker: 
>> task_200801181424_0004_r_000000_0 0.0% reduce >
copy >
>> 
>> forever.
>> 
>> The program does work if I start Hadoop only on
single node.
>> 
>> Below is my hadoop-site.xml configuration:
>> 
>> <configuration>
>> 
>> <property>
>>    <name>fs.default.name</name>
>>    <value>10.0.0.1:60000</value>
>> </property>
>> 
>> <property>
>>    <name>mapred.job.tracker</name>
>>    <value>10.0.0.1:60001</value>
>> </property>
>> 
>> <property>
>>    <name>dfs.data.dir</name>
>>    <value>/raid/hadoop/data</value>
>> </property>
>> 
>> <property>
>>    <name>mapred.local.dir</name>
>>    <value>/raid/hadoop/mapred</value>
>> </property>
>> 
>> <property>
>>   <name>hadoop.tmp.dir</name>
>>   <value>/raid/hadoop/tmp</value>
>> </property>
>> 
>> <property>
>>   <name>mapred.child.java.opts</name>
>>   <value>-Xmx1024m</value>
>> </property>
>> 
>> <property>
>>  
<name>mapred.tasktracker.tasks.maximum</name>
>>   <value>4</value>
>> </property>
>> 
>> <!--
>> <property>
>>   <name>mapred.map.tasks</name>
>>   <value>7</value>
>> </property>
>> 
>> <property>
>>   <name>mapred.reduce.tasks</name>
>>   <value>3</value>
>> </property>
>> -->
>> 
>> <property>
>>   <name>fs.inmemory.size.mb</name>
>>   <value>200</value>
>> </property>
>> 
>> <property>
>>   <name>dfs.block.size</name>
>>   <value>134217728</value>
>> </property>
>> 
>> <property>
>>   <name>io.sort.factor</name>
>>   <value>100</value>
>> </property>
>> 
>> <property>
>>   <name>io.sort.mb</name>
>>   <value>200</value>
>> </property>
>> 
>> <property>
>>   <name>io.file.buffer.size</name>
>>   <value>131072</value>
>> </property>
>> 
>> </configuration>
>> 
>

Re: Reduce hangs
country flaguser name
United States
2008-01-18 18:10:36
The program "mrbench" takes 1 second on a single
node, so I think waiting 
for 1 minute should be long enough. And I also restarted
Hadoop after I 
updated the config file.

Yunhong


On Fri, 18 Jan 2008, Miles Osborne wrote:

> I think it takes a while to actually work, so be
patient!
>
> Miles
>
> On 18/01/2008, Yunhong Gu1 <ygu1cs.uic.edu> wrote:
>>
>>
>> Hi, Miles,
>>
>> Thanks for your information. I applied this but the
problem still exists.
>> By the way, when this happens, the CPUs are idle
and doing nothing.
>>
>> Yunhong
>>
>> On Fri, 18 Jan 2008, Miles Osborne wrote:
>>
>>> I had the same problem.  If I recall, the fix
is to add the following to
>>> your hadoop-site.xml file:
>>>
>>> <property>
>>>
<name>mapred.reduce.copy.backoff</name>
>>> <value>5</value>
>>> </property>
>>>
>>> See hadoop-1984
>>>
>>> Miles
>>>
>>>
>>> On 18/01/2008, Yunhong Gu1 <ygu1cs.uic.edu> wrote:
>>>>
>>>>
>>>> Hi,
>>>>
>>>> If someone knows how to fix the problem
described below, please help me
>>>> out. Thanks!
>>>>
>>>> I am testing Hadoop on 2-node cluster and
the "reduce" always hangs at
>>>> some stage, even if I use different
clusters. My OS is Debian Linux
>> kernel
>>>> 2.6 (AMD Opteron w/ 4GB Mem). Hadoop
verision is 0.15.2. Java version
>> is
>>>> 1.5.0_01-b08.
>>>>
>>>> I simply tried "./bin/hadoop jar
hadoop-0.15.2-test.jar mrbench" and
>> when
>>>> the map stage finishes, the reduce stage
will hang somewhere in the
>>>> middle, sometimes at 0%. I also tried any
other mapreduce program I can
>>>> find in the example jar package but they
all hang.
>>>>
>>>> The log file simply print
>>>> 2008-01-18 15:15:50,831 INFO
org.apache.hadoop.mapred.TaskTracker:
>>>> task_200801181424_0004_r_000000_0 0.0%
reduce > copy >
>>>> 2008-01-18 15:15:56,841 INFO
org.apache.hadoop.mapred.TaskTracker:
>>>> task_200801181424_0004_r_000000_0 0.0%
reduce > copy >
>>>> 2008-01-18 15:16:02,850 INFO
org.apache.hadoop.mapred.TaskTracker:
>>>> task_200801181424_0004_r_000000_0 0.0%
reduce > copy >
>>>>
>>>> forever.
>>>>
>>>> The program does work if I start Hadoop
only on single node.
>>>>
>>>> Below is my hadoop-site.xml configuration:
>>>>
>>>> <configuration>
>>>>
>>>> <property>
>>>>    
<name>fs.default.name</name>
>>>>    
<value>10.0.0.1:60000</value>
>>>> </property>
>>>>
>>>> <property>
>>>>    
<name>mapred.job.tracker</name>
>>>>    
<value>10.0.0.1:60001</value>
>>>> </property>
>>>>
>>>> <property>
>>>>     <name>dfs.data.dir</name>
>>>>    
<value>/raid/hadoop/data</value>
>>>> </property>
>>>>
>>>> <property>
>>>>    
<name>mapred.local.dir</name>
>>>>    
<value>/raid/hadoop/mapred</value>
>>>> </property>
>>>>
>>>> <property>
>>>>    <name>hadoop.tmp.dir</name>
>>>>   
<value>/raid/hadoop/tmp</value>
>>>> </property>
>>>>
>>>> <property>
>>>>   
<name>mapred.child.java.opts</name>
>>>>    <value>-Xmx1024m</value>
>>>> </property>
>>>>
>>>> <property>
>>>>   
<name>mapred.tasktracker.tasks.maximum</name>
>>>>    <value>4</value>
>>>> </property>
>>>>
>>>> <!--
>>>> <property>
>>>>   
<name>mapred.map.tasks</name>
>>>>    <value>7</value>
>>>> </property>
>>>>
>>>> <property>
>>>>   
<name>mapred.reduce.tasks</name>
>>>>    <value>3</value>
>>>> </property>
>>>> -->
>>>>
>>>> <property>
>>>>   
<name>fs.inmemory.size.mb</name>
>>>>    <value>200</value>
>>>> </property>
>>>>
>>>> <property>
>>>>    <name>dfs.block.size</name>
>>>>    <value>134217728</value>
>>>> </property>
>>>>
>>>> <property>
>>>>    <name>io.sort.factor</name>
>>>>    <value>100</value>
>>>> </property>
>>>>
>>>> <property>
>>>>    <name>io.sort.mb</name>
>>>>    <value>200</value>
>>>> </property>
>>>>
>>>> <property>
>>>>   
<name>io.file.buffer.size</name>
>>>>    <value>131072</value>
>>>> </property>
>>>>
>>>> </configuration>
>>>>
>>>>
>>>
>>
>

Re: Reduce hangs
country flaguser name
United States
2008-01-18 19:56:36
Looks like we still have this unsolved mysterious problem:

http
://issues.apache.org/jira/browse/HADOOP-1374

Could it be related to HADOOP-1246? Arun?

Thanks,
--Konstantin

Yunhong Gu1 wrote:
> 
> Hi,
> 
> If someone knows how to fix the problem described
below, please help me 
> out. Thanks!
> 
> I am testing Hadoop on 2-node cluster and the
"reduce" always hangs at 
> some stage, even if I use different clusters. My OS is
Debian Linux 
> kernel 2.6 (AMD Opteron w/ 4GB Mem). Hadoop verision is
0.15.2. Java 
> version is 1.5.0_01-b08.
> 
> I simply tried "./bin/hadoop jar
hadoop-0.15.2-test.jar mrbench" and 
> when the map stage finishes, the reduce stage will hang
somewhere in the 
> middle, sometimes at 0%. I also tried any other
mapreduce program I can 
> find in the example jar package but they all hang.
> 
> The log file simply print
> 2008-01-18 15:15:50,831 INFO
org.apache.hadoop.mapred.TaskTracker: 
> task_200801181424_0004_r_000000_0 0.0% reduce > copy
>
> 2008-01-18 15:15:56,841 INFO
org.apache.hadoop.mapred.TaskTracker: 
> task_200801181424_0004_r_000000_0 0.0% reduce > copy
>
> 2008-01-18 15:16:02,850 INFO
org.apache.hadoop.mapred.TaskTracker: 
> task_200801181424_0004_r_000000_0 0.0% reduce > copy
>
> 
> forever.
> 
> The program does work if I start Hadoop only on single
node.
> 
> Below is my hadoop-site.xml configuration:
> 
> <configuration>
> 
> <property>
>    <name>fs.default.name</name>
>    <value>10.0.0.1:60000</value>
> </property>
> 
> <property>
>    <name>mapred.job.tracker</name>
>    <value>10.0.0.1:60001</value>
> </property>
> 
> <property>
>    <name>dfs.data.dir</name>
>    <value>/raid/hadoop/data</value>
> </property>
> 
> <property>
>    <name>mapred.local.dir</name>
>    <value>/raid/hadoop/mapred</value>
> </property>
> 
> <property>
>   <name>hadoop.tmp.dir</name>
>   <value>/raid/hadoop/tmp</value>
> </property>
> 
> <property>
>   <name>mapred.child.java.opts</name>
>   <value>-Xmx1024m</value>
> </property>
> 
> <property>
>  
<name>mapred.tasktracker.tasks.maximum</name>
>   <value>4</value>
> </property>
> 
> <!--
> <property>
>   <name>mapred.map.tasks</name>
>   <value>7</value>
> </property>
> 
> <property>
>   <name>mapred.reduce.tasks</name>
>   <value>3</value>
> </property>
> -->
> 
> <property>
>   <name>fs.inmemory.size.mb</name>
>   <value>200</value>
> </property>
> 
> <property>
>   <name>dfs.block.size</name>
>   <value>134217728</value>
> </property>
> 
> <property>
>   <name>io.sort.factor</name>
>   <value>100</value>
> </property>
> 
> <property>
>   <name>io.sort.mb</name>
>   <value>200</value>
> </property>
> 
> <property>
>   <name>io.file.buffer.size</name>
>   <value>131072</value>
> </property>
> 
> </configuration>
> 
> 

Re: Reduce hangs
country flaguser name
United States
2008-01-18 21:25:55

Yes, it looks like HADOOP-1374

The program actually failed after a while:


guncdm-8:~/hadoop-0.15.2$ ./bin/hadoop jar
hadoop-0.15.2-test.jar mrbench
MRBenchmark.0.0.2
08/01/18 18:53:08 INFO mapred.MRBench: creating control
file: 1 numLines, 
ASCENDING sortOrder
08/01/18 18:53:08 INFO mapred.MRBench: created control file:

/benchmarks/MRBench/mr_input/input_-450753747.txt
08/01/18 18:53:09 INFO mapred.MRBench: Running job 0: 
input=/benchmarks/MRBench/mr_input 
output=/benchmarks/MRBench/mr_output/output_1843693325
08/01/18 18:53:09 INFO mapred.FileInputFormat: Total input
paths to process : 1
08/01/18 18:53:09 INFO mapred.JobClient: Running job:
job_200801181852_0001
08/01/18 18:53:10 INFO mapred.JobClient:  map 0% reduce 0%
08/01/18 18:53:17 INFO mapred.JobClient:  map 100% reduce
0%
08/01/18 18:53:25 INFO mapred.JobClient:  map 100% reduce
16%
08/01/18 19:08:27 INFO mapred.JobClient: Task Id : 
task_200801181852_0001_m_000001_0, Status : FAILED
Too many fetch-failures
08/01/18 19:08:27 WARN mapred.JobClient: Error reading task
outputncdm15
08/01/18 19:08:27 WARN mapred.JobClient: Error reading task
outputncdm15
08/01/18 19:08:34 INFO mapred.JobClient:  map 100% reduce
100%
08/01/18 19:08:35 INFO mapred.JobClient: Job complete:
job_200801181852_0001
08/01/18 19:08:35 INFO mapred.JobClient: Counters: 10
08/01/18 19:08:35 INFO mapred.JobClient:   Job Counters
08/01/18 19:08:35 INFO mapred.JobClient:     Launched map
tasks=3
08/01/18 19:08:35 INFO mapred.JobClient:     Launched reduce
tasks=1
08/01/18 19:08:35 INFO mapred.JobClient:     Data-local map
tasks=2
08/01/18 19:08:35 INFO mapred.JobClient:   Map-Reduce
Framework
08/01/18 19:08:35 INFO mapred.JobClient:     Map input
records=1
08/01/18 19:08:35 INFO mapred.JobClient:     Map output
records=1
08/01/18 19:08:35 INFO mapred.JobClient:     Map input
bytes=2
08/01/18 19:08:35 INFO mapred.JobClient:     Map output
bytes=5
08/01/18 19:08:35 INFO mapred.JobClient:     Reduce input
groups=1
08/01/18 19:08:35 INFO mapred.JobClient:     Reduce input
records=1
08/01/18 19:08:35 INFO mapred.JobClient:     Reduce output
records=1
DataLines       Maps    Reduces AvgTime (milliseconds)
1               2       1       926333



On Fri, 18 Jan 2008, Konstantin Shvachko wrote:

> Looks like we still have this unsolved mysterious
problem:
>
> http
://issues.apache.org/jira/browse/HADOOP-1374
>
> Could it be related to HADOOP-1246? Arun?
>
> Thanks,
> --Konstantin
>
> Yunhong Gu1 wrote:
>> 
>> Hi,
>> 
>> If someone knows how to fix the problem described
below, please help me 
>> out. Thanks!
>> 
>> I am testing Hadoop on 2-node cluster and the
"reduce" always hangs at some 
>> stage, even if I use different clusters. My OS is
Debian Linux kernel 2.6 
>> (AMD Opteron w/ 4GB Mem). Hadoop verision is
0.15.2. Java version is 
>> 1.5.0_01-b08.
>> 
>> I simply tried "./bin/hadoop jar
hadoop-0.15.2-test.jar mrbench" and when 
>> the map stage finishes, the reduce stage will hang
somewhere in the middle, 
>> sometimes at 0%. I also tried any other mapreduce
program I can find in the 
>> example jar package but they all hang.
>> 
>> The log file simply print
>> 2008-01-18 15:15:50,831 INFO
org.apache.hadoop.mapred.TaskTracker: 
>> task_200801181424_0004_r_000000_0 0.0% reduce >
copy >
>> 2008-01-18 15:15:56,841 INFO
org.apache.hadoop.mapred.TaskTracker: 
>> task_200801181424_0004_r_000000_0 0.0% reduce >
copy >
>> 2008-01-18 15:16:02,850 INFO
org.apache.hadoop.mapred.TaskTracker: 
>> task_200801181424_0004_r_000000_0 0.0% reduce >
copy >
>> 
>> forever.
>> 
>> The program does work if I start Hadoop only on
single node.
>> 
>> Below is my hadoop-site.xml configuration:
>> 
>> <configuration>
>> 
>> <property>
>>    <name>fs.default.name</name>
>>    <value>10.0.0.1:60000</value>
>> </property>
>> 
>> <property>
>>    <name>mapred.job.tracker</name>
>>    <value>10.0.0.1:60001</value>
>> </property>
>> 
>> <property>
>>    <name>dfs.data.dir</name>
>>    <value>/raid/hadoop/data</value>
>> </property>
>> 
>> <property>
>>    <name>mapred.local.dir</name>
>>    <value>/raid/hadoop/mapred</value>
>> </property>
>> 
>> <property>
>>   <name>hadoop.tmp.dir</name>
>>   <value>/raid/hadoop/tmp</value>
>> </property>
>> 
>> <property>
>>   <name>mapred.child.java.opts</name>
>>   <value>-Xmx1024m</value>
>> </property>
>> 
>> <property>
>>  
<name>mapred.tasktracker.tasks.maximum</name>
>>   <value>4</value>
>> </property>
>> 
>> <!--
>> <property>
>>   <name>mapred.map.tasks</name>
>>   <value>7</value>
>> </property>
>> 
>> <property>
>>   <name>mapred.reduce.tasks</name>
>>   <value>3</value>
>> </property>
>> -->
>> 
>> <property>
>>   <name>fs.inmemory.size.mb</name>
>>   <value>200</value>
>> </property>
>> 
>> <property>
>>   <name>dfs.block.size</name>
>>   <value>134217728</value>
>> </property>
>> 
>> <property>
>>   <name>io.sort.factor</name>
>>   <value>100</value>
>> </property>
>> 
>> <property>
>>   <name>io.sort.mb</name>
>>   <value>200</value>
>> </property>
>> 
>> <property>
>>   <name>io.file.buffer.size</name>
>>   <value>131072</value>
>> </property>
>> 
>> </configuration>
>> 
>> 
>

RE: Reduce hangs
country flaguser name
Hong Kong
2008-01-19 00:02:15
Hi Yunhong,
As per the output it seems the job ran to successful
completion (albeit with
some failures)... 
Devaraj 

> -----Original Message-----
> From: Yunhong Gu1 [mailto:ygu1cs.uic.edu] 
> Sent: Saturday, January 19, 2008 8:56 AM
> To: hadoop-userlucene.apache.org
> Subject: Re: Reduce hangs
> 
> 
> 
> Yes, it looks like HADOOP-1374
> 
> The program actually failed after a while:
> 
> 
> guncdm-8:~/hadoop-0.15.2$ ./bin/hadoop jar 
> hadoop-0.15.2-test.jar mrbench
> MRBenchmark.0.0.2
> 08/01/18 18:53:08 INFO mapred.MRBench: creating control
file: 
> 1 numLines, ASCENDING sortOrder
> 08/01/18 18:53:08 INFO mapred.MRBench: created control
file: 
> /benchmarks/MRBench/mr_input/input_-450753747.txt
> 08/01/18 18:53:09 INFO mapred.MRBench: Running job 0: 
> input=/benchmarks/MRBench/mr_input
> output=/benchmarks/MRBench/mr_output/output_1843693325
> 08/01/18 18:53:09 INFO mapred.FileInputFormat: Total
input 
> paths to process : 1
> 08/01/18 18:53:09 INFO mapred.JobClient: Running job: 
> job_200801181852_0001
> 08/01/18 18:53:10 INFO mapred.JobClient:  map 0% reduce
0%
> 08/01/18 18:53:17 INFO mapred.JobClient:  map 100%
reduce 0%
> 08/01/18 18:53:25 INFO mapred.JobClient:  map 100%
reduce 16%
> 08/01/18 19:08:27 INFO mapred.JobClient: Task Id : 
> task_200801181852_0001_m_000001_0, Status : FAILED Too
many 
> fetch-failures
> 08/01/18 19:08:27 WARN mapred.JobClient: Error reading
task 
> outputncdm15
> 08/01/18 19:08:27 WARN mapred.JobClient: Error reading
task 
> outputncdm15
> 08/01/18 19:08:34 INFO mapred.JobClient:  map 100%
reduce 100%
> 08/01/18 19:08:35 INFO mapred.JobClient: Job complete:

> job_200801181852_0001
> 08/01/18 19:08:35 INFO mapred.JobClient: Counters: 10
> 08/01/18 19:08:35 INFO mapred.JobClient:   Job
Counters
> 08/01/18 19:08:35 INFO mapred.JobClient:     Launched
map tasks=3
> 08/01/18 19:08:35 INFO mapred.JobClient:     Launched
reduce tasks=1
> 08/01/18 19:08:35 INFO mapred.JobClient:     Data-local
map tasks=2
> 08/01/18 19:08:35 INFO mapred.JobClient:   Map-Reduce
Framework
> 08/01/18 19:08:35 INFO mapred.JobClient:     Map input
records=1
> 08/01/18 19:08:35 INFO mapred.JobClient:     Map output
records=1
> 08/01/18 19:08:35 INFO mapred.JobClient:     Map input
bytes=2
> 08/01/18 19:08:35 INFO mapred.JobClient:     Map output
bytes=5
> 08/01/18 19:08:35 INFO mapred.JobClient:     Reduce
input groups=1
> 08/01/18 19:08:35 INFO mapred.JobClient:     Reduce
input records=1
> 08/01/18 19:08:35 INFO mapred.JobClient:     Reduce
output records=1
> DataLines       Maps    Reduces AvgTime (milliseconds)
> 1               2       1       926333
> 
> 
> 
> On Fri, 18 Jan 2008, Konstantin Shvachko wrote:
> 
> > Looks like we still have this unsolved mysterious
problem:
> >
> > http
://issues.apache.org/jira/browse/HADOOP-1374
> >
> > Could it be related to HADOOP-1246? Arun?
> >
> > Thanks,
> > --Konstantin
> >
> > Yunhong Gu1 wrote:
> >> 
> >> Hi,
> >> 
> >> If someone knows how to fix the problem
described below, 
> please help 
> >> me out. Thanks!
> >> 
> >> I am testing Hadoop on 2-node cluster and the
"reduce" 
> always hangs 
> >> at some stage, even if I use different
clusters. My OS is Debian 
> >> Linux kernel 2.6 (AMD Opteron w/ 4GB Mem).
Hadoop verision 
> is 0.15.2. 
> >> Java version is 1.5.0_01-b08.
> >> 
> >> I simply tried "./bin/hadoop jar
hadoop-0.15.2-test.jar 
> mrbench" and 
> >> when the map stage finishes, the reduce stage
will hang 
> somewhere in 
> >> the middle, sometimes at 0%. I also tried any
other 
> mapreduce program 
> >> I can find in the example jar package but they
all hang.
> >> 
> >> The log file simply print
> >> 2008-01-18 15:15:50,831 INFO
org.apache.hadoop.mapred.TaskTracker: 
> >> task_200801181424_0004_r_000000_0 0.0% reduce
> copy >
> >> 2008-01-18 15:15:56,841 INFO
org.apache.hadoop.mapred.TaskTracker: 
> >> task_200801181424_0004_r_000000_0 0.0% reduce
> copy >
> >> 2008-01-18 15:16:02,850 INFO
org.apache.hadoop.mapred.TaskTracker: 
> >> task_200801181424_0004_r_000000_0 0.0% reduce
> copy >
> >> 
> >> forever.
> >> 
> >> The program does work if I start Hadoop only
on single node.
> >> 
> >> Below is my hadoop-site.xml configuration:
> >> 
> >> <configuration>
> >> 
> >> <property>
> >>    <name>fs.default.name</name>
> >>    <value>10.0.0.1:60000</value>
> >> </property>
> >> 
> >> <property>
> >>   
<name>mapred.job.tracker</name>
> >>    <value>10.0.0.1:60001</value>
> >> </property>
> >> 
> >> <property>
> >>    <name>dfs.data.dir</name>
> >>   
<value>/raid/hadoop/data</value>
> >> </property>
> >> 
> >> <property>
> >>    <name>mapred.local.dir</name>
> >>   
<value>/raid/hadoop/mapred</value>
> >> </property>
> >> 
> >> <property>
> >>   <name>hadoop.tmp.dir</name>
> >>   <value>/raid/hadoop/tmp</value>
> >> </property>
> >> 
> >> <property>
> >>  
<name>mapred.child.java.opts</name>
> >>   <value>-Xmx1024m</value>
> >> </property>
> >> 
> >> <property>
> >>  
<name>mapred.tasktracker.tasks.maximum</name>
> >>   <value>4</value>
> >> </property>
> >> 
> >> <!--
> >> <property>
> >>   <name>mapred.map.tasks</name>
> >>   <value>7</value>
> >> </property>
> >> 
> >> <property>
> >>  
<name>mapred.reduce.tasks</name>
> >>   <value>3</value>
> >> </property>
> >> -->
> >> 
> >> <property>
> >>  
<name>fs.inmemory.size.mb</name>
> >>   <value>200</value>
> >> </property>
> >> 
> >> <property>
> >>   <name>dfs.block.size</name>
> >>   <value>134217728</value>
> >> </property>
> >> 
> >> <property>
> >>   <name>io.sort.factor</name>
> >>   <value>100</value>
> >> </property>
> >> 
> >> <property>
> >>   <name>io.sort.mb</name>
> >>   <value>200</value>
> >> </property>
> >> 
> >> <property>
> >>  
<name>io.file.buffer.size</name>
> >>   <value>131072</value>
> >> </property>
> >> 
> >> </configuration>
> >> 
> >> 
> >
> 


[1-10] [11]

about | contact  Other archives ( Real Estate discussion Medical topics )