List Info

Thread: 答复: Hadoop 'wordcount' program hanging in the Reduce phase.




答复: Hadoop 'wordcount' program hanging in the Reduce phase.
country flaguser name
China
2007-03-07 06:27:00
In my opinion, you should make the conf setting files both
in master and
slave node to be same. That means that the files in
conf/slaves should be
same between your small cluster.
-----邮件原件-----
发件人: Gaurav Agarwal [mailto:gauravagarwal_4yahoo.com] 
发送时间: 2007年3月7日 16:22
收件人: hadoop-userlucene.apache.org
主题: Hadoop 'wordcount' program hanging in the Reduce
phase.


Hi Everyone!
I am new user to Hadoop and trying to set up a small cluster
using Hadoop.
but I am facing some issues doing that.

I am trying to run the Hadoop 'wordcount' example program
which come bundled
with it. I am able to successfully run the program on a
single node cluster
(that is using my local machine only). But, when I try to
run the same
program on a cluster of two machines, the program hangs in
the 'reduce'
phase.


Settings:

Master Node: 192.168.1.150 (dennis-laptop)
Slave Node: 192.168.1.201 (traal)

User Account on both Master and Slave is named : Hadoop

Password-less ssh login to Slave from the Master is
working.

JAVA_HOME is set appropriately in the hadoop-env.sh file on
both
Master/Slave.

MASTER

1) conf/slaves
localhost
hadoop192.168.1.201

2) conf/master
localhost

3) conf/hadoop-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl"
href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file.
-->

<configuration>
<property>
         <name>fs.default.name</name>
         <value>192.168.1.150:50000</value>
    </property>

    <property>
         <name>mapred.job.tracker</name>
         <value>192.168.1.150:50001</value>
     </property>
        
    <property>
         <name>dfs.replication</name>
         <value>2</value>
    </property>
</configuration>

SLAVE

1) conf/slaves
localhost

2) conf/master
hadoop192.168.1.150

3) conf/hadoop-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl"
href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file.
-->

<configuration>
<property>
         <name>fs.default.name</name>
         <value>192.168.1.150:50000</value>
    </property>

    <property>
         <name>mapred.job.tracker</name>
         <value>192.168.1.150:50001</value>
     </property>
        
    <property>
         <name>dfs.replication</name>
         <value>2</value>
    </property>
</configuration>


CONSOLE OUTPUT
bin/hadoop jar hadoop-*-examples.jar wordcount -m 10 -r 2
input output
07/03/06 23:17:17 INFO mapred.InputFormatBase: Total input
paths to process
: 1
07/03/06 23:17:18 INFO mapred.JobClient: Running job:
job_0001
07/03/06 23:17:19 INFO mapred.JobClient:  map 0% reduce 0%
07/03/06 23:17:29 INFO mapred.JobClient:  map 20% reduce 0%
07/03/06 23:17:30 INFO mapred.JobClient:  map 40% reduce 0%
07/03/06 23:17:32 INFO mapred.JobClient:  map 80% reduce 0%
07/03/06 23:17:33 INFO mapred.JobClient:  map 100% reduce
0%
07/03/06 23:17:42 INFO mapred.JobClient:  map 100% reduce
3%
07/03/06 23:17:43 INFO mapred.JobClient:  map 100% reduce
5%
07/03/06 23:17:44 INFO mapred.JobClient:  map 100% reduce
8%
07/03/06 23:17:52 INFO mapred.JobClient:  map 100% reduce
10%
07/03/06 23:17:53 INFO mapred.JobClient:  map 100% reduce
13%
07/03/06 23:18:03 INFO mapred.JobClient:  map 100% reduce
16%


The only exception I can see from the log files is in the
'TaskTracker' log
file:

2007-03-06 23:17:32,214 INFO
org.apache.hadoop.mapred.TaskRunner:
task_0001_r_000000_0 Copying task_0001_m_000002_0 output
from traal.
2007-03-06 23:17:32,221 INFO
org.apache.hadoop.mapred.TaskRunner:
task_0001_r_000000_0 Copying task_0001_m_000001_0 output
from dennis-laptop.
2007-03-06 23:17:32,368 WARN
org.apache.hadoop.mapred.TaskRunner:
task_0001_r_000000_0 copy failed: task_0001_m_000002_0 from
traal
2007-03-06 23:17:32,368 WARN
org.apache.hadoop.mapred.TaskRunner:
java.io.IOException: File
/tmp/hadoop-hadoop/mapred/local/task_0001_r_000000_0/map_2.o
ut-0 not created
at
org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.co
pyOutput(ReduceT
askRunner.java:301)
at
org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.ru
n(ReduceTaskRunn
er.java:262)

2007-03-06 23:17:32,369 WARN
org.apache.hadoop.mapred.TaskRunner:
task_0001_r_000000_0 adding host traal to penalty box, next
contact in 99
seconds

I am attaching the master log files just in case anyone
wants to check them.

Any help will be greatly appreciated! 

-gaurav

http://www.nabble.com/file/7013/hadoo
p-hadoop-tasktracker-dennis-laptop.log
hadoop-hadoop-tasktracker-dennis-laptop.log </br>
http://www.nabble.com/file/7012/hadoop
-hadoop-jobtracker-dennis-laptop.log
hadoop-hadoop-jobtracker-dennis-laptop.log </br>
http://www.nabble.com/file/7011/hadoop-h
adoop-namenode-dennis-laptop.log
hadoop-hadoop-namenode-dennis-laptop.log </br>
http://www.nabble.com/file/7010/hadoop-h
adoop-datanode-dennis-laptop.log
hadoop-hadoop-datanode-dennis-laptop.log 
-- 
View this message in context:
http://www.nabble.com/Hadoop-%27word
count%27-program-hanging-in-the-Reduce-p
hase.-tf3360661.html#a9348424
Sent from the Hadoop Users mailing list archive at
Nabble.com.

Re: 绛斿: Hadoop 'wordcount' program hanging in the Reduce phase.
country flaguser name
United States
2007-03-07 15:22:06
Hi, I tried that.. same problem! thx

寮犺寕妫 wrote:
> 
> In my opinion, you should make the conf setting files
both in master and
> slave node to be same. That means that the files in
conf/slaves should be
> same between your small cluster.
> -----閭欢鍘熶欢-----
> 鍙戜欢浜: Gaurav Agarwal [mailto:gauravagarwal_4yahoo.com] 
> 鍙戦佹椂闂: 2007骞3鏈7鏃 16:22
> 鏀朵欢浜: hadoop-userlucene.apache.org
> 涓婚: Hadoop 'wordcount' program hanging in the
Reduce phase.
> 
> 
> Hi Everyone!
> I am new user to Hadoop and trying to set up a small
cluster using Hadoop.
> but I am facing some issues doing that.
> 
> I am trying to run the Hadoop 'wordcount' example
program which come
> bundled
> with it. I am able to successfully run the program on a
single node
> cluster
> (that is using my local machine only). But, when I try
to run the same
> program on a cluster of two machines, the program hangs
in the 'reduce'
> phase.
> 
> 
> Settings:
> 
> Master Node: 192.168.1.150 (dennis-laptop)
> Slave Node: 192.168.1.201 (traal)
> 
> User Account on both Master and Slave is named :
Hadoop
> 
> Password-less ssh login to Slave from the Master is
working.
> 
> JAVA_HOME is set appropriately in the hadoop-env.sh
file on both
> Master/Slave.
> 
> MASTER
> 
> 1) conf/slaves
> localhost
> hadoop192.168.1.201
> 
> 2) conf/master
> localhost
> 
> 3) conf/hadoop-site.xml
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl"
href="configuration.xsl"?>
> 
> <!-- Put site-specific property overrides in this
file. -->
> 
> <configuration>
> <property>
>          <name>fs.default.name</name>
>         
<value>192.168.1.150:50000</value>
>     </property>
> 
>     <property>
>          <name>mapred.job.tracker</name>
>         
<value>192.168.1.150:50001</value>
>      </property>
>         
>     <property>
>          <name>dfs.replication</name>
>          <value>2</value>
>     </property>
> </configuration>
> 
> SLAVE
> 
> 1) conf/slaves
> localhost
> 
> 2) conf/master
> hadoop192.168.1.150
> 
> 3) conf/hadoop-site.xml
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl"
href="configuration.xsl"?>
> 
> <!-- Put site-specific property overrides in this
file. -->
> 
> <configuration>
> <property>
>          <name>fs.default.name</name>
>         
<value>192.168.1.150:50000</value>
>     </property>
> 
>     <property>
>          <name>mapred.job.tracker</name>
>         
<value>192.168.1.150:50001</value>
>      </property>
>         
>     <property>
>          <name>dfs.replication</name>
>          <value>2</value>
>     </property>
> </configuration>
> 
> 
> CONSOLE OUTPUT
> bin/hadoop jar hadoop-*-examples.jar wordcount -m 10 -r
2 input output
> 07/03/06 23:17:17 INFO mapred.InputFormatBase: Total
input paths to
> process
> : 1
> 07/03/06 23:17:18 INFO mapred.JobClient: Running job:
job_0001
> 07/03/06 23:17:19 INFO mapred.JobClient:  map 0% reduce
0%
> 07/03/06 23:17:29 INFO mapred.JobClient:  map 20%
reduce 0%
> 07/03/06 23:17:30 INFO mapred.JobClient:  map 40%
reduce 0%
> 07/03/06 23:17:32 INFO mapred.JobClient:  map 80%
reduce 0%
> 07/03/06 23:17:33 INFO mapred.JobClient:  map 100%
reduce 0%
> 07/03/06 23:17:42 INFO mapred.JobClient:  map 100%
reduce 3%
> 07/03/06 23:17:43 INFO mapred.JobClient:  map 100%
reduce 5%
> 07/03/06 23:17:44 INFO mapred.JobClient:  map 100%
reduce 8%
> 07/03/06 23:17:52 INFO mapred.JobClient:  map 100%
reduce 10%
> 07/03/06 23:17:53 INFO mapred.JobClient:  map 100%
reduce 13%
> 07/03/06 23:18:03 INFO mapred.JobClient:  map 100%
reduce 16%
> 
> 
> The only exception I can see from the log files is in
the 'TaskTracker'
> log
> file:
> 
> 2007-03-06 23:17:32,214 INFO
org.apache.hadoop.mapred.TaskRunner:
> task_0001_r_000000_0 Copying task_0001_m_000002_0
output from traal.
> 2007-03-06 23:17:32,221 INFO
org.apache.hadoop.mapred.TaskRunner:
> task_0001_r_000000_0 Copying task_0001_m_000001_0
output from
> dennis-laptop.
> 2007-03-06 23:17:32,368 WARN
org.apache.hadoop.mapred.TaskRunner:
> task_0001_r_000000_0 copy failed: task_0001_m_000002_0
from traal
> 2007-03-06 23:17:32,368 WARN
org.apache.hadoop.mapred.TaskRunner:
> java.io.IOException: File
>
/tmp/hadoop-hadoop/mapred/local/task_0001_r_000000_0/map_2.o
ut-0 not
> created
> at
>
org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.co
pyOutput(ReduceT
> askRunner.java:301)
> at
>
org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.ru
n(ReduceTaskRunn
> er.java:262)
> 
> 2007-03-06 23:17:32,369 WARN
org.apache.hadoop.mapred.TaskRunner:
> task_0001_r_000000_0 adding host traal to penalty box,
next contact in 99
> seconds
> 
> I am attaching the master log files just in case anyone
wants to check
> them.
> 
> Any help will be greatly appreciated! 
> 
> -gaurav
> 
> http://www.nabble.com/file/7013/hadoo
p-hadoop-tasktracker-dennis-laptop.log
> hadoop-hadoop-tasktracker-dennis-laptop.log
</br>
> http://www.nabble.com/file/7012/hadoop
-hadoop-jobtracker-dennis-laptop.log
> hadoop-hadoop-jobtracker-dennis-laptop.log </br>
> http://www.nabble.com/file/7011/hadoop-h
adoop-namenode-dennis-laptop.log
> hadoop-hadoop-namenode-dennis-laptop.log </br>
> http://www.nabble.com/file/7010/hadoop-h
adoop-datanode-dennis-laptop.log
> hadoop-hadoop-datanode-dennis-laptop.log 
> -- 
> View this message in context:
> http://www.nabble.com/Hadoop-%27word
count%27-program-hanging-in-the-Reduce-p
> hase.-tf3360661.html#a9348424
> Sent from the Hadoop Users mailing list archive at
Nabble.com.
> 
> 

-- 
View this message in context: http://
www.nabble.com/Hadoop-%27wordcount%27-program-hanging-in-the
-Reduce-phase.-tf3360661.html#a9362461
Sent from the Hadoop Users mailing list archive at
Nabble.com.


[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )