List Info

Thread: Hadoop only processing the first 64 meg block of a 2 gig file




Hadoop only processing the first 64 meg block of a 2 gig file
country flaguser name
United States
2008-01-18 09:46:51
Hello,

 

I'm trying to get Hadoop to process a 2 gig file but it
seems to only be
processing the first block.  I'm running the exact Hadoop
vmware image
that is available here http:
//dl.google.com/edutools/hadoop-vmware.zip
without any tweaks or modifications to it.  I think my file
has been
properly loaded into HDFS (hdfs reports it as having 
2270607035 bytes)
but when I run the example wordcount task it only seems to
operate on
the first 64 meg chunk (Map input bytes is reported as
67239230 when the
job completes).  Is the image setup to only run the first
block, and if
so how to I change this so it runs over the whole file?  Any
help would
be greatly appreciated.

 

Thanks,

 

--Matt

 

P.S.  Here are the commands I've actually run to verify that
the file is
in the hdfs and to run the wordcount example along with
their output:

 

hadoop dfs -ls /clickdir 

Found 1 items

/clickdir/cf709.txt     <r 1>   2270607035

 

hadoop jar hadoop-examples.jar wordcount /clickdir
/wordTEST3

08/01/18 00:18:59 INFO mapred.FileInputFormat: Total input
paths to
process : 1

08/01/18 00:19:00 INFO mapred.JobClient: Running job:
job_0023

08/01/18 00:19:01 INFO mapred.JobClient:  map 0% reduce 0%

08/01/18 00:19:28 INFO mapred.JobClient:  map 2% reduce 0%

08/01/18 00:19:34 INFO mapred.JobClient:  map 3% reduce 0%

08/01/18 00:19:37 INFO mapred.JobClient:  map 5% reduce 0%

08/01/18 00:19:43 INFO mapred.JobClient:  map 6% reduce 1%

08/01/18 00:19:45 INFO mapred.JobClient:  map 9% reduce 1%

08/01/18 00:19:54 INFO mapred.JobClient:  map 12% reduce 2%

08/01/18 00:20:02 INFO mapred.JobClient:  map 15% reduce 3%

08/01/18 00:20:11 INFO mapred.JobClient:  map 18% reduce 4%

08/01/18 00:20:19 INFO mapred.JobClient:  map 21% reduce 4%

08/01/18 00:20:25 INFO mapred.JobClient:  map 21% reduce 6%

08/01/18 00:20:26 INFO mapred.JobClient:  map 24% reduce 6%

08/01/18 00:20:34 INFO mapred.JobClient:  map 27% reduce 7%

08/01/18 00:20:45 INFO mapred.JobClient:  map 27% reduce 8%

08/01/18 00:20:46 INFO mapred.JobClient:  map 30% reduce 8%

08/01/18 00:20:54 INFO mapred.JobClient:  map 33% reduce 8%

08/01/18 00:20:56 INFO mapred.JobClient:  map 33% reduce 9%

08/01/18 00:21:03 INFO mapred.JobClient:  map 36% reduce
10%

08/01/18 00:21:11 INFO mapred.JobClient:  map 39% reduce
11%

08/01/18 00:21:19 INFO mapred.JobClient:  map 41% reduce
12%

08/01/18 00:21:25 INFO mapred.JobClient:  map 44% reduce
13%

08/01/18 00:21:31 INFO mapred.JobClient:  map 47% reduce
13%

08/01/18 00:21:36 INFO mapred.JobClient:  map 50% reduce
14%

08/01/18 00:21:42 INFO mapred.JobClient:  map 53% reduce
16%

08/01/18 00:21:47 INFO mapred.JobClient:  map 56% reduce
16%

08/01/18 00:21:52 INFO mapred.JobClient:  map 59% reduce
17%

08/01/18 00:21:56 INFO mapred.JobClient:  map 62% reduce
18%

08/01/18 00:22:01 INFO mapred.JobClient:  map 65% reduce
19%

08/01/18 00:22:06 INFO mapred.JobClient:  map 68% reduce
20%

08/01/18 00:22:11 INFO mapred.JobClient:  map 71% reduce
20%

08/01/18 00:22:15 INFO mapred.JobClient:  map 74% reduce
22%

08/01/18 00:22:20 INFO mapred.JobClient:  map 77% reduce
24%

08/01/18 00:22:25 INFO mapred.JobClient:  map 80% reduce
24%

08/01/18 00:22:30 INFO mapred.JobClient:  map 83% reduce
25%

08/01/18 00:22:35 INFO mapred.JobClient:  map 86% reduce
27%

08/01/18 00:22:40 INFO mapred.JobClient:  map 89% reduce
28%

08/01/18 00:22:45 INFO mapred.JobClient:  map 89% reduce
29%

08/01/18 00:22:46 INFO mapred.JobClient:  map 91% reduce
29%

08/01/18 00:22:51 INFO mapred.JobClient:  map 94% reduce
30%

08/01/18 00:22:56 INFO mapred.JobClient:  map 97% reduce
30%

08/01/18 00:23:06 INFO mapred.JobClient:  map 98% reduce
32%

08/01/18 00:25:06 INFO mapred.JobClient:  map 99% reduce
32%

08/01/18 00:26:16 INFO mapred.JobClient:  map 100% reduce
32%

08/01/18 00:27:08 INFO mapred.JobClient:  map 100% reduce
66%

08/01/18 00:27:16 INFO mapred.JobClient:  map 100% reduce
71%

08/01/18 00:27:27 INFO mapred.JobClient:  map 100% reduce
77%

08/01/18 00:27:28 INFO mapred.JobClient:  map 100% reduce
78%

08/01/18 00:27:37 INFO mapred.JobClient:  map 100% reduce
100%

08/01/18 00:27:38 INFO mapred.JobClient: Job complete:
job_0023

08/01/18 00:27:38 INFO mapred.JobClient: Counters: 11

08/01/18 00:27:38 INFO mapred.JobClient:
org.apache.hadoop.examples.WordCount$Counter

08/01/18 00:27:38 INFO mapred.JobClient:     WORDS=13050362

08/01/18 00:27:38 INFO mapred.JobClient:    
VALUES=13976767

08/01/18 00:27:38 INFO mapred.JobClient:   Map-Reduce
Framework

08/01/18 00:27:38 INFO mapred.JobClient:     Map input
records=277434

08/01/18 00:27:38 INFO mapred.JobClient:     Map output
records=13050362

08/01/18 00:27:38 INFO mapred.JobClient:     Map input
bytes=67239230

08/01/18 00:27:38 INFO mapred.JobClient:     Map output
bytes=118620427

08/01/18 00:27:38 INFO mapred.JobClient:     Combine input
records=13050362

08/01/18 00:27:38 INFO mapred.JobClient:     Combine output
records=926405

08/01/18 00:27:38 INFO mapred.JobClient:     Reduce input
groups=709097

08/01/18 00:27:38 INFO mapred.JobClient:     Reduce input
records=926405

08/01/18 00:27:38 INFO mapred.JobClient:     Reduce output
records=709097

Re: Hadoop only processing the first 64 meg block of a 2 gig file
country flaguser name
United States
2008-01-18 10:23:06
Go into the web interface and look at the file.

See if you can see all of the blocks.


On 1/18/08 7:46 AM, "Matt Herndon"
<mherndonintwine.com> wrote:

> Hello,
> 
>  
> 
> I'm trying to get Hadoop to process a 2 gig file but it
seems to only be
> processing the first block.  I'm running the exact
Hadoop vmware image
> that is available here http:
//dl.google.com/edutools/hadoop-vmware.zip
> without any tweaks or modifications to it.  I think my
file has been
> properly loaded into HDFS (hdfs reports it as having 
2270607035 bytes)
> but when I run the example wordcount task it only seems
to operate on
> the first 64 meg chunk (Map input bytes is reported as
67239230 when the
> job completes).  Is the image setup to only run the
first block, and if
> so how to I change this so it runs over the whole file?
 Any help would
> be greatly appreciated.
> 
>  
> 
> Thanks,
> 
>  
> 
> --Matt
> 
>  
> 
> P.S.  Here are the commands I've actually run to verify
that the file is
> in the hdfs and to run the wordcount example along with
their output:
> 
>  
> 
> hadoop dfs -ls /clickdir
> 
> Found 1 items
> 
> /clickdir/cf709.txt     <r 1>   2270607035
> 
>  
> 
> hadoop jar hadoop-examples.jar wordcount /clickdir
/wordTEST3
> 
> 08/01/18 00:18:59 INFO mapred.FileInputFormat: Total
input paths to
> process : 1
> 
> 08/01/18 00:19:00 INFO mapred.JobClient: Running job:
job_0023
> 
> 08/01/18 00:19:01 INFO mapred.JobClient:  map 0% reduce
0%
> 
> 08/01/18 00:19:28 INFO mapred.JobClient:  map 2% reduce
0%
> 
> 08/01/18 00:19:34 INFO mapred.JobClient:  map 3% reduce
0%
> 
> 08/01/18 00:19:37 INFO mapred.JobClient:  map 5% reduce
0%
> 
> 08/01/18 00:19:43 INFO mapred.JobClient:  map 6% reduce
1%
> 
> 08/01/18 00:19:45 INFO mapred.JobClient:  map 9% reduce
1%
> 
> 08/01/18 00:19:54 INFO mapred.JobClient:  map 12%
reduce 2%
> 
> 08/01/18 00:20:02 INFO mapred.JobClient:  map 15%
reduce 3%
> 
> 08/01/18 00:20:11 INFO mapred.JobClient:  map 18%
reduce 4%
> 
> 08/01/18 00:20:19 INFO mapred.JobClient:  map 21%
reduce 4%
> 
> 08/01/18 00:20:25 INFO mapred.JobClient:  map 21%
reduce 6%
> 
> 08/01/18 00:20:26 INFO mapred.JobClient:  map 24%
reduce 6%
> 
> 08/01/18 00:20:34 INFO mapred.JobClient:  map 27%
reduce 7%
> 
> 08/01/18 00:20:45 INFO mapred.JobClient:  map 27%
reduce 8%
> 
> 08/01/18 00:20:46 INFO mapred.JobClient:  map 30%
reduce 8%
> 
> 08/01/18 00:20:54 INFO mapred.JobClient:  map 33%
reduce 8%
> 
> 08/01/18 00:20:56 INFO mapred.JobClient:  map 33%
reduce 9%
> 
> 08/01/18 00:21:03 INFO mapred.JobClient:  map 36%
reduce 10%
> 
> 08/01/18 00:21:11 INFO mapred.JobClient:  map 39%
reduce 11%
> 
> 08/01/18 00:21:19 INFO mapred.JobClient:  map 41%
reduce 12%
> 
> 08/01/18 00:21:25 INFO mapred.JobClient:  map 44%
reduce 13%
> 
> 08/01/18 00:21:31 INFO mapred.JobClient:  map 47%
reduce 13%
> 
> 08/01/18 00:21:36 INFO mapred.JobClient:  map 50%
reduce 14%
> 
> 08/01/18 00:21:42 INFO mapred.JobClient:  map 53%
reduce 16%
> 
> 08/01/18 00:21:47 INFO mapred.JobClient:  map 56%
reduce 16%
> 
> 08/01/18 00:21:52 INFO mapred.JobClient:  map 59%
reduce 17%
> 
> 08/01/18 00:21:56 INFO mapred.JobClient:  map 62%
reduce 18%
> 
> 08/01/18 00:22:01 INFO mapred.JobClient:  map 65%
reduce 19%
> 
> 08/01/18 00:22:06 INFO mapred.JobClient:  map 68%
reduce 20%
> 
> 08/01/18 00:22:11 INFO mapred.JobClient:  map 71%
reduce 20%
> 
> 08/01/18 00:22:15 INFO mapred.JobClient:  map 74%
reduce 22%
> 
> 08/01/18 00:22:20 INFO mapred.JobClient:  map 77%
reduce 24%
> 
> 08/01/18 00:22:25 INFO mapred.JobClient:  map 80%
reduce 24%
> 
> 08/01/18 00:22:30 INFO mapred.JobClient:  map 83%
reduce 25%
> 
> 08/01/18 00:22:35 INFO mapred.JobClient:  map 86%
reduce 27%
> 
> 08/01/18 00:22:40 INFO mapred.JobClient:  map 89%
reduce 28%
> 
> 08/01/18 00:22:45 INFO mapred.JobClient:  map 89%
reduce 29%
> 
> 08/01/18 00:22:46 INFO mapred.JobClient:  map 91%
reduce 29%
> 
> 08/01/18 00:22:51 INFO mapred.JobClient:  map 94%
reduce 30%
> 
> 08/01/18 00:22:56 INFO mapred.JobClient:  map 97%
reduce 30%
> 
> 08/01/18 00:23:06 INFO mapred.JobClient:  map 98%
reduce 32%
> 
> 08/01/18 00:25:06 INFO mapred.JobClient:  map 99%
reduce 32%
> 
> 08/01/18 00:26:16 INFO mapred.JobClient:  map 100%
reduce 32%
> 
> 08/01/18 00:27:08 INFO mapred.JobClient:  map 100%
reduce 66%
> 
> 08/01/18 00:27:16 INFO mapred.JobClient:  map 100%
reduce 71%
> 
> 08/01/18 00:27:27 INFO mapred.JobClient:  map 100%
reduce 77%
> 
> 08/01/18 00:27:28 INFO mapred.JobClient:  map 100%
reduce 78%
> 
> 08/01/18 00:27:37 INFO mapred.JobClient:  map 100%
reduce 100%
> 
> 08/01/18 00:27:38 INFO mapred.JobClient: Job complete:
job_0023
> 
> 08/01/18 00:27:38 INFO mapred.JobClient: Counters: 11
> 
> 08/01/18 00:27:38 INFO mapred.JobClient:
> org.apache.hadoop.examples.WordCount$Counter
> 
> 08/01/18 00:27:38 INFO mapred.JobClient:    
WORDS=13050362
> 
> 08/01/18 00:27:38 INFO mapred.JobClient:    
VALUES=13976767
> 
> 08/01/18 00:27:38 INFO mapred.JobClient:   Map-Reduce
Framework
> 
> 08/01/18 00:27:38 INFO mapred.JobClient:     Map input
records=277434
> 
> 08/01/18 00:27:38 INFO mapred.JobClient:     Map output
records=13050362
> 
> 08/01/18 00:27:38 INFO mapred.JobClient:     Map input
bytes=67239230
> 
> 08/01/18 00:27:38 INFO mapred.JobClient:     Map output
bytes=118620427
> 
> 08/01/18 00:27:38 INFO mapred.JobClient:     Combine
input
> records=13050362
> 
> 08/01/18 00:27:38 INFO mapred.JobClient:     Combine
output
> records=926405
> 
> 08/01/18 00:27:38 INFO mapred.JobClient:     Reduce
input groups=709097
> 
> 08/01/18 00:27:38 INFO mapred.JobClient:     Reduce
input records=926405
> 
> 08/01/18 00:27:38 INFO mapred.JobClient:     Reduce
output
> records=709097
> 


RE: Hadoop only processing the first 64 meg block of a 2 gig file
country flaguser name
United States
2008-01-18 10:37:49
Yep, I can see all 34 blocks and view chunks of actual data
from each
using the web interface (quite a nifty tool).  Any other
suggestions?

--Matt

-----Original Message-----
From: Ted Dunning [mailto:tdunningveoh.com] 
Sent: Friday, January 18, 2008 11:23 AM
To: hadoop-userlucene.apache.org
Subject: Re: Hadoop only processing the first 64 meg block
of a 2 gig
file


Go into the web interface and look at the file.

See if you can see all of the blocks.


On 1/18/08 7:46 AM, "Matt Herndon"
<mherndonintwine.com> wrote:

> Hello,
> 
>  
> 
> I'm trying to get Hadoop to process a 2 gig file but it
seems to only
be
> processing the first block.  I'm running the exact
Hadoop vmware image
> that is available here http:
//dl.google.com/edutools/hadoop-vmware.zip
> without any tweaks or modifications to it.  I think my
file has been
> properly loaded into HDFS (hdfs reports it as having 
2270607035
bytes)
> but when I run the example wordcount task it only seems
to operate on
> the first 64 meg chunk (Map input bytes is reported as
67239230 when
the
> job completes).  Is the image setup to only run the
first block, and
if
> so how to I change this so it runs over the whole file?
 Any help
would
> be greatly appreciated.
> 
>  
> 
> Thanks,
> 
>  
> 
> --Matt
> 
>  
> 
> P.S.  Here are the commands I've actually run to verify
that the file
is
> in the hdfs and to run the wordcount example along with
their output:
> 
>  
> 
> hadoop dfs -ls /clickdir
> 
> Found 1 items
> 
> /clickdir/cf709.txt     <r 1>   2270607035
> 
>  
> 
> hadoop jar hadoop-examples.jar wordcount /clickdir
/wordTEST3
> 
> 08/01/18 00:18:59 INFO mapred.FileInputFormat: Total
input paths to
> process : 1
> 
> 08/01/18 00:19:00 INFO mapred.JobClient: Running job:
job_0023
> 
> 08/01/18 00:19:01 INFO mapred.JobClient:  map 0% reduce
0%
> 
> 08/01/18 00:19:28 INFO mapred.JobClient:  map 2% reduce
0%
> 
> 08/01/18 00:19:34 INFO mapred.JobClient:  map 3% reduce
0%
> 
> 08/01/18 00:19:37 INFO mapred.JobClient:  map 5% reduce
0%
> 
> 08/01/18 00:19:43 INFO mapred.JobClient:  map 6% reduce
1%
> 
> 08/01/18 00:19:45 INFO mapred.JobClient:  map 9% reduce
1%
> 
> 08/01/18 00:19:54 INFO mapred.JobClient:  map 12%
reduce 2%
> 
> 08/01/18 00:20:02 INFO mapred.JobClient:  map 15%
reduce 3%
> 
> 08/01/18 00:20:11 INFO mapred.JobClient:  map 18%
reduce 4%
> 
> 08/01/18 00:20:19 INFO mapred.JobClient:  map 21%
reduce 4%
> 
> 08/01/18 00:20:25 INFO mapred.JobClient:  map 21%
reduce 6%
> 
> 08/01/18 00:20:26 INFO mapred.JobClient:  map 24%
reduce 6%
> 
> 08/01/18 00:20:34 INFO mapred.JobClient:  map 27%
reduce 7%
> 
> 08/01/18 00:20:45 INFO mapred.JobClient:  map 27%
reduce 8%
> 
> 08/01/18 00:20:46 INFO mapred.JobClient:  map 30%
reduce 8%
> 
> 08/01/18 00:20:54 INFO mapred.JobClient:  map 33%
reduce 8%
> 
> 08/01/18 00:20:56 INFO mapred.JobClient:  map 33%
reduce 9%
> 
> 08/01/18 00:21:03 INFO mapred.JobClient:  map 36%
reduce 10%
> 
> 08/01/18 00:21:11 INFO mapred.JobClient:  map 39%
reduce 11%
> 
> 08/01/18 00:21:19 INFO mapred.JobClient:  map 41%
reduce 12%
> 
> 08/01/18 00:21:25 INFO mapred.JobClient:  map 44%
reduce 13%
> 
> 08/01/18 00:21:31 INFO mapred.JobClient:  map 47%
reduce 13%
> 
> 08/01/18 00:21:36 INFO mapred.JobClient:  map 50%
reduce 14%
> 
> 08/01/18 00:21:42 INFO mapred.JobClient:  map 53%
reduce 16%
> 
> 08/01/18 00:21:47 INFO mapred.JobClient:  map 56%
reduce 16%
> 
> 08/01/18 00:21:52 INFO mapred.JobClient:  map 59%
reduce 17%
> 
> 08/01/18 00:21:56 INFO mapred.JobClient:  map 62%
reduce 18%
> 
> 08/01/18 00:22:01 INFO mapred.JobClient:  map 65%
reduce 19%
> 
> 08/01/18 00:22:06 INFO mapred.JobClient:  map 68%
reduce 20%
> 
> 08/01/18 00:22:11 INFO mapred.JobClient:  map 71%
reduce 20%
> 
> 08/01/18 00:22:15 INFO mapred.JobClient:  map 74%
reduce 22%
> 
> 08/01/18 00:22:20 INFO mapred.JobClient:  map 77%
reduce 24%
> 
> 08/01/18 00:22:25 INFO mapred.JobClient:  map 80%
reduce 24%
> 
> 08/01/18 00:22:30 INFO mapred.JobClient:  map 83%
reduce 25%
> 
> 08/01/18 00:22:35 INFO mapred.JobClient:  map 86%
reduce 27%
> 
> 08/01/18 00:22:40 INFO mapred.JobClient:  map 89%
reduce 28%
> 
> 08/01/18 00:22:45 INFO mapred.JobClient:  map 89%
reduce 29%
> 
> 08/01/18 00:22:46 INFO mapred.JobClient:  map 91%
reduce 29%
> 
> 08/01/18 00:22:51 INFO mapred.JobClient:  map 94%
reduce 30%
> 
> 08/01/18 00:22:56 INFO mapred.JobClient:  map 97%
reduce 30%
> 
> 08/01/18 00:23:06 INFO mapred.JobClient:  map 98%
reduce 32%
> 
> 08/01/18 00:25:06 INFO mapred.JobClient:  map 99%
reduce 32%
> 
> 08/01/18 00:26:16 INFO mapred.JobClient:  map 100%
reduce 32%
> 
> 08/01/18 00:27:08 INFO mapred.JobClient:  map 100%
reduce 66%
> 
> 08/01/18 00:27:16 INFO mapred.JobClient:  map 100%
reduce 71%
> 
> 08/01/18 00:27:27 INFO mapred.JobClient:  map 100%
reduce 77%
> 
> 08/01/18 00:27:28 INFO mapred.JobClient:  map 100%
reduce 78%
> 
> 08/01/18 00:27:37 INFO mapred.JobClient:  map 100%
reduce 100%
> 
> 08/01/18 00:27:38 INFO mapred.JobClient: Job complete:
job_0023
> 
> 08/01/18 00:27:38 INFO mapred.JobClient: Counters: 11
> 
> 08/01/18 00:27:38 INFO mapred.JobClient:
> org.apache.hadoop.examples.WordCount$Counter
> 
> 08/01/18 00:27:38 INFO mapred.JobClient:    
WORDS=13050362
> 
> 08/01/18 00:27:38 INFO mapred.JobClient:    
VALUES=13976767
> 
> 08/01/18 00:27:38 INFO mapred.JobClient:   Map-Reduce
Framework
> 
> 08/01/18 00:27:38 INFO mapred.JobClient:     Map input
records=277434
> 
> 08/01/18 00:27:38 INFO mapred.JobClient:     Map
output
records=13050362
> 
> 08/01/18 00:27:38 INFO mapred.JobClient:     Map input
bytes=67239230
> 
> 08/01/18 00:27:38 INFO mapred.JobClient:     Map
output
bytes=118620427
> 
> 08/01/18 00:27:38 INFO mapred.JobClient:     Combine
input
> records=13050362
> 
> 08/01/18 00:27:38 INFO mapred.JobClient:     Combine
output
> records=926405
> 
> 08/01/18 00:27:38 INFO mapred.JobClient:     Reduce
input
groups=709097
> 
> 08/01/18 00:27:38 INFO mapred.JobClient:     Reduce
input
records=926405
> 
> 08/01/18 00:27:38 INFO mapred.JobClient:     Reduce
output
> records=709097
> 


[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )