Yep, I can see all 34 blocks and view chunks of actual data
from each
using the web interface (quite a nifty tool). Any other
suggestions?
--Matt
-----Original Message-----
From: Ted Dunning [mailto:tdunning veoh.com]
Sent: Friday, January 18, 2008 11:23 AM
To: hadoop-user lucene.apache.org
Subject: Re: Hadoop only processing the first 64 meg block
of a 2 gig
file
Go into the web interface and look at the file.
See if you can see all of the blocks.
On 1/18/08 7:46 AM, "Matt Herndon"
<mherndon intwine.com> wrote:
> Hello,
>
>
>
> I'm trying to get Hadoop to process a 2 gig file but it
seems to only
be
> processing the first block. I'm running the exact
Hadoop vmware image
> that is available here http:
//dl.google.com/edutools/hadoop-vmware.zip
> without any tweaks or modifications to it. I think my
file has been
> properly loaded into HDFS (hdfs reports it as having
2270607035
bytes)
> but when I run the example wordcount task it only seems
to operate on
> the first 64 meg chunk (Map input bytes is reported as
67239230 when
the
> job completes). Is the image setup to only run the
first block, and
if
> so how to I change this so it runs over the whole file?
Any help
would
> be greatly appreciated.
>
>
>
> Thanks,
>
>
>
> --Matt
>
>
>
> P.S. Here are the commands I've actually run to verify
that the file
is
> in the hdfs and to run the wordcount example along with
their output:
>
>
>
> hadoop dfs -ls /clickdir
>
> Found 1 items
>
> /clickdir/cf709.txt <r 1> 2270607035
>
>
>
> hadoop jar hadoop-examples.jar wordcount /clickdir
/wordTEST3
>
> 08/01/18 00:18:59 INFO mapred.FileInputFormat: Total
input paths to
> process : 1
>
> 08/01/18 00:19:00 INFO mapred.JobClient: Running job:
job_0023
>
> 08/01/18 00:19:01 INFO mapred.JobClient: map 0% reduce
0%
>
> 08/01/18 00:19:28 INFO mapred.JobClient: map 2% reduce
0%
>
> 08/01/18 00:19:34 INFO mapred.JobClient: map 3% reduce
0%
>
> 08/01/18 00:19:37 INFO mapred.JobClient: map 5% reduce
0%
>
> 08/01/18 00:19:43 INFO mapred.JobClient: map 6% reduce
1%
>
> 08/01/18 00:19:45 INFO mapred.JobClient: map 9% reduce
1%
>
> 08/01/18 00:19:54 INFO mapred.JobClient: map 12%
reduce 2%
>
> 08/01/18 00:20:02 INFO mapred.JobClient: map 15%
reduce 3%
>
> 08/01/18 00:20:11 INFO mapred.JobClient: map 18%
reduce 4%
>
> 08/01/18 00:20:19 INFO mapred.JobClient: map 21%
reduce 4%
>
> 08/01/18 00:20:25 INFO mapred.JobClient: map 21%
reduce 6%
>
> 08/01/18 00:20:26 INFO mapred.JobClient: map 24%
reduce 6%
>
> 08/01/18 00:20:34 INFO mapred.JobClient: map 27%
reduce 7%
>
> 08/01/18 00:20:45 INFO mapred.JobClient: map 27%
reduce 8%
>
> 08/01/18 00:20:46 INFO mapred.JobClient: map 30%
reduce 8%
>
> 08/01/18 00:20:54 INFO mapred.JobClient: map 33%
reduce 8%
>
> 08/01/18 00:20:56 INFO mapred.JobClient: map 33%
reduce 9%
>
> 08/01/18 00:21:03 INFO mapred.JobClient: map 36%
reduce 10%
>
> 08/01/18 00:21:11 INFO mapred.JobClient: map 39%
reduce 11%
>
> 08/01/18 00:21:19 INFO mapred.JobClient: map 41%
reduce 12%
>
> 08/01/18 00:21:25 INFO mapred.JobClient: map 44%
reduce 13%
>
> 08/01/18 00:21:31 INFO mapred.JobClient: map 47%
reduce 13%
>
> 08/01/18 00:21:36 INFO mapred.JobClient: map 50%
reduce 14%
>
> 08/01/18 00:21:42 INFO mapred.JobClient: map 53%
reduce 16%
>
> 08/01/18 00:21:47 INFO mapred.JobClient: map 56%
reduce 16%
>
> 08/01/18 00:21:52 INFO mapred.JobClient: map 59%
reduce 17%
>
> 08/01/18 00:21:56 INFO mapred.JobClient: map 62%
reduce 18%
>
> 08/01/18 00:22:01 INFO mapred.JobClient: map 65%
reduce 19%
>
> 08/01/18 00:22:06 INFO mapred.JobClient: map 68%
reduce 20%
>
> 08/01/18 00:22:11 INFO mapred.JobClient: map 71%
reduce 20%
>
> 08/01/18 00:22:15 INFO mapred.JobClient: map 74%
reduce 22%
>
> 08/01/18 00:22:20 INFO mapred.JobClient: map 77%
reduce 24%
>
> 08/01/18 00:22:25 INFO mapred.JobClient: map 80%
reduce 24%
>
> 08/01/18 00:22:30 INFO mapred.JobClient: map 83%
reduce 25%
>
> 08/01/18 00:22:35 INFO mapred.JobClient: map 86%
reduce 27%
>
> 08/01/18 00:22:40 INFO mapred.JobClient: map 89%
reduce 28%
>
> 08/01/18 00:22:45 INFO mapred.JobClient: map 89%
reduce 29%
>
> 08/01/18 00:22:46 INFO mapred.JobClient: map 91%
reduce 29%
>
> 08/01/18 00:22:51 INFO mapred.JobClient: map 94%
reduce 30%
>
> 08/01/18 00:22:56 INFO mapred.JobClient: map 97%
reduce 30%
>
> 08/01/18 00:23:06 INFO mapred.JobClient: map 98%
reduce 32%
>
> 08/01/18 00:25:06 INFO mapred.JobClient: map 99%
reduce 32%
>
> 08/01/18 00:26:16 INFO mapred.JobClient: map 100%
reduce 32%
>
> 08/01/18 00:27:08 INFO mapred.JobClient: map 100%
reduce 66%
>
> 08/01/18 00:27:16 INFO mapred.JobClient: map 100%
reduce 71%
>
> 08/01/18 00:27:27 INFO mapred.JobClient: map 100%
reduce 77%
>
> 08/01/18 00:27:28 INFO mapred.JobClient: map 100%
reduce 78%
>
> 08/01/18 00:27:37 INFO mapred.JobClient: map 100%
reduce 100%
>
> 08/01/18 00:27:38 INFO mapred.JobClient: Job complete:
job_0023
>
> 08/01/18 00:27:38 INFO mapred.JobClient: Counters: 11
>
> 08/01/18 00:27:38 INFO mapred.JobClient:
> org.apache.hadoop.examples.WordCount$Counter
>
> 08/01/18 00:27:38 INFO mapred.JobClient:
WORDS=13050362
>
> 08/01/18 00:27:38 INFO mapred.JobClient:
VALUES=13976767
>
> 08/01/18 00:27:38 INFO mapred.JobClient: Map-Reduce
Framework
>
> 08/01/18 00:27:38 INFO mapred.JobClient: Map input
records=277434
>
> 08/01/18 00:27:38 INFO mapred.JobClient: Map
output
records=13050362
>
> 08/01/18 00:27:38 INFO mapred.JobClient: Map input
bytes=67239230
>
> 08/01/18 00:27:38 INFO mapred.JobClient: Map
output
bytes=118620427
>
> 08/01/18 00:27:38 INFO mapred.JobClient: Combine
input
> records=13050362
>
> 08/01/18 00:27:38 INFO mapred.JobClient: Combine
output
> records=926405
>
> 08/01/18 00:27:38 INFO mapred.JobClient: Reduce
input
groups=709097
>
> 08/01/18 00:27:38 INFO mapred.JobClient: Reduce
input
records=926405
>
> 08/01/18 00:27:38 INFO mapred.JobClient: Reduce
output
> records=709097
>
|