List Info

Thread: Re: µªÎ`: HBase PerformanceEvaluation failing




Re: µªÎ`: HBase PerformanceEvaluation failing
country flaguser name
United States
2007-11-24 21:56:02
Can you clear the logs, clean out the file system and run
the test again?

The namenode logs should tell an interesting story.


On 11/24/07 6:12 PM, "Kareem Dana"
<kareem.danagmail.com> wrote:

> I ran hadoop fsck and sure enough the DFS was
corrupted. It seems that
> the PerformanceEvaluation test is corrupting it. Before
I run the
> test, I ran fsck and the DFS was reported as HEALTHY.
Once the PE
> fails, the DFS is reported as corrupt. I tried to
simplify my setup
> and run the PE again. My new config is as follows:
> 
> hadoop07 - DFS Master, Mapred master, hbase master
> hadoop09-10 - 2 hbase region servers
> hadoop11-12 - 2 datanodes, task trackers
> 
> mapred.map.tasks = 2
> mapred.reduce.tasks = 1
> dfs.replication = 1
> 
> I ran the distributed PE in that configuration and it
still failed
> with similar errors. The output of the hadoop fsck for
this run was:
> 
> ..........
> 
/tmp/hadoop-kcd/hbase/hregion_.META.,,1/info/mapfiles/643488
1831082231493/data>
:
> MISSING 1 blocks of total size 0 B.
> ......................................
>
/tmp/hadoop-kcd/hbase/hregion_TestTable,11566878,12270926815
44002579/info/mapf
> iles/5263238643231358600/data:
> MISSING 1 blocks of total size 0 B.
> ....
>
/tmp/hadoop-kcd/hbase/hregion_TestTable,12612310,16520624110
16999689/info/mapf
> iles/2024298319068625138/data:
> MISSING 1 blocks of total size 0 B.
> ....
>
/tmp/hadoop-kcd/hbase/hregion_TestTable,12612310,16520624110
16999689/info/mapf
> iles/5071453667327337040/data:
> MISSING 1 blocks of total size 0 B.
> .........
>
/tmp/hadoop-kcd/hbase/hregion_TestTable,13932,47381927475213
22482/info/mapfile
> s/4400784113695734765/data:
> MISSING 1 blocks of total size 0 B.
> ...................................
>
............................................................
............
>
/tmp/hadoop-kcd/hbase/log_172.16.6.56_-1823376333333123807_6
0020/hlog.dat.027:
> MISSING 1 blocks of total size 0 B.
> .Status: CORRUPT
>  Total size:    1890454330 B
>  Total blocks:  180 (avg. block size 10502524 B)
>  Total dirs:    190
>  Total files:   173
>  Over-replicated blocks:        0 (0.0 %)
>  Under-replicated blocks:       0 (0.0 %)
>  Target replication factor:     1
>  Real replication factor:       1.0
> 
> 
> The filesystem under path '/' is CORRUPT
> 
> 
> On Nov 24, 2007 6:21 PM, Ted Dunning <tdunningveoh.com> wrote:
>> 
>> I think that stack was suggesting an HDFS fsck, not
a disk level fsck.
>> 
>> Try [hadoop fsck /]
>> 
>> 
>> 
>> 
>> 
>> On 11/24/07 4:09 PM, "Kareem Dana"
<kareem.danagmail.com> wrote:
>> 
>>> I do not have root access on the xen cluster
I'm using. I will ask the
>>> admin to make sure the disk is working
properly. Regarding the
>>> mismatch versions though, are you suggesting
that different region
>>> servers might be running different versions of
hbase/hadoop? They are
>>> all running the same code from the same shared
storage. There isn't
>>> even another version of hadoop anywhere for the
other nodes to run. I
>>> think I'll try dropping my cluster down to 2
nodes and working back
>>> up... maybe I can pin point a specific problem
node. Thanks for taking
>>> a look at my logs.
>>> 
>>> On Nov 24, 2007 5:49 PM, stack <stackduboce.net> wrote:
>>>> I took a quick look Kareem.   As with the
last time, hbase keeps having
>>>> trouble w/ the hdfs.  Things start out fine
around 16:00 then go bad
>>>> because can't write reliably to the hdfs --
a variety of reasons.  You
>>>> then seem to restart the cluster around
17:37 or so and things seem to
>>>> go along fine for a while until 19:05 when
again, all regionservers
>>>> report trouble writing the hdfs.  Have you
run an fsck?
>> 
>> 


Re: µªÎ`: HBase PerformanceEvaluation failing
user name
2007-11-24 23:13:19
I cleaned the logs, cleaned and reformatted the DFS. Started
up DFS,
mapred, and hbase.
Ran fsck just to make sure:
kcdhadoop07:~/hadoop$ bin/hadoop fsck /
.....Status: HEALTHY
 Total size:    450 B
 Total blocks:  3 (avg. block size 150 B)
 Total dirs:    16
 Total files:   5
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Target replication factor:     1
 Real replication factor:       1.0


The filesystem under path '/' is HEALTHY

Started the PE Test:
kcdhadoop07:~/hadoop$ bin/hadoop jar
/usr/research/home/kcd/hadoop/build/contrib/hbase/hadoop-0.1
5.0-dev-hbase-test.jar
sequentialWrite 2
07/11/24 23:14:03 INFO hbase.PerformanceEvaluation: Table
name:
TestTable, families: {info:={name: info, max versions: 3,
compression:
NONE, in memory: false, max length: 2147483647, bloom
filter: none}}
created
07/11/24 23:14:04 INFO mapred.FileInputFormat: Total input
paths to process : 1
07/11/24 23:14:06 INFO mapred.JobClient: Running job:
job_200711242311_0001
07/11/24 23:14:07 INFO mapred.JobClient:  map 0% reduce 0%
07/11/24 23:20:53 INFO mapred.JobClient:  map 5% reduce 0%
07/11/24 23:21:03 INFO mapred.JobClient:  map 5% reduce 1%
07/11/24 23:22:35 INFO mapred.JobClient:  map 10% reduce 1%
07/11/24 23:22:46 INFO mapred.JobClient:  map 10% reduce 3%
07/11/24 23:24:10 INFO mapred.JobClient:  map 14% reduce 3%
07/11/24 23:24:22 INFO mapred.JobClient:  map 19% reduce 3%
07/11/24 23:24:34 INFO mapred.JobClient:  map 19% reduce 5%
07/11/24 23:29:31 INFO mapred.JobClient:  map 24% reduce 5%
07/11/24 23:29:42 INFO mapred.JobClient:  map 24% reduce 6%
07/11/24 23:31:37 INFO mapred.JobClient:  map 29% reduce 6%
07/11/24 23:31:45 INFO mapred.JobClient:  map 30% reduce 6%
07/11/24 23:31:56 INFO mapred.JobClient:  map 30% reduce
10%
07/11/24 23:32:12 INFO mapred.JobClient:  map 35% reduce
10%
07/11/24 23:32:18 INFO mapred.JobClient:  map 35% reduce
11%
07/11/24 23:37:33 INFO mapred.JobClient:  map 40% reduce
11%
07/11/24 23:37:39 INFO mapred.JobClient:  map 40% reduce
13%
07/11/24 23:37:45 INFO mapred.JobClient:  map 45% reduce
13%
07/11/24 23:37:54 INFO mapred.JobClient:  map 45% reduce
15%
07/11/24 23:42:02 INFO mapred.JobClient:  map 50% reduce
15%
07/11/24 23:52:14 INFO mapred.JobClient:  map 100% reduce
100%
java.io.IOException: Job failed!
        at
org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:831
)
        at
org.apache.hadoop.hbase.PerformanceEvaluation.runNIsMoreThan
One(PerformanceEvaluation.java:248)
        at
org.apache.hadoop.hbase.PerformanceEvaluation.runTest(Perfor
manceEvaluation.java:588)
        at
org.apache.hadoop.hbase.PerformanceEvaluation.doCommandLine(
PerformanceEvaluation.java:680)
        at
org.apache.hadoop.hbase.PerformanceEvaluation.main(Performan
ceEvaluation.java:699)
        at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
ssorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
thodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at
org.apache.hadoop.util.RunJar.main(RunJar.java:155)


Checked DFS status again:
kcdhadoop07:~/hadoop$ bin/hadoop fsck /
.....
/tmp/hadoop-kcd/hbase/hregion_-ROOT-,,0/info/mapfiles/356202
0000090370572/data:
MISSING 1 blocks of total size 0 B.
.......
/tmp/hadoop-kcd/hbase/hregion_.META.,,1/info/mapfiles/434460
773960343228/data:
MISSING 1 blocks of total size 0 B.
....................................
/tmp/hadoop-kcd/hbase/hregion_TestTable,1152738,800744514559
5019431/info/mapfiles/5636229081894866081/data:
MISSING 1 blocks of total size 0 B.
....................................................
....................................................
/tmp/hadoop-kcd/hbase/hregion_TestTable,8426559,750141718782
5603851/info/mapfiles/6826621922173844792/data:
MISSING 1 blocks of total size 0 B.
...........
/tmp/hadoop-kcd/hbase/log_172.16.6.56_-1556193815149242895_6
0020/hlog.dat.025:
MISSING 1 blocks of total size 0 B.
.Status: CORRUPT
 Total size:    1823865158 B
 Total blocks:  174 (avg. block size 10481983 B)
 Total dirs:    180
 Total files:   164
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Target replication factor:     1
 Real replication factor:       1.0


The filesystem under path '/' is CORRUPT

All my logs are uploaded at http://www.
cs.duke.edu/~kcd/hadoop-logs2/

The namenode log looks fine until the very end when it has a
few of
these errors:
2007-11-24 23:43:00,798 INFO
org.apache.hadoop.dfs.StateChange: BLOCK*
NameSystem.allocateBlock:
/tmp/hadoop-kcd/hbase/hregion_.META.,,1/info/mapfiles/914606
8477937578655/data.
blk_-6879410450276751752
2007-11-24 23:43:00,798 WARN
org.apache.hadoop.fs.FSNamesystem: Not
able to place enough replicas, still in need of 1
2007-11-24 23:43:00,808 INFO org.apache.hadoop.ipc.Server:
IPC Server
handler 1 on 54310, call
addBlock(/tmp/hadoop-kcd/hbase/hregion_.META.,,1/info/mapfil
es/9146068477937578655/data,
DFSClient_-1049549396) from 172.16.6.56:56970: error:
java.io.IOException: File
/tmp/hadoop-kcd/hbase/hregion_.META.,,1/info/mapfiles/914606
8477937578655/data
could only be replicated to 0 nodes, instead of 1
java.io.IOException: File
/tmp/hadoop-kcd/hbase/hregion_.META.,,1/info/mapfiles/914606
8477937578655/data
could only be replicated to 0 nodes, instead of 1
	at
org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSName
system.java:1003)
	at
org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:293)
	at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown
Source)
	at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
thodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:585)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
	at
org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)

The hbase-region servers seem to be having a lot of problems
though
according to their logs. Not quite sure what could be the
cause.

- Kareem

On Nov 24, 2007 9:56 PM, Ted Dunning <tdunningveoh.com> wrote:
>
> Can you clear the logs, clean out the file system and
run the test again?
>
> The namenode logs should tell an interesting story.
>
>
>
> On 11/24/07 6:12 PM, "Kareem Dana"
<kareem.danagmail.com> wrote:
>
> > I ran hadoop fsck and sure enough the DFS was
corrupted. It seems that
> > the PerformanceEvaluation test is corrupting it.
Before I run the
> > test, I ran fsck and the DFS was reported as
HEALTHY. Once the PE
> > fails, the DFS is reported as corrupt. I tried to
simplify my setup
> > and run the PE again. My new config is as
follows:
> >
> > hadoop07 - DFS Master, Mapred master, hbase
master
> > hadoop09-10 - 2 hbase region servers
> > hadoop11-12 - 2 datanodes, task trackers
> >
> > mapred.map.tasks = 2
> > mapred.reduce.tasks = 1
> > dfs.replication = 1
> >
> > I ran the distributed PE in that configuration and
it still failed
> > with similar errors. The output of the hadoop fsck
for this run was:
> >
> > ..........
> >
>
/tmp/hadoop-kcd/hbase/hregion_.META.,,1/info/mapfiles/643488
1831082231493/data>
> :
> > MISSING 1 blocks of total size 0 B.
> > ......................................
> >
/tmp/hadoop-kcd/hbase/hregion_TestTable,11566878,12270926815
44002579/info/mapf
> > iles/5263238643231358600/data:
> > MISSING 1 blocks of total size 0 B.
> > ....
> >
/tmp/hadoop-kcd/hbase/hregion_TestTable,12612310,16520624110
16999689/info/mapf
> > iles/2024298319068625138/data:
> > MISSING 1 blocks of total size 0 B.
> > ....
> >
/tmp/hadoop-kcd/hbase/hregion_TestTable,12612310,16520624110
16999689/info/mapf
> > iles/5071453667327337040/data:
> > MISSING 1 blocks of total size 0 B.
> > .........
> >
/tmp/hadoop-kcd/hbase/hregion_TestTable,13932,47381927475213
22482/info/mapfile
> > s/4400784113695734765/data:
> > MISSING 1 blocks of total size 0 B.
> > ...................................
> >
............................................................
............
> >
/tmp/hadoop-kcd/hbase/log_172.16.6.56_-1823376333333123807_6
0020/hlog.dat.027:
> > MISSING 1 blocks of total size 0 B.
> > .Status: CORRUPT
> >  Total size:    1890454330 B
> >  Total blocks:  180 (avg. block size 10502524 B)
> >  Total dirs:    190
> >  Total files:   173
> >  Over-replicated blocks:        0 (0.0 %)
> >  Under-replicated blocks:       0 (0.0 %)
> >  Target replication factor:     1
> >  Real replication factor:       1.0
> >
> >
> > The filesystem under path '/' is CORRUPT
> >
> >
> > On Nov 24, 2007 6:21 PM, Ted Dunning
<tdunningveoh.com> wrote:
> >>
> >> I think that stack was suggesting an HDFS
fsck, not a disk level fsck.
> >>
> >> Try [hadoop fsck /]
> >>
> >>
> >>
> >>
> >>
> >> On 11/24/07 4:09 PM, "Kareem Dana"
<kareem.danagmail.com> wrote:
> >>
> >>> I do not have root access on the xen
cluster I'm using. I will ask the
> >>> admin to make sure the disk is working
properly. Regarding the
> >>> mismatch versions though, are you
suggesting that different region
> >>> servers might be running different
versions of hbase/hadoop? They are
> >>> all running the same code from the same
shared storage. There isn't
> >>> even another version of hadoop anywhere
for the other nodes to run. I
> >>> think I'll try dropping my cluster down to
2 nodes and working back
> >>> up... maybe I can pin point a specific
problem node. Thanks for taking
> >>> a look at my logs.
> >>>
> >>> On Nov 24, 2007 5:49 PM, stack
<stackduboce.net> wrote:
> >>>> I took a quick look Kareem.   As with
the last time, hbase keeps having
> >>>> trouble w/ the hdfs.  Things start out
fine around 16:00 then go bad
> >>>> because can't write reliably to the
hdfs -- a variety of reasons.  You
> >>>> then seem to restart the cluster
around 17:37 or so and things seem to
> >>>> go along fine for a while until 19:05
when again, all regionservers
> >>>> report trouble writing the hdfs.  Have
you run an fsck?
> >>
> >>
>
>

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )