List Info

Thread: RE: secondary namenode errors




RE: secondary namenode errors
user name
2007-08-24 21:20:59
I wish I had read the bug more carefully - thought that the
issue was
fixed in 0.13.1.

Of course not, the issue persists. Meanwhile - half the
files are
corrupted after the upgrade (followed the upgrade wiki,
tried to restore
to backed up metadata and old version - to no avail).

Sigh - have a nice weekend everyone,

Joydeep

-----Original Message-----
From: Koji Noguchi [mailto:knoguchiyahoo-inc.com] 
Sent: Friday, August 24, 2007 8:29 AM
To: hadoop-userlucene.apache.org
Subject: Re: secondary namenode errors

Joydeep,

I think you're hitting this bug.
http
://issues.apache.org/jira/browse/HADOOP-1076

In any case, as Raghu suggested, please use 0.13.1 and not
0.13.

Koji




Raghu Angadi wrote:
> Joydeep Sen Sarma wrote:
>> Thanks for replying.
>>
>> Can you please clarify - is it the case that the
secondary namenode
>> stuff only works in 0.13.1? and what's the
connection with
replication
>> factor?
>>
>> We lost the file system completely once, trying to
make sure we can
>> avoid it the next time.
>
> I am not sure if the problem you reported still exists
in 0.13.1. You 
> might still have the problem and you can ask again. But
you should 
> move to 0.13.1 since it has some critical fixes. See
release notes for

> 0.13.1 or HADOOP-1603. You should always upgrade to the
latest minor 
> release version when moving to next major version.
>
> Raghu.
>
>> Joydeep
>>
>> -----Original Message-----
>> From: Raghu Angadi [mailto:rangadiyahoo-inc.com] Sent: Thursday, 
>> August 23, 2007 9:44 PM
>> To: hadoop-userlucene.apache.org
>> Subject: Re: secondary namenode errors
>>
>>
>> On a related note, please don't use 0.13.0, use the
latest released 
>> version for 0.13 (I think it is 0.13.1). If the
secondary namenode 
>> actually works, then it will resulting all the
replications set to 1.
>>
>> Raghu.
>>
>> Joydeep Sen Sarma wrote:
>>> Hi folks,


Re: secondary namenode errors
country flaguser name
United States
2007-08-24 21:42:33
Damn.

I hope you have a nice weekend (anyway).


On 8/24/07 7:20 PM, "Joydeep Sen Sarma"
<jssarmafacebook.com> wrote:

> Sigh - have a nice weekend everyone,


RE: secondary namenode errors
user name
2007-08-25 01:24:39
Just in case someone's curious.

 

Stop and restart dfs with 0.13.1:

 

- master name node says:

 

2007-08-24 18:31:27,318 INFO org.apache.hadoop.dfs.NameNode:
Namenode up
at: hadoop001.sf2p.facebook.com/10.16.159.101:9000

2007-08-24 18:31:28,560 WARN
org.apache.hadoop.dfs.StateChange: DIR*
FSDirectory.unprotectedDelete: failed to remove /tmp/pu3
because

 it does not exist

2007-08-24 18:31:28,571 WARN
org.apache.hadoop.dfs.StateChange: DIR*
FSDirectory.unprotectedRenameTo: failed to rename
/user/facebook

/chatter/rawcounts/2007-08-04/_task_0001_r_000044_0/part-000
44 to
/user/facebook/chatter/rawcounts/2007-08-04/part-00044
because dest

ination exists

2007-08-24 18:31:28,571 WARN
org.apache.hadoop.dfs.StateChange: DIR*
FSDirectory.unprotectedRenameTo: failed to rename
/user/facebook

/chatter/rawcounts/2007-08-04/_task_0001_r_000044_0/.part-00
044.crc to
/user/facebook/chatter/rawcounts/2007-08-04/.part-00044.crc
be

cause destination exists

2007-08-24 18:31:28,572 WARN
org.apache.hadoop.dfs.StateChange: DIR*
FSDirectory.unprotectedRenameTo: failed to rename
/user/facebook

/chatter/rawcounts/2007-08-04/_task_0001_r_000040_0/part-000
40 to
/user/facebook/chatter/rawcounts/2007-08-04/part-00040
because dest

ination exists

2007-08-24 18:31:28,572 WARN
org.apache.hadoop.dfs.StateChange: DIR*
FSDirectory.unprotectedRenameTo: failed to rename
/user/facebook

/chatter/rawcounts/2007-08-04/_task_0001_r_000040_0/.part-00
040.crc to
/user/facebook/chatter/rawcounts/2007-08-04/.part-00040.crc
be

cause destination exists

2007-08-24 18:31:28,573 WARN
org.apache.hadoop.dfs.StateChange: DIR*
FSDirectory.unprotectedRenameTo: failed to rename
/user/facebook

/chatter/rawcounts/2007-08-04/_task_0001_r_000052_0/part-000
52 to
/user/facebook/chatter/rawcounts/2007-08-04/part-00052
because dest

ination exists

...

 

there's a serious blast of these (replaying edit log?). In
any case -
after this is done - it enters safemode - presume the fs is
corrupted by
then. At the exact same time - the datanodes are busy
deleting blocks!:

 

2007-08-24 18:31:33,243 INFO org.apache.hadoop.dfs.DataNode:
Starting
DataNode in:
FSDataset{dirpath='/var/hadoop/tmp/dfs/data/curren

t'}

2007-08-24 18:31:33,243 INFO org.apache.hadoop.dfs.DataNode:
using
BLOCKREPORT_INTERVAL of 3588023msec

2007-08-24 18:31:34,252 INFO org.apache.hadoop.dfs.DataNode:
Deleting
block blk_-9223045762536565560 file
/var/hadoop/tmp/dfs/data/cu

rrent/subdir14/subdir18/blk_-9223045762536565560

2007-08-24 18:31:34,269 INFO org.apache.hadoop.dfs.DataNode:
Deleting
block blk_-9214178286744587840 file
/var/hadoop/tmp/dfs/data/cu

rrent/subdir14/subdir12/blk_-9214178286744587840

2007-08-24 18:31:34,370 INFO org.apache.hadoop.dfs.DataNode:
Deleting
block blk_-9213127144044535407 file
/var/hadoop/tmp/dfs/data/cu

rrent/subdir14/subdir20/blk_-9213127144044535407

2007-08-24 18:31:34,386 INFO org.apache.hadoop.dfs.DataNode:
Deleting
block blk_-9211625398030978419 file
/var/hadoop/tmp/dfs/data/cu

rrent/subdir14/subdir26/blk_-9211625398030978419

2007-08-24 18:31:34,418 INFO org.apache.hadoop.dfs.DataNode:
Deleting
block blk_-9189558923884323865 file
/var/hadoop/tmp/dfs/data/cu

rrent/subdir14/subdir24/blk_-9189558923884323865

2007-08-24 18:31:34,419 INFO org.apache.hadoop.dfs.DataNode:
Deleting
block blk_-9115468136273900585 file
/var/hadoop/tmp/dfs/data/cu

rrent/subdir10/blk_-9115468136273900585

 

 

ouch - I guess those are all the blocks that fsck is now
reporting
missing. Known bug? Operator error? (well - I did do a clean
shutdown
..)

 

 

-----Original Message-----
From: Joydeep Sen Sarma [mailto:jssarmafacebook.com] 
Sent: Friday, August 24, 2007 7:21 PM
To: hadoop-userlucene.apache.org
Subject: RE: secondary namenode errors

 

I wish I had read the bug more carefully - thought that the
issue was

fixed in 0.13.1.

 

Of course not, the issue persists. Meanwhile - half the
files are

corrupted after the upgrade (followed the upgrade wiki,
tried to restore

to backed up metadata and old version - to no avail).

 

Sigh - have a nice weekend everyone,

 

Joydeep

 

-----Original Message-----

From: Koji Noguchi [mailto:knoguchiyahoo-inc.com] 

Sent: Friday, August 24, 2007 8:29 AM

To: hadoop-userlucene.apache.org

Subject: Re: secondary namenode errors

 

Joydeep,

 

I think you're hitting this bug.

http
://issues.apache.org/jira/browse/HADOOP-1076

 

In any case, as Raghu suggested, please use 0.13.1 and not
0.13.

 

Koji

 

 

 

 

Raghu Angadi wrote:

> Joydeep Sen Sarma wrote:

>> Thanks for replying.

>> 

>> Can you please clarify - is it the case that the
secondary namenode

>> stuff only works in 0.13.1? and what's the
connection with

replication

>> factor?

>> 

>> We lost the file system completely once, trying to
make sure we can

>> avoid it the next time.

> 

> I am not sure if the problem you reported still exists
in 0.13.1. You 

> might still have the problem and you can ask again. But
you should 

> move to 0.13.1 since it has some critical fixes. See
release notes for

 

> 0.13.1 or HADOOP-1603. You should always upgrade to the
latest minor 

> release version when moving to next major version.

> 

> Raghu.

> 

>> Joydeep

>> 

>> -----Original Message-----

>> From: Raghu Angadi [mailto:rangadiyahoo-inc.com] Sent: Thursday, 

>> August 23, 2007 9:44 PM

>> To: hadoop-userlucene.apache.org

>> Subject: Re: secondary namenode errors

>> 

>> 

>> On a related note, please don't use 0.13.0, use the
latest released 

>> version for 0.13 (I think it is 0.13.1). If the
secondary namenode 

>> actually works, then it will resulting all the
replications set to 1.

>> 

>> Raghu.

>> 

>> Joydeep Sen Sarma wrote:

>>> Hi folks,

 

[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )