List Info

Thread: Created: (HADOOP-1664) Hadoop DFS upgrade prcoedure




Created: (HADOOP-1664) Hadoop DFS upgrade prcoedure
country flaguser name
United States
2007-07-30 09:54:53
Hadoop DFS upgrade prcoedure
----------------------------

                 Key: HADOOP-1664
                 URL: htt
ps://issues.apache.org/jira/browse/HADOOP-1664
             Project: Hadoop
          Issue Type: Improvement
          Components: dfs
    Affects Versions: 0.14.0
            Reporter: Christian Kunz


When upgrading from a July-9  to a July-25 nightly release,
we are able to upgrade successfully on a single-node
cluster, but failed on a 10 and a 200 node cluster.
As it is not sure whether we made a mistake or not, I file
this as an improvement. But going forward it is imperative
that there is a safe and well-documented procedure to
upgrade dfs without loss of data, including a rollback
procedure and listing of operational procedures that are
irreversibly destructive (hopefully an empty list).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


Commented: (HADOOP-1664) Hadoop DFS upgrade prcoedure
country flaguser name
United States
2007-07-30 11:29:53
    [ https://issues.apache.org/jira/browse
/HADOOP-1664?page=com.atlassian.jira.plugin.system.issuetabp
anels:comment-tabpanel#action_12516431 ] 

Raghu Angadi commented on HADOOP-1664:
--------------------------------------

I am writing an admin guide for upgrading to Hadoop-0.14.
will post it in couple of days. If you have any logs, please
add them here. Upgrade and rollback procedure is same as
before.

> Hadoop DFS upgrade prcoedure
> ----------------------------
>
>                 Key: HADOOP-1664
>                 URL: htt
ps://issues.apache.org/jira/browse/HADOOP-1664
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Christian Kunz
>
> When upgrading from a July-9  to a July-25 nightly
release, we are able to upgrade successfully on a
single-node cluster, but failed on a 10 and a 200 node
cluster.
> As it is not sure whether we made a mistake or not, I
file this as an improvement. But going forward it is
imperative that there is a safe and well-documented
procedure to upgrade dfs without loss of data, including a
rollback procedure and listing of operational procedures
that are irreversibly destructive (hopefully an empty
list).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


Commented: (HADOOP-1664) Hadoop DFS upgrade prcoedure
country flaguser name
United States
2007-07-30 14:15:53
    [ https://issues.apache.org/jira/browse
/HADOOP-1664?page=com.atlassian.jira.plugin.system.issuetabp
anels:comment-tabpanel#action_12516488 ] 

Christian Kunz commented on HADOOP-1664:
----------------------------------------

Datanode servers were apparently successful in upgrading:
...
2007-07-26 10:35:34,973 INFO
org.apache.hadoop.dfs.DataNode:
   Distributed upgrade for DataNode version -6 to current LV
-7 is initialized.
2007-07-26 10:35:34,974 INFO org.apache.hadoop.dfs.Storage:
Upgrading storage directory <hadoop-dir>/dfs/data.
   old LV = -5; old CTime = 1183153812398.
   new LV = -7; new CTime = 1185471333047
2007-07-26 10:36:58,098 INFO org.apache.hadoop.dfs.Storage:
Upgrade of /<hadoop-dir>/dfs/data is complete.
2007-07-26 10:36:58,587 INFO org.apache.hadoop.dfs.DataNode:
Opened server at 50010
...

but namenode server reported 0% upgrade long after that:

2007-07-26 10:43:04,818 INFO
org.apache.hadoop.dfs.BlockCrcUpgradeNamenode: Upgrade still
running.
                                 Avg completion on
Datanodes: 0.00% with 0 errors.

Even after 40 minutes no change in report status, namenode
was still in safe mode, and if I wanted to force it to leave
safe mode, it refused:

hadoop dfsadmin -safemode leave
safemode: org.apache.hadoop.dfs.SafeModeException:
Distributed upgrade is in progress. Name node is in safe
mode.



> Hadoop DFS upgrade prcoedure
> ----------------------------
>
>                 Key: HADOOP-1664
>                 URL: htt
ps://issues.apache.org/jira/browse/HADOOP-1664
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Christian Kunz
>
> When upgrading from a July-9  to a July-25 nightly
release, we are able to upgrade successfully on a
single-node cluster, but failed on a 10 and a 200 node
cluster.
> As it is not sure whether we made a mistake or not, I
file this as an improvement. But going forward it is
imperative that there is a safe and well-documented
procedure to upgrade dfs without loss of data, including a
rollback procedure and listing of operational procedures
that are irreversibly destructive (hopefully an empty
list).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


Commented: (HADOOP-1664) Hadoop DFS upgrade prcoedure
country flaguser name
United States
2007-07-30 14:32:53
    [ https://issues.apache.org/jira/browse
/HADOOP-1664?page=com.atlassian.jira.plugin.system.issuetabp
anels:comment-tabpanel#action_12516493 ] 

Raghu Angadi commented on HADOOP-1664:
--------------------------------------

If possible, I would like to look at full log of a datanode
and the namenode. There is new dfsadmin command
'upgradeProgress'.

> Hadoop DFS upgrade prcoedure
> ----------------------------
>
>                 Key: HADOOP-1664
>                 URL: htt
ps://issues.apache.org/jira/browse/HADOOP-1664
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Christian Kunz
>
> When upgrading from a July-9  to a July-25 nightly
release, we are able to upgrade successfully on a
single-node cluster, but failed on a 10 and a 200 node
cluster.
> As it is not sure whether we made a mistake or not, I
file this as an improvement. But going forward it is
imperative that there is a safe and well-documented
procedure to upgrade dfs without loss of data, including a
rollback procedure and listing of operational procedures
that are irreversibly destructive (hopefully an empty
list).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


Commented: (HADOOP-1664) Hadoop DFS upgrade prcoedure
country flaguser name
United States
2007-07-30 16:01:58
    [ https://issues.apache.org/jira/browse
/HADOOP-1664?page=com.atlassian.jira.plugin.system.issuetabp
anels:comment-tabpanel#action_12516525 ] 

Christian Kunz commented on HADOOP-1664:
----------------------------------------

I will send you the location offline. BTW: I tried the
dfsadmin command 'upgradeProgress' during that time,
reporting 0.0% progress.

> Hadoop DFS upgrade prcoedure
> ----------------------------
>
>                 Key: HADOOP-1664
>                 URL: htt
ps://issues.apache.org/jira/browse/HADOOP-1664
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Christian Kunz
>
> When upgrading from a July-9  to a July-25 nightly
release, we are able to upgrade successfully on a
single-node cluster, but failed on a 10 and a 200 node
cluster.
> As it is not sure whether we made a mistake or not, I
file this as an improvement. But going forward it is
imperative that there is a safe and well-documented
procedure to upgrade dfs without loss of data, including a
rollback procedure and listing of operational procedures
that are irreversibly destructive (hopefully an empty
list).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


Updated: (HADOOP-1664) Hadoop DFS upgrade prcoedure
country flaguser name
United States
2007-07-31 14:45:53
     [ https://issues.apache.org/jira/browse/HADOOP-1664?page=co
m.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi updated HADOOP-1664:
---------------------------------

    Attachment: datanode.log.txt

Namenode log looks fine. It starts the CRC upgrade and is
waiting for datanodes to start the same and join. But for
some reason, datanodes don't start the CRC upgrade. I am not
sure what was going on. If you ever able to reproduce this,
please let me know. 

I am attaching relevant part of one of the datanode's log.


> Hadoop DFS upgrade prcoedure
> ----------------------------
>
>                 Key: HADOOP-1664
>                 URL: htt
ps://issues.apache.org/jira/browse/HADOOP-1664
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Christian Kunz
>         Attachments: datanode.log.txt
>
>
> When upgrading from a July-9  to a July-25 nightly
release, we are able to upgrade successfully on a
single-node cluster, but failed on a 10 and a 200 node
cluster.
> As it is not sure whether we made a mistake or not, I
file this as an improvement. But going forward it is
imperative that there is a safe and well-documented
procedure to upgrade dfs without loss of data, including a
rollback procedure and listing of operational procedures
that are irreversibly destructive (hopefully an empty
list).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


Resolved: (HADOOP-1664) Hadoop DFS upgrade prcoedure
country flaguser name
United States
2007-10-23 15:59:51
     [ https://issues.apache.org/jira/browse/HADOOP-1664?page=co
m.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi resolved HADOOP-1664.
----------------------------------

    Resolution: Cannot Reproduce

We have never seen this behavior again. 0.14.x has gone
through many upgrades, large and small. 

> Hadoop DFS upgrade prcoedure
> ----------------------------
>
>                 Key: HADOOP-1664
>                 URL: htt
ps://issues.apache.org/jira/browse/HADOOP-1664
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Christian Kunz
>         Attachments: datanode.log.txt
>
>
> When upgrading from a July-9  to a July-25 nightly
release, we are able to upgrade successfully on a
single-node cluster, but failed on a 10 and a 200 node
cluster.
> As it is not sure whether we made a mistake or not, I
file this as an improvement. But going forward it is
imperative that there is a safe and well-documented
procedure to upgrade dfs without loss of data, including a
rollback procedure and listing of operational procedures
that are irreversibly destructive (hopefully an empty
list).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


[1-7]

about | contact  Other archives ( Real Estate discussion Medical topics )