List Info

Thread: Created: (HADOOP-2049) distcp does not fail if source directory has files with missing blocks




Created: (HADOOP-2049) distcp does not fail if source directory has files with missing blocks
country flaguser name
United States
2007-10-12 21:21:50
distcp does not fail if source directory has files with
missing blocks
------------------------------------------------------------
----------

                 Key: HADOOP-2049
                 URL: htt
ps://issues.apache.org/jira/browse/HADOOP-2049
             Project: Hadoop
          Issue Type: Bug
          Components: util
    Affects Versions: 0.15.0
         Environment: Nightly build: Oct 11, 2007.
            Reporter: Murtaza A. Basrai
            Priority: Critical


I copied a directory using distcp (to another directory on
the same file system).

There were 9 data blocks missing in the files in the source
directory, which caused distcp to print messages like the
following:

...
07/10/13 00:09:16 INFO mapred.JobClient:  map 1% reduce 0%
07/10/13 00:09:16 INFO mapred.JobClient: Task Id :
task_200710120717_0081_m_000020_0, Status : FAILED
java.io.IOException: Could not obtain block:
blk_6787282547149034655 file=/srcdir/file1
        at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNod
e(DFSClient.java:1136)
        at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(D
FSClient.java:988)
        at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClien
t.java:1094)
        at
java.io.DataInputStream.read(DataInputStream.java:83)
        at
org.apache.hadoop.util.CopyFiles$FSCopyFilesMapper.copy(Copy
Files.java:289)
        at
org.apache.hadoop.util.CopyFiles$FSCopyFilesMapper.map(CopyF
iles.java:348)
        at
org.apache.hadoop.util.CopyFiles$FSCopyFilesMapper.map(CopyF
iles.java:216)
        at
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
        at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.
java:1753)
...

The corresponding tasks failed, but the retries were
successful (all files with missing blocks in the source
directory were copied as empty files in the target
directory).

I think that distcp should fail if it cannot successfully
copy all the files (at least when no command-line options
are given).

This is critical for us as we intend to use distcp to copy
databases from one dfs to another, and if silent failures
can happen then we would have to monitor each distcp
manually to ensure that it succeeded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


Resolved: (HADOOP-2049) distcp does not fail if source directory has files with missing block
country flaguser name
United States
2007-10-24 13:18:51
     [ https://issues.apache.org/jira/browse/HADOOP-2049?page=co
m.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley resolved HADOOP-2049.
-----------------------------------

       Resolution: Duplicate
    Fix Version/s: 0.15.0
         Assignee: Chris Douglas

This was fixed as part of HADOOP-2048.

> distcp does not fail if source directory has files with
missing blocks
>
------------------------------------------------------------
----------
>
>                 Key: HADOOP-2049
>                 URL: htt
ps://issues.apache.org/jira/browse/HADOOP-2049
>             Project: Hadoop
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.15.0
>         Environment: Nightly build: Oct 11, 2007.
>            Reporter: Murtaza A. Basrai
>            Assignee: Chris Douglas
>            Priority: Critical
>             Fix For: 0.15.0
>
>
> I copied a directory using distcp (to another directory
on the same file system).
> There were 9 data blocks missing in the files in the
source directory, which caused distcp to print messages like
the following:
> ...
> 07/10/13 00:09:16 INFO mapred.JobClient:  map 1% reduce
0%
> 07/10/13 00:09:16 INFO mapred.JobClient: Task Id :
task_200710120717_0081_m_000020_0, Status : FAILED
> java.io.IOException: Could not obtain block:
blk_6787282547149034655 file=/srcdir/file1
>         at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNod
e(DFSClient.java:1136)
>         at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(D
FSClient.java:988)
>         at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClien
t.java:1094)
>         at
java.io.DataInputStream.read(DataInputStream.java:83)
>         at
org.apache.hadoop.util.CopyFiles$FSCopyFilesMapper.copy(Copy
Files.java:289)
>         at
org.apache.hadoop.util.CopyFiles$FSCopyFilesMapper.map(CopyF
iles.java:348)
>         at
org.apache.hadoop.util.CopyFiles$FSCopyFilesMapper.map(CopyF
iles.java:216)
>         at
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>         at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
>         at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.
java:1753)
> ...
> The corresponding tasks failed, but the retries were
successful (all files with missing blocks in the source
directory were copied as empty files in the target
directory).
> I think that distcp should fail if it cannot
successfully copy all the files (at least when no
command-line options are given).
> This is critical for us as we intend to use distcp to
copy databases from one dfs to another, and if silent
failures can happen then we would have to monitor each
distcp manually to ensure that it succeeded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )