FAQ
distcp -pugp does not work when copying to a local file system
--------------------------------------------------------------

Key: HADOOP-6016
URL: https://issues.apache.org/jira/browse/HADOOP-6016
Project: Hadoop Core
Issue Type: Bug
Components: tools/distcp
Affects Versions: 0.18.3
Reporter: Aaron Kimball


To achieve rsync-like behavior between a local directory and an HDFS instance, a pseudo-distributed MapReduce cluster was started, connected to a fully distributed HDFS instance. An initial distcp from HDFS down to the local fileystem succeeded. The following day, another distcp was run with:

$ bin/hadoop distcp -pugp -update hdfs://nn:7276/data/raw file:///data/raw

It failed; its output is below:

09/06/07 13:14:51 INFO tools.DistCp: srcPaths=[hdfs://nn:7276/data/raw]
09/06/07 13:14:51 INFO tools.DistCp: destPath=file:/data/raw
09/06/07 13:14:55 INFO tools.DistCp: srcCount=10955
09/06/07 13:14:56 INFO mapred.JobClient: Running job: job_200906071310_0001
09/06/07 13:14:57 INFO mapred.JobClient: map 0% reduce 0%
09/06/07 13:15:24 INFO mapred.JobClient: map 1% reduce 0%
09/06/07 13:17:34 INFO mapred.JobClient: map 2% reduce 0%
09/06/07 13:20:04 INFO mapred.JobClient: map 3% reduce 0%
09/06/07 13:20:49 INFO mapred.JobClient: map 4% reduce 0%
09/06/07 13:21:44 INFO mapred.JobClient: map 5% reduce 0%
09/06/07 13:22:33 INFO mapred.JobClient: map 6% reduce 0%
09/06/07 13:25:14 INFO mapred.JobClient: map 7% reduce 0%
09/06/07 13:27:14 INFO mapred.JobClient: map 8% reduce 0%
09/06/07 13:33:34 INFO mapred.JobClient: map 9% reduce 0%
09/06/07 13:37:30 INFO mapred.JobClient: map 10% reduce 0%
09/06/07 13:40:05 INFO mapred.JobClient: map 11% reduce 0%
09/06/07 13:44:55 INFO mapred.JobClient: map 12% reduce 0%
09/06/07 13:48:55 INFO mapred.JobClient: map 13% reduce 0%
09/06/07 13:54:41 INFO mapred.JobClient: map 14% reduce 0%
09/06/07 13:58:30 INFO mapred.JobClient: map 15% reduce 0%
09/06/07 14:00:46 INFO mapred.JobClient: map 16% reduce 0%
09/06/07 14:01:36 INFO mapred.JobClient: map 17% reduce 0%
09/06/07 14:04:12 INFO mapred.JobClient: map 13% reduce 0%
09/06/07 14:04:12 INFO mapred.JobClient: Task Id : attempt_200906071310_0001_m_000006_0, Status : FAILED
java.io.IOException: Copied: 0 Skipped: 264 Failed: 39
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.close(DistCp.java:542) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210)

09/06/07 14:04:19 INFO mapred.JobClient: Task Id : attempt_200906071310_0001_m_000006_1, Status : FAILED
java.io.FileNotFoundException: File does not exist: hdfs://nn:7276/tmp/hadoop/mapred/system/distcp_m8n2e/_distcp_src_files
at org.apache.hadoop.dfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:412)
at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:684)
at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1415)
at org.apache.hadoop.mapred.SequenceFileRecordReader.(DistCp.java:272)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210)

(several more tasks fail for the same reason with FileNotFoundException)

With failures, global counters are inaccurate; consider running with -i
Copy failed: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1113)
at org.apache.hadoop.tools.DistCp.copy(DistCp.java:619)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:768)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:788)


This distcp update operation does succeed without -pugp.


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Ravi Gummadi (JIRA) at Jun 12, 2009 at 5:12 am
    [ https://issues.apache.org/jira/browse/HADOOP-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718715#action_12718715 ]

    Ravi Gummadi commented on HADOOP-6016:
    --------------------------------------

    Do you see the issue consistently (each time you run a job) ?

    Most probably, issue is not related to -pugp option.

    This issue looks like because of hdfs issue HADOOP-4681.
    distcp -pugp does not work when copying to a local file system
    --------------------------------------------------------------

    Key: HADOOP-6016
    URL: https://issues.apache.org/jira/browse/HADOOP-6016
    Project: Hadoop Core
    Issue Type: Bug
    Components: tools/distcp
    Affects Versions: 0.18.3
    Reporter: Aaron Kimball

    To achieve rsync-like behavior between a local directory and an HDFS instance, a pseudo-distributed MapReduce cluster was started, connected to a fully distributed HDFS instance. An initial distcp from HDFS down to the local fileystem succeeded. The following day, another distcp was run with:
    $ bin/hadoop distcp -pugp -update hdfs://nn:7276/data/raw file:///data/raw
    It failed; its output is below:
    09/06/07 13:14:51 INFO tools.DistCp: srcPaths=[hdfs://nn:7276/data/raw]
    09/06/07 13:14:51 INFO tools.DistCp: destPath=file:/data/raw
    09/06/07 13:14:55 INFO tools.DistCp: srcCount=10955
    09/06/07 13:14:56 INFO mapred.JobClient: Running job: job_200906071310_0001
    09/06/07 13:14:57 INFO mapred.JobClient: map 0% reduce 0%
    09/06/07 13:15:24 INFO mapred.JobClient: map 1% reduce 0%
    09/06/07 13:17:34 INFO mapred.JobClient: map 2% reduce 0%
    09/06/07 13:20:04 INFO mapred.JobClient: map 3% reduce 0%
    09/06/07 13:20:49 INFO mapred.JobClient: map 4% reduce 0%
    09/06/07 13:21:44 INFO mapred.JobClient: map 5% reduce 0%
    09/06/07 13:22:33 INFO mapred.JobClient: map 6% reduce 0%
    09/06/07 13:25:14 INFO mapred.JobClient: map 7% reduce 0%
    09/06/07 13:27:14 INFO mapred.JobClient: map 8% reduce 0%
    09/06/07 13:33:34 INFO mapred.JobClient: map 9% reduce 0%
    09/06/07 13:37:30 INFO mapred.JobClient: map 10% reduce 0%
    09/06/07 13:40:05 INFO mapred.JobClient: map 11% reduce 0%
    09/06/07 13:44:55 INFO mapred.JobClient: map 12% reduce 0%
    09/06/07 13:48:55 INFO mapred.JobClient: map 13% reduce 0%
    09/06/07 13:54:41 INFO mapred.JobClient: map 14% reduce 0%
    09/06/07 13:58:30 INFO mapred.JobClient: map 15% reduce 0%
    09/06/07 14:00:46 INFO mapred.JobClient: map 16% reduce 0%
    09/06/07 14:01:36 INFO mapred.JobClient: map 17% reduce 0%
    09/06/07 14:04:12 INFO mapred.JobClient: map 13% reduce 0%
    09/06/07 14:04:12 INFO mapred.JobClient: Task Id : attempt_200906071310_0001_m_000006_0, Status : FAILED
    java.io.IOException: Copied: 0 Skipped: 264 Failed: 39
    at org.apache.hadoop.tools.DistCp$CopyFilesMapper.close(DistCp.java:542) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210)
    09/06/07 14:04:19 INFO mapred.JobClient: Task Id : attempt_200906071310_0001_m_000006_1, Status : FAILED
    java.io.FileNotFoundException: File does not exist: hdfs://nn:7276/tmp/hadoop/mapred/system/distcp_m8n2e/_distcp_src_files
    at org.apache.hadoop.dfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:412)
    at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:684)
    at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1420)
    at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1415)
    at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43)
    at org.apache.hadoop.tools.DistCp$CopyInputFormat.getRecordReader(DistCp.java:272)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210)
    (several more tasks fail for the same reason with FileNotFoundException)
    With failures, global counters are inaccurate; consider running with -i
    Copy failed: java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1113)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:619)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:768)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:788)
    This distcp update operation does succeed without -pugp.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Aaron Kimball (JIRA) at Jun 15, 2009 at 9:38 pm
    [ https://issues.apache.org/jira/browse/HADOOP-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719793#action_12719793 ]

    Aaron Kimball commented on HADOOP-6016:
    ---------------------------------------

    Running {{distcp -pugp}} always failed; without the option, it succeeded immediately. Looking through the code, I think what is really necessary is just better error handling. {{-pg}} will preserve group membership for files; this has the effect of copying the file to the local filesystem (via the {{file:///}} URI) and then attempting to {{chgrp supergroup}} the file. Of course, there is no "supergroup" locally, so this fails and throws an IOException. That in turn causes a finally-block to delete {{_distcp_src_files}} and everything else crashes down from there. (Moreover, as Hadoop isn't running as root, the {{chown}} would have failed even if the destination group/user did exist.)

    These actions should fail as that's what the security model dictates. So this ticket is really just documenting a need for clearer documentation of what's going wrong, so it's obvious to not attempt {{-pug}} in the future.



    distcp -pugp does not work when copying to a local file system
    --------------------------------------------------------------

    Key: HADOOP-6016
    URL: https://issues.apache.org/jira/browse/HADOOP-6016
    Project: Hadoop Core
    Issue Type: Bug
    Components: tools/distcp
    Affects Versions: 0.18.3
    Reporter: Aaron Kimball

    To achieve rsync-like behavior between a local directory and an HDFS instance, a pseudo-distributed MapReduce cluster was started, connected to a fully distributed HDFS instance. An initial distcp from HDFS down to the local fileystem succeeded. The following day, another distcp was run with:
    $ bin/hadoop distcp -pugp -update hdfs://nn:7276/data/raw file:///data/raw
    It failed; its output is below:
    09/06/07 13:14:51 INFO tools.DistCp: srcPaths=[hdfs://nn:7276/data/raw]
    09/06/07 13:14:51 INFO tools.DistCp: destPath=file:/data/raw
    09/06/07 13:14:55 INFO tools.DistCp: srcCount=10955
    09/06/07 13:14:56 INFO mapred.JobClient: Running job: job_200906071310_0001
    09/06/07 13:14:57 INFO mapred.JobClient: map 0% reduce 0%
    09/06/07 13:15:24 INFO mapred.JobClient: map 1% reduce 0%
    09/06/07 13:17:34 INFO mapred.JobClient: map 2% reduce 0%
    09/06/07 13:20:04 INFO mapred.JobClient: map 3% reduce 0%
    09/06/07 13:20:49 INFO mapred.JobClient: map 4% reduce 0%
    09/06/07 13:21:44 INFO mapred.JobClient: map 5% reduce 0%
    09/06/07 13:22:33 INFO mapred.JobClient: map 6% reduce 0%
    09/06/07 13:25:14 INFO mapred.JobClient: map 7% reduce 0%
    09/06/07 13:27:14 INFO mapred.JobClient: map 8% reduce 0%
    09/06/07 13:33:34 INFO mapred.JobClient: map 9% reduce 0%
    09/06/07 13:37:30 INFO mapred.JobClient: map 10% reduce 0%
    09/06/07 13:40:05 INFO mapred.JobClient: map 11% reduce 0%
    09/06/07 13:44:55 INFO mapred.JobClient: map 12% reduce 0%
    09/06/07 13:48:55 INFO mapred.JobClient: map 13% reduce 0%
    09/06/07 13:54:41 INFO mapred.JobClient: map 14% reduce 0%
    09/06/07 13:58:30 INFO mapred.JobClient: map 15% reduce 0%
    09/06/07 14:00:46 INFO mapred.JobClient: map 16% reduce 0%
    09/06/07 14:01:36 INFO mapred.JobClient: map 17% reduce 0%
    09/06/07 14:04:12 INFO mapred.JobClient: map 13% reduce 0%
    09/06/07 14:04:12 INFO mapred.JobClient: Task Id : attempt_200906071310_0001_m_000006_0, Status : FAILED
    java.io.IOException: Copied: 0 Skipped: 264 Failed: 39
    at org.apache.hadoop.tools.DistCp$CopyFilesMapper.close(DistCp.java:542) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210)
    09/06/07 14:04:19 INFO mapred.JobClient: Task Id : attempt_200906071310_0001_m_000006_1, Status : FAILED
    java.io.FileNotFoundException: File does not exist: hdfs://nn:7276/tmp/hadoop/mapred/system/distcp_m8n2e/_distcp_src_files
    at org.apache.hadoop.dfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:412)
    at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:684)
    at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1420)
    at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1415)
    at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43)
    at org.apache.hadoop.tools.DistCp$CopyInputFormat.getRecordReader(DistCp.java:272)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210)
    (several more tasks fail for the same reason with FileNotFoundException)
    With failures, global counters are inaccurate; consider running with -i
    Copy failed: java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1113)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:619)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:768)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:788)
    This distcp update operation does succeed without -pugp.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Jun 15, 2009 at 10:08 pm
    [ https://issues.apache.org/jira/browse/HADOOP-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tsz Wo (Nicholas), SZE updated HADOOP-6016:
    -------------------------------------------

    Summary: distcp -pugp error message is not clear when chgrp fail. (was: distcp -pugp does not work when copying to a local file system)
    ... So this ticket is really just documenting a need for clearer documentation of what's going wrong, so it's obvious to not attempt -pug in the future.
    Good observation! Edited title reflecting this.
    distcp -pugp error message is not clear when chgrp fail.
    --------------------------------------------------------

    Key: HADOOP-6016
    URL: https://issues.apache.org/jira/browse/HADOOP-6016
    Project: Hadoop Core
    Issue Type: Bug
    Components: tools/distcp
    Affects Versions: 0.18.3
    Reporter: Aaron Kimball

    To achieve rsync-like behavior between a local directory and an HDFS instance, a pseudo-distributed MapReduce cluster was started, connected to a fully distributed HDFS instance. An initial distcp from HDFS down to the local fileystem succeeded. The following day, another distcp was run with:
    $ bin/hadoop distcp -pugp -update hdfs://nn:7276/data/raw file:///data/raw
    It failed; its output is below:
    09/06/07 13:14:51 INFO tools.DistCp: srcPaths=[hdfs://nn:7276/data/raw]
    09/06/07 13:14:51 INFO tools.DistCp: destPath=file:/data/raw
    09/06/07 13:14:55 INFO tools.DistCp: srcCount=10955
    09/06/07 13:14:56 INFO mapred.JobClient: Running job: job_200906071310_0001
    09/06/07 13:14:57 INFO mapred.JobClient: map 0% reduce 0%
    09/06/07 13:15:24 INFO mapred.JobClient: map 1% reduce 0%
    09/06/07 13:17:34 INFO mapred.JobClient: map 2% reduce 0%
    09/06/07 13:20:04 INFO mapred.JobClient: map 3% reduce 0%
    09/06/07 13:20:49 INFO mapred.JobClient: map 4% reduce 0%
    09/06/07 13:21:44 INFO mapred.JobClient: map 5% reduce 0%
    09/06/07 13:22:33 INFO mapred.JobClient: map 6% reduce 0%
    09/06/07 13:25:14 INFO mapred.JobClient: map 7% reduce 0%
    09/06/07 13:27:14 INFO mapred.JobClient: map 8% reduce 0%
    09/06/07 13:33:34 INFO mapred.JobClient: map 9% reduce 0%
    09/06/07 13:37:30 INFO mapred.JobClient: map 10% reduce 0%
    09/06/07 13:40:05 INFO mapred.JobClient: map 11% reduce 0%
    09/06/07 13:44:55 INFO mapred.JobClient: map 12% reduce 0%
    09/06/07 13:48:55 INFO mapred.JobClient: map 13% reduce 0%
    09/06/07 13:54:41 INFO mapred.JobClient: map 14% reduce 0%
    09/06/07 13:58:30 INFO mapred.JobClient: map 15% reduce 0%
    09/06/07 14:00:46 INFO mapred.JobClient: map 16% reduce 0%
    09/06/07 14:01:36 INFO mapred.JobClient: map 17% reduce 0%
    09/06/07 14:04:12 INFO mapred.JobClient: map 13% reduce 0%
    09/06/07 14:04:12 INFO mapred.JobClient: Task Id : attempt_200906071310_0001_m_000006_0, Status : FAILED
    java.io.IOException: Copied: 0 Skipped: 264 Failed: 39
    at org.apache.hadoop.tools.DistCp$CopyFilesMapper.close(DistCp.java:542) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210)
    09/06/07 14:04:19 INFO mapred.JobClient: Task Id : attempt_200906071310_0001_m_000006_1, Status : FAILED
    java.io.FileNotFoundException: File does not exist: hdfs://nn:7276/tmp/hadoop/mapred/system/distcp_m8n2e/_distcp_src_files
    at org.apache.hadoop.dfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:412)
    at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:684)
    at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1420)
    at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1415)
    at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43)
    at org.apache.hadoop.tools.DistCp$CopyInputFormat.getRecordReader(DistCp.java:272)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210)
    (several more tasks fail for the same reason with FileNotFoundException)
    With failures, global counters are inaccurate; consider running with -i
    Copy failed: java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1113)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:619)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:768)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:788)
    This distcp update operation does succeed without -pugp.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedJun 11, '09 at 11:49p
activeJun 15, '09 at 10:08p
posts4
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Tsz Wo (Nicholas), SZE (JIRA): 4 posts

People

Translate

site design / logo © 2022 Grokbase