FAQ
DistCp should support an option for deleting non-existing files.
----------------------------------------------------------------

Key: HADOOP-3939
URL: https://issues.apache.org/jira/browse/HADOOP-3939
Project: Hadoop Core
Issue Type: New Feature
Components: tools/distcp
Reporter: Tsz Wo (Nicholas), SZE


One use case of DistCp is to sync two directories. Currently, DistCp has an -update option for overwriting dst files if src is different from dst. However, it is not enough for sync. If there are some files in dst but not exist in src, there is no easy way to delete them. We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Koji Noguchi (JIRA) at Aug 12, 2008 at 9:16 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12621981#action_12621981 ]

    Koji Noguchi commented on HADOOP-3939:
    --------------------------------------

    I can see users mis-using this feature and deleting some of their important files.
    Can we use Trash if it's enabled ?

    DistCp should support an option for deleting non-existing files.
    ----------------------------------------------------------------

    Key: HADOOP-3939
    URL: https://issues.apache.org/jira/browse/HADOOP-3939
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE

    One use case of DistCp is to sync two directories. Currently, DistCp has an -update option for overwriting dst files if src is different from dst. However, it is not enough for sync. If there are some files in dst but not exist in src, there is no easy way to delete them. We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Aug 12, 2008 at 10:08 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12622002#action_12622002 ]

    Tsz Wo (Nicholas), SZE commented on HADOOP-3939:
    ------------------------------------------------
    Can we use Trash if it's enabled ?
    +1 I think this is a good idea. It can be done by re-using the codes in FsShell.
    DistCp should support an option for deleting non-existing files.
    ----------------------------------------------------------------

    Key: HADOOP-3939
    URL: https://issues.apache.org/jira/browse/HADOOP-3939
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE

    One use case of DistCp is to sync two directories. Currently, DistCp has an -update option for overwriting dst files if src is different from dst. However, it is not enough for sync. If there are some files in dst but not exist in src, there is no easy way to delete them. We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Aug 25, 2008 at 10:22 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tsz Wo (Nicholas), SZE updated HADOOP-3939:
    -------------------------------------------

    Attachment: 3939_20080825.patch

    3939_20080825.patch: first version. Need some tests
    DistCp should support an option for deleting non-existing files.
    ----------------------------------------------------------------

    Key: HADOOP-3939
    URL: https://issues.apache.org/jira/browse/HADOOP-3939
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE
    Attachments: 3939_20080825.patch


    One use case of DistCp is to sync two directories. Currently, DistCp has an -update option for overwriting dst files if src is different from dst. However, it is not enough for sync. If there are some files in dst but not exist in src, there is no easy way to delete them. We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Aug 26, 2008 at 3:08 am
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tsz Wo (Nicholas), SZE updated HADOOP-3939:
    -------------------------------------------

    Attachment: 3939_20080825b.patch

    3939_20080825b.patch: fixed some bugs.
    DistCp should support an option for deleting non-existing files.
    ----------------------------------------------------------------

    Key: HADOOP-3939
    URL: https://issues.apache.org/jira/browse/HADOOP-3939
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE
    Attachments: 3939_20080825.patch, 3939_20080825b.patch


    One use case of DistCp is to sync two directories. Currently, DistCp has an -update option for overwriting dst files if src is different from dst. However, it is not enough for sync. If there are some files in dst but not exist in src, there is no easy way to delete them. We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Aug 26, 2008 at 7:47 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tsz Wo (Nicholas), SZE updated HADOOP-3939:
    -------------------------------------------

    Attachment: 3939_20080826.patch

    3939_20080826.patch: added a test.
    DistCp should support an option for deleting non-existing files.
    ----------------------------------------------------------------

    Key: HADOOP-3939
    URL: https://issues.apache.org/jira/browse/HADOOP-3939
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE
    Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch


    One use case of DistCp is to sync two directories. Currently, DistCp has an -update option for overwriting dst files if src is different from dst. However, it is not enough for sync. If there are some files in dst but not exist in src, there is no easy way to delete them. We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Aug 26, 2008 at 8:25 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tsz Wo (Nicholas), SZE updated HADOOP-3939:
    -------------------------------------------

    Assignee: Tsz Wo (Nicholas), SZE
    Release Note: Added a new optopm -delete to DistCp so that if the files/directories exist in dst but not in src will be deleted. It uses FsShell to do delete, so that it will use trash if the trash is enable.
    Status: Patch Available (was: Open)

    Passed test-patch and all tests locally. Submitting ...
    DistCp should support an option for deleting non-existing files.
    ----------------------------------------------------------------

    Key: HADOOP-3939
    URL: https://issues.apache.org/jira/browse/HADOOP-3939
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE
    Assignee: Tsz Wo (Nicholas), SZE
    Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch


    One use case of DistCp is to sync two directories. Currently, DistCp has an -update option for overwriting dst files if src is different from dst. However, it is not enough for sync. If there are some files in dst but not exist in src, there is no easy way to delete them. We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Aug 27, 2008 at 7:56 am
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12625997#action_12625997 ]

    Hadoop QA commented on HADOOP-3939:
    -----------------------------------

    -1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12388941/3939_20080826.patch
    against trunk revision 689363.

    +1 @author. The patch does not contain any @author tags.

    +1 tests included. The patch appears to include 4 new or modified tests.

    +1 javadoc. The javadoc tool did not generate any warning messages.

    +1 javac. The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs. The patch does not introduce any new Findbugs warnings.

    +1 release audit. The applied patch does not increase the total number of release audit warnings.

    -1 core tests. The patch failed core unit tests.

    -1 contrib tests. The patch failed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3117/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3117/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3117/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3117/console

    This message is automatically generated.
    DistCp should support an option for deleting non-existing files.
    ----------------------------------------------------------------

    Key: HADOOP-3939
    URL: https://issues.apache.org/jira/browse/HADOOP-3939
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE
    Assignee: Tsz Wo (Nicholas), SZE
    Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch


    One use case of DistCp is to sync two directories. Currently, DistCp has an -update option for overwriting dst files if src is different from dst. However, it is not enough for sync. If there are some files in dst but not exist in src, there is no easy way to delete them. We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Aug 27, 2008 at 3:56 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626168#action_12626168 ]

    Tsz Wo (Nicholas), SZE commented on HADOOP-3939:
    ------------------------------------------------

    3939_20080826.patch only changed DistCp and fixed a bug in FileStatus.hashCode(). The unit tests failed are not related.
    DistCp should support an option for deleting non-existing files.
    ----------------------------------------------------------------

    Key: HADOOP-3939
    URL: https://issues.apache.org/jira/browse/HADOOP-3939
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE
    Assignee: Tsz Wo (Nicholas), SZE
    Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch


    One use case of DistCp is to sync two directories. Currently, DistCp has an -update option for overwriting dst files if src is different from dst. However, it is not enough for sync. If there are some files in dst but not exist in src, there is no easy way to delete them. We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Aug 28, 2008 at 9:08 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626739#action_12626739 ]

    Chris Douglas commented on HADOOP-3939:
    ---------------------------------------

    * Would it make sense to require either \-update or \-overwrite if \-delete is specified? Without either of these options, the semantics are a little confusing. For example:
    ** In this case, the destination doesn't exist. Everything that isn't the source is deleted, which seems reasonable.
    {noformat}
    $ bin/hadoop fs -ls a b
    Found 2 items
    -rw-r--r-- 1 someuser somegroup 92934 2008-08-11 21:42 /user/someuser/a/part-00000
    Found 4 items
    -rw-r--r-- 1 someuser somegroup 105177784 2008-08-28 11:46 /user/someuser/b/part-00000
    -rw-r--r-- 1 someuser somegroup 105177884 2008-08-28 11:46 /user/someuser/b/part-00001
    -rw-r--r-- 1 someuser somegroup 105177754 2008-08-28 11:46 /user/someuser/b/part-00002
    $ bin/hadoop distcp -delete hdfs://host:8020/user/someuser/a hdfs://host:8020/user/someuser/b
    08/08/28 11:51:18 INFO tools.DistCp: srcPaths=[hdfs://host:8020/user/someuser/a]
    08/08/28 11:51:18 INFO tools.DistCp: destPath=hdfs://host:8020/user/someuser/b
    Deleted hdfs://host/user/someuser/b/part-00000
    Deleted hdfs://host/user/someuser/b/part-00001
    Deleted hdfs://host/user/someuser/b/part-00002
    [snip]
    $ bin/hadoop fs -ls a b
    Found 2 items
    -rw-r--r-- 1 someuser somegroup 92934 2008-08-11 21:42 /user/someuser/a/part-00000
    Found 2 items
    drwxr-xr-x - someuser somegroup 0 2008-08-28 11:51 /user/someuser/b/a
    {noformat}
    ** Here, the destination does exist, but it is deleted anyway, as though \-overwrite were specified.
    {noformat}
    $ bin/hadoop fs -lsr a b
    -rw-r--r-- 1 someuser somegroup 92934 2008-08-11 21:42 /user/someuser/a/part-00000
    -rw-r--r-- 1 someuser somegroup 105177784 2008-08-28 11:51 /user/someuser/b/part-00000
    -rw-r--r-- 1 someuser somegroup 105177884 2008-08-28 11:51 /user/someuser/b/part-00001
    -rw-r--r-- 1 someuser somegroup 105177754 2008-08-28 11:51 /user/someuser/b/part-00002
    drwxr-xr-x - someuser somegroup 0 2008-08-28 13:34 /user/someuser/b/a
    -rw-r--r-- 1 someuser somegroup 105177784 2008-08-28 13:34 /user/someuser/b/a/part-00000
    $ bin/hadoop distcp -delete hdfs://host:8020/user/someuser/a hdfs://host:8020/user/someuser/b
    08/08/28 13:35:14 INFO tools.DistCp: srcPaths=[hdfs://host:8020/user/someuser/a]
    08/08/28 13:35:14 INFO tools.DistCp: destPath=hdfs://host:8020/user/someuser/b
    Deleted hdfs://host:8020/user/someuser/b/part-00000
    Deleted hdfs://host:8020/user/someuser/b/part-00001
    Deleted hdfs://host:8020/user/someuser/b/part-00002
    Deleted hdfs://host:8020/user/someuser/b/a
    [snip]
    $ bin/hadoop fs -lsr a b
    -rw-r--r-- 1 someuser somegroup 92934 2008-08-11 21:42 /user/someuser/a/part-00000
    drwxr-xr-x - someuser somegroup 0 2008-08-28 13:35 /user/someuser/b/a
    -rw-r--r-- 1 someuser somegroup 92934 2008-08-28 13:35 /user/someuser/b/a/part-00000
    {noformat}

    Adding this dependency would also help prevent casual errors and potentially serious mistakes if the Trash is disabled.
    * It might help to always add a message about FsShell failing, and set the cause rather than:
    {noformat}
    + } catch(Exception e) {
    + throw e instanceof IOException? (IOException)e: new IOException(e);
    + }
    {noformat}
    * When \-delete is specified, the client is doing a lot of work to recursively list the destination, then to delete individual files there. In the future it might make sense to leave it to the maps to delete entries, since the source list is sorted. The client (or a reduce) would have to do some work on the boundaries, but it should scale well. The current patch is clearer given distcp's current organization, though.
    * The fix to FileStatus makes sense, but when is the Path null?
    DistCp should support an option for deleting non-existing files.
    ----------------------------------------------------------------

    Key: HADOOP-3939
    URL: https://issues.apache.org/jira/browse/HADOOP-3939
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE
    Assignee: Tsz Wo (Nicholas), SZE
    Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch


    One use case of DistCp is to sync two directories. Currently, DistCp has an -update option for overwriting dst files if src is different from dst. However, it is not enough for sync. If there are some files in dst but not exist in src, there is no easy way to delete them. We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Aug 29, 2008 at 12:08 am
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626798#action_12626798 ]

    Tsz Wo (Nicholas), SZE commented on HADOOP-3939:
    ------------------------------------------------
    Would it make sense to require either -update or -overwrite if -delete is specified?
    We should enforce that.
    The fix to FileStatus makes sense, but when is the Path null?
    I hit this when creating a FileStatus by the default constructor and then put is in some data structure (I forgot which data structure). The current implementation does not need to this operation. So I will revert this change.
    DistCp should support an option for deleting non-existing files.
    ----------------------------------------------------------------

    Key: HADOOP-3939
    URL: https://issues.apache.org/jira/browse/HADOOP-3939
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE
    Assignee: Tsz Wo (Nicholas), SZE
    Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch


    One use case of DistCp is to sync two directories. Currently, DistCp has an -update option for overwriting dst files if src is different from dst. However, it is not enough for sync. If there are some files in dst but not exist in src, there is no easy way to delete them. We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Aug 29, 2008 at 12:20 am
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tsz Wo (Nicholas), SZE updated HADOOP-3939:
    -------------------------------------------

    Attachment: 3939_20080828.patch

    3939_20080828.patch: incorporated all comments from Chris.
    DistCp should support an option for deleting non-existing files.
    ----------------------------------------------------------------

    Key: HADOOP-3939
    URL: https://issues.apache.org/jira/browse/HADOOP-3939
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE
    Assignee: Tsz Wo (Nicholas), SZE
    Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch, 3939_20080828.patch


    One use case of DistCp is to sync two directories. Currently, DistCp has an -update option for overwriting dst files if src is different from dst. However, it is not enough for sync. If there are some files in dst but not exist in src, there is no easy way to delete them. We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Aug 29, 2008 at 12:56 am
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-3939:
    ----------------------------------

    Fix Version/s: 0.19.0
    Hadoop Flags: [Reviewed]

    +1
    DistCp should support an option for deleting non-existing files.
    ----------------------------------------------------------------

    Key: HADOOP-3939
    URL: https://issues.apache.org/jira/browse/HADOOP-3939
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE
    Assignee: Tsz Wo (Nicholas), SZE
    Fix For: 0.19.0

    Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch, 3939_20080828.patch


    One use case of DistCp is to sync two directories. Currently, DistCp has an -update option for overwriting dst files if src is different from dst. However, it is not enough for sync. If there are some files in dst but not exist in src, there is no easy way to delete them. We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Aug 29, 2008 at 1:50 am
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tsz Wo (Nicholas), SZE updated HADOOP-3939:
    -------------------------------------------

    Status: Open (was: Patch Available)
    DistCp should support an option for deleting non-existing files.
    ----------------------------------------------------------------

    Key: HADOOP-3939
    URL: https://issues.apache.org/jira/browse/HADOOP-3939
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE
    Assignee: Tsz Wo (Nicholas), SZE
    Fix For: 0.19.0

    Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch, 3939_20080828.patch


    One use case of DistCp is to sync two directories. Currently, DistCp has an -update option for overwriting dst files if src is different from dst. However, it is not enough for sync. If there are some files in dst but not exist in src, there is no easy way to delete them. We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Aug 29, 2008 at 1:52 am
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tsz Wo (Nicholas), SZE updated HADOOP-3939:
    -------------------------------------------

    Status: Patch Available (was: Open)

    submit again.
    DistCp should support an option for deleting non-existing files.
    ----------------------------------------------------------------

    Key: HADOOP-3939
    URL: https://issues.apache.org/jira/browse/HADOOP-3939
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE
    Assignee: Tsz Wo (Nicholas), SZE
    Fix For: 0.19.0

    Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch, 3939_20080828.patch


    One use case of DistCp is to sync two directories. Currently, DistCp has an -update option for overwriting dst files if src is different from dst. However, it is not enough for sync. If there are some files in dst but not exist in src, there is no easy way to delete them. We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Aug 29, 2008 at 6:43 am
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626864#action_12626864 ]

    Hadoop QA commented on HADOOP-3939:
    -----------------------------------

    +1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12389133/3939_20080828.patch
    against trunk revision 690096.

    +1 @author. The patch does not contain any @author tags.

    +1 tests included. The patch appears to include 4 new or modified tests.

    +1 javadoc. The javadoc tool did not generate any warning messages.

    +1 javac. The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs. The patch does not introduce any new Findbugs warnings.

    +1 release audit. The applied patch does not increase the total number of release audit warnings.

    +1 core tests. The patch passed core unit tests.

    +1 contrib tests. The patch passed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3141/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3141/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3141/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3141/console

    This message is automatically generated.
    DistCp should support an option for deleting non-existing files.
    ----------------------------------------------------------------

    Key: HADOOP-3939
    URL: https://issues.apache.org/jira/browse/HADOOP-3939
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE
    Assignee: Tsz Wo (Nicholas), SZE
    Fix For: 0.19.0

    Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch, 3939_20080828.patch


    One use case of DistCp is to sync two directories. Currently, DistCp has an -update option for overwriting dst files if src is different from dst. However, it is not enough for sync. If there are some files in dst but not exist in src, there is no easy way to delete them. We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Aug 29, 2008 at 6:14 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tsz Wo (Nicholas), SZE updated HADOOP-3939:
    -------------------------------------------

    Attachment: 3939_20080829.patch

    3939_20080829.patch: fixed a bug for path checking.
    DistCp should support an option for deleting non-existing files.
    ----------------------------------------------------------------

    Key: HADOOP-3939
    URL: https://issues.apache.org/jira/browse/HADOOP-3939
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE
    Assignee: Tsz Wo (Nicholas), SZE
    Fix For: 0.19.0

    Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch, 3939_20080828.patch, 3939_20080829.patch


    One use case of DistCp is to sync two directories. Currently, DistCp has an -update option for overwriting dst files if src is different from dst. However, it is not enough for sync. If there are some files in dst but not exist in src, there is no easy way to delete them. We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Aug 29, 2008 at 10:40 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tsz Wo (Nicholas), SZE updated HADOOP-3939:
    -------------------------------------------

    Attachment: 3939_20080829b.patch

    3939_20080829b.patch: updated the new unit test.
    DistCp should support an option for deleting non-existing files.
    ----------------------------------------------------------------

    Key: HADOOP-3939
    URL: https://issues.apache.org/jira/browse/HADOOP-3939
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE
    Assignee: Tsz Wo (Nicholas), SZE
    Fix For: 0.19.0

    Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch, 3939_20080828.patch, 3939_20080829.patch, 3939_20080829b.patch


    One use case of DistCp is to sync two directories. Currently, DistCp has an -update option for overwriting dst files if src is different from dst. However, it is not enough for sync. If there are some files in dst but not exist in src, there is no easy way to delete them. We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Aug 29, 2008 at 11:04 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627143#action_12627143 ]

    Tsz Wo (Nicholas), SZE commented on HADOOP-3939:
    ------------------------------------------------

    Tested locally. 3939_20080829b.patch is ready to be committed.
    DistCp should support an option for deleting non-existing files.
    ----------------------------------------------------------------

    Key: HADOOP-3939
    URL: https://issues.apache.org/jira/browse/HADOOP-3939
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE
    Assignee: Tsz Wo (Nicholas), SZE
    Fix For: 0.19.0

    Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch, 3939_20080828.patch, 3939_20080829.patch, 3939_20080829b.patch


    One use case of DistCp is to sync two directories. Currently, DistCp has an -update option for overwriting dst files if src is different from dst. However, it is not enough for sync. If there are some files in dst but not exist in src, there is no easy way to delete them. We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Aug 29, 2008 at 11:18 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tsz Wo (Nicholas), SZE updated HADOOP-3939:
    -------------------------------------------

    Status: Open (was: Patch Available)
    DistCp should support an option for deleting non-existing files.
    ----------------------------------------------------------------

    Key: HADOOP-3939
    URL: https://issues.apache.org/jira/browse/HADOOP-3939
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE
    Assignee: Tsz Wo (Nicholas), SZE
    Fix For: 0.19.0

    Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch, 3939_20080828.patch, 3939_20080829.patch, 3939_20080829b.patch


    One use case of DistCp is to sync two directories. Currently, DistCp has an -update option for overwriting dst files if src is different from dst. However, it is not enough for sync. If there are some files in dst but not exist in src, there is no easy way to delete them. We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Aug 29, 2008 at 11:18 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tsz Wo (Nicholas), SZE updated HADOOP-3939:
    -------------------------------------------

    Status: Patch Available (was: Open)
    DistCp should support an option for deleting non-existing files.
    ----------------------------------------------------------------

    Key: HADOOP-3939
    URL: https://issues.apache.org/jira/browse/HADOOP-3939
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE
    Assignee: Tsz Wo (Nicholas), SZE
    Fix For: 0.19.0

    Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch, 3939_20080828.patch, 3939_20080829.patch, 3939_20080829b.patch


    One use case of DistCp is to sync two directories. Currently, DistCp has an -update option for overwriting dst files if src is different from dst. However, it is not enough for sync. If there are some files in dst but not exist in src, there is no easy way to delete them. We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Sep 1, 2008 at 7:11 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627518#action_12627518 ]

    Hadoop QA commented on HADOOP-3939:
    -----------------------------------

    +1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12389205/3939_20080829b.patch
    against trunk revision 690641.

    +1 @author. The patch does not contain any @author tags.

    +1 tests included. The patch appears to include 4 new or modified tests.

    +1 javadoc. The javadoc tool did not generate any warning messages.

    +1 javac. The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs. The patch does not introduce any new Findbugs warnings.

    +1 release audit. The applied patch does not increase the total number of release audit warnings.

    +1 core tests. The patch passed core unit tests.

    +1 contrib tests. The patch passed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3150/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3150/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3150/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3150/console

    This message is automatically generated.
    DistCp should support an option for deleting non-existing files.
    ----------------------------------------------------------------

    Key: HADOOP-3939
    URL: https://issues.apache.org/jira/browse/HADOOP-3939
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE
    Assignee: Tsz Wo (Nicholas), SZE
    Fix For: 0.19.0

    Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch, 3939_20080828.patch, 3939_20080829.patch, 3939_20080829b.patch


    One use case of DistCp is to sync two directories. Currently, DistCp has an -update option for overwriting dst files if src is different from dst. However, it is not enough for sync. If there are some files in dst but not exist in src, there is no easy way to delete them. We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Sep 1, 2008 at 8:47 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-3939:
    ----------------------------------

    Resolution: Fixed
    Status: Resolved (was: Patch Available)

    I just committed this. Thanks Nicholas
    DistCp should support an option for deleting non-existing files.
    ----------------------------------------------------------------

    Key: HADOOP-3939
    URL: https://issues.apache.org/jira/browse/HADOOP-3939
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE
    Assignee: Tsz Wo (Nicholas), SZE
    Fix For: 0.19.0

    Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch, 3939_20080828.patch, 3939_20080829.patch, 3939_20080829b.patch


    One use case of DistCp is to sync two directories. Currently, DistCp has an -update option for overwriting dst files if src is different from dst. However, it is not enough for sync. If there are some files in dst but not exist in src, there is no easy way to delete them. We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hudson (JIRA) at Sep 2, 2008 at 1:04 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627653#action_12627653 ]

    Hudson commented on HADOOP-3939:
    --------------------------------

    Integrated in Hadoop-trunk #590 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/590/])
    DistCp should support an option for deleting non-existing files.
    ----------------------------------------------------------------

    Key: HADOOP-3939
    URL: https://issues.apache.org/jira/browse/HADOOP-3939
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE
    Assignee: Tsz Wo (Nicholas), SZE
    Fix For: 0.19.0

    Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch, 3939_20080828.patch, 3939_20080829.patch, 3939_20080829b.patch


    One use case of DistCp is to sync two directories. Currently, DistCp has an -update option for overwriting dst files if src is different from dst. However, it is not enough for sync. If there are some files in dst but not exist in src, there is no easy way to delete them. We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hudson (JIRA) at Oct 3, 2008 at 2:33 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636648#action_12636648 ]

    Hudson commented on HADOOP-3939:
    --------------------------------

    Integrated in Hadoop-trunk #622 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/622/])

    DistCp should support an option for deleting non-existing files.
    ----------------------------------------------------------------

    Key: HADOOP-3939
    URL: https://issues.apache.org/jira/browse/HADOOP-3939
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE
    Assignee: Tsz Wo (Nicholas), SZE
    Fix For: 0.19.0

    Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch, 3939_20080828.patch, 3939_20080829.patch, 3939_20080829b.patch


    One use case of DistCp is to sync two directories. Currently, DistCp has an -update option for overwriting dst files if src is different from dst. However, it is not enough for sync. If there are some files in dst but not exist in src, there is no easy way to delete them. We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Oct 8, 2008 at 4:44 am
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tsz Wo (Nicholas), SZE updated HADOOP-3939:
    -------------------------------------------

    Release Note: Added a new option -delete to DistCp so that if the files/directories exist in dst but not in src will be deleted. It uses FsShell to do delete, so that it will use trash if the trash is enable. (was: Added a new optopm -delete to DistCp so that if the files/directories exist in dst but not in src will be deleted. It uses FsShell to do delete, so that it will use trash if the trash is enable.)
    DistCp should support an option for deleting non-existing files.
    ----------------------------------------------------------------

    Key: HADOOP-3939
    URL: https://issues.apache.org/jira/browse/HADOOP-3939
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE
    Assignee: Tsz Wo (Nicholas), SZE
    Fix For: 0.19.0

    Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch, 3939_20080828.patch, 3939_20080829.patch, 3939_20080829b.patch


    One use case of DistCp is to sync two directories. Currently, DistCp has an -update option for overwriting dst files if src is different from dst. However, it is not enough for sync. If there are some files in dst but not exist in src, there is no easy way to delete them. We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Jan 13, 2009 at 12:04 am
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tsz Wo (Nicholas), SZE updated HADOOP-3939:
    -------------------------------------------

    Attachment: 3939_20080829b_0.18+3873_20080811b_0.18.patch

    3939_20080829b_0.18+3873_20080811b_0.18.patch: for 0.18. It also includes HADOOP-3873. This patch won't be committed.
    DistCp should support an option for deleting non-existing files.
    ----------------------------------------------------------------

    Key: HADOOP-3939
    URL: https://issues.apache.org/jira/browse/HADOOP-3939
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE
    Assignee: Tsz Wo (Nicholas), SZE
    Fix For: 0.19.0

    Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch, 3939_20080828.patch, 3939_20080829.patch, 3939_20080829b.patch, 3939_20080829b_0.18+3873_20080811b_0.18.patch


    One use case of DistCp is to sync two directories. Currently, DistCp has an -update option for overwriting dst files if src is different from dst. However, it is not enough for sync. If there are some files in dst but not exist in src, there is no easy way to delete them. We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedAug 12, '08 at 8:00p
activeJan 13, '09 at 12:04a
posts27
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Tsz Wo (Nicholas), SZE (JIRA): 27 posts

People

Translate

site design / logo © 2023 Grokbase