FAQ
DistCp should have an option for limiting the number of files/bytes being copied
--------------------------------------------------------------------------------

Key: HADOOP-3873
URL: https://issues.apache.org/jira/browse/HADOOP-3873
Project: Hadoop Core
Issue Type: New Feature
Components: tools/distcp
Reporter: Tsz Wo (Nicholas), SZE


A single DistCp command may potentially copies a huge number of files/bytes. In such case, DistCp will run a long time and there is no way stop it nicely. It would be good if DistCp have an option to limit the number of files/bytes being copied. Once the limit is reached, DistCp will terminate and return success. All files copied are guaranteed to be good and there is no partially copied file.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Doug Cutting (JIRA) at Jul 30, 2008 at 9:16 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618510#action_12618510 ]

    Doug Cutting commented on HADOOP-3873:
    --------------------------------------

    This sounds rather ad-hoc. What is the use case?

    In most cases, the total size to be copied can be determined up front, before the copying begins, no?

    What might be better is a mechanism to stop a DistCp job. E.g., one could provide a "stop" file name. When this is non-null, copying will stop as soon as the named file exists. Might that meet the need here?
    DistCp should have an option for limiting the number of files/bytes being copied
    --------------------------------------------------------------------------------

    Key: HADOOP-3873
    URL: https://issues.apache.org/jira/browse/HADOOP-3873
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE

    A single DistCp command may potentially copies a huge number of files/bytes. In such case, DistCp will run a long time and there is no way stop it nicely. It would be good if DistCp have an option to limit the number of files/bytes being copied. Once the limit is reached, DistCp will terminate and return success. All files copied are guaranteed to be good and there is no partially copied file.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Jul 30, 2008 at 10:34 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618535#action_12618535 ]

    Tsz Wo (Nicholas), SZE commented on HADOOP-3873:
    ------------------------------------------------
    This sounds rather ad-hoc. What is the use case?
    One use case is doing backup a number of directories, say /user1/data, /user2/data, /user3/data, etc. during off peak hours everyday. Each of these directories may contain large number of files/bytes. If we simply do distcp, then it cannot finish copying everything within a single day.

    Also, since DistCp currently copies files sequentially, files in /user1/data will be copied first. The other users will be unhappy.

    If distcp support a limit option, we could do something like
    distcp /user1/data limit 100GB, 1000000 files
    distcp /user2/data limit 100GB, 1000000 files
    ...

    These commands will be executed everyday. Suppose /user1/data contains 5 files as following

    /user1/data/file1 50GB
    /user1/data/file2 50GB
    /user1/data/file3 50GB
    /user1/data/file4 50GB
    /user1/data/file5 50GB

    Then, distcp will copy file1 and file2 in the first day. In the second day, since file1 and file2 already exist, distcp will copy file3 and file4. User1 will expect 3 days to finish copying all files.
    DistCp should have an option for limiting the number of files/bytes being copied
    --------------------------------------------------------------------------------

    Key: HADOOP-3873
    URL: https://issues.apache.org/jira/browse/HADOOP-3873
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE

    A single DistCp command may potentially copies a huge number of files/bytes. In such case, DistCp will run a long time and there is no way stop it nicely. It would be good if DistCp have an option to limit the number of files/bytes being copied. Once the limit is reached, DistCp will terminate and return success. All files copied are guaranteed to be good and there is no partially copied file.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Jul 30, 2008 at 10:38 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618539#action_12618539 ]

    Tsz Wo (Nicholas), SZE commented on HADOOP-3873:
    ------------------------------------------------
    In most cases, the total size to be copied can be determined up front, before the copying begins, no?
    Yes, you are right that we can pre-compute lists of files being copied and impose whatever constraints. The new option is to automate the pre-computation. DistCp currently computes a list of files before copying. I am planning to change the computation so that the list will satisfy the file/size limit constraints.

    What might be better is a mechanism to stop a DistCp job. E.g., one could provide a "stop" file name. When this is non-null, copying will stop as soon as the named file exists. Might that meet the need here?
    This is a good idea to stop DistCp job nicely. Let me see whether it could solve the backup use case described above.
    DistCp should have an option for limiting the number of files/bytes being copied
    --------------------------------------------------------------------------------

    Key: HADOOP-3873
    URL: https://issues.apache.org/jira/browse/HADOOP-3873
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE

    A single DistCp command may potentially copies a huge number of files/bytes. In such case, DistCp will run a long time and there is no way stop it nicely. It would be good if DistCp have an option to limit the number of files/bytes being copied. Once the limit is reached, DistCp will terminate and return success. All files copied are guaranteed to be good and there is no partially copied file.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doug Cutting (JIRA) at Jul 30, 2008 at 11:01 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618547#action_12618547 ]

    Doug Cutting commented on HADOOP-3873:
    --------------------------------------

    Okay, sounds like a reasonable use case.

    Your initial description sounded like you intended to count the files copied as the job runs, and terminate it when it crosses a limit. That would be tricky, and is perhaps not what you meant anyway. Rather, all we need to do to implement this is to count bytes and files as files are listed in the client before the job is created. If that's all you mean, then +1, this seems like a fine feature.

    The implementation would be much cleaner if listStatus acceptted a StatusFilter. Then the filter can count bytes and files and stop returning new files once its limit is exceeded. The existing code would hardly change.
    DistCp should have an option for limiting the number of files/bytes being copied
    --------------------------------------------------------------------------------

    Key: HADOOP-3873
    URL: https://issues.apache.org/jira/browse/HADOOP-3873
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE

    A single DistCp command may potentially copies a huge number of files/bytes. In such case, DistCp will run a long time and there is no way stop it nicely. It would be good if DistCp have an option to limit the number of files/bytes being copied. Once the limit is reached, DistCp will terminate and return success. All files copied are guaranteed to be good and there is no partially copied file.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Aug 8, 2008 at 6:30 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12621001#action_12621001 ]

    Tsz Wo (Nicholas), SZE commented on HADOOP-3873:
    ------------------------------------------------

    How about we add two new options "-filelimit n" and "-sizelimit n" to distcp?
    DistCp should have an option for limiting the number of files/bytes being copied
    --------------------------------------------------------------------------------

    Key: HADOOP-3873
    URL: https://issues.apache.org/jira/browse/HADOOP-3873
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE

    A single DistCp command may potentially copies a huge number of files/bytes. In such case, DistCp will run a long time and there is no way stop it nicely. It would be good if DistCp have an option to limit the number of files/bytes being copied. Once the limit is reached, DistCp will terminate and return success. All files copied are guaranteed to be good and there is no partially copied file.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doug Cutting (JIRA) at Aug 8, 2008 at 7:40 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12621026#action_12621026 ]

    Doug Cutting commented on HADOOP-3873:
    --------------------------------------
    How about we add two new options "-filelimit n" and "-sizelimit n" to distcp?
    Sounds fine to me.
    DistCp should have an option for limiting the number of files/bytes being copied
    --------------------------------------------------------------------------------

    Key: HADOOP-3873
    URL: https://issues.apache.org/jira/browse/HADOOP-3873
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE

    A single DistCp command may potentially copies a huge number of files/bytes. In such case, DistCp will run a long time and there is no way stop it nicely. It would be good if DistCp have an option to limit the number of files/bytes being copied. Once the limit is reached, DistCp will terminate and return success. All files copied are guaranteed to be good and there is no partially copied file.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Aug 8, 2008 at 7:58 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12621032#action_12621032 ]

    Raghu Angadi commented on HADOOP-3873:
    --------------------------------------

    This is a useful feature. Hopefully documentation clearly defines what users can expect.

    One question : extending the example in the 2nd commend above, what happens if /user1/data/file1 is deleted on the source before the second day? will it be deleted on the destination? If yes, may be some option like "-sync" will make it more clear to the user (of course "-sizelimit" etc still apply).

    In the long term, once we can preserve modification times and other metadata while copying, it might be better to add "rsync" mode to distcp.

    DistCp should have an option for limiting the number of files/bytes being copied
    --------------------------------------------------------------------------------

    Key: HADOOP-3873
    URL: https://issues.apache.org/jira/browse/HADOOP-3873
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE

    A single DistCp command may potentially copies a huge number of files/bytes. In such case, DistCp will run a long time and there is no way stop it nicely. It would be good if DistCp have an option to limit the number of files/bytes being copied. Once the limit is reached, DistCp will terminate and return success. All files copied are guaranteed to be good and there is no partially copied file.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Aug 8, 2008 at 8:38 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12621046#action_12621046 ]

    Tsz Wo (Nicholas), SZE commented on HADOOP-3873:
    ------------------------------------------------
    One question : extending the example in the 2nd commend above, what happens if /user1/data/file1 is deleted on the source before the second day? will it be deleted on the destination? If yes, may be some option like "-sync" will make it more clear to the user (of course "-sizelimit" etc still apply).
    +1 "-sync" is probably our next step.
    DistCp should have an option for limiting the number of files/bytes being copied
    --------------------------------------------------------------------------------

    Key: HADOOP-3873
    URL: https://issues.apache.org/jira/browse/HADOOP-3873
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE

    A single DistCp command may potentially copies a huge number of files/bytes. In such case, DistCp will run a long time and there is no way stop it nicely. It would be good if DistCp have an option to limit the number of files/bytes being copied. Once the limit is reached, DistCp will terminate and return success. All files copied are guaranteed to be good and there is no partially copied file.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Aug 8, 2008 at 8:52 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12621053#action_12621053 ]

    Raghu Angadi commented on HADOOP-3873:
    --------------------------------------

    Is that an "yes" for deleting the file? Thanks.
    DistCp should have an option for limiting the number of files/bytes being copied
    --------------------------------------------------------------------------------

    Key: HADOOP-3873
    URL: https://issues.apache.org/jira/browse/HADOOP-3873
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE

    A single DistCp command may potentially copies a huge number of files/bytes. In such case, DistCp will run a long time and there is no way stop it nicely. It would be good if DistCp have an option to limit the number of files/bytes being copied. Once the limit is reached, DistCp will terminate and return success. All files copied are guaranteed to be good and there is no partially copied file.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Aug 8, 2008 at 8:58 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12621058#action_12621058 ]

    Tsz Wo (Nicholas), SZE commented on HADOOP-3873:
    ------------------------------------------------

    I don't plan to implement "-sync" or file deletion in this issue. So the answer for your question is: "no, the file will be remained in the destination." I will file another issue for that. Sorry for not being clear.
    DistCp should have an option for limiting the number of files/bytes being copied
    --------------------------------------------------------------------------------

    Key: HADOOP-3873
    URL: https://issues.apache.org/jira/browse/HADOOP-3873
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE

    A single DistCp command may potentially copies a huge number of files/bytes. In such case, DistCp will run a long time and there is no way stop it nicely. It would be good if DistCp have an option to limit the number of files/bytes being copied. Once the limit is reached, DistCp will terminate and return success. All files copied are guaranteed to be good and there is no partially copied file.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Aug 9, 2008 at 1:17 am
    [ https://issues.apache.org/jira/browse/HADOOP-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tsz Wo (Nicholas), SZE updated HADOOP-3873:
    -------------------------------------------

    Attachment: 3873_20080808b.patch

    3873_20080808b.patch: this is a first patch supporting the new "-filelimit" and "-sizelimit" options. Need re-writing the shell messages and new tests.
    DistCp should have an option for limiting the number of files/bytes being copied
    --------------------------------------------------------------------------------

    Key: HADOOP-3873
    URL: https://issues.apache.org/jira/browse/HADOOP-3873
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE
    Attachments: 3873_20080808b.patch


    A single DistCp command may potentially copies a huge number of files/bytes. In such case, DistCp will run a long time and there is no way stop it nicely. It would be good if DistCp have an option to limit the number of files/bytes being copied. Once the limit is reached, DistCp will terminate and return success. All files copied are guaranteed to be good and there is no partially copied file.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Aug 11, 2008 at 10:45 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tsz Wo (Nicholas), SZE updated HADOOP-3873:
    -------------------------------------------

    Attachment: 3873_20080811b.patch

    3873_20080811b.patch: this is a complete patch

    - -filelimit <n> and -sizelimit <n> support symbolic representation. For examples,
    1230k = 1230 * 1024 = 1259520
    891g = 891 * 1024^3 = 956703965184

    - Comparing files sizes during setup

    - Rewrote shell messages

    - Added a few tests
    DistCp should have an option for limiting the number of files/bytes being copied
    --------------------------------------------------------------------------------

    Key: HADOOP-3873
    URL: https://issues.apache.org/jira/browse/HADOOP-3873
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE
    Attachments: 3873_20080808b.patch, 3873_20080811b.patch


    A single DistCp command may potentially copies a huge number of files/bytes. In such case, DistCp will run a long time and there is no way stop it nicely. It would be good if DistCp have an option to limit the number of files/bytes being copied. Once the limit is reached, DistCp will terminate and return success. All files copied are guaranteed to be good and there is no partially copied file.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Aug 12, 2008 at 12:11 am
    [ https://issues.apache.org/jira/browse/HADOOP-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tsz Wo (Nicholas), SZE updated HADOOP-3873:
    -------------------------------------------

    Assignee: Tsz Wo (Nicholas), SZE
    Status: Patch Available (was: Open)

    Passed all tests locally, try Hudson.
    DistCp should have an option for limiting the number of files/bytes being copied
    --------------------------------------------------------------------------------

    Key: HADOOP-3873
    URL: https://issues.apache.org/jira/browse/HADOOP-3873
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE
    Assignee: Tsz Wo (Nicholas), SZE
    Attachments: 3873_20080808b.patch, 3873_20080811b.patch


    A single DistCp command may potentially copies a huge number of files/bytes. In such case, DistCp will run a long time and there is no way stop it nicely. It would be good if DistCp have an option to limit the number of files/bytes being copied. Once the limit is reached, DistCp will terminate and return success. All files copied are guaranteed to be good and there is no partially copied file.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Aug 13, 2008 at 9:53 am
    [ https://issues.apache.org/jira/browse/HADOOP-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12622148#action_12622148 ]

    Hadoop QA commented on HADOOP-3873:
    -----------------------------------

    -1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12388005/3873_20080811b.patch
    against trunk revision 685425.

    +1 @author. The patch does not contain any @author tags.

    +1 tests included. The patch appears to include 9 new or modified tests.

    -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

    +1 javac. The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs. The patch does not introduce any new Findbugs warnings.

    +1 release audit. The applied patch does not increase the total number of release audit warnings.

    -1 core tests. The patch failed core unit tests.

    -1 contrib tests. The patch failed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3053/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3053/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3053/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3053/console

    This message is automatically generated.
    DistCp should have an option for limiting the number of files/bytes being copied
    --------------------------------------------------------------------------------

    Key: HADOOP-3873
    URL: https://issues.apache.org/jira/browse/HADOOP-3873
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE
    Assignee: Tsz Wo (Nicholas), SZE
    Attachments: 3873_20080808b.patch, 3873_20080811b.patch


    A single DistCp command may potentially copies a huge number of files/bytes. In such case, DistCp will run a long time and there is no way stop it nicely. It would be good if DistCp have an option to limit the number of files/bytes being copied. Once the limit is reached, DistCp will terminate and return success. All files copied are guaranteed to be good and there is no partially copied file.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Aug 13, 2008 at 6:25 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12622308#action_12622308 ]

    Tsz Wo (Nicholas), SZE commented on HADOOP-3873:
    ------------------------------------------------

    The javadoc warnings are nothing to do with the patch. Before I did "svn update" today, there were no javadoc warnings. See HADOOP-3949

    The tests failed are TestMapRed and TestMiniMRDFSSort but they failed on trunk (they did not fail before I did "svn update"). See HADOOP-3950
    DistCp should have an option for limiting the number of files/bytes being copied
    --------------------------------------------------------------------------------

    Key: HADOOP-3873
    URL: https://issues.apache.org/jira/browse/HADOOP-3873
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE
    Assignee: Tsz Wo (Nicholas), SZE
    Attachments: 3873_20080808b.patch, 3873_20080811b.patch


    A single DistCp command may potentially copies a huge number of files/bytes. In such case, DistCp will run a long time and there is no way stop it nicely. It would be good if DistCp have an option to limit the number of files/bytes being copied. Once the limit is reached, DistCp will terminate and return success. All files copied are guaranteed to be good and there is no partially copied file.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Aug 13, 2008 at 11:37 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-3873:
    ----------------------------------

    Resolution: Fixed
    Fix Version/s: 0.19.0
    Hadoop Flags: [Reviewed]
    Status: Resolved (was: Patch Available)

    +1 looks good.

    I just committed this. Thanks, Nicholas
    DistCp should have an option for limiting the number of files/bytes being copied
    --------------------------------------------------------------------------------

    Key: HADOOP-3873
    URL: https://issues.apache.org/jira/browse/HADOOP-3873
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE
    Assignee: Tsz Wo (Nicholas), SZE
    Fix For: 0.19.0

    Attachments: 3873_20080808b.patch, 3873_20080811b.patch


    A single DistCp command may potentially copies a huge number of files/bytes. In such case, DistCp will run a long time and there is no way stop it nicely. It would be good if DistCp have an option to limit the number of files/bytes being copied. Once the limit is reached, DistCp will terminate and return success. All files copied are guaranteed to be good and there is no partially copied file.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Aug 13, 2008 at 11:41 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tsz Wo (Nicholas), SZE updated HADOOP-3873:
    -------------------------------------------

    Release Note: Added two new options -filelimit <n> and -sizelimit <n> to DistCp for limiting the total number of files and the total size in bytes, respectively.
    DistCp should have an option for limiting the number of files/bytes being copied
    --------------------------------------------------------------------------------

    Key: HADOOP-3873
    URL: https://issues.apache.org/jira/browse/HADOOP-3873
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE
    Assignee: Tsz Wo (Nicholas), SZE
    Fix For: 0.19.0

    Attachments: 3873_20080808b.patch, 3873_20080811b.patch


    A single DistCp command may potentially copies a huge number of files/bytes. In such case, DistCp will run a long time and there is no way stop it nicely. It would be good if DistCp have an option to limit the number of files/bytes being copied. Once the limit is reached, DistCp will terminate and return success. All files copied are guaranteed to be good and there is no partially copied file.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hudson (JIRA) at Aug 22, 2008 at 12:38 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624730#action_12624730 ]

    Hudson commented on HADOOP-3873:
    --------------------------------

    Integrated in Hadoop-trunk #581 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/581/])
    DistCp should have an option for limiting the number of files/bytes being copied
    --------------------------------------------------------------------------------

    Key: HADOOP-3873
    URL: https://issues.apache.org/jira/browse/HADOOP-3873
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE
    Assignee: Tsz Wo (Nicholas), SZE
    Fix For: 0.19.0

    Attachments: 3873_20080808b.patch, 3873_20080811b.patch


    A single DistCp command may potentially copies a huge number of files/bytes. In such case, DistCp will run a long time and there is no way stop it nicely. It would be good if DistCp have an option to limit the number of files/bytes being copied. Once the limit is reached, DistCp will terminate and return success. All files copied are guaranteed to be good and there is no partially copied file.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hudson (JIRA) at Oct 3, 2008 at 2:33 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636647#action_12636647 ]

    Hudson commented on HADOOP-3873:
    --------------------------------

    Integrated in Hadoop-trunk #622 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/622/])

    DistCp should have an option for limiting the number of files/bytes being copied
    --------------------------------------------------------------------------------

    Key: HADOOP-3873
    URL: https://issues.apache.org/jira/browse/HADOOP-3873
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE
    Assignee: Tsz Wo (Nicholas), SZE
    Fix For: 0.19.0

    Attachments: 3873_20080808b.patch, 3873_20080811b.patch


    A single DistCp command may potentially copies a huge number of files/bytes. In such case, DistCp will run a long time and there is no way stop it nicely. It would be good if DistCp have an option to limit the number of files/bytes being copied. Once the limit is reached, DistCp will terminate and return success. All files copied are guaranteed to be good and there is no partially copied file.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Jan 9, 2009 at 10:00 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tsz Wo (Nicholas), SZE updated HADOOP-3873:
    -------------------------------------------

    Attachment: 3873_20080811b_0.18.patch

    3873_20080811b_0.18.patch: for 0.18 (this won't be committed.)
    DistCp should have an option for limiting the number of files/bytes being copied
    --------------------------------------------------------------------------------

    Key: HADOOP-3873
    URL: https://issues.apache.org/jira/browse/HADOOP-3873
    Project: Hadoop Core
    Issue Type: New Feature
    Components: tools/distcp
    Reporter: Tsz Wo (Nicholas), SZE
    Assignee: Tsz Wo (Nicholas), SZE
    Fix For: 0.19.0

    Attachments: 3873_20080808b.patch, 3873_20080811b.patch, 3873_20080811b_0.18.patch


    A single DistCp command may potentially copies a huge number of files/bytes. In such case, DistCp will run a long time and there is no way stop it nicely. It would be good if DistCp have an option to limit the number of files/bytes being copied. Once the limit is reached, DistCp will terminate and return success. All files copied are guaranteed to be good and there is no partially copied file.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedJul 30, '08 at 8:48p
activeJan 9, '09 at 10:00p
posts21
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Tsz Wo (Nicholas), SZE (JIRA): 21 posts

People

Translate

site design / logo © 2022 Grokbase