FAQ
Add handling of s3 to CopyFile tool
-----------------------------------

Key: HADOOP-862
URL: https://issues.apache.org/jira/browse/HADOOP-862
Project: Hadoop
Issue Type: Improvement
Components: util
Affects Versions: 0.10.0
Reporter: stack@archive.org
Priority: Minor


CopyFile is a useful tool for doing bulk copies. It doesn't have handling for the recently added s3 filesystem.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

Search Discussions

  • stack@archive.org (JIRA) at Jan 6, 2007 at 2:10 am
    [ https://issues.apache.org/jira/browse/HADOOP-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    stack@archive.org updated HADOOP-862:
    -------------------------------------

    Attachment: copyfiles-s3.diff
    Add handling of s3 to CopyFile tool
    -----------------------------------

    Key: HADOOP-862
    URL: https://issues.apache.org/jira/browse/HADOOP-862
    Project: Hadoop
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.10.0
    Reporter: stack@archive.org
    Priority: Minor
    Attachments: copyfiles-s3.diff


    CopyFile is a useful tool for doing bulk copies. It doesn't have handling for the recently added s3 filesystem.
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • stack@archive.org (JIRA) at Jan 6, 2007 at 2:12 am
    [ https://issues.apache.org/jira/browse/HADOOP-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462676 ]

    stack@archive.org commented on HADOOP-862:
    ------------------------------------------

    Attached is first cut at adding s3 handling to CopyFiles.

    Here's list of changes:

    + Allow hdfs or dfs URI schemes (Used to be dfs only).
    + Changed the usage message so filesystem is generic URI (rather than namenode:port | local).
    + getFileSysName was removed. Use Filesystem.get with fs URI instead.
    + getMapCount: Moved duplicated code for figuring number of maps here.
    + toURI: Added. Have (duplicated) tests of URIness go via here instead.
    + CopyFilesReducer: Removed two instances. Does nothing.
    + Added testing of URIness to members of file-of-source URIs.
    + Minor javadoc and formatting changes.

    Its lightly tested.
    Add handling of s3 to CopyFile tool
    -----------------------------------

    Key: HADOOP-862
    URL: https://issues.apache.org/jira/browse/HADOOP-862
    Project: Hadoop
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.10.0
    Reporter: stack@archive.org
    Priority: Minor
    Attachments: copyfiles-s3.diff


    CopyFile is a useful tool for doing bulk copies. It doesn't have handling for the recently added s3 filesystem.
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • stack@archive.org (JIRA) at Jan 9, 2007 at 5:57 am
    [ https://issues.apache.org/jira/browse/HADOOP-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    stack@archive.org updated HADOOP-862:
    -------------------------------------

    Attachment: copyfiles-s3-2.diff
    Add handling of s3 to CopyFile tool
    -----------------------------------

    Key: HADOOP-862
    URL: https://issues.apache.org/jira/browse/HADOOP-862
    Project: Hadoop
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.10.0
    Reporter: stack@archive.org
    Priority: Minor
    Attachments: copyfiles-s3-2.diff, copyfiles-s3.diff


    CopyFile is a useful tool for doing bulk copies. It doesn't have handling for the recently added s3 filesystem.
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • stack@archive.org (JIRA) at Jan 9, 2007 at 6:09 am
    [ https://issues.apache.org/jira/browse/HADOOP-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463198 ]

    stack@archive.org commented on HADOOP-862:
    ------------------------------------------

    Updated patch.

    + Renamed DFSCopyFilesMapper as FSCopyFilesMapper
    + If no scheme, use 'default' (the value of 'fs.default.name' in hadoop-site.xml).

    I ran more extensive tests going from hdfs to s3 and back again and copying from http into s3 and hdfs (distcp is a nice tool). For example, here is output from a copy of a small nutch segment from hdfs to s3 (in the below hdfs was set as the fs.default.name filesystem):

    stack@debord:~/checkouts/hadoop$ ./bin/hadoop fs -lsr outputs/segments
    /user/stack/outputs/segments/20070108213341-test <dir>
    /user/stack/outputs/segments/20070108213341-test/crawl_fetch <dir>
    /user/stack/outputs/segments/20070108213341-test/crawl_fetch/part-00000 <dir>
    /user/stack/outputs/segments/20070108213341-test/crawl_fetch/part-00000/data <r 1> 1187
    /user/stack/outputs/segments/20070108213341-test/crawl_fetch/part-00000/index <r 1> 234
    /user/stack/outputs/segments/20070108213341-test/crawl_parse <dir>
    /user/stack/outputs/segments/20070108213341-test/crawl_parse/part-00000 <r 1> 9010
    /user/stack/outputs/segments/20070108213341-test/parse_data <dir>
    /user/stack/outputs/segments/20070108213341-test/parse_data/part-00000 <dir>
    /user/stack/outputs/segments/20070108213341-test/parse_data/part-00000/data <r 1> 4630
    /user/stack/outputs/segments/20070108213341-test/parse_data/part-00000/index <r 1> 234
    /user/stack/outputs/segments/20070108213341-test/parse_text <dir>
    /user/stack/outputs/segments/20070108213341-test/parse_text/part-00000 <dir>
    /user/stack/outputs/segments/20070108213341-test/parse_text/part-00000/data <r 1> 6180
    /user/stack/outputs/segments/20070108213341-test/parse_text/part-00000/index <r 1> 234

    Here's copy to an s3 directory named segments-bkup:

    % ./bin/hadoop distcp /user/stack/outputs/segments s3://KEY:SECRET@BUCKET/segments-bkup

    Here's listing of s3 content:

    stack@debord:~/checkouts/hadoop$ ./bin/hadoop fs -fs s3://KEY:SECRET@BUCKET/segments-bkup -lsr /segments-bkup/
    /segments-bkup/20070108213341-test <dir>
    /segments-bkup/20070108213341-test/crawl_fetch <dir>
    /segments-bkup/20070108213341-test/crawl_fetch/part-00000 <dir>
    /segments-bkup/20070108213341-test/crawl_fetch/part-00000/data <r 1> 1187
    /segments-bkup/20070108213341-test/crawl_fetch/part-00000/index <r 1> 234
    /segments-bkup/20070108213341-test/crawl_parse <dir>
    /segments-bkup/20070108213341-test/crawl_parse/part-00000 <r 1> 9010
    /segments-bkup/20070108213341-test/parse_data <dir>
    /segments-bkup/20070108213341-test/parse_data/part-00000 <dir>
    /segments-bkup/20070108213341-test/parse_data/part-00000/data <r 1> 4630
    /segments-bkup/20070108213341-test/parse_data/part-00000/index <r 1> 234
    /segments-bkup/20070108213341-test/parse_text <dir>
    /segments-bkup/20070108213341-test/parse_text/part-00000 <dir>
    /segments-bkup/20070108213341-test/parse_text/part-00000/data <r 1> 6180
    /segments-bkup/20070108213341-test/parse_text/part-00000/index <r 1> 234
    Add handling of s3 to CopyFile tool
    -----------------------------------

    Key: HADOOP-862
    URL: https://issues.apache.org/jira/browse/HADOOP-862
    Project: Hadoop
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.10.0
    Reporter: stack@archive.org
    Priority: Minor
    Attachments: copyfiles-s3-2.diff, copyfiles-s3.diff


    CopyFile is a useful tool for doing bulk copies. It doesn't have handling for the recently added s3 filesystem.
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Tom White (JIRA) at Jan 31, 2007 at 9:57 pm
    [ https://issues.apache.org/jira/browse/HADOOP-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469210 ]

    Tom White commented on HADOOP-862:
    ----------------------------------

    I just tried using this patch, and I managed to copy some local files to the S3 file system without trouble.

    Looking at the code I noticed that the -fs option doesn't seem to be used any longer so it can be dropped. Other than that, it looks fine to me.
    Add handling of s3 to CopyFile tool
    -----------------------------------

    Key: HADOOP-862
    URL: https://issues.apache.org/jira/browse/HADOOP-862
    Project: Hadoop
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.10.0
    Reporter: stack@archive.org
    Priority: Minor
    Attachments: copyfiles-s3-2.diff, copyfiles-s3.diff


    CopyFile is a useful tool for doing bulk copies. It doesn't have handling for the recently added s3 filesystem.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack@archive.org (JIRA) at Feb 2, 2007 at 6:17 am
    [ https://issues.apache.org/jira/browse/HADOOP-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    stack@archive.org updated HADOOP-862:
    -------------------------------------

    Attachment: copyfiles-s3-3.diff

    Fix usage string (suggested by Tom White review)
    Add handling of s3 to CopyFile tool
    -----------------------------------

    Key: HADOOP-862
    URL: https://issues.apache.org/jira/browse/HADOOP-862
    Project: Hadoop
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.10.0
    Reporter: stack@archive.org
    Priority: Minor
    Attachments: copyfiles-s3-2.diff, copyfiles-s3-3.diff, copyfiles-s3.diff


    CopyFile is a useful tool for doing bulk copies. It doesn't have handling for the recently added s3 filesystem.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack@archive.org (JIRA) at Feb 2, 2007 at 6:20 am
    [ https://issues.apache.org/jira/browse/HADOOP-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    stack@archive.org updated HADOOP-862:
    -------------------------------------

    Fix Version/s: 0.11.0
    Affects Version/s: (was: 0.10.0)
    0.10.1
    Status: Patch Available (was: Open)

    Marking issue with 'patch available'.
    Add handling of s3 to CopyFile tool
    -----------------------------------

    Key: HADOOP-862
    URL: https://issues.apache.org/jira/browse/HADOOP-862
    Project: Hadoop
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.10.1
    Reporter: stack@archive.org
    Priority: Minor
    Fix For: 0.11.0

    Attachments: copyfiles-s3-2.diff, copyfiles-s3-3.diff, copyfiles-s3.diff


    CopyFile is a useful tool for doing bulk copies. It doesn't have handling for the recently added s3 filesystem.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack@archive.org (JIRA) at Feb 2, 2007 at 6:22 am
    [ https://issues.apache.org/jira/browse/HADOOP-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469681 ]

    stack@archive.org commented on HADOOP-862:
    ------------------------------------------

    Thanks for the review Tom.
    Add handling of s3 to CopyFile tool
    -----------------------------------

    Key: HADOOP-862
    URL: https://issues.apache.org/jira/browse/HADOOP-862
    Project: Hadoop
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.10.1
    Reporter: stack@archive.org
    Priority: Minor
    Fix For: 0.11.0

    Attachments: copyfiles-s3-2.diff, copyfiles-s3-3.diff, copyfiles-s3.diff


    CopyFile is a useful tool for doing bulk copies. It doesn't have handling for the recently added s3 filesystem.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Feb 2, 2007 at 7:28 am
    [ https://issues.apache.org/jira/browse/HADOOP-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469685 ]

    Hadoop QA commented on HADOOP-862:
    ----------------------------------

    -1, because 3 attempts failed to build and test the latest attachment (http://issues.apache.org/jira/secure/attachment/12350196/copyfiles-s3-3.diff) against trunk revision r502402. Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.
    Add handling of s3 to CopyFile tool
    -----------------------------------

    Key: HADOOP-862
    URL: https://issues.apache.org/jira/browse/HADOOP-862
    Project: Hadoop
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.10.1
    Reporter: stack@archive.org
    Priority: Minor
    Fix For: 0.11.0

    Attachments: copyfiles-s3-2.diff, copyfiles-s3-3.diff, copyfiles-s3.diff


    CopyFile is a useful tool for doing bulk copies. It doesn't have handling for the recently added s3 filesystem.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack@archive.org (JIRA) at Feb 2, 2007 at 6:10 pm
    [ https://issues.apache.org/jira/browse/HADOOP-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    stack@archive.org updated HADOOP-862:
    -------------------------------------

    Attachment: copyfiles-s3-4.diff

    New patch to fix broken unit test. Removes 'dfs' scheme. Only 'hdfs' allowed from here on out.
    Add handling of s3 to CopyFile tool
    -----------------------------------

    Key: HADOOP-862
    URL: https://issues.apache.org/jira/browse/HADOOP-862
    Project: Hadoop
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.10.1
    Reporter: stack@archive.org
    Priority: Minor
    Fix For: 0.11.0

    Attachments: copyfiles-s3-2.diff, copyfiles-s3-3.diff, copyfiles-s3-4.diff, copyfiles-s3.diff


    CopyFile is a useful tool for doing bulk copies. It doesn't have handling for the recently added s3 filesystem.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack@archive.org (JIRA) at Feb 2, 2007 at 7:06 pm
    [ https://issues.apache.org/jira/browse/HADOOP-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469836 ]

    stack@archive.org commented on HADOOP-862:
    ------------------------------------------

    Mr 'Hadoop QA', do I have to do anything special to re-trigger your auto-application and test of version 4 of the patch? Thanks.
    Add handling of s3 to CopyFile tool
    -----------------------------------

    Key: HADOOP-862
    URL: https://issues.apache.org/jira/browse/HADOOP-862
    Project: Hadoop
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.10.1
    Reporter: stack@archive.org
    Priority: Minor
    Fix For: 0.11.0

    Attachments: copyfiles-s3-2.diff, copyfiles-s3-3.diff, copyfiles-s3-4.diff, copyfiles-s3.diff


    CopyFile is a useful tool for doing bulk copies. It doesn't have handling for the recently added s3 filesystem.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doug Cutting (JIRA) at Feb 2, 2007 at 7:38 pm
    [ https://issues.apache.org/jira/browse/HADOOP-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469841 ]

    Doug Cutting commented on HADOOP-862:
    -------------------------------------
    Mr 'Hadoop QA' [ ... ]
    Please, call him "Nigel".
    Add handling of s3 to CopyFile tool
    -----------------------------------

    Key: HADOOP-862
    URL: https://issues.apache.org/jira/browse/HADOOP-862
    Project: Hadoop
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.10.1
    Reporter: stack@archive.org
    Priority: Minor
    Fix For: 0.11.0

    Attachments: copyfiles-s3-2.diff, copyfiles-s3-3.diff, copyfiles-s3-4.diff, copyfiles-s3.diff


    CopyFile is a useful tool for doing bulk copies. It doesn't have handling for the recently added s3 filesystem.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Nigel Daley at Feb 2, 2007 at 7:42 pm
    org.apache.hadoop.mapred.TestMiniMRLocalFS hung the process. I'm
    restarting now...

    On Feb 2, 2007, at 11:38 AM, Doug Cutting (JIRA) wrote:


    [ https://issues.apache.org/jira/browse/HADOOP-862?
    page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
    tabpanel#action_12469841 ]

    Doug Cutting commented on HADOOP-862:
    -------------------------------------
    Mr 'Hadoop QA' [ ... ]
    Please, call him "Nigel".
    Add handling of s3 to CopyFile tool
    -----------------------------------

    Key: HADOOP-862
    URL: https://issues.apache.org/jira/browse/HADOOP-862
    Project: Hadoop
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.10.1
    Reporter: stack@archive.org
    Priority: Minor
    Fix For: 0.11.0

    Attachments: copyfiles-s3-2.diff, copyfiles-s3-3.diff,
    copyfiles-s3-4.diff, copyfiles-s3.diff


    CopyFile is a useful tool for doing bulk copies. It doesn't have
    handling for the recently added s3 filesystem.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Feb 2, 2007 at 8:05 pm
    [ https://issues.apache.org/jira/browse/HADOOP-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469853 ]

    Hadoop QA commented on HADOOP-862:
    ----------------------------------

    +1, because http://issues.apache.org/jira/secure/attachment/12350237/copyfiles-s3-4.diff applied and successfully tested against trunk revision r502694.
    Add handling of s3 to CopyFile tool
    -----------------------------------

    Key: HADOOP-862
    URL: https://issues.apache.org/jira/browse/HADOOP-862
    Project: Hadoop
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.10.1
    Reporter: stack@archive.org
    Priority: Minor
    Fix For: 0.11.0

    Attachments: copyfiles-s3-2.diff, copyfiles-s3-3.diff, copyfiles-s3-4.diff, copyfiles-s3.diff


    CopyFile is a useful tool for doing bulk copies. It doesn't have handling for the recently added s3 filesystem.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doug Cutting (JIRA) at Feb 2, 2007 at 8:24 pm
    [ https://issues.apache.org/jira/browse/HADOOP-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Doug Cutting updated HADOOP-862:
    --------------------------------

    Resolution: Fixed
    Status: Resolved (was: Patch Available)

    I just committed this. Thanks, Michael!
    Add handling of s3 to CopyFile tool
    -----------------------------------

    Key: HADOOP-862
    URL: https://issues.apache.org/jira/browse/HADOOP-862
    Project: Hadoop
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.10.1
    Reporter: stack@archive.org
    Priority: Minor
    Fix For: 0.11.0

    Attachments: copyfiles-s3-2.diff, copyfiles-s3-3.diff, copyfiles-s3-4.diff, copyfiles-s3.diff


    CopyFile is a useful tool for doing bulk copies. It doesn't have handling for the recently added s3 filesystem.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedJan 6, '07 at 2:07a
activeFeb 2, '07 at 8:24p
posts16
users2
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase