[
https://issues.apache.org/jira/browse/HADOOP-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463198 ]
stack@archive.org commented on HADOOP-862:
------------------------------------------
Updated patch.
+ Renamed DFSCopyFilesMapper as FSCopyFilesMapper
+ If no scheme, use 'default' (the value of 'fs.default.name' in hadoop-site.xml).
I ran more extensive tests going from hdfs to s3 and back again and copying from http into s3 and hdfs (distcp is a nice tool). For example, here is output from a copy of a small nutch segment from hdfs to s3 (in the below hdfs was set as the fs.default.name filesystem):
stack@debord:~/checkouts/hadoop$ ./bin/hadoop fs -lsr outputs/segments
/user/stack/outputs/segments/20070108213341-test <dir>
/user/stack/outputs/segments/20070108213341-test/crawl_fetch <dir>
/user/stack/outputs/segments/20070108213341-test/crawl_fetch/part-00000 <dir>
/user/stack/outputs/segments/20070108213341-test/crawl_fetch/part-00000/data <r 1> 1187
/user/stack/outputs/segments/20070108213341-test/crawl_fetch/part-00000/index <r 1> 234
/user/stack/outputs/segments/20070108213341-test/crawl_parse <dir>
/user/stack/outputs/segments/20070108213341-test/crawl_parse/part-00000 <r 1> 9010
/user/stack/outputs/segments/20070108213341-test/parse_data <dir>
/user/stack/outputs/segments/20070108213341-test/parse_data/part-00000 <dir>
/user/stack/outputs/segments/20070108213341-test/parse_data/part-00000/data <r 1> 4630
/user/stack/outputs/segments/20070108213341-test/parse_data/part-00000/index <r 1> 234
/user/stack/outputs/segments/20070108213341-test/parse_text <dir>
/user/stack/outputs/segments/20070108213341-test/parse_text/part-00000 <dir>
/user/stack/outputs/segments/20070108213341-test/parse_text/part-00000/data <r 1> 6180
/user/stack/outputs/segments/20070108213341-test/parse_text/part-00000/index <r 1> 234
Here's copy to an s3 directory named segments-bkup:
% ./bin/hadoop distcp /user/stack/outputs/segments s3://KEY:SECRET@BUCKET/segments-bkup
Here's listing of s3 content:
stack@debord:~/checkouts/hadoop$ ./bin/hadoop fs -fs s3://KEY:SECRET@BUCKET/segments-bkup -lsr /segments-bkup/
/segments-bkup/20070108213341-test <dir>
/segments-bkup/20070108213341-test/crawl_fetch <dir>
/segments-bkup/20070108213341-test/crawl_fetch/part-00000 <dir>
/segments-bkup/20070108213341-test/crawl_fetch/part-00000/data <r 1> 1187
/segments-bkup/20070108213341-test/crawl_fetch/part-00000/index <r 1> 234
/segments-bkup/20070108213341-test/crawl_parse <dir>
/segments-bkup/20070108213341-test/crawl_parse/part-00000 <r 1> 9010
/segments-bkup/20070108213341-test/parse_data <dir>
/segments-bkup/20070108213341-test/parse_data/part-00000 <dir>
/segments-bkup/20070108213341-test/parse_data/part-00000/data <r 1> 4630
/segments-bkup/20070108213341-test/parse_data/part-00000/index <r 1> 234
/segments-bkup/20070108213341-test/parse_text <dir>
/segments-bkup/20070108213341-test/parse_text/part-00000 <dir>
/segments-bkup/20070108213341-test/parse_text/part-00000/data <r 1> 6180
/segments-bkup/20070108213341-test/parse_text/part-00000/index <r 1> 234