FAQ
Hello, List!

I have several files in HDFS in a single directory that I create throughout
the day. At the end of the day, I want to merge them together into one file.
How do you guys do this?

It seems this would do it:
hadoop fs -getmerge /hdfs/directory/allsource* > mergefile ; cat mergefile |
hadoop fs -put - ; rm mergefile ; hadoop fs -rm /hdfs/directory/allsource*

But I wonder if there's a command that can avoid writing to the local
filesystem then re-writing back into HDFS. I'm looking for an HDFS
equivalent to this Unix script:
cat /some/dir/allsource* > /some/dir/merged ; rm /some/dir/allsource*

--
Tim Ellis
Data Architect, Riot Games

Search Discussions

  • Joey Echeverria at Jul 22, 2011 at 5:57 pm
    You could do it with streaming and a single reducer:

    bin/hadoop jar $HADOOP_HOME/hadoop-0.20.2-streaming.jar
    -Dmapred.num.reduce.tasks=1 -reducer cat -input
    /hdfs/directory/allsource* -output
    mergefile -verbose

    -Joey
    On Fri, Jul 22, 2011 at 1:26 PM, Time Less wrote:

    Hello, List!

    I have several files in HDFS in a single directory that I create throughout
    the day. At the end of the day, I want to merge them together into one file.
    How do you guys do this?

    It seems this would do it:
    hadoop fs -getmerge /hdfs/directory/allsource* > mergefile ; cat mergefile
    hadoop fs -put - ; rm mergefile ; hadoop fs -rm /hdfs/directory/allsource*
    But I wonder if there's a command that can avoid writing to the local
    filesystem then re-writing back into HDFS. I'm looking for an HDFS
    equivalent to this Unix script:
    cat /some/dir/allsource* > /some/dir/merged ; rm /some/dir/allsource*

    --
    Tim Ellis
    Data Architect, Riot Games

    --
    Joseph Echeverria
    Cloudera, Inc.
    443.305.9434

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouphdfs-user @
categorieshadoop
postedJul 22, '11 at 5:26p
activeJul 22, '11 at 5:57p
posts2
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Joey Echeverria: 1 post Time Less: 1 post

People

Translate

site design / logo © 2022 Grokbase