FAQ
Hi,

Can anyone guide me to merge my output files from reducer to single file in
HDFS.

I read about <getmerge> but in that destination file should be local
file.But I need to copy it back again to HDFS which costly.
Other way is to use Cat but again in taht final result will be obtained on
local disk which i dont want.

So does anyone have idea on that?

Pankil

Search Discussions

  • Owen O'Malley at Jul 8, 2009 at 10:42 pm

    On Jul 8, 2009, at 3:13 PM, Pankil Doshi wrote:

    Can anyone guide me to merge my output files from reducer to single
    file in
    HDFS.
    The usual approach is to leave them as separate files. Often the need
    to merge them into a single file is removed by using a total sort
    order. Basically, that ensures that all of the keys in reduce-0 are
    less than the keys in reduce-1, etc. There is a library that helps
    doing that named org.apache.hadoop.mapred.lib.TotalOrderPartitioner.

    -- Owen
  • Ted Dunning at Jul 8, 2009 at 11:55 pm

    On Wed, Jul 8, 2009 at 3:38 PM, Owen O'Malley wrote:
    On Jul 8, 2009, at 3:13 PM, Pankil Doshi wrote:

    Can anyone guide me to merge my output files from reducer to single file
    in
    HDFS.
    The usual approach is to leave them as separate files.

    Also, the need to merge often arises from a need to import the data into an
    external database. That doesn't sound like your need because you already
    know and have rejected dfs -cat.

    It may help to think of the containing directory as the actual file and the
    files inside that directory as no more interesting than the inodes and
    blocks that make up a normal unix file.
  • Jason hadoop at Jul 9, 2009 at 5:54 am
    In the example code from Pro Hadoop, is a sample map reduce job that uses
    mapside join to merge the files into a single output.
    It is part of the chapter 9 examples.
    On Wed, Jul 8, 2009 at 4:55 PM, Ted Dunning wrote:
    On Wed, Jul 8, 2009 at 3:38 PM, Owen O'Malley wrote:


    On Jul 8, 2009, at 3:13 PM, Pankil Doshi wrote:

    Can anyone guide me to merge my output files from reducer to single file
    in
    HDFS.
    The usual approach is to leave them as separate files.

    Also, the need to merge often arises from a need to import the data into an
    external database. That doesn't sound like your need because you already
    know and have rejected dfs -cat.

    It may help to think of the containing directory as the actual file and the
    files inside that directory as no more interesting than the inodes and
    blocks that make up a normal unix file.


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals
  • Pankil Doshi at Jul 9, 2009 at 3:34 pm
    Thanks a lot Jason.My copy of that book is on the way..So soon I will be
    able to use that.

    Pankil
    On Thu, Jul 9, 2009 at 1:54 AM, jason hadoop wrote:

    In the example code from Pro Hadoop, is a sample map reduce job that uses
    mapside join to merge the files into a single output.
    It is part of the chapter 9 examples.
    On Wed, Jul 8, 2009 at 4:55 PM, Ted Dunning wrote:
    On Wed, Jul 8, 2009 at 3:38 PM, Owen O'Malley wrote:


    On Jul 8, 2009, at 3:13 PM, Pankil Doshi wrote:

    Can anyone guide me to merge my output files from reducer to single
    file
    in
    HDFS.
    The usual approach is to leave them as separate files.

    Also, the need to merge often arises from a need to import the data into an
    external database. That doesn't sound like your need because you already
    know and have rejected dfs -cat.

    It may help to think of the containing directory as the actual file and the
    files inside that directory as no more interesting than the inodes and
    blocks that make up a normal unix file.


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJul 8, '09 at 10:13p
activeJul 9, '09 at 3:34p
posts5
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2023 Grokbase