FAQ
Hi,
I want to sort my records ( consisting of string, int, float) using Hadoop.

One way I have found is to set number of reducers = 1, but this would mean
all the records go to 1 reducer and it won't be optimized. Can anyone point
me to some better way to do sorting using Hadoop ?

Thanks,
Tenaali

Search Discussions

  • Edward J. Yoon at Sep 20, 2008 at 5:54 pm
    I would recommend that run map/reduce twice.

    /Edward
    On Sat, Sep 13, 2008 at 5:58 AM, Tenaali Ram wrote:
    Hi,
    I want to sort my records ( consisting of string, int, float) using Hadoop.

    One way I have found is to set number of reducers = 1, but this would mean
    all the records go to 1 reducer and it won't be optimized. Can anyone point
    me to some better way to do sorting using Hadoop ?

    Thanks,
    Tenaali


    --
    Best regards, Edward J. Yoon
    edwardyoon@apache.org
    http://blog.udanax.org
  • Lohit at Sep 20, 2008 at 6:13 pm
    Since this is sorting, does it help if you run map/reduce twice? Number of output bytes should be same as input bytes.
    To do total order sorting, you have to make your partition function split the keyspace equally in order among the number of reducers.
    For example look at the TeraSort as to how this is done. http://svn.apache.org/repos/asf/hadoop/core/trunk/src/examples/org/apache/hadoop/examples/terasort/TeraSort.java

    Thanks,
    Lohit



    ----- Original Message ----
    From: Edward J. Yoon <edwardyoon@apache.org>
    To: core-user@hadoop.apache.org
    Sent: Saturday, September 20, 2008 10:53:40 AM
    Subject: Re: Tips on sorting using Hadoop

    I would recommend that run map/reduce twice.

    /Edward
    On Sat, Sep 13, 2008 at 5:58 AM, Tenaali Ram wrote:
    Hi,
    I want to sort my records ( consisting of string, int, float) using Hadoop.

    One way I have found is to set number of reducers = 1, but this would mean
    all the records go to 1 reducer and it won't be optimized. Can anyone point
    me to some better way to do sorting using Hadoop ?

    Thanks,
    Tenaali


    --
    Best regards, Edward J. Yoon
    edwardyoon@apache.org
    http://blog.udanax.org
  • Owen O'Malley at Sep 20, 2008 at 9:22 pm

    On Sat, Sep 20, 2008 at 11:12 AM, lohit wrote:

    To do total order sorting, you have to make your partition function split
    the keyspace equally in order among the number of reducers.

    A library to do this was checked in yesterday. See HADOOP-3019.

    -- Owen
  • Bz at Sep 25, 2008 at 3:01 am
    Hi,

    Is there a way to do this with streaming?

    I've noticed there is a "-partitioner" option for streaming, does that mean
    I have to write a java partitioner class to perform total order sorting?

    Thanks,
    Joseph


    On Sun, Sep 21, 2008 at 2:12 AM, lohit wrote:

    Since this is sorting, does it help if you run map/reduce twice? Number of
    output bytes should be same as input bytes.
    To do total order sorting, you have to make your partition function split
    the keyspace equally in order among the number of reducers.
    For example look at the TeraSort as to how this is done.
    http://svn.apache.org/repos/asf/hadoop/core/trunk/src/examples/org/apache/hadoop/examples/terasort/TeraSort.java

    Thanks,
    Lohit



    ----- Original Message ----
    From: Edward J. Yoon <edwardyoon@apache.org>
    To: core-user@hadoop.apache.org
    Sent: Saturday, September 20, 2008 10:53:40 AM
    Subject: Re: Tips on sorting using Hadoop

    I would recommend that run map/reduce twice.

    /Edward
    On Sat, Sep 13, 2008 at 5:58 AM, Tenaali Ram wrote:
    Hi,
    I want to sort my records ( consisting of string, int, float) using Hadoop.
    One way I have found is to set number of reducers = 1, but this would mean
    all the records go to 1 reducer and it won't be optimized. Can anyone point
    me to some better way to do sorting using Hadoop ?

    Thanks,
    Tenaali


    --
    Best regards, Edward J. Yoon
    edwardyoon@apache.org
    http://blog.udanax.org

    --
    Screenshots, http://flickr.com/photos/bizkit
    Blog, http://bz.d22.cc
    張至(bizkit)

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedSep 12, '08 at 8:59p
activeSep 25, '08 at 3:01a
posts5
users5
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase