FAQ
hi,

Can I sort the output of reducer based on the value instead of key.
Also can I specify that the output should be sorted in decreasing order ?

Mapper output -
<aWord, 1>

Reducer gets-
<aWord, (1,1,...)>

and outputs -
<aWord, count>

e.g abc 10
xyz 100

I want the output to be sorted based on the value and that too in
decreasing order -
xyz 100
abc 10

Any suggestions ?

thanks,
Taran

Search Discussions

  • Tarandeep Singh at Feb 22, 2008 at 1:12 am

    On Thu, Feb 21, 2008 at 3:46 PM, Tarandeep Singh wrote:
    hi,

    Can I sort the output of reducer based on the value instead of key.
    Also can I specify that the output should be sorted in decreasing order ?

    Mapper output -
    <aWord, 1>

    Reducer gets-
    <aWord, (1,1,...)>

    and outputs -
    <aWord, count>

    e.g abc 10
    xyz 100

    I want the output to be sorted based on the value and that too in
    decreasing order -
    xyz 100
    abc 10

    Any suggestions ?
    I set the output format to Text and then converted the count into text
    and wrote this as key and the aWord as value. I was expecting an
    output sorted on the count now but it didn't work that way ? Could
    anyone explain why so ?

    reducer output -
    <000001, abc>
    <000005, xyz>
    <000002, pqr>

    thanks,
    Taran

    thanks,
    Taran
  • Ted Dunning at Feb 22, 2008 at 1:35 am
    Use another job step to get the sort done.
    On 2/21/08 5:11 PM, "Tarandeep Singh" wrote:
    On Thu, Feb 21, 2008 at 3:46 PM, Tarandeep Singh wrote:
    hi,

    Can I sort the output of reducer based on the value instead of key.
    Also can I specify that the output should be sorted in decreasing order ?

    Mapper output -
    <aWord, 1>

    Reducer gets-
    <aWord, (1,1,...)>

    and outputs -
    <aWord, count>

    e.g abc 10
    xyz 100

    I want the output to be sorted based on the value and that too in
    decreasing order -
    xyz 100
    abc 10

    Any suggestions ?
    I set the output format to Text and then converted the count into text
    and wrote this as key and the aWord as value. I was expecting an
    output sorted on the count now but it didn't work that way ? Could
    anyone explain why so ?

    reducer output -
    <000001, abc>
    <000005, xyz>
    <000002, pqr>

    thanks,
    Taran

    thanks,
    Taran
  • Tarandeep Singh at Feb 22, 2008 at 1:38 am

    On Thu, Feb 21, 2008 at 5:34 PM, Ted Dunning wrote:
    Use another job step to get the sort done.
    but isn't the output of reduce step sorted ?
    Also can I specify that sort be done in reverse order ?
    On 2/21/08 5:11 PM, "Tarandeep Singh" wrote:
    On Thu, Feb 21, 2008 at 3:46 PM, Tarandeep Singh wrote:
    hi,

    Can I sort the output of reducer based on the value instead of key.
    Also can I specify that the output should be sorted in decreasing order ?

    Mapper output -
    <aWord, 1>

    Reducer gets-
    <aWord, (1,1,...)>

    and outputs -
    <aWord, count>

    e.g abc 10
    xyz 100

    I want the output to be sorted based on the value and that too in
    decreasing order -
    xyz 100
    abc 10

    Any suggestions ?
    I set the output format to Text and then converted the count into text
    and wrote this as key and the aWord as value. I was expecting an
    output sorted on the count now but it didn't work that way ? Could
    anyone explain why so ?

    reducer output -
    <000001, abc>
    <000005, xyz>
    <000002, pqr>

    thanks,
    Taran

    thanks,
    Taran
  • Ted Dunning at Feb 22, 2008 at 1:47 am
    It may be sorted within the output for a single reducer and, indeed, you can
    even guarantee that it is sorted but *only* by the reduce key. The order
    that values appear will not be deterministic.

    To sort by value, you need to run another MR job with the count from the
    first step as the key and the old reducers output key as the value. You
    will only need an identity mapper. If you use both the count and the key as
    the new key and have an empty value, then you can do a two level sort in one
    step.

    Hadoop isn't magic. If you want something sorted according to a new
    ordering *something* will have to do the work.

    On 2/21/08 5:38 PM, "Tarandeep Singh" wrote:
    On Thu, Feb 21, 2008 at 5:34 PM, Ted Dunning wrote:

    Use another job step to get the sort done.
    but isn't the output of reduce step sorted ?
    Also can I specify that sort be done in reverse order ?
    On 2/21/08 5:11 PM, "Tarandeep Singh" wrote:

    On Thu, Feb 21, 2008 at 3:46 PM, Tarandeep Singh <tarandeep@gmail.com>
    wrote:
    hi,

    Can I sort the output of reducer based on the value instead of key.
    Also can I specify that the output should be sorted in decreasing order ?

    Mapper output -
    <aWord, 1>

    Reducer gets-
    <aWord, (1,1,...)>

    and outputs -
    <aWord, count>

    e.g abc 10
    xyz 100

    I want the output to be sorted based on the value and that too in
    decreasing order -
    xyz 100
    abc 10

    Any suggestions ?
    I set the output format to Text and then converted the count into text
    and wrote this as key and the aWord as value. I was expecting an
    output sorted on the count now but it didn't work that way ? Could
    anyone explain why so ?

    reducer output -
    <000001, abc>
    <000005, xyz>
    <000002, pqr>

    thanks,
    Taran

    thanks,
    Taran
  • Owen O'Malley at Feb 22, 2008 at 4:42 am

    On Feb 21, 2008, at 5:47 PM, Ted Dunning wrote:

    It may be sorted within the output for a single reducer and,
    indeed, you can
    even guarantee that it is sorted but *only* by the reduce key. The
    order
    that values appear will not be deterministic.
    Actually, there is a better answer for this. If you put both the
    primary and secondary key into the key, you can use
    JobConf.setOutputValueGroupingComparator to set a comparator that
    only compares the primary key. Reduce will be called once per a
    primary key, but all of the values will be sorted by the secondary key.

    See http://tinyurl.com/32gld4

    -- Owen
  • Ted Dunning at Feb 22, 2008 at 7:01 am
    But this only guarantees that the results will be sorted within each
    reducers input. Thus, this won't result in getting the results sorted by
    the reducers output value.

    On 2/21/08 8:40 PM, "Owen O'Malley" wrote:

    On Feb 21, 2008, at 5:47 PM, Ted Dunning wrote:

    It may be sorted within the output for a single reducer and,
    indeed, you can
    even guarantee that it is sorted but *only* by the reduce key. The
    order
    that values appear will not be deterministic.
    Actually, there is a better answer for this. If you put both the
    primary and secondary key into the key, you can use
    JobConf.setOutputValueGroupingComparator to set a comparator that
    only compares the primary key. Reduce will be called once per a
    primary key, but all of the values will be sorted by the secondary key.

    See http://tinyurl.com/32gld4

    -- Owen
  • Owen O'Malley at Feb 22, 2008 at 1:48 pm

    On Feb 21, 2008, at 11:01 PM, Ted Dunning wrote:
    But this only guarantees that the results will be sorted within each
    reducers input. Thus, this won't result in getting the results
    sorted by
    the reducers output value.
    I thought the question was how to get the values sorted within a call
    to reduce. Of course if you are trying to sort the reduce output on a
    key other than the key that was used coming out of the map, you do
    need another job.

    -- Owen
  • Tarandeep Singh at Feb 22, 2008 at 4:35 pm

    On Fri, Feb 22, 2008 at 5:46 AM, Owen O'Malley wrote:
    On Feb 21, 2008, at 11:01 PM, Ted Dunning wrote:


    But this only guarantees that the results will be sorted within each
    reducers input. Thus, this won't result in getting the results
    sorted by
    the reducers output value.
    I thought the question was how to get the values sorted within a call
    to reduce. Of course if you are trying to sort the reduce output on a
    key other than the key that was used coming out of the map, you do
    need another job.
    Yes, I need to sort the output coming output of reduce... so the
    solution is to run another MR job.

    thanks guys for your replies... they were very useful.

    -Taran
    -- Owen
  • Doug Cutting at Feb 22, 2008 at 6:48 pm

    Tarandeep Singh wrote:
    but isn't the output of reduce step sorted ?
    No, the input of reduce is sorted by key. The output of reduce is
    generally produced as the input arrives, so is generally also sorted by
    key, but reducers can output whatever they like.

    Doug

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedFeb 21, '08 at 11:47p
activeFeb 22, '08 at 6:48p
posts10
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase