FAQ
Hi all,

I have recently started using Hadoop streaming. From the documentation, I
understand that by default, each line output from a mapper up to the first
tab becomes the key and rest of the line is the value. I wanted to know that
between the mapper and reducer, is there a shuffling(sorting) phase? More
specifically, Would it be correct to assume that output from all mappers
with the same key will go to the same reducer?

Thanks,
Nipun

Search Discussions

  • Aaron Kimball at Aug 25, 2009 at 12:44 am
    Yes. It works just like Java-based MapReduce in that regard.
    - Aaron
    On Sun, Aug 23, 2009 at 5:09 AM, Nipun Saggar wrote:

    Hi all,

    I have recently started using Hadoop streaming. From the documentation, I
    understand that by default, each line output from a mapper up to the first
    tab becomes the key and rest of the line is the value. I wanted to know
    that
    between the mapper and reducer, is there a shuffling(sorting) phase? More
    specifically, Would it be correct to assume that output from all mappers
    with the same key will go to the same reducer?

    Thanks,
    Nipun
  • Nipun Saggar at Aug 25, 2009 at 5:11 am
    Does that mean that, if the same key is emitted more than once from a
    mapper, it is not necessary that the key value pairs (for that same key)
    will go to the same reducer?

    -Nipun
    On Tue, Aug 25, 2009 at 6:13 AM, Aaron Kimball wrote:

    Yes. It works just like Java-based MapReduce in that regard.
    - Aaron

    On Sun, Aug 23, 2009 at 5:09 AM, Nipun Saggar <nipun.saggar@gmail.com
    wrote:
    Hi all,

    I have recently started using Hadoop streaming. From the documentation, I
    understand that by default, each line output from a mapper up to the first
    tab becomes the key and rest of the line is the value. I wanted to know
    that
    between the mapper and reducer, is there a shuffling(sorting) phase? More
    specifically, Would it be correct to assume that output from all mappers
    with the same key will go to the same reducer?

    Thanks,
    Nipun
  • Amogh Vasekar at Aug 25, 2009 at 5:21 am
    Hadoop will make sure that every <k,v> pair with same key will land up in same reducer and consumed in a single reduce instance.

    -----Original Message-----
    From: Nipun Saggar
    Sent: Tuesday, August 25, 2009 10:41 AM
    To: common-user@hadoop.apache.org
    Subject: Re: Hadoop streaming: How is data distributed from mappers to reducers?

    Does that mean that, if the same key is emitted more than once from a
    mapper, it is not necessary that the key value pairs (for that same key)
    will go to the same reducer?

    -Nipun
    On Tue, Aug 25, 2009 at 6:13 AM, Aaron Kimball wrote:

    Yes. It works just like Java-based MapReduce in that regard.
    - Aaron

    On Sun, Aug 23, 2009 at 5:09 AM, Nipun Saggar <nipun.saggar@gmail.com
    wrote:
    Hi all,

    I have recently started using Hadoop streaming. From the documentation, I
    understand that by default, each line output from a mapper up to the first
    tab becomes the key and rest of the line is the value. I wanted to know
    that
    between the mapper and reducer, is there a shuffling(sorting) phase? More
    specifically, Would it be correct to assume that output from all mappers
    with the same key will go to the same reducer?

    Thanks,
    Nipun

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedAug 23, '09 at 12:10p
activeAug 25, '09 at 5:21a
posts4
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase