FAQ
Hi all,

I'd like to know does the map task push map output to reduce task or reduce
task pull it from map task ? Which way is real in hadoop ?

Thank you very much.


Jeff zhang

Search Discussions

  • Prabhu Hari Dhanapal at Oct 27, 2009 at 1:47 am
    Well ,I m not sure But I think it might be the pull.. because physically
    the mappers and the reducers are the same nodes ,So if the Mappers had to
    push , it might be the case that all nodes are mapping and there are no
    reducers to accept it. May be for this reason ,unless all of the Mapper
    tasks are finished, the reducers might not want to start reducing anything
    @all..

    There is also this sort shuffle layer between maping and reducing , it
    clearly demarcates the phases.. whihc seem to suggest that its the pull
    rather than the push ..

    You might think of this as a performance bottle neck, but in reality it
    seems it isnt .

    btw, Wait for some expert to answer, I m a beginner too !
    On Mon, Oct 26, 2009 at 9:05 PM, Jeff Zhang wrote:

    Hi all,

    I'd like to know does the map task push map output to reduce task or reduce
    task pull it from map task ? Which way is real in hadoop ?

    Thank you very much.


    Jeff zhang


    --
    Hari
  • Dave bayer at Oct 27, 2009 at 3:08 am

    On Oct 26, 2009, at 6:05 PM, Jeff Zhang wrote:

    I'd like to know does the map task push map output to reduce task or
    reduce
    task pull it from map task ? Which way is real in hadoop ?
    In 0.19, it appears to be a pull. Look at the run() method in mapred/
    org/apache/hadoop/mapred/ReduceTask.java. Don't
    know what the equivalent would be in the mapreduce package
    in 0.20.x.

    dave bayer
  • Jothi Padmanabhan at Oct 27, 2009 at 5:17 am

    Don't
    know what the equivalent would be in the mapreduce package
    in 0.20.x.

    dave bayer
    The framework code to do with fetching of map outputs is the same for
    both the mapred and mapreduce based reducers.
  • Amogh Vasekar at Oct 27, 2009 at 5:02 am
    Hi,
    Reduce task looks at map tasks for the partition it requires, and pulls it ( the number of parallel copies is controlled by reduce.parallel.copies ). As partitions are taken in by reduce task, it performs a merge sort, this forms your S&S phase. Typically your mappers / reducers are O(n) , S&S is O(nlogn), so if the amount of intermediate data is huge you will see a relative drop in performance.

    Amogh


    On 10/27/09 6:35 AM, "Jeff Zhang" wrote:

    Hi all,

    I'd like to know does the map task push map output to reduce task or reduce
    task pull it from map task ? Which way is real in hadoop ?

    Thank you very much.


    Jeff zhang

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedOct 27, '09 at 1:05a
activeOct 27, '09 at 5:17a
posts5
users5
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase