FAQ
Hi, everyone.
In class org.apache.hadoop.mapred.JobInProgress, there is a public method:
scheduleReduces(), it will return true if "finishedMapTasks >=
completedMapsForReduceSlowstart"
and then the scheduler can schedule a new reduce task for a given
taskTracker.

but as I konw, reduce can not be started unitl map is 100% completed. Does
anyone can explain it? thanks a lot.

Search Discussions

  • Ahmad Humayun at Aug 2, 2009 at 12:18 pm
    Even though reduces are scheduled, they just transfer data from maps before
    all maps are completed.The reduce only begins its processing when all maps
    are completed.

    Please correct me if I'm wrong.


    regards,

    2009/8/2 我的Gmail邮箱 <zhangxiang390@gmail.com>
    Hi, everyone.
    In class org.apache.hadoop.mapred.JobInProgress, there is a public method:
    scheduleReduces(), it will return true if "finishedMapTasks >=
    completedMapsForReduceSlowstart"
    and then the scheduler can schedule a new reduce task for a given
    taskTracker.

    but as I konw, reduce can not be started unitl map is 100% completed. Does
    anyone can explain it? thanks a lot.


    --
    Ahmad Humayun
    Research Associate
    Computer Science Dpt., LUMS
    http://suraj.lums.edu.pk/~ahmadh
    +92 321 4457315
  • Arun C Murthy at Aug 3, 2009 at 3:57 am
    That check ensures sufficient #maps are completed before any of the
    reduces for the job are started.

    The reduces 'shuffle' map outputs from completed maps, but don't get
    into the 'reduce' phase until all map-outputs are copied over.

    Arun

    PS: Moving this to mapreduce-dev@
    On Aug 2, 2009, at 5:02 AM, 我的Gmail邮箱 wrote:

    Hi, everyone.
    In class org.apache.hadoop.mapred.JobInProgress, there is a public
    method:
    scheduleReduces(), it will return true if "finishedMapTasks >=
    completedMapsForReduceSlowstart"
    and then the scheduler can schedule a new reduce task for a given
    taskTracker.

    but as I konw, reduce can not be started unitl map is 100%
    completed. Does
    anyone can explain it? thanks a lot.
  • Amogh Vasekar at Aug 3, 2009 at 4:54 am
    And the combiner runs while fetching the outputs right?

    -----Original Message-----
    From: Arun C Murthy
    Sent: Monday, August 03, 2009 9:27 AM
    To: mapreduce-dev@hadoop.apache.org
    Cc: common-dev@hadoop.apache.org
    Subject: Re: why reduce task can be scheduled before map tasks are 100% completed?

    That check ensures sufficient #maps are completed before any of the
    reduces for the job are started.

    The reduces 'shuffle' map outputs from completed maps, but don't get
    into the 'reduce' phase until all map-outputs are copied over.

    Arun

    PS: Moving this to mapreduce-dev@
    On Aug 2, 2009, at 5:02 AM, 我的Gmail邮箱 wrote:

    Hi, everyone.
    In class org.apache.hadoop.mapred.JobInProgress, there is a public
    method:
    scheduleReduces(), it will return true if "finishedMapTasks >=
    completedMapsForReduceSlowstart"
    and then the scheduler can schedule a new reduce task for a given
    taskTracker.

    but as I konw, reduce can not be started unitl map is 100%
    completed. Does
    anyone can explain it? thanks a lot.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedAug 2, '09 at 12:02p
activeAug 3, '09 at 4:54a
posts4
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase