FAQ
Hi, all

recently, I was hit by a question, "how is a hadoop job divided into 2
phases?",

In textbooks, we are told that the mapreduce jobs are divided into 2 phases,
map and reduce, and for reduce, we further divided it into 3 stages,
shuffle, sort, and reduce, but in hadoop codes, I never think about
this question, I didn't see any variable members in JobInProgress class
to indicate this information,

and according to my understanding on the source code of hadoop, the reduce
tasks are unnecessarily started until all mappers are finished, in
constract, we can see the reduce tasks are in shuffle stage while there are
mappers which are still in running,
So how can I indicate the phase which the job is belonging to?

Thanks
--
Nan Zhu
School of Electronic, Information and Electrical Engineering,229
Shanghai Jiao Tong University
800,Dongchuan Road,Shanghai,China
E-Mail: zhunansjtu@gmail.com

Search Discussions

  • He Chen at Sep 19, 2011 at 3:43 am
    Hi Nan

    I have the same question for a while. In some research papers, people like
    to make the reduce stage to be slow start. In this way, the map stage and
    reduce stage are easy to differentiate. You can use the number of remaining
    unallocated map tasks to detect in which stage your job is.

    To let the reduce stage overlap with the map stage, it blurs the boundary
    between two stages. I think it may decreases the execution time of the whole
    job (I am not sure whether this is the main reason that people allow "fast
    start" happen or not).

    However, "fast start" has its side-effect. It is hard to get a global view
    of the map stage's output, and then the reduce stage's balance and data
    locality are not easy to be solved.

    Chen
    Research Assistant of Holland Computing Center
    PhD student of CSE Department
    University of Nebraska-Lincoln

    On Sun, Sep 18, 2011 at 9:24 PM, Nan Zhu wrote:

    Hi, all

    recently, I was hit by a question, "how is a hadoop job divided into 2
    phases?",

    In textbooks, we are told that the mapreduce jobs are divided into 2
    phases,
    map and reduce, and for reduce, we further divided it into 3 stages,
    shuffle, sort, and reduce, but in hadoop codes, I never think about
    this question, I didn't see any variable members in JobInProgress class
    to indicate this information,

    and according to my understanding on the source code of hadoop, the reduce
    tasks are unnecessarily started until all mappers are finished, in
    constract, we can see the reduce tasks are in shuffle stage while there are
    mappers which are still in running,
    So how can I indicate the phase which the job is belonging to?

    Thanks
    --
    Nan Zhu
    School of Electronic, Information and Electrical Engineering,229
    Shanghai Jiao Tong University
    800,Dongchuan Road,Shanghai,China
    E-Mail: zhunansjtu@gmail.com
  • Arun C Murthy at Sep 19, 2011 at 4:18 am
    Nan,

    The 'phase' is implicitly understood by the 'progress' (value) made by the map/reduce tasks (see o.a.h.mapred.TaskStatus.Phase).

    For e.g.
    Reduce:
    0-33% -> Shuffle
    34-66% -> Sort (actually, just 'merge', there is no sort in the reduce since all map-outputs are sorted)
    67-100% -> Reduce

    With 0.23 onwards the Map has phases too:
    0-90% -> Map
    91-100% -> Final Sort/merge

    Now,about starting reduces early - this is done to ensure shuffle can proceed for completed maps while rest of the maps run, there-by pipelining shuffle and map completion. There is a 'reduce slowstart' feature to control this - by default, reduces aren't started until 5% of maps are complete. Users can set this higher.

    Arun
    On Sep 18, 2011, at 7:24 PM, Nan Zhu wrote:

    Hi, all

    recently, I was hit by a question, "how is a hadoop job divided into 2
    phases?",

    In textbooks, we are told that the mapreduce jobs are divided into 2 phases,
    map and reduce, and for reduce, we further divided it into 3 stages,
    shuffle, sort, and reduce, but in hadoop codes, I never think about
    this question, I didn't see any variable members in JobInProgress class
    to indicate this information,

    and according to my understanding on the source code of hadoop, the reduce
    tasks are unnecessarily started until all mappers are finished, in
    constract, we can see the reduce tasks are in shuffle stage while there are
    mappers which are still in running,
    So how can I indicate the phase which the job is belonging to?

    Thanks
    --
    Nan Zhu
    School of Electronic, Information and Electrical Engineering,229
    Shanghai Jiao Tong University
    800,Dongchuan Road,Shanghai,China
    E-Mail: zhunansjtu@gmail.com
  • Kai Voigt at Sep 19, 2011 at 4:23 am
    Hi,

    this 0-33-66-100% phases are really confusing to beginners. We see that in our training classes. The output should be more verbose, such as breaking down the phases into seperate progress numbers.

    Does that make sense?

    Am 19.09.2011 um 06:17 schrieb Arun C Murthy:
    Nan,

    The 'phase' is implicitly understood by the 'progress' (value) made by the map/reduce tasks (see o.a.h.mapred.TaskStatus.Phase).

    For e.g.
    Reduce:
    0-33% -> Shuffle
    34-66% -> Sort (actually, just 'merge', there is no sort in the reduce since all map-outputs are sorted)
    67-100% -> Reduce

    With 0.23 onwards the Map has phases too:
    0-90% -> Map
    91-100% -> Final Sort/merge

    Now,about starting reduces early - this is done to ensure shuffle can proceed for completed maps while rest of the maps run, there-by pipelining shuffle and map completion. There is a 'reduce slowstart' feature to control this - by default, reduces aren't started until 5% of maps are complete. Users can set this higher.

    Arun
    On Sep 18, 2011, at 7:24 PM, Nan Zhu wrote:

    Hi, all

    recently, I was hit by a question, "how is a hadoop job divided into 2
    phases?",

    In textbooks, we are told that the mapreduce jobs are divided into 2 phases,
    map and reduce, and for reduce, we further divided it into 3 stages,
    shuffle, sort, and reduce, but in hadoop codes, I never think about
    this question, I didn't see any variable members in JobInProgress class
    to indicate this information,

    and according to my understanding on the source code of hadoop, the reduce
    tasks are unnecessarily started until all mappers are finished, in
    constract, we can see the reduce tasks are in shuffle stage while there are
    mappers which are still in running,
    So how can I indicate the phase which the job is belonging to?

    Thanks
    --
    Nan Zhu
    School of Electronic, Information and Electrical Engineering,229
    Shanghai Jiao Tong University
    800,Dongchuan Road,Shanghai,China
    E-Mail: zhunansjtu@gmail.com
    --
    Kai Voigt
    k@123.org
  • Arun C Murthy at Sep 19, 2011 at 4:27 am
    Agreed.

    At least, I believe the new web-ui for MRv2 is (or will be soon) more verbose about this.
    On Sep 18, 2011, at 9:23 PM, Kai Voigt wrote:

    Hi,

    this 0-33-66-100% phases are really confusing to beginners. We see that in our training classes. The output should be more verbose, such as breaking down the phases into seperate progress numbers.

    Does that make sense?

    Am 19.09.2011 um 06:17 schrieb Arun C Murthy:
    Nan,

    The 'phase' is implicitly understood by the 'progress' (value) made by the map/reduce tasks (see o.a.h.mapred.TaskStatus.Phase).

    For e.g.
    Reduce:
    0-33% -> Shuffle
    34-66% -> Sort (actually, just 'merge', there is no sort in the reduce since all map-outputs are sorted)
    67-100% -> Reduce

    With 0.23 onwards the Map has phases too:
    0-90% -> Map
    91-100% -> Final Sort/merge

    Now,about starting reduces early - this is done to ensure shuffle can proceed for completed maps while rest of the maps run, there-by pipelining shuffle and map completion. There is a 'reduce slowstart' feature to control this - by default, reduces aren't started until 5% of maps are complete. Users can set this higher.

    Arun
    On Sep 18, 2011, at 7:24 PM, Nan Zhu wrote:

    Hi, all

    recently, I was hit by a question, "how is a hadoop job divided into 2
    phases?",

    In textbooks, we are told that the mapreduce jobs are divided into 2 phases,
    map and reduce, and for reduce, we further divided it into 3 stages,
    shuffle, sort, and reduce, but in hadoop codes, I never think about
    this question, I didn't see any variable members in JobInProgress class
    to indicate this information,

    and according to my understanding on the source code of hadoop, the reduce
    tasks are unnecessarily started until all mappers are finished, in
    constract, we can see the reduce tasks are in shuffle stage while there are
    mappers which are still in running,
    So how can I indicate the phase which the job is belonging to?

    Thanks
    --
    Nan Zhu
    School of Electronic, Information and Electrical Engineering,229
    Shanghai Jiao Tong University
    800,Dongchuan Road,Shanghai,China
    E-Mail: zhunansjtu@gmail.com
    --
    Kai Voigt
    k@123.org


  • GOEKE, MATTHEW (AG/1000) at Sep 19, 2011 at 7:20 pm
    Was the command line output really ever intended to be *that* verbose? I can see how it would be useful but considering how easy it is to interact with the web-ui I can't see why much effort should be put into enhancing it. Even if you didn't want to see all of the details the web-ui has to offer it doesn't take long to learn how to skim it and get 10x more accurate reading on your job progress.

    Matt

    -----Original Message-----
    From: Arun C Murthy
    Sent: Sunday, September 18, 2011 11:27 PM
    To: common-user@hadoop.apache.org
    Subject: Re: phases of Hadoop Jobs

    Agreed.

    At least, I believe the new web-ui for MRv2 is (or will be soon) more verbose about this.
    On Sep 18, 2011, at 9:23 PM, Kai Voigt wrote:

    Hi,

    this 0-33-66-100% phases are really confusing to beginners. We see that in our training classes. The output should be more verbose, such as breaking down the phases into seperate progress numbers.

    Does that make sense?

    Am 19.09.2011 um 06:17 schrieb Arun C Murthy:
    Nan,

    The 'phase' is implicitly understood by the 'progress' (value) made by the map/reduce tasks (see o.a.h.mapred.TaskStatus.Phase).

    For e.g.
    Reduce:
    0-33% -> Shuffle
    34-66% -> Sort (actually, just 'merge', there is no sort in the reduce since all map-outputs are sorted)
    67-100% -> Reduce

    With 0.23 onwards the Map has phases too:
    0-90% -> Map
    91-100% -> Final Sort/merge

    Now,about starting reduces early - this is done to ensure shuffle can proceed for completed maps while rest of the maps run, there-by pipelining shuffle and map completion. There is a 'reduce slowstart' feature to control this - by default, reduces aren't started until 5% of maps are complete. Users can set this higher.

    Arun
    On Sep 18, 2011, at 7:24 PM, Nan Zhu wrote:

    Hi, all

    recently, I was hit by a question, "how is a hadoop job divided into 2
    phases?",

    In textbooks, we are told that the mapreduce jobs are divided into 2 phases,
    map and reduce, and for reduce, we further divided it into 3 stages,
    shuffle, sort, and reduce, but in hadoop codes, I never think about
    this question, I didn't see any variable members in JobInProgress class
    to indicate this information,

    and according to my understanding on the source code of hadoop, the reduce
    tasks are unnecessarily started until all mappers are finished, in
    constract, we can see the reduce tasks are in shuffle stage while there are
    mappers which are still in running,
    So how can I indicate the phase which the job is belonging to?

    Thanks
    --
    Nan Zhu
    School of Electronic, Information and Electrical Engineering,229
    Shanghai Jiao Tong University
    800,Dongchuan Road,Shanghai,China
    E-Mail: zhunansjtu@gmail.com
    --
    Kai Voigt
    k@123.org


    This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled
    to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and
    all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited.

    All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its
    subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of "Viruses" or other "Malware".
    Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying
    this e-mail or any attachment.


    The information contained in this email may be subject to the export control laws and regulations of the United States, potentially
    including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of
    Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this information you are obligated to comply with all
    applicable U.S. export laws and regulations.
  • Nan Zhu at Sep 19, 2011 at 5:01 am
    Hi, Arun ,

    Thanks!

    As you explained, in the hadoop, we cannot explicitly divide job as two
    phase, map and reduce, but only for reduce task, we can judge which stage
    it's in, (shuffle, sort, reduce) (with 0.23 , we can also do it with
    mappers, )

    right?

    Nan
    On Mon, Sep 19, 2011 at 12:17 PM, Arun C Murthy wrote:

    Nan,

    The 'phase' is implicitly understood by the 'progress' (value) made by the
    map/reduce tasks (see o.a.h.mapred.TaskStatus.Phase).

    For e.g.
    Reduce:
    0-33% -> Shuffle
    34-66% -> Sort (actually, just 'merge', there is no sort in the reduce
    since all map-outputs are sorted)
    67-100% -> Reduce

    With 0.23 onwards the Map has phases too:
    0-90% -> Map
    91-100% -> Final Sort/merge

    Now,about starting reduces early - this is done to ensure shuffle can
    proceed for completed maps while rest of the maps run, there-by pipelining
    shuffle and map completion. There is a 'reduce slowstart' feature to control
    this - by default, reduces aren't started until 5% of maps are complete.
    Users can set this higher.

    Arun
    On Sep 18, 2011, at 7:24 PM, Nan Zhu wrote:

    Hi, all

    recently, I was hit by a question, "how is a hadoop job divided into 2
    phases?",

    In textbooks, we are told that the mapreduce jobs are divided into 2 phases,
    map and reduce, and for reduce, we further divided it into 3 stages,
    shuffle, sort, and reduce, but in hadoop codes, I never think about
    this question, I didn't see any variable members in JobInProgress class
    to indicate this information,

    and according to my understanding on the source code of hadoop, the reduce
    tasks are unnecessarily started until all mappers are finished, in
    constract, we can see the reduce tasks are in shuffle stage while there are
    mappers which are still in running,
    So how can I indicate the phase which the job is belonging to?

    Thanks
    --
    Nan Zhu
    School of Electronic, Information and Electrical Engineering,229
    Shanghai Jiao Tong University
    800,Dongchuan Road,Shanghai,China
    E-Mail: zhunansjtu@gmail.com

    --
    Nan Zhu
    School of Electronic, Information and Electrical Engineering,229
    Shanghai Jiao Tong University
    800,Dongchuan Road,Shanghai,China
    E-Mail: zhunansjtu@gmail.com
  • He Chen at Sep 19, 2011 at 5:30 am
    Hi Arun

    I have a question. Do you know what is the reason that hadoop allows the map
    and the reduce stage overlap? Or anyone knows about it. Thank you in
    advance.

    Chen
    On Sun, Sep 18, 2011 at 11:17 PM, Arun C Murthy wrote:

    Nan,

    The 'phase' is implicitly understood by the 'progress' (value) made by the
    map/reduce tasks (see o.a.h.mapred.TaskStatus.Phase).

    For e.g.
    Reduce:
    0-33% -> Shuffle
    34-66% -> Sort (actually, just 'merge', there is no sort in the reduce
    since all map-outputs are sorted)
    67-100% -> Reduce

    With 0.23 onwards the Map has phases too:
    0-90% -> Map
    91-100% -> Final Sort/merge

    Now,about starting reduces early - this is done to ensure shuffle can
    proceed for completed maps while rest of the maps run, there-by pipelining
    shuffle and map completion. There is a 'reduce slowstart' feature to control
    this - by default, reduces aren't started until 5% of maps are complete.
    Users can set this higher.

    Arun
    On Sep 18, 2011, at 7:24 PM, Nan Zhu wrote:

    Hi, all

    recently, I was hit by a question, "how is a hadoop job divided into 2
    phases?",

    In textbooks, we are told that the mapreduce jobs are divided into 2 phases,
    map and reduce, and for reduce, we further divided it into 3 stages,
    shuffle, sort, and reduce, but in hadoop codes, I never think about
    this question, I didn't see any variable members in JobInProgress class
    to indicate this information,

    and according to my understanding on the source code of hadoop, the reduce
    tasks are unnecessarily started until all mappers are finished, in
    constract, we can see the reduce tasks are in shuffle stage while there are
    mappers which are still in running,
    So how can I indicate the phase which the job is belonging to?

    Thanks
    --
    Nan Zhu
    School of Electronic, Information and Electrical Engineering,229
    Shanghai Jiao Tong University
    800,Dongchuan Road,Shanghai,China
    E-Mail: zhunansjtu@gmail.com
  • Kai Voigt at Sep 19, 2011 at 5:37 am
    Hi Chen,

    the times when nodes running instances of the map and reduce nodes overlap. But map() and reduce() execution will not.

    reduce nodes will start copying data from map nodes, that's the shuffle phase. And the map nodes are still running during that copy phase. My observation had been that if the map phase progresses from 0 to 100%, it matches with the reduce phase progress from 0-33%. For example, if you map progress shows 60%, reduce might show 20%.

    But the reduce() will not start until all the map() code has processed the entire input. So you will never see the reduce progress higher than 66% when map progress didn't reach 100%.

    If you see map phase reaching 100%, but reduce phase not making any higher number than 66%, it means your reduce() code is broken or slow because it doesn't produce any output. An infinitive loop is a common mistake.

    Kai

    Am 19.09.2011 um 07:29 schrieb He Chen:
    Hi Arun

    I have a question. Do you know what is the reason that hadoop allows the map
    and the reduce stage overlap? Or anyone knows about it. Thank you in
    advance.

    Chen
    On Sun, Sep 18, 2011 at 11:17 PM, Arun C Murthy wrote:

    Nan,

    The 'phase' is implicitly understood by the 'progress' (value) made by the
    map/reduce tasks (see o.a.h.mapred.TaskStatus.Phase).

    For e.g.
    Reduce:
    0-33% -> Shuffle
    34-66% -> Sort (actually, just 'merge', there is no sort in the reduce
    since all map-outputs are sorted)
    67-100% -> Reduce

    With 0.23 onwards the Map has phases too:
    0-90% -> Map
    91-100% -> Final Sort/merge

    Now,about starting reduces early - this is done to ensure shuffle can
    proceed for completed maps while rest of the maps run, there-by pipelining
    shuffle and map completion. There is a 'reduce slowstart' feature to control
    this - by default, reduces aren't started until 5% of maps are complete.
    Users can set this higher.

    Arun
    On Sep 18, 2011, at 7:24 PM, Nan Zhu wrote:

    Hi, all

    recently, I was hit by a question, "how is a hadoop job divided into 2
    phases?",

    In textbooks, we are told that the mapreduce jobs are divided into 2 phases,
    map and reduce, and for reduce, we further divided it into 3 stages,
    shuffle, sort, and reduce, but in hadoop codes, I never think about
    this question, I didn't see any variable members in JobInProgress class
    to indicate this information,

    and according to my understanding on the source code of hadoop, the reduce
    tasks are unnecessarily started until all mappers are finished, in
    constract, we can see the reduce tasks are in shuffle stage while there are
    mappers which are still in running,
    So how can I indicate the phase which the job is belonging to?

    Thanks
    --
    Nan Zhu
    School of Electronic, Information and Electrical Engineering,229
    Shanghai Jiao Tong University
    800,Dongchuan Road,Shanghai,China
    E-Mail: zhunansjtu@gmail.com
    --
    Kai Voigt
    k@123.org
  • He Chen at Sep 19, 2011 at 6:20 am
    Hi Kai

    Thank you for the reply.

    The reduce() will not start because the shuffle phase does not finish. And
    the shuffle phase will not finish untill alll mapper end.

    I am curious about the design purpose about overlapping the map and reduce
    stage. Was this only for saving shuffling time? Or there are some other
    reasons.

    Best wishes!

    Chen
    On Mon, Sep 19, 2011 at 12:36 AM, Kai Voigt wrote:

    Hi Chen,

    the times when nodes running instances of the map and reduce nodes overlap.
    But map() and reduce() execution will not.

    reduce nodes will start copying data from map nodes, that's the shuffle
    phase. And the map nodes are still running during that copy phase. My
    observation had been that if the map phase progresses from 0 to 100%, it
    matches with the reduce phase progress from 0-33%. For example, if you map
    progress shows 60%, reduce might show 20%.

    But the reduce() will not start until all the map() code has processed the
    entire input. So you will never see the reduce progress higher than 66% when
    map progress didn't reach 100%.

    If you see map phase reaching 100%, but reduce phase not making any higher
    number than 66%, it means your reduce() code is broken or slow because it
    doesn't produce any output. An infinitive loop is a common mistake.

    Kai

    Am 19.09.2011 um 07:29 schrieb He Chen:
    Hi Arun

    I have a question. Do you know what is the reason that hadoop allows the map
    and the reduce stage overlap? Or anyone knows about it. Thank you in
    advance.

    Chen
    On Sun, Sep 18, 2011 at 11:17 PM, Arun C Murthy wrote:

    Nan,

    The 'phase' is implicitly understood by the 'progress' (value) made by
    the
    map/reduce tasks (see o.a.h.mapred.TaskStatus.Phase).

    For e.g.
    Reduce:
    0-33% -> Shuffle
    34-66% -> Sort (actually, just 'merge', there is no sort in the reduce
    since all map-outputs are sorted)
    67-100% -> Reduce

    With 0.23 onwards the Map has phases too:
    0-90% -> Map
    91-100% -> Final Sort/merge

    Now,about starting reduces early - this is done to ensure shuffle can
    proceed for completed maps while rest of the maps run, there-by
    pipelining
    shuffle and map completion. There is a 'reduce slowstart' feature to
    control
    this - by default, reduces aren't started until 5% of maps are complete.
    Users can set this higher.

    Arun
    On Sep 18, 2011, at 7:24 PM, Nan Zhu wrote:

    Hi, all

    recently, I was hit by a question, "how is a hadoop job divided into 2
    phases?",

    In textbooks, we are told that the mapreduce jobs are divided into 2 phases,
    map and reduce, and for reduce, we further divided it into 3 stages,
    shuffle, sort, and reduce, but in hadoop codes, I never think about
    this question, I didn't see any variable members in JobInProgress class
    to indicate this information,

    and according to my understanding on the source code of hadoop, the reduce
    tasks are unnecessarily started until all mappers are finished, in
    constract, we can see the reduce tasks are in shuffle stage while there are
    mappers which are still in running,
    So how can I indicate the phase which the job is belonging to?

    Thanks
    --
    Nan Zhu
    School of Electronic, Information and Electrical Engineering,229
    Shanghai Jiao Tong University
    800,Dongchuan Road,Shanghai,China
    E-Mail: zhunansjtu@gmail.com
    --
    Kai Voigt
    k@123.org



  • He Chen at Sep 19, 2011 at 6:25 am
    Or we can just seperate shuffle from reduce stage and integrate it to the
    map stage
    . Then we can clearly differentiate the map stage(before shuffle finish) and
    (after shuffle finish)the reduce stage.

    On Mon, Sep 19, 2011 at 1:20 AM, He Chen wrote:

    Hi Kai

    Thank you for the reply.

    The reduce() will not start because the shuffle phase does not finish. And
    the shuffle phase will not finish untill alll mapper end.

    I am curious about the design purpose about overlapping the map and reduce
    stage. Was this only for saving shuffling time? Or there are some other
    reasons.

    Best wishes!

    Chen
    On Mon, Sep 19, 2011 at 12:36 AM, Kai Voigt wrote:

    Hi Chen,

    the times when nodes running instances of the map and reduce nodes
    overlap. But map() and reduce() execution will not.

    reduce nodes will start copying data from map nodes, that's the shuffle
    phase. And the map nodes are still running during that copy phase. My
    observation had been that if the map phase progresses from 0 to 100%, it
    matches with the reduce phase progress from 0-33%. For example, if you map
    progress shows 60%, reduce might show 20%.

    But the reduce() will not start until all the map() code has processed the
    entire input. So you will never see the reduce progress higher than 66% when
    map progress didn't reach 100%.

    If you see map phase reaching 100%, but reduce phase not making any higher
    number than 66%, it means your reduce() code is broken or slow because it
    doesn't produce any output. An infinitive loop is a common mistake.

    Kai

    Am 19.09.2011 um 07:29 schrieb He Chen:
    Hi Arun

    I have a question. Do you know what is the reason that hadoop allows the map
    and the reduce stage overlap? Or anyone knows about it. Thank you in
    advance.

    Chen

    On Sun, Sep 18, 2011 at 11:17 PM, Arun C Murthy <acm@hortonworks.com>
    wrote:
    Nan,

    The 'phase' is implicitly understood by the 'progress' (value) made by
    the
    map/reduce tasks (see o.a.h.mapred.TaskStatus.Phase).

    For e.g.
    Reduce:
    0-33% -> Shuffle
    34-66% -> Sort (actually, just 'merge', there is no sort in the reduce
    since all map-outputs are sorted)
    67-100% -> Reduce

    With 0.23 onwards the Map has phases too:
    0-90% -> Map
    91-100% -> Final Sort/merge

    Now,about starting reduces early - this is done to ensure shuffle can
    proceed for completed maps while rest of the maps run, there-by
    pipelining
    shuffle and map completion. There is a 'reduce slowstart' feature to
    control
    this - by default, reduces aren't started until 5% of maps are
    complete.
    Users can set this higher.

    Arun
    On Sep 18, 2011, at 7:24 PM, Nan Zhu wrote:

    Hi, all

    recently, I was hit by a question, "how is a hadoop job divided into 2
    phases?",

    In textbooks, we are told that the mapreduce jobs are divided into 2 phases,
    map and reduce, and for reduce, we further divided it into 3 stages,
    shuffle, sort, and reduce, but in hadoop codes, I never think about
    this question, I didn't see any variable members in JobInProgress
    class
    to indicate this information,

    and according to my understanding on the source code of hadoop, the reduce
    tasks are unnecessarily started until all mappers are finished, in
    constract, we can see the reduce tasks are in shuffle stage while
    there
    are
    mappers which are still in running,
    So how can I indicate the phase which the job is belonging to?

    Thanks
    --
    Nan Zhu
    School of Electronic, Information and Electrical Engineering,229
    Shanghai Jiao Tong University
    800,Dongchuan Road,Shanghai,China
    E-Mail: zhunansjtu@gmail.com
    --
    Kai Voigt
    k@123.org



  • Kai Voigt at Sep 19, 2011 at 6:33 am
    Hi Chen,

    yes, it saves time to move map() output to the nodes where they will be needed for the reduce() input. After map() has processed the first blocks, it makes sense to copy that output to the reduce nodes. Imagine a very large map() output. If shuffle&copy would be postponed after all map nodes are done, we'd wait. So those things happen in parallel.

    Consider it like Unix pipes where you start processing the output of the first command as the input of the next command.

    $ command1 | command2

    as opposed to store the output first and then process it.

    $ command1 > file ; $command2 < file

    Kai

    Am 19.09.2011 um 08:20 schrieb He Chen:
    Hi Kai

    Thank you for the reply.

    The reduce() will not start because the shuffle phase does not finish. And
    the shuffle phase will not finish untill alll mapper end.

    I am curious about the design purpose about overlapping the map and reduce
    stage. Was this only for saving shuffling time? Or there are some other
    reasons.

    Best wishes!

    Chen
    On Mon, Sep 19, 2011 at 12:36 AM, Kai Voigt wrote:

    Hi Chen,

    the times when nodes running instances of the map and reduce nodes overlap.
    But map() and reduce() execution will not.

    reduce nodes will start copying data from map nodes, that's the shuffle
    phase. And the map nodes are still running during that copy phase. My
    observation had been that if the map phase progresses from 0 to 100%, it
    matches with the reduce phase progress from 0-33%. For example, if you map
    progress shows 60%, reduce might show 20%.

    But the reduce() will not start until all the map() code has processed the
    entire input. So you will never see the reduce progress higher than 66% when
    map progress didn't reach 100%.

    If you see map phase reaching 100%, but reduce phase not making any higher
    number than 66%, it means your reduce() code is broken or slow because it
    doesn't produce any output. An infinitive loop is a common mistake.

    Kai

    Am 19.09.2011 um 07:29 schrieb He Chen:
    Hi Arun

    I have a question. Do you know what is the reason that hadoop allows the map
    and the reduce stage overlap? Or anyone knows about it. Thank you in
    advance.

    Chen

    On Sun, Sep 18, 2011 at 11:17 PM, Arun C Murthy <acm@hortonworks.com>
    wrote:
    Nan,

    The 'phase' is implicitly understood by the 'progress' (value) made by
    the
    map/reduce tasks (see o.a.h.mapred.TaskStatus.Phase).

    For e.g.
    Reduce:
    0-33% -> Shuffle
    34-66% -> Sort (actually, just 'merge', there is no sort in the reduce
    since all map-outputs are sorted)
    67-100% -> Reduce

    With 0.23 onwards the Map has phases too:
    0-90% -> Map
    91-100% -> Final Sort/merge

    Now,about starting reduces early - this is done to ensure shuffle can
    proceed for completed maps while rest of the maps run, there-by
    pipelining
    shuffle and map completion. There is a 'reduce slowstart' feature to
    control
    this - by default, reduces aren't started until 5% of maps are complete.
    Users can set this higher.

    Arun
    On Sep 18, 2011, at 7:24 PM, Nan Zhu wrote:

    Hi, all

    recently, I was hit by a question, "how is a hadoop job divided into 2
    phases?",

    In textbooks, we are told that the mapreduce jobs are divided into 2 phases,
    map and reduce, and for reduce, we further divided it into 3 stages,
    shuffle, sort, and reduce, but in hadoop codes, I never think about
    this question, I didn't see any variable members in JobInProgress class
    to indicate this information,

    and according to my understanding on the source code of hadoop, the reduce
    tasks are unnecessarily started until all mappers are finished, in
    constract, we can see the reduce tasks are in shuffle stage while there are
    mappers which are still in running,
    So how can I indicate the phase which the job is belonging to?

    Thanks
    --
    Nan Zhu
    School of Electronic, Information and Electrical Engineering,229
    Shanghai Jiao Tong University
    800,Dongchuan Road,Shanghai,China
    E-Mail: zhunansjtu@gmail.com
    --
    Kai Voigt
    k@123.org



    --
    Kai Voigt
    k@123.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedSep 19, '11 at 2:24a
activeSep 19, '11 at 7:20p
posts12
users5
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase