FAQ
Why not just combine them? How do I do that?

Rationale is that our tasks are very balanced in load, but unbalanced
in timing. I've found that limiting the number of total threads to be
the most safe approach to not overloading the dfs daemon. To date,
I've done that just through intelligent scheduling of jobs to stagger
maps and reduces, but have I missed a setting that exists to simply
limit number of tasks in-total?

Search Discussions

  • Amar Kamat at Jun 17, 2008 at 4:48 am

    Daniel Leffel wrote:
    Why not just combine them? How do I do that?
    Consider a case where the cluster (of n nodes) is configured to process
    just one task per node. Let there be (n-1) reducers. Lets assume that
    the map phase is complete and the reducers are shuffling. There will be
    (n-1) nodes with reducers. Now consider a case where the only node
    without the reducer gets lost. The cluster needs slots to run maps that
    were lost since the reducers are waiting for the maps to finish. In such
    a case the job will get stuck. To avoid such cases, there are separate
    maps and reduce task slots.
    Amar
    Rationale is that our tasks are very balanced in load, but unbalanced
    in timing. I've found that limiting the number of total threads to be
    the most safe approach to not overloading the dfs daemon. To date,
    I've done that just through intelligent scheduling of jobs to stagger
    maps and reduces, but have I missed a setting that exists to simply
    limit number of tasks in-total?
  • Taeho Kang at Jun 17, 2008 at 5:00 am
    Set "mapred.tasktracker.tasks.maximum"
    and each node will be able to process N number of tasks - map or/and reduce.

    Please note that once you set "mapred.tasktracker.tasks.maximum",
    "mapred.tasktracker.map.tasks.maximum" and
    "mapred.tasktracker.reduce.tasks.maximum" setting will not take effect.



    On Tue, Jun 17, 2008 at 1:46 PM, Amar Kamat wrote:

    Daniel Leffel wrote:
    Why not just combine them? How do I do that?

    Consider a case where the cluster (of n nodes) is configured to process
    just one task per node. Let there be (n-1) reducers. Lets assume that the
    map phase is complete and the reducers are shuffling. There will be (n-1)
    nodes with reducers. Now consider a case where the only node without the
    reducer gets lost. The cluster needs slots to run maps that were lost since
    the reducers are waiting for the maps to finish. In such a case the job will
    get stuck. To avoid such cases, there are separate maps and reduce task
    slots.
    Amar

    Rationale is that our tasks are very balanced in load, but unbalanced
    in timing. I've found that limiting the number of total threads to be
    the most safe approach to not overloading the dfs daemon. To date,
    I've done that just through intelligent scheduling of jobs to stagger
    maps and reduces, but have I missed a setting that exists to simply
    limit number of tasks in-total?
  • Amareshwari Sriramadasu at Jun 17, 2008 at 5:09 am

    Taeho Kang wrote:
    Set "mapred.tasktracker.tasks.maximum"
    and each node will be able to process N number of tasks - map or/and reduce.

    Please note that once you set "mapred.tasktracker.tasks.maximum",
    "mapred.tasktracker.map.tasks.maximum" and
    "mapred.tasktracker.reduce.tasks.maximum" setting will not take effect.


    This is valid only till 0.16.*, because the property
    "mapred.tasktracker.tasks.maximum" is removed from 0.17.
    So, from 0.17, "mapred.tasktracker.map.tasks.maximum" and
    "mapred.tasktracker.reduce.tasks.maximum" should be used.
    On Tue, Jun 17, 2008 at 1:46 PM, Amar Kamat wrote:

    Daniel Leffel wrote:

    Why not just combine them? How do I do that?


    Consider a case where the cluster (of n nodes) is configured to process
    just one task per node. Let there be (n-1) reducers. Lets assume that the
    map phase is complete and the reducers are shuffling. There will be (n-1)
    nodes with reducers. Now consider a case where the only node without the
    reducer gets lost. The cluster needs slots to run maps that were lost since
    the reducers are waiting for the maps to finish. In such a case the job will
    get stuck. To avoid such cases, there are separate maps and reduce task
    slots.
    Amar

    Rationale is that our tasks are very balanced in load, but unbalanced
    in timing. I've found that limiting the number of total threads to be
    the most safe approach to not overloading the dfs daemon. To date,
    I've done that just through intelligent scheduling of jobs to stagger
    maps and reduces, but have I missed a setting that exists to simply
    limit number of tasks in-total?

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 17, '08 at 4:19a
activeJun 17, '08 at 5:09a
posts4
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase