FAQ
Hi,all

I find that when the sub-nodes' hardware configurations are
different, some nodes are strong(more cpus and more memory), others are
weak (leas cpus and less memory), when I run the job, the task are
almost evenly distributed to all the sub-nodes. This makes the weak
nodes pretty slow and a lot of tasks on the weak nodes are killed. This
may lead the whole job processing becoming slow, I am sure, because a
lot of tasks(more than 10 tasks) are processed twice.

Question: How can I configure the hadoop to distribute less
tasks to weak nodes and distribute more tasks to strong nodes?



I configure the strong nodes with

"mapred.tasktracker.map.tasks.maximum=75",

"mapred.map.tasks=60",

"mapred.tasktracker.reduce.tasks.maximum=18",

"mapred.reduce.tasks=15"



and the weak nodes with

"mapred.tasktracker.map.tasks.maximum=60",

"mapred.map.tasks=45",

"mapred.tasktracker.reduce.tasks.maximum=15",

"mapred.reduce.tasks=12"



I have 4 nodes totally. One for name node and job tracker, the others
are for sub-nodes.



Thanks.



Guibin zhang

Search Discussions

  • Amar Kamat at Feb 28, 2008 at 2:40 pm
    This can be easily done through HoD since it requires separate
    configuration files for each tasktracker i.e node. As of now I dont think this can
    be done in HADOOP. Anyways never seen such high values for max tasks. :)
    Amar.
    On Wed, 27 Feb 2008, Zhang,
    Guibin wrote:
    Hi,all

    I find that when the sub-nodes' hardware configurations are
    different, some nodes are strong(more cpus and more memory), others are
    weak (leas cpus and less memory), when I run the job, the task are
    almost evenly distributed to all the sub-nodes. This makes the weak
    nodes pretty slow and a lot of tasks on the weak nodes are killed. This
    may lead the whole job processing becoming slow, I am sure, because a
    lot of tasks(more than 10 tasks) are processed twice.

    Question: How can I configure the hadoop to distribute less
    tasks to weak nodes and distribute more tasks to strong nodes?



    I configure the strong nodes with

    "mapred.tasktracker.map.tasks.maximum=75",

    "mapred.map.tasks=60",

    "mapred.tasktracker.reduce.tasks.maximum=18",

    "mapred.reduce.tasks=15"



    and the weak nodes with

    "mapred.tasktracker.map.tasks.maximum=60",

    "mapred.map.tasks=45",

    "mapred.tasktracker.reduce.tasks.maximum=15",

    "mapred.reduce.tasks=12"



    I have 4 nodes totally. One for name node and job tracker, the others
    are for sub-nodes.



    Thanks.



    Guibin zhang
  • Owen O'Malley at Feb 28, 2008 at 3:35 pm

    On Feb 27, 2008, at 7:29 PM, Zhang, Guibin wrote:

    Question: How can I configure the hadoop to distribute less
    tasks to weak nodes and distribute more tasks to strong nodes?
    mapred.tasktracker.map.tasks.maximum is the number of tasks to run on
    that task track simultaneously. 75 is almost certainly too high.
    mapred.map.tasks is only relevant on the submitting node, because
    that is where the planning takes place. More reasonable values
    (depending on your hardware) are:

    strong:
    mapred.tasktracker.map.tasks.maximum = 8
    mapred.tasktracker.reduce.tasks.maximum = 4

    weak:
    mapred.tasktracker.map.tasks.maximum = 4
    mapred.tasktracker.reduce.tasks.maximum = 2

    client:
    mapred.map.tasks = nodes * avgerage
    (mapred.tasktracker.map.tasks.maximum)
    mapred.reduce.tasks = 95% * nodes * average
    (mapred.tasktracker.reduce.tasks.maximum)

    -- Owen

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedFeb 28, '08 at 3:31a
activeFeb 28, '08 at 3:35p
posts3
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase