Hi,
I dont really understand the meaning of the sentences in "The Definitive Guide"(page 155):

Tasktrackers have a fixed number of slots for map tasks and for reduce tasks: for example,
a tasktracker may be able to run two map tasks and two reduce tasks simultaneously.
(The precise number depends on the number of cores and the amount of
memory on the tasktracker; see “Memory” on page 254.)

Does that mean the number of slots is fixed and the number of maps run simultaneously is set by user?
If that, how do the number of cores and memory decide the number of slots?
Thanks!




===================================================
Regards!
Tan Jun
谭军

Search Discussions

  • Harsh J at Dec 12, 2011 at 3:53 am
    Hi Tan,
    On 12-Dec-2011, at 8:48 AM, Tan Jun wrote:

    Hi,
    I dont really understand the meaning of the sentences in "The Definitive Guide"(page 155):

    Tasktrackers have a fixed number of slots for map tasks and for reduce tasks: for example,
    a tasktracker may be able to run two map tasks and two reduce tasks simultaneously.
    (The precise number depends on the number of cores and the amount of
    memory on the tasktracker; see “Memory” on page 254.)

    Does that mean the number of slots is fixed and the number of maps run simultaneously is set by user?
    Not by the user, but by the administrator. Each tasktracker is configured in production with a 'task slot' upper limit - say, 8 maps and 4 reducers for a 12-core machine. This is not auto-configured (unless you use auto cluster setup+configuration tools that determine it for you [0]), and has to be set when configuring Hadoop daemons.

    The book means to imply that you need to set these, based on the memory and CPU configuration of your machines. By default, tasktrackers have limits of 2+2.

    See http://wiki.apache.org/hadoop/LimitingTaskSlotUsage

    [0] - http://www.cloudera.com/products-services/tools/ is one.
  • Tan Jun at Dec 12, 2011 at 4:53 am
    Hi Harsh,
    Now I know the number of maps and reduces run simultaneously is set by the administrator in mapred-site.xml with default value 2.
    But I cant get the point about number of slots.
    For my understanding by now,
    the number of slots is decides by hardware that administrator cannot change.
    Is that wright?




    Tan Jun

    From: Harsh J
    Date: 2011-12-12 12:22
    To: mapreduce-user
    Subject: Re: About slots of tasktracker and munber of map taskers
    Hi Tan,


    On 12-Dec-2011, at 8:48 AM, Tan Jun wrote:


    Hi,
    I dont really understand the meaning of the sentences in "The Definitive Guide"(page 155):

    Tasktrackers have a fixed number of slots for map tasks and for reduce tasks: for example,
    a tasktracker may be able to run two map tasks and two reduce tasks simultaneously.
    (The precise number depends on the number of cores and the amount of
    memory on the tasktracker; see “Memory” on page 254.)

    Does that mean the number of slots is fixed and the number of maps run simultaneously is set by user?


    Not by the user, but by the administrator. Each tasktracker is configured in production with a 'task slot' upper limit - say, 8 maps and 4 reducers for a 12-core machine. This is not auto-configured (unless you use auto cluster setup+configuration tools that determine it for you [0]), and has to be set when configuring Hadoop daemons.


    The book means to imply that you need to set these, based on the memory and CPU configuration of your machines. By default, tasktrackers have limits of 2+2.


    See http://wiki.apache.org/hadoop/LimitingTaskSlotUsage


    [0] - http://www.cloudera.com/products-services/tools/ is one.
  • Harsh J at Dec 12, 2011 at 5:04 am
    Tan,

    As an admin, I can even choose to set configuration to even 100 slots
    on a 4-core node, if I feel like burning the box. There is no hardware
    auto-detection, and the slot limit is entirely controlled by the
    mapred-site.xml for that TaskTracker.

    The book merely tries to tell that you need to set these maximum slot
    settings based on your hardware knowledge on each node -- TaskTrackers
    do nothing of that sort on their own.

    There is some CPU/Memory considerations taken into account by a
    variety of non-default Schedulers in JobTracker, but your slot limits
    per tasktracker is entirely controlled by configuration.

    2011/12/12 Tan Jun <tanjun_2525@163.com>:
    Hi Harsh,
    Now I know the number of maps and reduces run simultaneously is set by the
    administrator in mapred-site.xml with default value 2.
    But I cant get the point about number of slots.
    For my understanding by now,
    the number of  slots is decides by hardware that administrator cannot
    change.
    Is that wright?

    ________________________________
    Tan Jun

    From: Harsh J
    Date: 2011-12-12 12:22
    To: mapreduce-user
    Subject: Re: About slots of tasktracker and munber of map taskers
    Hi Tan,

    On 12-Dec-2011, at 8:48 AM, Tan Jun wrote:

    Hi,
    I dont really understand the meaning of the sentences in "The Definitive
    Guide"(page 155):

    Tasktrackers have a fixed number of slots for map tasks and for reduce tasks: for example,
    a tasktracker may be able to run two map tasks and two reduce tasks simultaneously.
    (The precise number depends on the number of cores and the amount of
    memory on the tasktracker; see “Memory” on page 254.)

    Does that mean the number of slots is fixed and the number of maps run
    simultaneously is set by user?


    Not by the user, but by the administrator. Each tasktracker is configured in
    production with a 'task slot' upper limit - say, 8 maps and 4 reducers for a
    12-core machine. This is not auto-configured (unless you use auto cluster
    setup+configuration tools that determine it for you [0]), and has to be set
    when configuring Hadoop daemons.

    The book means to imply that you need to set these, based on the memory and
    CPU configuration of your machines. By default, tasktrackers have limits of
    2+2.

    See http://wiki.apache.org/hadoop/LimitingTaskSlotUsage

    [0] - http://www.cloudera.com/products-services/tools/ is one.


    --
    Harsh J
  • Tan Jun at Dec 12, 2011 at 5:30 am
    Harsh,
    Sorry for my poor English.
    There is one more question.
    As an administrator, I can set the max number of maps/reduces run on a datanode,
    then what I set the number of slots for?
    What's the differences between these attributes?
    In my opinion ,the number of slot depends on hardware while maps/reduces on software.
    Assuming that only one job is running, especially for benchmarking case PI computing.
    Thanks!




    Tan Jun

    From: Harsh J
    Date: 2011-12-12 13:33
    To: mapreduce-user; tanjun_2525
    Subject: Re: Re: About slots of tasktracker and munber of map taskers
    Tan,

    As an admin, I can even choose to set configuration to even 100 slots
    on a 4-core node, if I feel like burning the box. There is no hardware
    auto-detection, and the slot limit is entirely controlled by the
    mapred-site.xml for that TaskTracker.

    The book merely tries to tell that you need to set these maximum slot
    settings based on your hardware knowledge on each node -- TaskTrackers
    do nothing of that sort on their own.

    There is some CPU/Memory considerations taken into account by a
    variety of non-default Schedulers in JobTracker, but your slot limits
    per tasktracker is entirely controlled by configuration.

    2011/12/12 Tan Jun <tanjun_2525@163.com>:
    Hi Harsh,
    Now I know the number of maps and reduces run simultaneously is set by the
    administrator in mapred-site.xml with default value 2.
    But I cant get the point about number of slots.
    For my understanding by now,
    the number of?slots is decides by hardware that administrator cannot
    change.
    Is that wright?

    ________________________________
    Tan Jun

    From:�Harsh J
    Date:?011-12-12?2:22
    To:�mapreduce-user
    Subject:�Re: About slots of tasktracker and munber of map taskers
    Hi Tan,

    On 12-Dec-2011, at 8:48 AM, Tan Jun wrote:

    Hi,
    I dont really understand the meaning of the sentences in "The Definitive
    Guide"(page 155):

    Tasktrackers�have�a�fixed�number�of�slots�for�map�tasks�and�for�reduce�tasks:�for�example,
    a�tasktracker�may�be�able�to�run�two�map�tasks�and�two�reduce�tasks�simultaneously.
    (The�precise�number�depends�on�the�number�of�cores�and�the�amount�of
    memory�on�the�tasktracker;�see��Memory��on�page?54.)

    Does that mean the�number of slots is fixed and the number of maps run
    simultaneously is set by user?


    Not by the user, but by the administrator. Each tasktracker is configured in
    production with a 'task slot' upper limit - say, 8 maps and 4 reducers for a
    12-core machine. This is not auto-configured (unless you use auto cluster
    setup+configuration tools that determine it for you [0]), and has to be set
    when configuring Hadoop daemons.

    The book means to imply that you need to set these, based on the memory and
    CPU configuration of your machines. By default, tasktrackers have limits of
    2+2.

    See�http://wiki.apache.org/hadoop/LimitingTaskSlotUsage

    [0] -�http://www.cloudera.com/products-services/tools/�is one.


    --
    Harsh J
  • Bejoy Ks at Dec 12, 2011 at 12:04 pm
    Hi Tan
    Adding on to Harsh's response.

    *Map Reduce Slots*
    It is maximum number of map and reduce tasks that can run
    concurrently on your cluster/nodes. Say if you have a 10 node cluster(10
    data nodes), each node would be assigned a specific number of map and
    reduce tasks it can handle concurrently. It needn't be same for all nodes,
    as per the node's hardware capacity it can vary. Considering the
    hardware(cpu, memory, ...) of each node the admin assigns these values
    accordingly so that the box can handle the resource requirements
    gracefully. If you overload these values(assigning more slots), ie you are
    asking the box to run more number of simultaneous tasks than it can handle
    and it results in memory swap, OOM, CPU cycle unavailability etc and in
    turn you end up in having an inefficient cluster encountering large number
    of task failures. Here assuming all machines are of same capacity if one
    machine has 8 map and 2 reduce slots then the total number of map task
    capacity of your cluster is 8*10=80 maps and 2*10=20 reducers, which means
    at a time your cluster can run only 80 map tasks and 20 reduce tasks. So
    the total number of map slots is 80 and reduce slots is 20 for your
    cluster.

    *Map Reduce Tasks*
    It refers to the actual tasks spawn from your map reduce jobs. Say
    at a time in my above a cluster I'm firing two jobs, one after other. The
    first job spawns 60 mappers and the second one spawns 40 mappers. As soon
    as the first job is spawned the 60 slots out of 80 would be occupied, what
    is left in cluster is 20 slots. When I trigger my second job it has 40 map
    tasks but only 20 slots are available in cluster, so 20 map tasks would be
    spawned and the rest 20 has to be in queue, once the slots gets free these
    tasks would be able to execute.

    In short the map reduce slots are set by admin based on hardware
    on a per node basis. It is not set at individual task level. The developer
    need not have to worry on these parameter at his job level. The map reduce
    developer can develop his application, based on input splits and Input
    Formats it fires maps and reduce tasks. The number of tasks would vary as
    per your inputs and jobs. Based on the availability of slots in
    cluster(assigned by admin) (and factors like data/rack locality) these
    tasks are executed on cluster.

    Coming to your question,
    As an administrator, I can set the max number of maps/reduces run on a
    datanode,
    then what I set the number of slots for?

    max number of maps/reduces that can run on a datanode at the same time is
    exactly what you call map reduce slots specified for that data node.


    Hope it is clarifies.

    Regards
    Bejoy.K.S


    2011/12/12 Tan Jun <tanjun_2525@163.com>
    **
    Harsh,
    Sorry for my poor English.
    There is one more question.
    As an administrator, I can set the max number of maps/reduces run on a
    datanode,
    then what I set the number of slots for?
    What's the differences between these attributes?
    In my opinion ,the number of slot depends on hardware while maps/reduces
    on software.
    Assuming that only one job is running, especially for benchmarking case PI
    computing.
    Thanks!

    ------------------------------
    Tan Jun

    *From:* Harsh J <harsh@cloudera.com>
    *Date:* 2011-12-12 13:33
    *To:* mapreduce-user <mapreduce-user@hadoop.apache.org>; tanjun_2525<tanjun_2525@163.com>
    *Subject:* Re: Re: About slots of tasktracker and munber of map taskers
    Tan,

    As an admin, I can even choose to set configuration to even 100 slots
    on a 4-core node, if I feel like burning the box. There is no hardware
    auto-detection, and the slot limit is entirely controlled by the
    mapred-site.xml for that TaskTracker.

    The book merely tries to tell that you need to set these maximum slot
    settings based on your hardware knowledge on each node -- TaskTrackers
    do nothing of that sort on their own.

    There is some CPU/Memory considerations taken into account by a
    variety of non-default Schedulers in JobTracker, but your slot limits
    per tasktracker is entirely controlled by configuration.

    2011/12/12 Tan Jun <tanjun_2525@163.com>:
    Hi Harsh,
    Now I know the number of maps and reduces run simultaneously is set by the
    administrator in mapred-site.xml with default value 2.
    But I cant get the point about number of slots.
    For my understanding by now,
    the number of?slots is decides by hardware that administrator cannot
    change.
    Is that wright?

    ________________________________
    Tan Jun

    From:�Harsh J
    Date:?011-12-12?2:22
    To:�mapreduce-user
    Subject:�Re: About slots of tasktracker and munber of map taskers
    Hi Tan,

    On 12-Dec-2011, at 8:48 AM, Tan Jun wrote:

    Hi,
    I dont really understand the meaning of the sentences in "The Definitive
    Guide"(page 155):
    Tasktrackers�have�a�fixed�number�of�slots�for�map�tasks�and�for�reduce�tasks:�for�example,
    a�tasktracker�may�be�able�to�run�two�map�tasks�and�two�reduce�tasks�simultaneously.
    (The�precise�number�depends�on�the�number�of�cores�and�the�amount�of
    memory�on�the�tasktracker;�see��Memory��on�page?54.)

    Does that mean the�number of slots is fixed and the number of maps run
    simultaneously is set by user?

    Not by the user, but by the administrator. Each tasktracker is configured in
    production with a 'task slot' upper limit - say, 8 maps and 4 reducers for a
    12-core machine. This is not auto-configured (unless you use auto cluster
    setup+configuration tools that determine it for you [0]), and has to be set
    when configuring Hadoop daemons.
    The book means to imply that you need to set these, based on the memory and
    CPU configuration of your machines. By default, tasktrackers have limits of
    2+2.

    See�http://wiki.apache.org/hadoop/LimitingTaskSlotUsage

    [0] -�http://www.cloudera.com/products-services/tools/�is one.


    --
    Harsh J

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmapreduce-user @
categorieshadoop
postedDec 12, '11 at 3:19a
activeDec 12, '11 at 12:04p
posts6
users3
websitehadoop.apache.org...
irc#hadoop

3 users in discussion

Tan Jun: 3 posts Harsh J: 2 posts Bejoy Ks: 1 post

People

Translate

site design / logo © 2022 Grokbase