FAQ
Hi.



I have a cluster with where each node can run up to 8 map tasks (one task
per core), now we realized that we need to run another type of job that has
much larger memory requirements, which will only allow up to 4 tasks to be
run on each node. Is it possible to somehow specify that each map process of
that new task "occupies" two map slots so that at most 4 such maps will be
launched?



Another option is of course to change map.tasks.maximum to 4 and rewrite
all old tasks to run couple of threads, but if there's better solution I'd
highly appreciate any advice!

Search Discussions

  • Arun C Murthy at Apr 11, 2010 at 12:07 am

    On Apr 10, 2010, at 4:02 PM, Dmitry Pushkarev wrote:
    I have a cluster with where each node can run up to 8 map tasks (one
    task
    per core), now we realized that we need to run another type of job
    that has
    much larger memory requirements, which will only allow up to 4 tasks
    to be
    run on each node. Is it possible to somehow specify that each map
    process of
    that new task "occupies" two map slots so that at most 4 such maps
    will be
    launched?
    Which MR scheduler are you running?

    The CapacityScheduler (http://hadoop.apache.org/common/docs/r0.20.0/capacity_scheduler.html
    ) has exactly the feature you are looking for, it's called 'High RAM
    jobs'. I'm not sure whether the FairScheduler has this feature, I'll
    let someone more knowledgeable comment on the FS.

    Unfortunately, this feature in CS is available only in trunk/
    hadoop-0.21 which hasn't released yet.

    We, at Yahoo!, run a version hadoop-0.20 which includes a backport for
    this feature in the CS:
    http://github.com/yahoo/hadoop-common/commits/yahoo-hadoop-0.20.9-stable

    Arun
  • Dmitry Pushkarev at Apr 11, 2010 at 1:54 am
    I'll try using Yahoo! Version of 20.9, Thanks.

    Right now I'm still on 0.19.0, what is the expected date of the 0.21
    release?

    -----Original Message-----
    From: Arun C Murthy
    Sent: Saturday, April 10, 2010 5:07 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Resource allocation for map tasks
    On Apr 10, 2010, at 4:02 PM, Dmitry Pushkarev wrote:
    I have a cluster with where each node can run up to 8 map tasks (one
    task
    per core), now we realized that we need to run another type of job
    that has
    much larger memory requirements, which will only allow up to 4 tasks
    to be
    run on each node. Is it possible to somehow specify that each map
    process of
    that new task "occupies" two map slots so that at most 4 such maps
    will be
    launched?
    Which MR scheduler are you running?

    The CapacityScheduler
    (http://hadoop.apache.org/common/docs/r0.20.0/capacity_scheduler.html
    ) has exactly the feature you are looking for, it's called 'High RAM
    jobs'. I'm not sure whether the FairScheduler has this feature, I'll
    let someone more knowledgeable comment on the FS.

    Unfortunately, this feature in CS is available only in trunk/
    hadoop-0.21 which hasn't released yet.

    We, at Yahoo!, run a version hadoop-0.20 which includes a backport for
    this feature in the CS:
    http://github.com/yahoo/hadoop-common/commits/yahoo-hadoop-0.20.9-stable

    Arun

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedApr 10, '10 at 11:03p
activeApr 11, '10 at 1:54a
posts3
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Dmitry Pushkarev: 2 posts Arun C Murthy: 1 post

People

Translate

site design / logo © 2022 Grokbase