FAQ
Hi everyone :)
There's something I'm probably doing wrong but I can't seem to figure out
what.
I have two hadoop programs running one after the other.
This is done because they don't have the same needs in term of processor in
memory, so by separating them I optimize each task better.
Fact is, I need for the first job on every node
mapred.tasktracker.map.tasks.maximum set to 12.
For the second task, I need it to be set to 20.
so by default I set it to 12 and in the second job's code, I set this:

Configuration hadoopConfiguration = new Configuration();
hadoopConfiguration.setInt("mapred.tasktracker.map.tasks.maximum",
20);

But when running the job, instead of having the 20 tasks on each node as
expected, I have 12....
Any idea please?

Thank you.
Pierre.


--
http://www.neko-consulting.com
Ego sum quis ego servo
"Je suis ce que je protège"
"I am what I protect"

Search Discussions

  • Amareshwari Sri Ramadasu at Jun 30, 2010 at 10:10 am
    Hi Pierre,

    "mapred.tasktracker.map.tasks.maximum" is a cluster level configuration, cannot be set per job. It is loaded only while bringing up the TaskTracker.

    Thanks
    Amareshwari

    On 6/30/10 3:05 PM, "Pierre ANCELOT" wrote:

    Hi everyone :)
    There's something I'm probably doing wrong but I can't seem to figure out
    what.
    I have two hadoop programs running one after the other.
    This is done because they don't have the same needs in term of processor in
    memory, so by separating them I optimize each task better.
    Fact is, I need for the first job on every node
    mapred.tasktracker.map.tasks.maximum set to 12.
    For the second task, I need it to be set to 20.
    so by default I set it to 12 and in the second job's code, I set this:

    Configuration hadoopConfiguration = new Configuration();
    hadoopConfiguration.setInt("mapred.tasktracker.map.tasks.maximum",
    20);

    But when running the job, instead of having the 20 tasks on each node as
    expected, I have 12....
    Any idea please?

    Thank you.
    Pierre.


    --
    http://www.neko-consulting.com
    Ego sum quis ego servo
    "Je suis ce que je protège"
    "I am what I protect"
  • Pierre ANCELOT at Jun 30, 2010 at 10:28 am
    Hi,
    Okay, so, if I set the 20 by default, I could maybe limit the number of
    concurrent maps per node instead?
    job.setNumReduceTasks exists but I see no equivalent for maps, though I
    think there was a setNumMapTasks before...
    Was it removed? Why?
    Any idea about how to acheive this?

    Thank you.

    On Wed, Jun 30, 2010 at 12:08 PM, Amareshwari Sri Ramadasu wrote:

    Hi Pierre,

    "mapred.tasktracker.map.tasks.maximum" is a cluster level configuration,
    cannot be set per job. It is loaded only while bringing up the TaskTracker.

    Thanks
    Amareshwari

    On 6/30/10 3:05 PM, "Pierre ANCELOT" wrote:

    Hi everyone :)
    There's something I'm probably doing wrong but I can't seem to figure out
    what.
    I have two hadoop programs running one after the other.
    This is done because they don't have the same needs in term of processor in
    memory, so by separating them I optimize each task better.
    Fact is, I need for the first job on every node
    mapred.tasktracker.map.tasks.maximum set to 12.
    For the second task, I need it to be set to 20.
    so by default I set it to 12 and in the second job's code, I set this:

    Configuration hadoopConfiguration = new Configuration();
    hadoopConfiguration.setInt("mapred.tasktracker.map.tasks.maximum",
    20);

    But when running the job, instead of having the 20 tasks on each node as
    expected, I have 12....
    Any idea please?

    Thank you.
    Pierre.


    --
    http://www.neko-consulting.com
    Ego sum quis ego servo
    "Je suis ce que je protège"
    "I am what I protect"

    --
    http://www.neko-consulting.com
    Ego sum quis ego servo
    "Je suis ce que je protège"
    "I am what I protect"
  • Dmitry Pushkarev at Jun 30, 2010 at 11:00 am
    Dear Hadoop users,

    I'm in the process of building a new cluster for our lab and I'm trying to
    run SGE simultaneously with hadoop. Idea is that each node would function as
    datanode at all times, but depending on situation and a fraction of nodes
    will run SGE instead of plain. SGE jobs will not have access to HDFS or
    local filesystem (except for /tmp) and will run out of external NAS, they
    aren't supposed to be IO bound.

    I'm trying to figure out of what's the best way to setup this resource
    sharing. One way would be to shutdown tasktrackers on reserved nodes and add
    them to SGE pool. Another way is run tasktrackers as SGE jobs and each
    tasktracker would shut down after some idle time.

    Has anyone tried something like this? I'd appreciate any advice.

    Thanks.
  • Ted Yu at Jun 30, 2010 at 11:59 am
    The number of map tasks is determined by InputSplit.
    On Wednesday, June 30, 2010, Pierre ANCELOT wrote:
    Hi,
    Okay, so, if I set the 20 by default, I could maybe limit the number of
    concurrent maps per node instead?
    job.setNumReduceTasks exists but I see no equivalent for maps, though I
    think there was a setNumMapTasks before...
    Was it removed? Why?
    Any idea about how to acheive this?

    Thank you.


    On Wed, Jun 30, 2010 at 12:08 PM, Amareshwari Sri Ramadasu <
    amarsri@yahoo-inc.com> wrote:
    Hi Pierre,

    "mapred.tasktracker.map.tasks.maximum" is a cluster level configuration,
    cannot be set per job. It is loaded only while bringing up the TaskTracker.

    Thanks
    Amareshwari

    On 6/30/10 3:05 PM, "Pierre ANCELOT" wrote:

    Hi everyone :)
    There's something I'm probably doing wrong but I can't seem to figure out
    what.
    I have two hadoop programs running one after the other.
    This is done because they don't have the same needs in term of processor in
    memory, so by separating them I optimize each task better.
    Fact is, I need for the first job on every node
    mapred.tasktracker.map.tasks.maximum set to 12.
    For the second task, I need it to be set to 20.
    so by default I set it to 12 and in the second job's code, I set this:

    Configuration hadoopConfiguration = new Configuration();
    hadoopConfiguration.setInt("mapred.tasktracker.map.tasks.maximum",
    20);

    But when running the job, instead of having the 20 tasks on each node as
    expected, I have 12....
    Any idea please?

    Thank you.
    Pierre.


    --
    http://www.neko-consulting.com
    Ego sum quis ego servo
    "Je suis ce que je protège"
    "I am what I protect"

    --
    http://www.neko-consulting.com
    Ego sum quis ego servo
    "Je suis ce que je protège"
    "I am what I protect"
  • Pierre ANCELOT at Jun 30, 2010 at 12:10 pm
    Sure, but not the number of tasks running concurrently on a node at the same
    time.


    On Wed, Jun 30, 2010 at 1:57 PM, Ted Yu wrote:

    The number of map tasks is determined by InputSplit.
    On Wednesday, June 30, 2010, Pierre ANCELOT wrote:
    Hi,
    Okay, so, if I set the 20 by default, I could maybe limit the number of
    concurrent maps per node instead?
    job.setNumReduceTasks exists but I see no equivalent for maps, though I
    think there was a setNumMapTasks before...
    Was it removed? Why?
    Any idea about how to acheive this?

    Thank you.


    On Wed, Jun 30, 2010 at 12:08 PM, Amareshwari Sri Ramadasu <
    amarsri@yahoo-inc.com> wrote:
    Hi Pierre,

    "mapred.tasktracker.map.tasks.maximum" is a cluster level configuration,
    cannot be set per job. It is loaded only while bringing up the
    TaskTracker.
    Thanks
    Amareshwari

    On 6/30/10 3:05 PM, "Pierre ANCELOT" wrote:

    Hi everyone :)
    There's something I'm probably doing wrong but I can't seem to figure
    out
    what.
    I have two hadoop programs running one after the other.
    This is done because they don't have the same needs in term of processor
    in
    memory, so by separating them I optimize each task better.
    Fact is, I need for the first job on every node
    mapred.tasktracker.map.tasks.maximum set to 12.
    For the second task, I need it to be set to 20.
    so by default I set it to 12 and in the second job's code, I set this:

    Configuration hadoopConfiguration = new Configuration();
    hadoopConfiguration.setInt("mapred.tasktracker.map.tasks.maximum",
    20);

    But when running the job, instead of having the 20 tasks on each node as
    expected, I have 12....
    Any idea please?

    Thank you.
    Pierre.


    --
    http://www.neko-consulting.com
    Ego sum quis ego servo
    "Je suis ce que je protège"
    "I am what I protect"

    --
    http://www.neko-consulting.com
    Ego sum quis ego servo
    "Je suis ce que je protège"
    "I am what I protect"


    --
    http://www.neko-consulting.com
    Ego sum quis ego servo
    "Je suis ce que je protège"
    "I am what I protect"
  • Yu Li at Jun 30, 2010 at 1:57 pm
    Hi Pierre,

    The "setNumReduceTasks" method is for setting the number of reduce tasks to
    launch, it's equal to set the "mapred.reduce.tasks" parameter, while the
    "mapred.tasktracker.reduce.tasks.maximum" parameter decides the number of
    tasks running *concurrently* on one node.
    And as Amareshwari mentioned, the
    "mapred.tasktracker.map/reduce.tasks.maximum" is a cluster configuration
    which could not be set per job. If you set
    mapred.tasktracker.map.tasks.maximum to 20, and the overall number of map
    tasks is larger than 20*<nodes number>, there would be 20 map tasks running
    concurrently on a node. As I know, you probably need to restart the
    tasktracker if you truely need to change the configuration.

    Best Regards,
    Carp

    2010/6/30 Pierre ANCELOT <pierreact@gmail.com>
    Sure, but not the number of tasks running concurrently on a node at the
    same
    time.


    On Wed, Jun 30, 2010 at 1:57 PM, Ted Yu wrote:

    The number of map tasks is determined by InputSplit.
    On Wednesday, June 30, 2010, Pierre ANCELOT wrote:
    Hi,
    Okay, so, if I set the 20 by default, I could maybe limit the number of
    concurrent maps per node instead?
    job.setNumReduceTasks exists but I see no equivalent for maps, though I
    think there was a setNumMapTasks before...
    Was it removed? Why?
    Any idea about how to acheive this?

    Thank you.


    On Wed, Jun 30, 2010 at 12:08 PM, Amareshwari Sri Ramadasu <
    amarsri@yahoo-inc.com> wrote:
    Hi Pierre,

    "mapred.tasktracker.map.tasks.maximum" is a cluster level
    configuration,
    cannot be set per job. It is loaded only while bringing up the
    TaskTracker.
    Thanks
    Amareshwari

    On 6/30/10 3:05 PM, "Pierre ANCELOT" wrote:

    Hi everyone :)
    There's something I'm probably doing wrong but I can't seem to figure
    out
    what.
    I have two hadoop programs running one after the other.
    This is done because they don't have the same needs in term of
    processor
    in
    memory, so by separating them I optimize each task better.
    Fact is, I need for the first job on every node
    mapred.tasktracker.map.tasks.maximum set to 12.
    For the second task, I need it to be set to 20.
    so by default I set it to 12 and in the second job's code, I set this:

    Configuration hadoopConfiguration = new Configuration();
    hadoopConfiguration.setInt("mapred.tasktracker.map.tasks.maximum",
    20);

    But when running the job, instead of having the 20 tasks on each node
    as
    expected, I have 12....
    Any idea please?

    Thank you.
    Pierre.


    --
    http://www.neko-consulting.com
    Ego sum quis ego servo
    "Je suis ce que je protège"
    "I am what I protect"

    --
    http://www.neko-consulting.com
    Ego sum quis ego servo
    "Je suis ce que je protège"
    "I am what I protect"


    --
    http://www.neko-consulting.com
    Ego sum quis ego servo
    "Je suis ce que je protège"
    "I am what I protect"
  • Pierre ANCELOT at Jun 30, 2010 at 2:07 pm
    ok, well, thanks...
    I truely hoped a solution would exist for this.
    Thanks.

    Pierre.
    On Wed, Jun 30, 2010 at 3:56 PM, Yu Li wrote:

    Hi Pierre,

    The "setNumReduceTasks" method is for setting the number of reduce tasks to
    launch, it's equal to set the "mapred.reduce.tasks" parameter, while the
    "mapred.tasktracker.reduce.tasks.maximum" parameter decides the number of
    tasks running *concurrently* on one node.
    And as Amareshwari mentioned, the
    "mapred.tasktracker.map/reduce.tasks.maximum" is a cluster configuration
    which could not be set per job. If you set
    mapred.tasktracker.map.tasks.maximum to 20, and the overall number of map
    tasks is larger than 20*<nodes number>, there would be 20 map tasks running
    concurrently on a node. As I know, you probably need to restart the
    tasktracker if you truely need to change the configuration.

    Best Regards,
    Carp

    2010/6/30 Pierre ANCELOT <pierreact@gmail.com>
    Sure, but not the number of tasks running concurrently on a node at the
    same
    time.


    On Wed, Jun 30, 2010 at 1:57 PM, Ted Yu wrote:

    The number of map tasks is determined by InputSplit.
    On Wednesday, June 30, 2010, Pierre ANCELOT wrote:
    Hi,
    Okay, so, if I set the 20 by default, I could maybe limit the number
    of
    concurrent maps per node instead?
    job.setNumReduceTasks exists but I see no equivalent for maps, though
    I
    think there was a setNumMapTasks before...
    Was it removed? Why?
    Any idea about how to acheive this?

    Thank you.


    On Wed, Jun 30, 2010 at 12:08 PM, Amareshwari Sri Ramadasu <
    amarsri@yahoo-inc.com> wrote:
    Hi Pierre,

    "mapred.tasktracker.map.tasks.maximum" is a cluster level
    configuration,
    cannot be set per job. It is loaded only while bringing up the
    TaskTracker.
    Thanks
    Amareshwari

    On 6/30/10 3:05 PM, "Pierre ANCELOT" wrote:

    Hi everyone :)
    There's something I'm probably doing wrong but I can't seem to
    figure
    out
    what.
    I have two hadoop programs running one after the other.
    This is done because they don't have the same needs in term of
    processor
    in
    memory, so by separating them I optimize each task better.
    Fact is, I need for the first job on every node
    mapred.tasktracker.map.tasks.maximum set to 12.
    For the second task, I need it to be set to 20.
    so by default I set it to 12 and in the second job's code, I set
    this:
    Configuration hadoopConfiguration = new Configuration();
    hadoopConfiguration.setInt("mapred.tasktracker.map.tasks.maximum",
    20);

    But when running the job, instead of having the 20 tasks on each
    node
    as
    expected, I have 12....
    Any idea please?

    Thank you.
    Pierre.


    --
    http://www.neko-consulting.com
    Ego sum quis ego servo
    "Je suis ce que je protège"
    "I am what I protect"

    --
    http://www.neko-consulting.com
    Ego sum quis ego servo
    "Je suis ce que je protège"
    "I am what I protect"


    --
    http://www.neko-consulting.com
    Ego sum quis ego servo
    "Je suis ce que je protège"
    "I am what I protect"


    --
    http://www.neko-consulting.com
    Ego sum quis ego servo
    "Je suis ce que je protège"
    "I am what I protect"
  • Ken Goodhope at Jun 30, 2010 at 3:01 pm
    What you want to do can be accomplished in the scheduler. Take a look
    at the fair scheduler, specifically the user extensible options. There
    you will find the ability to add some extra logic for deciding if a
    task can be launched on a per job basis. Could be as simple as
    deciding a particular job can't launch more than 12 tasks at a time.

    Capacity scheduler might be able to do this too, but I'm not sure.
    On Wednesday, June 30, 2010, Pierre ANCELOT wrote:
    ok, well, thanks...
    I truely hoped a solution would exist for this.
    Thanks.

    Pierre.
    On Wed, Jun 30, 2010 at 3:56 PM, Yu Li wrote:

    Hi Pierre,

    The "setNumReduceTasks" method is for setting the number of reduce tasks to
    launch, it's equal to set the "mapred.reduce.tasks" parameter, while the
    "mapred.tasktracker.reduce.tasks.maximum" parameter decides the number of
    tasks running *concurrently* on one node.
    And as Amareshwari mentioned, the
    "mapred.tasktracker.map/reduce.tasks.maximum" is a cluster configuration
    which could not be set per job. If you set
    mapred.tasktracker.map.tasks.maximum to 20, and the overall number of map
    tasks is larger than 20*<nodes number>, there would be 20 map tasks running
    concurrently on a node. As I know, you probably need to restart the
    tasktracker if you truely need to change the configuration.

    Best Regards,
    Carp

    2010/6/30 Pierre ANCELOT <pierreact@gmail.com>
    Sure, but not the number of tasks running concurrently on a node at the
    same
    time.


    On Wed, Jun 30, 2010 at 1:57 PM, Ted Yu wrote:

    The number of map tasks is determined by InputSplit.

    On Wednesday, June 30, 2010, Pierre ANCELOT <pierreact@gmail.com>
    wrote:
    Hi,
    Okay, so, if I set the 20 by default, I could maybe limit the number
    of
    concurrent maps per node instead?
    job.setNumReduceTasks exists but I see no equivalent for maps, though
    I
    think there was a setNumMapTasks before...
    Was it removed? Why?
    Any idea about how to acheive this?

    Thank you.


    On Wed, Jun 30, 2010 at 12:08 PM, Amareshwari Sri Ramadasu <
    amarsri@yahoo-inc.com> wrote:
    Hi Pierre,

    "mapred.tasktracker.map.tasks.maximum" is a cluster level
    configuration,
    cannot be set per job. It is loaded only while bringing up the
    TaskTracker.
    Thanks
    Amareshwari

    On 6/30/10 3:05 PM, "Pierre ANCELOT" wrote:

    Hi everyone :)
    There's something I'm probably doing wrong but I can't seem to
    figure
    out
    what.
    I have two hadoop programs running one after the other.
    This is done because they don't have the same needs in term of
    processor
    in
    memory, so by separating them I optimize each task better.
    Fact is, I need for the first job on every node
    mapred.tasktracker.map.tasks.maximum set to 12.
    For the second task, I need it to be set to 20.
    so by default I set it to 12 and in the second job's code, I set
    this:
    Configuration hadoopConfiguration = new Configuration();
    hadoopConfiguration.setInt("mapred.tasktracker.map.tasks.maximum",
    20);

    But when running the job, instead of having the 20 tasks on each
    node
    as
    expected, I have 12....
    Any idea please?

    Thank you.
    Pierre.
    --
    http://www.neko-consulting.com
    Ego sum quis ego servo
    "Je suis ce que je protège"
    "I am what I protect"
  • Arun C Murthy at Jun 30, 2010 at 5:05 pm
    CapacityScheduler has a feature called 'High RAM Jobs' where-in you
    can specify, for a given job, that a single map/reduce task needs more
    than 1 slot. Thus you could consume all the map/reduce slots on a
    given TT for a single task of your job. This should suffice.

    Arun
    On Jun 30, 2010, at 5:09 AM, Pierre ANCELOT wrote:

    Sure, but not the number of tasks running concurrently on a node at
    the same
    time.


    On Wed, Jun 30, 2010 at 1:57 PM, Ted Yu wrote:

    The number of map tasks is determined by InputSplit.

    On Wednesday, June 30, 2010, Pierre ANCELOT <pierreact@gmail.com>
    wrote:
    Hi,
    Okay, so, if I set the 20 by default, I could maybe limit the
    number of
    concurrent maps per node instead?
    job.setNumReduceTasks exists but I see no equivalent for maps,
    though I
    think there was a setNumMapTasks before...
    Was it removed? Why?
    Any idea about how to acheive this?

    Thank you.


    On Wed, Jun 30, 2010 at 12:08 PM, Amareshwari Sri Ramadasu <
    amarsri@yahoo-inc.com> wrote:
    Hi Pierre,

    "mapred.tasktracker.map.tasks.maximum" is a cluster level
    configuration,
    cannot be set per job. It is loaded only while bringing up the
    TaskTracker.
    Thanks
    Amareshwari

    On 6/30/10 3:05 PM, "Pierre ANCELOT" wrote:

    Hi everyone :)
    There's something I'm probably doing wrong but I can't seem to
    figure
    out
    what.
    I have two hadoop programs running one after the other.
    This is done because they don't have the same needs in term of
    processor
    in
    memory, so by separating them I optimize each task better.
    Fact is, I need for the first job on every node
    mapred.tasktracker.map.tasks.maximum set to 12.
    For the second task, I need it to be set to 20.
    so by default I set it to 12 and in the second job's code, I set
    this:

    Configuration hadoopConfiguration = new Configuration();
    hadoopConfiguration.setInt("mapred.tasktracker.map.tasks.maximum",
    20);

    But when running the job, instead of having the 20 tasks on each
    node as
    expected, I have 12....
    Any idea please?

    Thank you.
    Pierre.


    --
    http://www.neko-consulting.com
    Ego sum quis ego servo
    "Je suis ce que je protège"
    "I am what I protect"

    --
    http://www.neko-consulting.com
    Ego sum quis ego servo
    "Je suis ce que je protège"
    "I am what I protect"


    --
    http://www.neko-consulting.com
    Ego sum quis ego servo
    "Je suis ce que je protège"
    "I am what I protect"

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 30, '10 at 9:35a
activeJun 30, '10 at 5:05p
posts10
users7
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase