FAQ
Hi folks;
I have a small cluster, but each node is big- 8 cores each, with lots
of IO bandwidth. I'd like to increase the number of simultaneous map
and reduce tasks scheduled per node from the default of 2 to something
like 8.
My understanding is that I should be able to do this by increasing
mapred.tasktracker.reduce.tasks.maximum and
mapred.tasktracker.map.tasks.maximum , but doing so does not increase
the number of tasks. I've been running gridmix with these parameters
set to 4, but the average number of tasks per node stays at 4, with 2
reduce and 2 map.
Am I missing something? Do I need to adjust something else as well?

Thanks,
-Joel
welling@psc.edu

Search Discussions

  • Arun C Murthy at Sep 23, 2008 at 6:49 pm

    On Sep 23, 2008, at 11:41 AM, Joel Welling wrote:

    Hi folks;
    I have a small cluster, but each node is big- 8 cores each, with lots
    of IO bandwidth. I'd like to increase the number of simultaneous map
    and reduce tasks scheduled per node from the default of 2 to something
    like 8.
    My understanding is that I should be able to do this by increasing
    mapred.tasktracker.reduce.tasks.maximum and
    mapred.tasktracker.map.tasks.maximum , but doing so does not increase
    the number of tasks. I've been running gridmix with these parameters
    set to 4, but the average number of tasks per node stays at 4, with 2
    reduce and 2 map.
    Am I missing something? Do I need to adjust something else as well?
    Please ensure that _all_ machines (tasktrackers) have this updated
    configuration file... the above config knobs are used by the
    TaskTrackers and hence they need to have the updated configs.

    Arun
    Thanks,
    -Joel
    welling@psc.edu
  • Joel Welling at Sep 23, 2008 at 9:21 pm
    Stopping and restarting the mapred service should push the new .xml file
    out, should it not? I've done 'bin/mapred-stop.sh',
    'bin/mapred-start.sh', and I can see my new values in the
    file:.../mapred/system/job_SomeNumber_SomeNumber/job.xml files
    associated with the jobs. The mapred.tasktracker.map.tasks.maximum
    values shown in those files are 8, but each worker node tasktracker
    still uses the value 2. What file should contain the xml for the
    tasktracker itself? Does the maximum map task number get set when the
    task tracker is spawned, or can a new job reset the number?

    Thanks,
    -Joel
    On Tue, 2008-09-23 at 11:46 -0700, Arun C Murthy wrote:
    On Sep 23, 2008, at 11:41 AM, Joel Welling wrote:

    Hi folks;
    I have a small cluster, but each node is big- 8 cores each, with lots
    of IO bandwidth. I'd like to increase the number of simultaneous map
    and reduce tasks scheduled per node from the default of 2 to something
    like 8.
    My understanding is that I should be able to do this by increasing
    mapred.tasktracker.reduce.tasks.maximum and
    mapred.tasktracker.map.tasks.maximum , but doing so does not increase
    the number of tasks. I've been running gridmix with these parameters
    set to 4, but the average number of tasks per node stays at 4, with 2
    reduce and 2 map.
    Am I missing something? Do I need to adjust something else as well?
    Please ensure that _all_ machines (tasktrackers) have this updated
    configuration file... the above config knobs are used by the
    TaskTrackers and hence they need to have the updated configs.

    Arun
    Thanks,
    -Joel
    welling@psc.edu
  • Arun C Murthy at Sep 23, 2008 at 9:30 pm

    On Sep 23, 2008, at 2:21 PM, Joel Welling wrote:

    Stopping and restarting the mapred service should push the new .xml
    file
    out, should it not? I've done 'bin/mapred-stop.sh',
    No, you need to run 'bin/mapred-stop.sh', push it out to all the
    machines and then do 'bin/mapred-start.sh'.

    You do see it in your job's config - but that config isn't used by the
    TaskTrackers. They use the config in their HADOOP_CONF_DIR; which is
    why you'd need to push it to all machines.

    Arun
  • Joel Welling at Sep 23, 2008 at 11:23 pm
    I think I've found my problem. At some point about a week ago, I must
    have tried to start new tasktracker processes on my worker nodes without
    killing the ones that were already there. The new processes died
    immediately because their sockets were already in use. The old
    processes then took over their roles, running happily with new
    JobTrackers and doing tasks as requested. The pid files that are
    supposed to point to the tasktrackers did not contain their pids,
    however, and 'bin/stop-mapred.sh' chooses its targets from the pid
    files. So I could do 'bin/stop-mapred.sh' all day long without killing
    them. I ended up killing them explicitly one node at a time.

    These tasktrackers knew the *old* config values that were in force when
    they were started, so pushing the new values out to the worker nodes had
    no effect.

    So. Is there any mechanism for killing 'rogue' tasktrackers? I'm a
    little surprised that they are killed via their pids rather than by
    sending them a kill signal via the same mechanism whereby they learn of
    new work.

    -Joel
    welling@psc.edu
    On Tue, 2008-09-23 at 14:29 -0700, Arun C Murthy wrote:
    On Sep 23, 2008, at 2:21 PM, Joel Welling wrote:

    Stopping and restarting the mapred service should push the new .xml
    file
    out, should it not? I've done 'bin/mapred-stop.sh',
    No, you need to run 'bin/mapred-stop.sh', push it out to all the
    machines and then do 'bin/mapred-start.sh'.

    You do see it in your job's config - but that config isn't used by the
    TaskTrackers. They use the config in their HADOOP_CONF_DIR; which is
    why you'd need to push it to all machines.

    Arun

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedSep 23, '08 at 6:43p
activeSep 23, '08 at 11:23p
posts5
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Joel Welling: 3 posts Arun C Murthy: 2 posts

People

Translate

site design / logo © 2022 Grokbase