FAQ
Hi,

Does Hadoop cache settings set in hadoop-*xml between runs?
I'm using Hadoop 0.16.2 and have initially set the number of map and reduce tasks to 8 of each. After running a number of jobs I wanted to increase that number (to 23 maps and 11 reduces), so I changed the mapred.map.tasks and mapred.reduce.tasks properties in hadoop-site.xml. I then stopped everything (stop-all.sh) and copied my modified hadoop-site.xml to all nodes in the cluster. I also rebuilt the .job file and pushed that out to all nodes, too.

However, when I start everything up again I *still* see Map Task Capacity is equal to 8, and the same for Reduce Task Capacity.
Am I supposed to do something in addition to the above to make Hadoop "forget" my old settings? I can't find *any* references to mapred.map.tasks in any of the Hadoop files except for my hadoop-site.xml, so I can't figure out why Hadoop is still stuck on 8.

Although the max capacity is set to 8, when I run my jobs now I *do* see that they get broken up into 23 maps and 11 reduces (it was 8 before), but only 8 of them run in parallel. There are 4 dual-code machines in the cluster for a total of 8 cores. Is Hadoop able to figure this out and that is why it runs only 8 tasks in parallel, despite my higher settings?

Thanks,
Otis

Search Discussions

  • Otis Gospodnetic at Apr 21, 2008 at 7:35 pm
    It turns out Hadoop was not remembering anything and the answer is in the FAQ:

    http://wiki.apache.org/hadoop/FAQ#13

    Otis
    --
    Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

    ----- Original Message ----
    From: Otis Gospodnetic <otis_gospodnetic@yahoo.com>
    To: core-user@hadoop.apache.org
    Sent: Sunday, April 20, 2008 8:14:43 PM
    Subject: Hadoop "remembering" old mapred.map.tasks

    Hi,

    Does Hadoop cache settings set in hadoop-*xml between runs?
    I'm using Hadoop 0.16.2 and have initially set the number of map and reduce
    tasks to 8 of each. After running a number of jobs I wanted to increase that
    number (to 23 maps and 11 reduces), so I changed the mapred.map.tasks and
    mapred.reduce.tasks properties in hadoop-site.xml. I then stopped everything
    (stop-all.sh) and copied my modified hadoop-site.xml to all nodes in the
    cluster. I also rebuilt the .job file and pushed that out to all nodes, too.

    However, when I start everything up again I *still* see Map Task Capacity is
    equal to 8, and the same for Reduce Task Capacity.
    Am I supposed to do something in addition to the above to make Hadoop "forget"
    my old settings? I can't find *any* references to mapred.map.tasks in any of
    the Hadoop files except for my hadoop-site.xml, so I can't figure out why Hadoop
    is still stuck on 8.

    Although the max capacity is set to 8, when I run my jobs now I *do* see that
    they get broken up into 23 maps and 11 reduces (it was 8 before), but only 8 of
    them run in parallel. There are 4 dual-code machines in the cluster for a total
    of 8 cores. Is Hadoop able to figure this out and that is why it runs only 8
    tasks in parallel, despite my higher settings?

    Thanks,
    Otis

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedApr 21, '08 at 12:15a
activeApr 21, '08 at 7:35p
posts2
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Otis Gospodnetic: 2 posts

People

Translate

site design / logo © 2022 Grokbase