Hi,
Does Hadoop cache settings set in hadoop-*xml between runs?
I'm using Hadoop 0.16.2 and have initially set the number of map and reduce tasks to 8 of each. After running a number of jobs I wanted to increase that number (to 23 maps and 11 reduces), so I changed the mapred.map.tasks and mapred.reduce.tasks properties in hadoop-site.xml. I then stopped everything (stop-all.sh) and copied my modified hadoop-site.xml to all nodes in the cluster. I also rebuilt the .job file and pushed that out to all nodes, too.
However, when I start everything up again I *still* see Map Task Capacity is equal to 8, and the same for Reduce Task Capacity.
Am I supposed to do something in addition to the above to make Hadoop "forget" my old settings? I can't find *any* references to mapred.map.tasks in any of the Hadoop files except for my hadoop-site.xml, so I can't figure out why Hadoop is still stuck on 8.
Although the max capacity is set to 8, when I run my jobs now I *do* see that they get broken up into 23 maps and 11 reduces (it was 8 before), but only 8 of them run in parallel. There are 4 dual-code machines in the cluster for a total of 8 cores. Is Hadoop able to figure this out and that is why it runs only 8 tasks in parallel, despite my higher settings?
Thanks,
Otis