For our CPU-bound application, I set the value of mapred.tasktracker.tasks.maximum (number of map tasks per tasktracker) equal to the number of CPUs on a tasktracker. Unfortunately, I think this value has to be set per cluster, not per machine. This is okay for us because our machines have similar hardware, but it might be a problem if your machines have different numbers of CPUs.

I created HADOOP-1245 a long time ago for this problem, but I've since heard that hadoop uses only the cluster value for maps per tasktracker, not the hybrid model I describe. In any case, I never did any work on fixing it because I don't need heterogeneous clusters.


On 9/25/07 9:37 AM, "Ted Dunning" wrote:
On 9/25/07 9:27 AM, "Bob Futrelle" wrote:

How does Hadoop handle multi-core CPUs? Does each core run a distinct copy
of the mapped app? Is this automatic, or need some configuration, or what?
Works fine. You need to tell it how many maps to run per machine. I expect
that this can be tuned per machine.
Or should I just spread Hadoop over some friendly machines already in my
College, buying nothing?
Or both? You will get interesting results all three ways.

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 6 of 11 | next ›
Discussion Overview
groupcommon-user @
postedSep 10, '07 at 11:56p
activeSep 25, '07 at 7:35p



site design / logo © 2022 Grokbase