eric baldeschwieler commented on HADOOP-3136:
Don't let me discourage you from getting some experimental data on
this! I'm sure we can find some simple heuristics that are better
than hat we have now. Even on a loaded cluster, the right decision
per node may require more global information. For example, is it a
good strategy to run all the tasks for the first job in the queue on a
subset of the nodes in the cluster? Maybe it is. But you probably
want to be thinking a bit about what the shuffle phase is going to
look like. You may not want very unbalanced racks / nodes. Or in
some cases perhaps that unbalance is optimal!
If a job only runs on 1/5 of the nodes on a cluster, you may amortize
a lot of startup costs that you pay per node over more tasks / node.
This might be really good! Costs to copy jars and start VMs for
example can be significant.
If you have 1000 maps to run on 1000 nodes, your might like to fill
every map slot on a subset of the nodes with at least 2 generations of
maps, rather than running one map / node. This could give you much
better throughput. This would support your idea.
On the other hand, you need to make sure you keep your node and rack
locality high, or that strategy might work really badly. One could
imagine your total locality going down a lot for your tail jobs, as we
just observed. Also, you might create shuffle hotspots, which could
kill your HDFS performance and slow down total throughput...
Assign multiple tasks per TaskTracker heartbeat
Project: Hadoop Core
Issue Type: Improvement
Reporter: Devaraj Das
Assignee: Arun C Murthy
Fix For: 0.20.0
Attachments: HADOOP-3136_0_20080805.patch, HADOOP-3136_1_20080809.patch, HADOOP-3136_2_20080911.patch
In today's logic of finding a new task, we assign only one task per heartbeat.
We probably could give the tasktracker multiple tasks subject to the max number of free slots it has - for maps we could assign it data local tasks. We could probably run some logic to decide what to give it if we run out of data local tasks (e.g., tasks from overloaded racks, tasks that have least locality, etc.). In addition to maps, if it has reduce slots free, we could give it reduce task(s) as well. Again for reduces we could probably run some logic to give more tasks to nodes that are closer to nodes running most maps (assuming data generated is proportional to the number of maps). For e.g., if rack1 has 70% of the input splits, and we know that most maps are data/rack local, we try to schedule ~70% of the reducers there.
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.