I find that when the sub-nodes' hardware configurations are
different, some nodes are strong(more cpus and more memory), others are
weak (leas cpus and less memory), when I run the job, the task are
almost evenly distributed to all the sub-nodes. This makes the weak
nodes pretty slow and a lot of tasks on the weak nodes are killed. This
may lead the whole job processing becoming slow, I am sure, because a
lot of tasks(more than 10 tasks) are processed twice.
Question: How can I configure the hadoop to distribute less
tasks to weak nodes and distribute more tasks to strong nodes?
I configure the strong nodes with
and the weak nodes with
I have 4 nodes totally. One for name node and job tracker, the others
are for sub-nodes.