|| at Aug 24, 2011 at 1:50 pm
The reducer's primary work begins by pulling in data files from all
the other tasktrackers. Due to this fact, assigning multiple reduce
tasks in one go would tax the node (in terms of number of network
connections) since they'll all begin individually connecting and
pulling at about the same time, and for this reason it was chosen to
assign only one per heartbeat, and thereby give each r-task some
breather time to finish up a round of connections before another comes
in to do the same.
On Wed, Aug 24, 2011 at 4:18 PM, Sudharsan Sampath wrote:
I see in the code that while we assign a number of map tasks, we assign only
one reduce task per tasktracker during the heartbeat.
Is there a brief somewhere on why this design decision is made ?