|| at Jul 28, 2008 at 10:31 am
As far as I know, the reducer has three tasks: fetching results of
mappers, sorting the results, and calling the reduce function.
when some mappers finish their execution, the reducer starts by fetching
their results to save time.
neither sorting nor calling the reduce function could start before all
the mappers have finished and all their results are available locally.
I don't know whether you can prevent copying mappers results before all
mappers finish. Anyway, it would be meaningless.
hope that helped
When i using Hadoop, I noticed that the reducer step is started immediately
when the mappers are still running. According to my project requirement, the
reducer step should not start until all the mappers finish their execution.
Anybody knows how to use some Hadoop API to achieve this? When all the
mappers finish their process, then the reducer is started.