|| at Sep 25, 2011 at 9:01 pm
Hi Arun and Harsh J
Thank you for your replies.
Yes, there will be two finally. But during the map running, there are more
The scenario I mentioned before will not occur with the Hadoop default
partitioner. If there is a partitioner lead to above problem. Is there any
security policy prevent this?
We all know that the unbalanced keys distribution can lead to the
differences of reduce tasks' execution time even in homogeneous environment.
It will be easier to rearrange unbalanced keys if each key occupies a file.
On Sun, Sep 25, 2011 at 2:55 PM, Arun C Murthy wrote:
There is only one file per-map. Actually two, an output file and an index
file to quickly get the offset/length for a given reducer.
The index file is also cached in memory for performance.
On Sep 25, 2011, at 10:00 AM, He Chen wrote:
According to my understanding of Hadoop, it save MapReduce job's
intermediate results into files in the mapper's hard drive. Each key will
occupy a file. I am curious what will happen if mapper's hard drive does not
have enough inodes to save the generated keys. Because every file needs a