When hadoop is running in cluster, the output of the Reducers are
saved in HDFS. The MapReduce have also location awareness on where is
saved the data?
For example, we've TT1 running in Machine1, and TT2 running in
Machine2. The replication of HDFS is 3. The Reduce Task RT1 is running
in TT1. So, when the reducer saves output in HDFS, 2 replicas of the
output goes to TT1 and the third one goes to TT2? Is this what
happens?
Thanks,
--
Pedro