1 - I'm trying to compare the size of 1 map output on the map and on
the reduce side. So, I did some code modifications in the MR to see
what's happening when map saves map outputs and the reduce fetchs
them, and I've notice that the map output fetched by the reducer is
smaller 10 bytes than the map saved output.
Running the wordcount example, I've put the following log in the map side:
MAP:: Filename /tmp/dir/hadoop-hadoop/mapred/local/taskTracker/jobcache/job_201008092215_0001/attempt_201008092215_0001_m_000002_0_2_m_0/output/spill_0.out
for map attempt_201008092215_0001_m_000002_0_2_m_0. FinalOutFileSize:
MAP:: Access time: 0
Modification Time: 1281384996000
And on the reduce side, I've the following the following log:
2010-08-09 22:16:42,591 INFO mapred.ReduceTask:2142 header:
attempt_201008092215_0001_m_000002_0_2_m_0, compressed len: 1517,
decompressed len: 1513
I deduce that the 10 bytes less are the first 10 bytes of the map
output. What this 10 bytes contains?
2 - If I'm deducting correctly, the reduce will always fetch 10 bytes
less than the saved map output?