FAQ
Hi,

1 - I'm trying to compare the size of 1 map output on the map and on
the reduce side. So, I did some code modifications in the MR to see
what's happening when map saves map outputs and the reduce fetchs
them, and I've notice that the map output fetched by the reducer is
smaller 10 bytes than the map saved output.

Running the wordcount example, I've put the following log in the map side:

MAP:: Filename /tmp/dir/hadoop-hadoop/mapred/local/taskTracker/jobcache/job_201008092215_0001/attempt_201008092215_0001_m_000002_0_2_m_0/output/spill_0.out
for map attempt_201008092215_0001_m_000002_0_2_m_0. FinalOutFileSize:
1523
MAP:: Access time: 0
BlockSize: 33554432
Group: 1001
Len: 1523
Modification Time: 1281384996000
Owner: hadoop
Path: file:/tmp/dir/hadoop-hadoop/mapred/local/taskTracker/jobcache/job_201008092215_0001/attempt_201008092215_0001_m_000002_0_2_m_0/output/spill_0.out
Permission: rw-r--r--
Replication1


And on the reduce side, I've the following the following log:

2010-08-09 22:16:42,591 INFO mapred.ReduceTask:2142 header:
attempt_201008092215_0001_m_000002_0_2_m_0, compressed len: 1517,
decompressed len: 1513

I deduce that the 10 bytes less are the first 10 bytes of the map
output. What this 10 bytes contains?


2 - If I'm deducting correctly, the reduce will always fetch 10 bytes
less than the saved map output?


--

Pedro

Search Discussions

  • Allen Wittenauer at Aug 10, 2010 at 12:08 am

    On Aug 9, 2010, at 1:27 PM, Pedro Costa wrote:

    2 - If I'm deducting correctly, the reduce will always fetch 10 bytes
    less than the saved map output?
    Why do you care?
  • Pedro Costa at Aug 10, 2010 at 8:24 am
    I would like to add to MR my personal feature to assure that the map
    outputs is transferred correctly to the reduce. Beside of simply
    looking to the CRC code of the mapoutput, I want to guarantee the
    content of the map output hasn't be tampered. I'm assuring the
    correctness of the file by hashing the map output on the map side.
    When the reduce task fetch the map output, it will do another hash on
    the file and it will compare the 2 hashes. As result, the 2 hashes
    must be equal, but for now, they aren't because the reducer fetch a 10
    bytes smaller map output.

    I hope that my explanation was clear.

    Now it's still missing the answers to my previous questions. :)

    Thanks,



    On Tue, Aug 10, 2010 at 1:07 AM, Allen Wittenauer
    wrote:
    On Aug 9, 2010, at 1:27 PM, Pedro Costa wrote:

    2 - If I'm deducting correctly, the reduce will always fetch 10 bytes
    less than the saved map output?
    Why do you care?



    --
    Pedro

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmapreduce-user @
categorieshadoop
postedAug 9, '10 at 8:28p
activeAug 10, '10 at 8:24a
posts3
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Pedro Costa: 2 posts Allen Wittenauer: 1 post

People

Translate

site design / logo © 2022 Grokbase