FAQ
Hello guys,
We wonder to know where the compression take place for MapOutputStream in Map phase.

We guess there are two possible places in sortAndSpill() at MapTask.java:
Writer.append() or Writer.close()
Which one makes compression?
Appreciate very much for your response~

See lines marked by ****** as below (from sortAndSpill() at MapTask.java).

for (int i = 0; i < partitions; ++i) {
IFile.Writer<K, V> writer = null;
try {;
writer = new Writer<K, V>(job, out, keyClass, valClass, codec,
spilledRecordsCounter);
if (combinerRunner == null) {

key.reset(kvbuffer, kvindices[kvoff + KEYSTART],
(kvindices[kvoff + VALSTART] -
kvindices[kvoff + KEYSTART]));
/**************************************/
writer.append(key, value); // The 1st possible place
++spindex;
}
} else {

}


// close the writer
/**************************************/
writer.close(); // The 2st possible place

--
Rui Hou (侯锐)
Insititute of Technology, Chinese Academy of Sciences

Search Discussions

  • Harsh Chouraria at Jul 5, 2011 at 10:48 pm
    Hello Rui Hou,

    If you look at the Writer constructor used here, you'll get your answer very easily. It takes a codec (a compression codec, to be specific) as an argument. The codec, if not null (in case compression is disabled), is then responsible for compressing the streams of data by wrapping around the actual output stream.

    The codec variable is initialized during the MapOutputStream construction accordingly.

    The code for how codecs work can be read in the common code for the chosen algorithm, if you'd like to take a look. For example, there's the DefaultCodec class.

    I hope this helps! :)

    P.s. Please do not cross post to multiple lists while seeking an answer. And for future mapreduce development questions such as this, please direct it to mapreduce-dev@hadoop.apache.org
    On 05-Jul-2011, at 7:50 PM, 侯锐 wrote:

    Hello guys,
    We wonder to know where the compression take place for MapOutputStream in Map phase.

    We guess there are two possible places in sortAndSpill() at MapTask.java:
    Writer.append() or Writer.close()
    Which one makes compression?
    Appreciate very much for your response~

    See lines marked by ****** as below (from sortAndSpill() at MapTask.java).

    for (int i = 0; i < partitions; ++i) {
    IFile.Writer<K, V> writer = null;
    try {;
    writer = new Writer<K, V>(job, out, keyClass, valClass, codec,
    spilledRecordsCounter);
    if (combinerRunner == null) {

    key.reset(kvbuffer, kvindices[kvoff + KEYSTART],
    (kvindices[kvoff + VALSTART] -
    kvindices[kvoff + KEYSTART]));
    /**************************************/
    writer.append(key, value); // The 1st possible place
    ++spindex;
    }
    } else {

    }


    // close the writer
    /**************************************/
    writer.close(); // The 2st possible place

    --
    Rui Hou (侯锐)
    Insititute of Technology, Chinese Academy of Sciences


Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedJul 5, '11 at 8:45p
activeJul 5, '11 at 10:48p
posts2
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

侯锐: 1 post Harsh Chouraria: 1 post

People

Translate

site design / logo © 2022 Grokbase