FAQ
I notice that the process reduce > copy is very slow.

I would like to configure hadoop to compress the map ouput.
<property>
<name>mapred.compress.map.output</name>
<value>true</value>
<description></description>
</property>

<property>
<name>map.output.compression.type</name>
<value>RECORD</value>
<description></description>
</property>

I'm wondering if someone already use it or if you have some statistics about
the improvement.

Any advice or feedback are welcome.

Thanks

Search Discussions

  • Marco Nicosia at Aug 2, 2007 at 4:10 pm
    I have some purely subjective experience. I invite anyone with empirical
    evidence to pipe up if possible.

    It can be used, but there are a couple of current important caveats:

    1] If your maps have a tremendous amount of output, the TaskTrackers will
    start producing OutOfMemory exceptions (and depending on which version
    you're using, subsequently hang).
    2] In our experience, you MUST compile native compression libraries, and
    include those in your distribution. If you use Java's compression, you will
    get wildly unpredictable performance, ranging from slow to "why do we even
    bother with computers!?"

    -- Marco

    On 8/2/07 08:53, "Emmanuel" wrote:

    I notice that the process reduce > copy is very slow.

    I would like to configure hadoop to compress the map ouput.
    <property>
    <name>mapred.compress.map.output</name>
    <value>true</value>
    <description></description>
    </property>

    <property>
    <name>map.output.compression.type</name>
    <value>RECORD</value>
    <description></description>
    </property>

    I'm wondering if someone already use it or if you have some statistics about
    the improvement.

    Any advice or feedback are welcome.

    Thanks
    --
    Marco Nicosia - Kryptonite Grid
    Systems, Tools, and Services Group

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedAug 2, '07 at 3:54p
activeAug 2, '07 at 4:10p
posts2
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Emmanuel: 1 post Marco Nicosia: 1 post

People

Translate

site design / logo © 2022 Grokbase