FAQ
Hello!

How to compress data by using hadoop api's??

I want to write a java code to comperss the core files(the data I am going
to dump in HDFS) and then place in HDFS. So, the api's usage is sufficient.
What about making related changes in hadoop-site.xml file?


--
Regards!
Sugandha

Search Discussions

  • Sugandha Naolekar at Jul 9, 2009 at 12:32 pm

    ---------- Forwarded message ----------
    From: Sugandha Naolekar <sugandha.n87@gmail.com>
    Date: Thu, Jul 9, 2009 at 1:41 PM
    Subject: how to compress..!
    To: core-user@hadoop.apache.org
    Hello!

    How to compress data by using hadoop api's??

    I want to write a java code to comperss the core files(the data I am going
    to dump in HDFS) and then place in HDFS. So, the api's usage is sufficient.
    What about making related changes in hadoop-site.xml file?


    --
    Regards!
    Sugandha



    --
    Regards!
    Sugandha
  • Alex Loddengaard at Jul 9, 2009 at 5:51 pm
    A few comments before I answer:
    1) Each time you send an email, we receive two emails. Is your mail client
    misconfigured?
    2) You already asked this question in another thread :). See my response
    there.

    Short answer: <
    http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/io/SequenceFile.html
    >

    Alex
    On Thu, Jul 9, 2009 at 1:11 AM, Sugandha Naolekar wrote:

    Hello!

    How to compress data by using hadoop api's??

    I want to write a java code to comperss the core files(the data I am going
    to dump in HDFS) and then place in HDFS. So, the api's usage is sufficient.
    What about making related changes in hadoop-site.xml file?


    --
    Regards!
    Sugandha
  • Jason hadoop at Jul 11, 2009 at 5:29 pm
    Here are the set of configuration parameters for compression from 0.19

    You can enable mapred.compress.map.output, and mapred.output.compress
    as well as set mapred.output.compression.type to BLOCK for a good set of
    defaults.

    The compression codec's very by release substantially, so I won't go into
    that.
    BZip to is slow, gzip is medium and lzo is fast, the compression rates seem
    to be move the compression speed

    <property>
    <name>io.compression.codecs</name>

    <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
    <description>A list of the compression codec classes that can be used
    for compression/decompression.</description>
    </property>
    <property>
    <name>mapred.output.compress</name>
    <value>false</value>
    <description>Should the job outputs be compressed?
    </description>
    </property>
    <property>
    <name>mapred.output.compression.type</name>
    <value>RECORD</value>
    <description>If the job outputs are to compressed as SequenceFiles, how
    should
    they be compressed? Should be one of NONE, RECORD or BLOCK.
    </description>
    </property>
    <property>
    <name>mapred.output.compression.codec</name>
    <value>org.apache.hadoop.io.compress.DefaultCodec</value>
    <description>If the job outputs are compressed, how should they be
    compressed?
    </description>
    </property>
    <property>
    <name>mapred.compress.map.output</name>
    <value>false</value>
    <description>Should the outputs of the maps be compressed before being
    sent across the network. Uses SequenceFile compression.
    </description>
    </property>
    <property>
    <name>mapred.map.output.compression.codec</name>
    <value>org.apache.hadoop.io.compress.DefaultCodec</value>
    <description>If the map outputs are compressed, how should they be
    compressed?
    </description>
    </property>
    <property>
    <name>io.seqfile.compress.blocksize</name>
    <value>1000000</value>
    <description>The minimum block size for compression in block compressed
    SequenceFiles.
    </description>
    </property>
    <property>
    <name>io.seqfile.lazydecompress</name>
    <value>true</value>
    <description>Should values of block-compressed SequenceFiles be
    decompressed
    only when necessary.
    </description>
    </property>

    On Thu, Jul 9, 2009 at 10:50 AM, Alex Loddengaard wrote:

    A few comments before I answer:
    1) Each time you send an email, we receive two emails. Is your mail client
    misconfigured?
    2) You already asked this question in another thread :). See my response
    there.

    Short answer: <

    http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/io/SequenceFile.html
    Alex

    On Thu, Jul 9, 2009 at 1:11 AM, Sugandha Naolekar <sugandha.n87@gmail.com
    wrote:
    Hello!

    How to compress data by using hadoop api's??

    I want to write a java code to comperss the core files(the data I am going
    to dump in HDFS) and then place in HDFS. So, the api's usage is
    sufficient.
    What about making related changes in hadoop-site.xml file?


    --
    Regards!
    Sugandha


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJul 9, '09 at 8:12a
activeJul 11, '09 at 5:29p
posts4
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase