FAQ
Here are the set of configuration parameters for compression from 0.19

You can enable mapred.compress.map.output, and mapred.output.compress
as well as set mapred.output.compression.type to BLOCK for a good set of
defaults.

The compression codec's very by release substantially, so I won't go into
that.
BZip to is slow, gzip is medium and lzo is fast, the compression rates seem
to be move the compression speed

<property>
<name>io.compression.codecs</name>

<value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
<description>A list of the compression codec classes that can be used
for compression/decompression.</description>
</property>
<property>
<name>mapred.output.compress</name>
<value>false</value>
<description>Should the job outputs be compressed?
</description>
</property>
<property>
<name>mapred.output.compression.type</name>
<value>RECORD</value>
<description>If the job outputs are to compressed as SequenceFiles, how
should
they be compressed? Should be one of NONE, RECORD or BLOCK.
</description>
</property>
<property>
<name>mapred.output.compression.codec</name>
<value>org.apache.hadoop.io.compress.DefaultCodec</value>
<description>If the job outputs are compressed, how should they be
compressed?
</description>
</property>
<property>
<name>mapred.compress.map.output</name>
<value>false</value>
<description>Should the outputs of the maps be compressed before being
sent across the network. Uses SequenceFile compression.
</description>
</property>
<property>
<name>mapred.map.output.compression.codec</name>
<value>org.apache.hadoop.io.compress.DefaultCodec</value>
<description>If the map outputs are compressed, how should they be
compressed?
</description>
</property>
<property>
<name>io.seqfile.compress.blocksize</name>
<value>1000000</value>
<description>The minimum block size for compression in block compressed
SequenceFiles.
</description>
</property>
<property>
<name>io.seqfile.lazydecompress</name>
<value>true</value>
<description>Should values of block-compressed SequenceFiles be
decompressed
only when necessary.
</description>
</property>

On Thu, Jul 9, 2009 at 10:50 AM, Alex Loddengaard wrote:

A few comments before I answer:
1) Each time you send an email, we receive two emails. Is your mail client
misconfigured?
2) You already asked this question in another thread :). See my response
there.

Short answer: <

http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/io/SequenceFile.html
Alex

On Thu, Jul 9, 2009 at 1:11 AM, Sugandha Naolekar <sugandha.n87@gmail.com
wrote:
Hello!

How to compress data by using hadoop api's??

I want to write a java code to comperss the core files(the data I am going
to dump in HDFS) and then place in HDFS. So, the api's usage is
sufficient.
What about making related changes in hadoop-site.xml file?


--
Regards!
Sugandha


--
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 4 of 4 | next ›
Discussion Overview
groupcommon-user @
categorieshadoop
postedJul 9, '09 at 8:12a
activeJul 11, '09 at 5:29p
posts4
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase