You can enable mapred.compress.map.output, and mapred.output.compress
as well as set mapred.output.compression.type to BLOCK for a good set of
defaults.
The compression codec's very by release substantially, so I won't go into
that.
BZip to is slow, gzip is medium and lzo is fast, the compression rates seem
to be move the compression speed
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
<description>A list of the compression codec classes that can be used
for compression/decompression.</description>
</property>
<property>
<name>mapred.output.compress</name>
<value>false</value>
<description>Should the job outputs be compressed?
</description>
</property>
<property>
<name>mapred.output.compression.type</name>
<value>RECORD</value>
<description>If the job outputs are to compressed as SequenceFiles, how
should
they be compressed? Should be one of NONE, RECORD or BLOCK.
</description>
</property>
<property>
<name>mapred.output.compression.codec</name>
<value>org.apache.hadoop.io.compress.DefaultCodec</value>
<description>If the job outputs are compressed, how should they be
compressed?
</description>
</property>
<property>
<name>mapred.compress.map.output</name>
<value>false</value>
<description>Should the outputs of the maps be compressed before being
sent across the network. Uses SequenceFile compression.
</description>
</property>
<property>
<name>mapred.map.output.compression.codec</name>
<value>org.apache.hadoop.io.compress.DefaultCodec</value>
<description>If the map outputs are compressed, how should they be
compressed?
</description>
</property>
<property>
<name>io.seqfile.compress.blocksize</name>
<value>1000000</value>
<description>The minimum block size for compression in block compressed
SequenceFiles.
</description>
</property>
<property>
<name>io.seqfile.lazydecompress</name>
<value>true</value>
<description>Should values of block-compressed SequenceFiles be
decompressed
only when necessary.
</description>
</property>
On Thu, Jul 9, 2009 at 10:50 AM, Alex Loddengaard wrote:
A few comments before I answer:
1) Each time you send an email, we receive two emails. Is your mail client
misconfigured?
2) You already asked this question in another thread :). See my response
there.
Short answer: <
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/io/SequenceFile.html
Alex
On Thu, Jul 9, 2009 at 1:11 AM, Sugandha Naolekar <sugandha.n87@gmail.com
A few comments before I answer:
1) Each time you send an email, we receive two emails. Is your mail client
misconfigured?
2) You already asked this question in another thread :). See my response
there.
Short answer: <
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/io/SequenceFile.html
Alex
On Thu, Jul 9, 2009 at 1:11 AM, Sugandha Naolekar <sugandha.n87@gmail.com
wrote:
Hello!
How to compress data by using hadoop api's??
I want to write a java code to comperss the core files(the data I am going
to dump in HDFS) and then place in HDFS. So, the api's usage is
sufficient.
What about making related changes in hadoop-site.xml file?
--
Regards!
Sugandha
Hello!
How to compress data by using hadoop api's??
I want to write a java code to comperss the core files(the data I am going
to dump in HDFS) and then place in HDFS. So, the api's usage is
sufficient.
What about making related changes in hadoop-site.xml file?
--
Regards!
Sugandha
--
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals