FAQ
Hi Justin,

There's a lot of evidence "out on the web" that lz4 generally has higher throughput and better compression than snappy. That sounds like a win/win for lz4. Just google "snappy versus lz4". Or have a look at the issue around adding lz4 support to hadoop: https://issues.apache.org/jira/browse/HADOOP-7657. Based on this we've decided to try lz4. If it presents compatibility problems, we'll use snappy—at least for some of our tables.

BTW, not sure but it sounds like you were implying that lz4 is GPL. This suggests otherwise: http://code.google.com/p/lz4/source/browse/trunk/lz4.h.

Cheers,

Michael
On Apr 15, 2013, at 3:53 PM, Justin Erickson wrote:

For splittable file formats such as SequenceFile we support Snappy, GZIP, and BZIP compression. Snappy provides similar performance benefits as LZO/LZ4 but is better integrated with the rest of the Hadoop stack. It is also not restricted with a GPL license.

For text files, Impala supports LZO compression since it provides splittable compression.

To help prioritize for post GA, can you help us understand why you're using LZ4 compression over Snappy for sequence files?

Thanks,
Justin


On Mon, Apr 15, 2013 at 12:38 PM, Michael Allman wrote:
Hello,

Some of our hive table data is stored as sequence files with lz4 block compression (using "set mapred.output.compression.codec=org.apache.hadoop.io.compress.Lz4Codec"). Impala does not seem to support this compression method. Is support for lz4 planned for a future release?

Cheers,

Michael

Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 2 | next ›
Discussion Overview
groupimpala-user @
categorieshadoop
postedApr 15, '13 at 11:26p
activeApr 16, '13 at 12:25a
posts2
users2
websitecloudera.com
irc#hadoop

2 users in discussion

Justin Erickson: 1 post Michael Allman: 1 post

People

Translate

site design / logo © 2022 Grokbase