FAQ
Hi Justin,

There's a lot of evidence "out on the web" that lz4 generally has higher throughput and better compression than snappy. That sounds like a win/win for lz4. Just google "snappy versus lz4". Or have a look at the issue around adding lz4 support to hadoop: https://issues.apache.org/jira/browse/HADOOP-7657. Based on this we've decided to try lz4. If it presents compatibility problems, we'll use snappy—at least for some of our tables.

BTW, not sure but it sounds like you were implying that lz4 is GPL. This suggests otherwise: http://code.google.com/p/lz4/source/browse/trunk/lz4.h.

Cheers,

Michael
On Apr 15, 2013, at 3:53 PM, Justin Erickson wrote:

For splittable file formats such as SequenceFile we support Snappy, GZIP, and BZIP compression. Snappy provides similar performance benefits as LZO/LZ4 but is better integrated with the rest of the Hadoop stack. It is also not restricted with a GPL license.

For text files, Impala supports LZO compression since it provides splittable compression.

To help prioritize for post GA, can you help us understand why you're using LZ4 compression over Snappy for sequence files?

Thanks,
Justin


On Mon, Apr 15, 2013 at 12:38 PM, Michael Allman wrote:
Hello,

Some of our hive table data is stored as sequence files with lz4 block compression (using "set mapred.output.compression.codec=org.apache.hadoop.io.compress.Lz4Codec"). Impala does not seem to support this compression method. Is support for lz4 planned for a future release?

Cheers,

Michael

Search Discussions

  • Justin Erickson at Apr 16, 2013 at 12:25 am
    Thanks. My recommendation for now is to stick with Snappy as it's better
    integrated with the Hadoop/CDH stack and provides similar benefits. We're
    keeping an ear out for demand/adoption of LZ4 for the future Impala roadmap
    taking your feedback into consideration.

    Thanks,
    Justin

    On Mon, Apr 15, 2013 at 4:26 PM, Michael Allman wrote:

    Hi Justin,

    There's a lot of evidence "out on the web" that lz4 generally has higher
    throughput and better compression than snappy. That sounds like a win/win
    for lz4. Just google "snappy versus lz4". Or have a look at the issue
    around adding lz4 support to hadoop:
    https://issues.apache.org/jira/browse/HADOOP-7657. Based on this we've
    decided to try lz4. If it presents compatibility problems, we'll use
    snappy—at least for some of our tables.

    BTW, not sure but it sounds like you were implying that lz4 is GPL. This
    suggests otherwise: http://code.google.com/p/lz4/source/browse/trunk/lz4.h
    .

    Cheers,

    Michael

    On Apr 15, 2013, at 3:53 PM, Justin Erickson wrote:

    For splittable file formats such as SequenceFile we support Snappy, GZIP,
    and BZIP compression. Snappy provides similar performance benefits as
    LZO/LZ4 but is better integrated with the rest of the Hadoop stack. It is
    also not restricted with a GPL license.

    For text files, Impala supports LZO compression since it provides
    splittable compression.

    To help prioritize for post GA, can you help us understand why you're
    using LZ4 compression over Snappy for sequence files?

    Thanks,
    Justin

    On Mon, Apr 15, 2013 at 12:38 PM, Michael Allman wrote:

    Hello,

    Some of our hive table data is stored as sequence files with lz4 block
    compression (using "set
    mapred.output.compression.codec=org.apache.hadoop.io.compress.Lz4Codec").
    Impala does not seem to support this compression method. Is support for lz4
    planned for a future release?

    Cheers,

    Michael

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedApr 15, '13 at 11:26p
activeApr 16, '13 at 12:25a
posts2
users2
websitecloudera.com
irc#hadoop

2 users in discussion

Justin Erickson: 1 post Michael Allman: 1 post

People

Translate

site design / logo © 2022 Grokbase