FAQ
Integrate Snappy compression
----------------------------

Key: HADOOP-7206
URL: https://issues.apache.org/jira/browse/HADOOP-7206
Project: Hadoop Common
Issue Type: New Feature
Reporter: Eli Collins


Google release Zippy as an open source (APLv2) project called Snappy (http://code.google.com/p/snappy). This tracks integrating it into Hadoop.

{quote}
Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more.
{quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Search Discussions

  • Tsz Wo (Nicholas), SZE (JIRA) at Jun 20, 2011 at 5:47 pm
    [ https://issues.apache.org/jira/browse/HADOOP-7206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tsz Wo (Nicholas), SZE reopened HADOOP-7206:
    --------------------------------------------

    Integrate Snappy compression
    ----------------------------

    Key: HADOOP-7206
    URL: https://issues.apache.org/jira/browse/HADOOP-7206
    Project: Hadoop Common
    Issue Type: New Feature
    Affects Versions: 0.21.0
    Reporter: Eli Collins
    Assignee: T Jake Luciani
    Fix For: 0.23.0

    Attachments: HADOOP-7206-002.patch, HADOOP-7206.patch, v2-HADOOP-7206-snappy-codec-using-snappy-java.txt, v3-HADOOP-7206-snappy-codec-using-snappy-java.txt, v4-HADOOP-7206-snappy-codec-using-snappy-java.txt, v5-HADOOP-7206-snappy-codec-using-snappy-java.txt


    Google release Zippy as an open source (APLv2) project called Snappy (http://code.google.com/p/snappy). This tracks integrating it into Hadoop.
    {quote}
    Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more.
    {quote}
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Tsz Wo (Nicholas), SZE (JIRA) at Jun 20, 2011 at 9:03 pm
    [ https://issues.apache.org/jira/browse/HADOOP-7206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tsz Wo (Nicholas), SZE resolved HADOOP-7206.
    --------------------------------------------

    Resolution: Fixed

    Let's fix the javadoc in HADOOP-7408. Thanks T Jake and Tom.
    Integrate Snappy compression
    ----------------------------

    Key: HADOOP-7206
    URL: https://issues.apache.org/jira/browse/HADOOP-7206
    Project: Hadoop Common
    Issue Type: New Feature
    Affects Versions: 0.21.0
    Reporter: Eli Collins
    Assignee: T Jake Luciani
    Fix For: 0.23.0

    Attachments: HADOOP-7206-002.patch, HADOOP-7206.patch, v2-HADOOP-7206-snappy-codec-using-snappy-java.txt, v3-HADOOP-7206-snappy-codec-using-snappy-java.txt, v4-HADOOP-7206-snappy-codec-using-snappy-java.txt, v5-HADOOP-7206-snappy-codec-using-snappy-java.txt


    Google release Zippy as an open source (APLv2) project called Snappy (http://code.google.com/p/snappy). This tracks integrating it into Hadoop.
    {quote}
    Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more.
    {quote}
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Alejandro Abdelnur (JIRA) at Jun 23, 2011 at 1:03 am
    [ https://issues.apache.org/jira/browse/HADOOP-7206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Alejandro Abdelnur reopened HADOOP-7206:
    ----------------------------------------


    After mulling over this issue a bit more, reading a few times Todd's comment and asking around to folks that deal with nativelibs I'm having second thoughts about the committed patch based on snappy-java.

    The snappy-java approach is tempting because it 'just works' (without having to install snappy SO in your system). However, it has a serious drawback; the native code is not built in target OS, only on the same architecture. Because of this the build is not easy reproducible as there is not knowledge of the OS used to build it. In addition, this can lead to not avail dependencies in the running OS.

    The hadoop-snappy approach has the drawback that it requires an additional step (to install snappy SO in the platform), but as benefits it takes care of the drawbacks of the snappy-java approach; the native code is built in the target OS. Thus, resulting on easy reproducible builds. Furthermore the drawback is transient, until snappy is avail the different OSes by default or OS driven updates.

    A secondary issue is that snappy-java nativelib statically links snappy. As snappy SO makes it to standard Linux distributions, snappy-java will use a private copy of it instead using the one installed in the OS. On the other hand, hadoop-snappy SO dynamically links snappy SO, when snappy SO is available in the OS, it could be consumed directly from it. (this could be taken care by snappy-java if it changes to dynamically link snappy SO).

    Because of this I'd like to revert the snappy-java based patch and go for Issay's hadoop-snappy patch.
    Integrate Snappy compression
    ----------------------------

    Key: HADOOP-7206
    URL: https://issues.apache.org/jira/browse/HADOOP-7206
    Project: Hadoop Common
    Issue Type: New Feature
    Affects Versions: 0.21.0
    Reporter: Eli Collins
    Assignee: Alejandro Abdelnur
    Fix For: 0.23.0

    Attachments: HADOOP-7206-002.patch, HADOOP-7206.patch, v2-HADOOP-7206-snappy-codec-using-snappy-java.txt, v3-HADOOP-7206-snappy-codec-using-snappy-java.txt, v4-HADOOP-7206-snappy-codec-using-snappy-java.txt, v5-HADOOP-7206-snappy-codec-using-snappy-java.txt


    Google release Zippy as an open source (APLv2) project called Snappy (http://code.google.com/p/snappy). This tracks integrating it into Hadoop.
    {quote}
    Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more.
    {quote}
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedMar 22, '11 at 11:46p
activeJun 23, '11 at 1:03a
posts4
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Alejandro Abdelnur (JIRA): 4 posts

People

Translate

site design / logo © 2022 Grokbase