FAQ
Hello Cloudera list,

We recently setup a new CDH 4.1 cluster on Debian Squeeze using the
Debian packages, and we are unable to use the native zlib compression
libraries that should be included in libhadoop. As a part of our
MapReduce logs, we see "WARN zlib.ZlibFactory: Failed to load/initialize
native-zlib library." Likewise, hadoop fs -text shows the same warning
when reading sequence files compressed with gzip.

Adding -verbose:jni to HADOOP_OPTS shows the following as part of the
output:
[Dynamic-linking native method
org.apache.hadoop.io.compress.zlib.ZlibCompressor.initIDs ...
JNI]

On a development machine running CDH 4.0.1 (also on Debian Squeeze), we
see the following, expected output, both when running the same code and
using hadoop fs -text.
[Dynamic-linking native method
org.apache.hadoop.io.compress.zlib.ZlibCompressor.initIDs ...
JNI]
[Dynamic-linking native method
org.apache.hadoop.io.compress.zlib.ZlibDecompressor.initIDs ...
JNI]
INFO zlib.ZlibFactory: Successfully loaded & initialized
native-zlib library

The static blocks of ZlibCompressor and ZlibDecompressor both call
NativeCodeLoader to see if the native library "hadoop" can be loaded.
After turning on debugging for NativeCodeLoader, we see the following in
the output:
DEBUG util.NativeCodeLoader: Trying to load the custom-built
native-hadoop library...
DEBUG util.NativeCodeLoader: Loaded the native-hadoop library

Examining the libhadoop.so file on the CDH 4.1 machine (using objdump),
all of the necessary native methods for both ZlibCompressor and
ZlibDecompressor appear to be present.
* 0000000000004d50
<Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_setDictionary>
* 0000000000004e90
<Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_getBytesRead>
* 0000000000004ea0
<Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_getBytesWritten>
* 0000000000004eb0
<Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_reset>
* 0000000000004f30
<Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_end>
* 0000000000004fd0
<Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_deflateBytesDirect>
* 0000000000005460
<Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_init>
* 0000000000005680
<Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_initIDs>
* 00000000000058f0
<Java_org_apache_hadoop_io_compress_zlib_ZlibDecompressor_setDictionary>
* 0000000000005a40
<Java_org_apache_hadoop_io_compress_zlib_ZlibDecompressor_getBytesRead>
* 0000000000005a50
<Java_org_apache_hadoop_io_compress_zlib_ZlibDecompressor_getBytesWritten>
* 0000000000005a60
<Java_org_apache_hadoop_io_compress_zlib_ZlibDecompressor_getRemaining>
* 0000000000005a70
<Java_org_apache_hadoop_io_compress_zlib_ZlibDecompressor_reset>
* 0000000000005af0
<Java_org_apache_hadoop_io_compress_zlib_ZlibDecompressor_end>
* 0000000000005b90
<Java_org_apache_hadoop_io_compress_zlib_ZlibDecompressor_inflateBytesDirect>
* 0000000000006080
<Java_org_apache_hadoop_io_compress_zlib_ZlibDecompressor_init>
* 0000000000006260
<Java_org_apache_hadoop_io_compress_zlib_ZlibDecompressor_initIDs>


Given all of the above, it appears that libhadoop is being loaded
correctly and all of the necessary functions are present in libhadoop.
Since the -verbose:jni output on the CDH 4.1 machine does not contain
the ZlibDecompressor.initIDs line, but it does contain the
ZlibCompressor.initIDs line, we think that there may be a bug in the
native decompressor code that is preventing the decompressor from
loading.

I was hoping to avoid building a custom version of CDH with additional
debugging in ZlibDecompressor -- the Throwable from the initIDs method
is caught but is not logged in any way when testing if native libraries
are available -- so I was wondering if this gzip issue might be a known
issue with CDH 4.1. We tried use the native libhadoop library from CDH
4.0.1 on the CDH 4.1 cluster, but this did not solve the issue.

It should also be noted that the native snappy compression libraries
work perfectly on the CDH 4.1 machine:
DEBUG util.NativeCodeLoader: Trying to load the custom-built
native-hadoop library...
DEBUG util.NativeCodeLoader: Loaded the native-hadoop library
WARN snappy.LoadSnappy: Snappy native library is available
INFO snappy.LoadSnappy: Snappy native library loaded
[Dynamic-linking native method
org.apache.hadoop.io.compress.snappy.SnappyDecompressor.initIDs ... JNI]
INFO compress.CodecPool: Got brand-new decompressor [.snappy]
INFO compress.CodecPool: Got brand-new decompressor [.snappy]
INFO compress.CodecPool: Got brand-new decompressor [.snappy]
INFO compress.CodecPool: Got brand-new decompressor [.snappy]
[Dynamic-linking native method
org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompressBytesDirect ... JNI]

Another, possibly-related issue is that, with the native libraries
failing in the manner described above, SequenceFileOuputFormat produces
sequence files that are block compressed with gzip that cannot be read
by subsequent jobs using SequenceFileInputFormat. The magic number check
in BuiltInGzipDecompressor is failing when reading from the sequence
file in the second job, causing a "not a gzip file" error to be thrown.
We do not yet know if this is associated with the above native library
issue; we thought we would get the native libraries working first before
proceeding to debug the second issue if gzip is still not working
correctly. We do not think any of our code is causing the issue, since
the code works perfectly under CDH 4.0.1.

Any help further debugging and solving either of these issues would be
appreciated.

Thanks,

--Brandon

--

Search Discussions

  • Colin McCabe at Oct 8, 2012 at 5:43 am
    I don't know of any reason why libz would be loaded on CDH4.0 but not
    on CDH4.1. There were some changes to the way snappy was loaded, but
    not libz. It's also very odd to see the ZlibCompressor functionality
    load, but not the ZlibDecompressor.

    The first thing to check is that you really do have libz.so installed
    on both test machines. We have had some issues in the past where
    apparent regressions were the result of misconfigurations on the new
    machines. (I realize this doesn't explain ZlibCompressor appearing to
    load.)

    One thing you might try is copying libz.so to the same directory as
    libhadoop.so. This will ensure that libz.so gets loaded if
    libhadoop.so does.

    If that doesn't work, then I'm afraid you will have to do a rebuild to
    get access to that exception in ZlibDecompressor.java. I think this
    is part of the code that will have to improve in the future.

    sincerely,
    Colin


    The code for loading zlib didn't change between CDH4.0 and CDH4.1
    On Fri, Oct 5, 2012 at 11:51 AM, Brandon Vargo wrote:
    Hello Cloudera list,

    We recently setup a new CDH 4.1 cluster on Debian Squeeze using the
    Debian packages, and we are unable to use the native zlib compression
    libraries that should be included in libhadoop. As a part of our
    MapReduce logs, we see "WARN zlib.ZlibFactory: Failed to load/initialize
    native-zlib library." Likewise, hadoop fs -text shows the same warning
    when reading sequence files compressed with gzip.

    Adding -verbose:jni to HADOOP_OPTS shows the following as part of the
    output:
    [Dynamic-linking native method
    org.apache.hadoop.io.compress.zlib.ZlibCompressor.initIDs ...
    JNI]

    On a development machine running CDH 4.0.1 (also on Debian Squeeze), we
    see the following, expected output, both when running the same code and
    using hadoop fs -text.
    [Dynamic-linking native method
    org.apache.hadoop.io.compress.zlib.ZlibCompressor.initIDs ...
    JNI]
    [Dynamic-linking native method
    org.apache.hadoop.io.compress.zlib.ZlibDecompressor.initIDs ...
    JNI]
    INFO zlib.ZlibFactory: Successfully loaded & initialized
    native-zlib library

    The static blocks of ZlibCompressor and ZlibDecompressor both call
    NativeCodeLoader to see if the native library "hadoop" can be loaded.
    After turning on debugging for NativeCodeLoader, we see the following in
    the output:
    DEBUG util.NativeCodeLoader: Trying to load the custom-built
    native-hadoop library...
    DEBUG util.NativeCodeLoader: Loaded the native-hadoop library

    Examining the libhadoop.so file on the CDH 4.1 machine (using objdump),
    all of the necessary native methods for both ZlibCompressor and
    ZlibDecompressor appear to be present.
    * 0000000000004d50
    <Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_setDictionary>
    * 0000000000004e90
    <Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_getBytesRead>
    * 0000000000004ea0
    <Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_getBytesWritten>
    * 0000000000004eb0
    <Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_reset>
    * 0000000000004f30
    <Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_end>
    * 0000000000004fd0
    <Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_deflateBytesDirect>
    * 0000000000005460
    <Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_init>
    * 0000000000005680
    <Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_initIDs>
    * 00000000000058f0
    <Java_org_apache_hadoop_io_compress_zlib_ZlibDecompressor_setDictionary>
    * 0000000000005a40
    <Java_org_apache_hadoop_io_compress_zlib_ZlibDecompressor_getBytesRead>
    * 0000000000005a50
    <Java_org_apache_hadoop_io_compress_zlib_ZlibDecompressor_getBytesWritten>
    * 0000000000005a60
    <Java_org_apache_hadoop_io_compress_zlib_ZlibDecompressor_getRemaining>
    * 0000000000005a70
    <Java_org_apache_hadoop_io_compress_zlib_ZlibDecompressor_reset>
    * 0000000000005af0
    <Java_org_apache_hadoop_io_compress_zlib_ZlibDecompressor_end>
    * 0000000000005b90
    <Java_org_apache_hadoop_io_compress_zlib_ZlibDecompressor_inflateBytesDirect>
    * 0000000000006080
    <Java_org_apache_hadoop_io_compress_zlib_ZlibDecompressor_init>
    * 0000000000006260
    <Java_org_apache_hadoop_io_compress_zlib_ZlibDecompressor_initIDs>


    Given all of the above, it appears that libhadoop is being loaded
    correctly and all of the necessary functions are present in libhadoop.
    Since the -verbose:jni output on the CDH 4.1 machine does not contain
    the ZlibDecompressor.initIDs line, but it does contain the
    ZlibCompressor.initIDs line, we think that there may be a bug in the
    native decompressor code that is preventing the decompressor from
    loading.

    I was hoping to avoid building a custom version of CDH with additional
    debugging in ZlibDecompressor -- the Throwable from the initIDs method
    is caught but is not logged in any way when testing if native libraries
    are available -- so I was wondering if this gzip issue might be a known
    issue with CDH 4.1. We tried use the native libhadoop library from CDH
    4.0.1 on the CDH 4.1 cluster, but this did not solve the issue.

    It should also be noted that the native snappy compression libraries
    work perfectly on the CDH 4.1 machine:
    DEBUG util.NativeCodeLoader: Trying to load the custom-built
    native-hadoop library...
    DEBUG util.NativeCodeLoader: Loaded the native-hadoop library
    WARN snappy.LoadSnappy: Snappy native library is available
    INFO snappy.LoadSnappy: Snappy native library loaded
    [Dynamic-linking native method
    org.apache.hadoop.io.compress.snappy.SnappyDecompressor.initIDs ... JNI]
    INFO compress.CodecPool: Got brand-new decompressor [.snappy]
    INFO compress.CodecPool: Got brand-new decompressor [.snappy]
    INFO compress.CodecPool: Got brand-new decompressor [.snappy]
    INFO compress.CodecPool: Got brand-new decompressor [.snappy]
    [Dynamic-linking native method
    org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompressBytesDirect ... JNI]

    Another, possibly-related issue is that, with the native libraries
    failing in the manner described above, SequenceFileOuputFormat produces
    sequence files that are block compressed with gzip that cannot be read
    by subsequent jobs using SequenceFileInputFormat. The magic number check
    in BuiltInGzipDecompressor is failing when reading from the sequence
    file in the second job, causing a "not a gzip file" error to be thrown.
    We do not yet know if this is associated with the above native library
    issue; we thought we would get the native libraries working first before
    proceeding to debug the second issue if gzip is still not working
    correctly. We do not think any of our code is causing the issue, since
    the code works perfectly under CDH 4.0.1.

    Any help further debugging and solving either of these issues would be
    appreciated.

    Thanks,

    --Brandon

    --

    --
  • Brandon Vargo at Oct 8, 2012 at 5:53 pm

    On Sun, 2012-10-07 at 22:43 -0700, Colin McCabe wrote:
    I don't know of any reason why libz would be loaded on CDH4.0 but not
    on CDH4.1. There were some changes to the way snappy was loaded, but
    not libz. It's also very odd to see the ZlibCompressor functionality
    load, but not the ZlibDecompressor.

    The first thing to check is that you really do have libz.so installed
    on both test machines. We have had some issues in the past where
    apparent regressions were the result of misconfigurations on the new
    machines. (I realize this doesn't explain ZlibCompressor appearing to
    load.)

    One thing you might try is copying libz.so to the same directory as
    libhadoop.so. This will ensure that libz.so gets loaded if
    libhadoop.so does.

    If that doesn't work, then I'm afraid you will have to do a rebuild to
    get access to that exception in ZlibDecompressor.java. I think this
    is part of the code that will have to improve in the future.

    sincerely,
    Colin
    Ok, thanks for the suggestion. Copying libz
    to /usr/lib/hadoop/lib/native/ did not help. However, it did help me
    find the solution.

    Upon further inspection, I noticed that libz was not getting mapped into
    the memory space of the hadoop java processes, as is expected with
    dynamic linking. This is curious, since I thought that the JVM debugging
    output was claiming that the native library was being used. It turns out
    that -verbose:jni prints upon method resolution, even if the resolution
    was not succeeding. Thus, the JNI debug methods were saying that the JNI
    method was trying to be accessed, not that the access had succeeded, as
    I had thought. The call to ZlibCompressor.initIDs was actually failing,
    which is why the ZlibDecompressor call was not showing up in the logs.

    Installing zlib1g-dev made everything start to work. Among other things,
    the zlib1g-dev package in Debian contains /usr/lib/libz.so
    and /usr/lib/libz.a. The former is a symlink to the current libz.so
    version -- in our case, libz.so.1.2.3.4. On our CDH 4.0 machines,
    libz.so.1 being a symlink was sufficient. However, on the new machines,
    where libz.so.1 is also present, the library was not being resolved
    correctly by the hadoop java processes for some reason; all other uses
    of libz work fine. At any rate, creating a symlink from libz.so to
    libz.so.1.2.3.4 makes native gzip work under CDH 4.1 on Debian. I do not
    yet know if this is an artifact of the linking configuration being
    subtly different between our machines, a difference in the java versions
    between the two machines, or something else.

    We now get the expected output:

    [Dynamic-linking native method
    org.apache.hadoop.io.compress.zlib.ZlibCompressor.initIDs ... JNI]
    [Dynamic-linking native method
    org.apache.hadoop.io.compress.zlib.ZlibDecompressor.initIDs ... JNI]
    12/10/08 16:26:57 INFO zlib.ZlibFactory: Successfully loaded &
    initialized native-zlib library
    [Dynamic-linking native method
    org.apache.hadoop.io.compress.zlib.ZlibDecompressor.init ... JNI]
    12/10/08 16:26:57 INFO compress.CodecPool: Got brand-new decompressor
    [.gz]
    12/10/08 16:26:57 INFO compress.CodecPool: Got brand-new decompressor
    [.gz]
    12/10/08 16:26:57 INFO compress.CodecPool: Got brand-new decompressor
    [.gz]
    12/10/08 16:26:57 INFO compress.CodecPool: Got brand-new decompressor
    [.gz]
    [Dynamic-linking native method
    org.apache.hadoop.io.compress.zlib.ZlibDecompressor.reset ... JNI]
    [Dynamic-linking native method
    org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect ... JNI]

    Thanks for the help.

    --Brandon

    --
  • Andy Isaacson at Oct 8, 2012 at 6:13 pm

    On Mon, Oct 8, 2012 at 9:53 AM, Brandon Vargo wrote:
    Installing zlib1g-dev made everything start to work. Among other things,
    the zlib1g-dev package in Debian contains /usr/lib/libz.so
    and /usr/lib/libz.a. The former is a symlink to the current libz.so
    version -- in our case, libz.so.1.2.3.4. On our CDH 4.0 machines,
    libz.so.1 being a symlink was sufficient. However, on the new machines,
    where libz.so.1 is also present, the library was not being resolved
    correctly by the hadoop java processes for some reason; all other uses
    of libz work fine. At any rate, creating a symlink from libz.so to
    libz.so.1.2.3.4 makes native gzip work under CDH 4.1 on Debian. I do not
    yet know if this is an artifact of the linking configuration being
    subtly different between our machines, a difference in the java versions
    between the two machines, or something else.
    Thanks for reporting this! I'm glad you got a solution.

    Could you send me (off-list) the output of "dpkg -l > dpkg.out" and
    "ls -l /lib /usr/lib" from a system where you're seeing the failure to
    load zlib due to the missing libz.so symlink? I'd like to root cause
    the change to ensure that we don't have a hidden dependency in our
    build system.

    -andy

    --
  • Andy Isaacson at Oct 9, 2012 at 1:23 am

    On Mon, Oct 8, 2012 at 9:53 AM, Brandon Vargo wrote:
    Installing zlib1g-dev made everything start to work. Among other things,
    the zlib1g-dev package in Debian contains /usr/lib/libz.so
    and /usr/lib/libz.a. The former is a symlink to the current libz.so
    version -- in our case, libz.so.1.2.3.4. On our CDH 4.0 machines,
    libz.so.1 being a symlink was sufficient. However, on the new machines,
    where libz.so.1 is also present, the library was not being resolved
    correctly by the hadoop java processes for some reason; all other uses
    of libz work fine. At any rate, creating a symlink from libz.so to
    libz.so.1.2.3.4 makes native gzip work under CDH 4.1 on Debian. I do not
    yet know if this is an artifact of the linking configuration being
    subtly different between our machines, a difference in the java versions
    between the two machines, or something else.
    Brandon,

    Thanks for reporting this bug, it turns out to be caused by a cmake change.
    https://issues.apache.org/jira/browse/HADOOP-8901

    -andy

    --
  • Colin McCabe at Oct 10, 2012 at 9:14 pm
    Hi Brandon,

    Thanks again for the very clear bug report. As Andy mentioned, this
    turns out to be a bug in 4.1.

    Basically, Hadoop is supposed to work with just libz.so.1 and
    libsnappy.so.1 installed, but instead, we're searching only for
    libz.so (versionless) and libsnappy.so (versionless) instead. If
    anyone else hits this problem, the workaround is to install zlib-devel
    or snappy-devel (exact names depend on your distro.) These packages
    will contain the versionless variants of the aforementioned libraries.

    This should be fixed in 4.2.

    cheers.
    Colin

    On Mon, Oct 8, 2012 at 6:23 PM, Andy Isaacson wrote:
    On Mon, Oct 8, 2012 at 9:53 AM, Brandon Vargo wrote:
    Installing zlib1g-dev made everything start to work. Among other things,
    the zlib1g-dev package in Debian contains /usr/lib/libz.so
    and /usr/lib/libz.a. The former is a symlink to the current libz.so
    version -- in our case, libz.so.1.2.3.4. On our CDH 4.0 machines,
    libz.so.1 being a symlink was sufficient. However, on the new machines,
    where libz.so.1 is also present, the library was not being resolved
    correctly by the hadoop java processes for some reason; all other uses
    of libz work fine. At any rate, creating a symlink from libz.so to
    libz.so.1.2.3.4 makes native gzip work under CDH 4.1 on Debian. I do not
    yet know if this is an artifact of the linking configuration being
    subtly different between our machines, a difference in the java versions
    between the two machines, or something else.
    Brandon,

    Thanks for reporting this bug, it turns out to be caused by a cmake change.
    https://issues.apache.org/jira/browse/HADOOP-8901

    -andy

    --

    --

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcdh-user @
categorieshadoop
postedOct 5, '12 at 6:56p
activeOct 10, '12 at 9:14p
posts6
users3
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase