FAQ
Hello,

We currently do not enable native Hadoop libraries in unit tests (at least
when running on Hadoop QA), but we do use them in production. Should we
try to close this discrepancy between tests and production? Some possible
approaches would be:

- Enable native libraries by default (e.g. import libhadoop.so into a
special location in the HBase repository and add
-Djava.library.path=/that/path to unit test JVM options).
- In addition to the above, modify some unit tests (e.g. those using
Gzip compression) to run both with and without native libraries enabled

Please let me know if this sounds reasonable.

Thanks,
--Mikhail

Search Discussions

  • Ted Yu at Feb 11, 2012 at 3:26 am
    This makes sense.
    I like the second part of the proposal.
    On Fri, Feb 10, 2012 at 7:22 PM, Mikhail Bautin wrote:

    Hello,

    We currently do not enable native Hadoop libraries in unit tests (at least
    when running on Hadoop QA), but we do use them in production. Should we
    try to close this discrepancy between tests and production? Some possible
    approaches would be:

    - Enable native libraries by default (e.g. import libhadoop.so into a
    special location in the HBase repository and add
    -Djava.library.path=/that/path to unit test JVM options).
    - In addition to the above, modify some unit tests (e.g. those using
    Gzip compression) to run both with and without native libraries enabled

    Please let me know if this sounds reasonable.

    Thanks,
    --Mikhail
  • Jesse Yates at Feb 11, 2012 at 3:32 am
    We probably should have some tests that run both with and without then, just for completeness.

    - Jesse Yates

    Sent from my iPhone.
    On Feb 10, 2012, at 7:22 PM, Mikhail Bautin wrote:

    Hello,

    We currently do not enable native Hadoop libraries in unit tests (at least
    when running on Hadoop QA), but we do use them in production. Should we
    try to close this discrepancy between tests and production? Some possible
    approaches would be:

    - Enable native libraries by default (e.g. import libhadoop.so into a
    special location in the HBase repository and add
    -Djava.library.path=/that/path to unit test JVM options).
    - In addition to the above, modify some unit tests (e.g. those using
    Gzip compression) to run both with and without native libraries enabled

    Please let me know if this sounds reasonable.

    Thanks,
    --Mikhail
  • Stack at Feb 11, 2012 at 4:52 am

    On Fri, Feb 10, 2012 at 7:22 PM, Mikhail Bautin wrote:
    Hello,

    We currently do not enable native Hadoop libraries in unit tests (at least
    when running on Hadoop QA), but we do use them in production.  Should we
    try to close this discrepancy between tests and production? Some possible
    approaches would be:

    - Enable native libraries by default (e.g. import libhadoop.so into a
    special location in the HBase repository and add
    -Djava.library.path=/that/path to unit test JVM options).
    - In addition to the above, modify some unit tests (e.g. those using
    Gzip compression) to run both with and without native libraries enabled

    Please let me know if this sounds reasonable.
    We used to be careful w/ native libs to ensure they were present and
    hbase would use them if available but we let that drop a couple of
    versions ago and haven't picked it up since. Thanks for reviving them
    Mikhail. I can help making sure works on apache build box.

    St.Ack
  • Mikhail Bautin at Feb 12, 2012 at 8:36 pm
    One difficulty with simply importing libhadoop.so into HBase codebase is
    that the dynamic library is probably a bit different for different versions
    of Hadoop. Is there a way to pull the .so file from Maven for the
    configured Hadoop version? Ideally this should be done in a
    platform-independent way, too, but making it work on Linux would be the
    first step.

    Thanks,
    --Mikhail
    On Fri, Feb 10, 2012 at 8:52 PM, Stack wrote:

    On Fri, Feb 10, 2012 at 7:22 PM, Mikhail Bautin
    wrote:
    Hello,

    We currently do not enable native Hadoop libraries in unit tests (at least
    when running on Hadoop QA), but we do use them in production. Should we
    try to close this discrepancy between tests and production? Some possible
    approaches would be:

    - Enable native libraries by default (e.g. import libhadoop.so into a
    special location in the HBase repository and add
    -Djava.library.path=/that/path to unit test JVM options).
    - In addition to the above, modify some unit tests (e.g. those using
    Gzip compression) to run both with and without native libraries enabled

    Please let me know if this sounds reasonable.
    We used to be careful w/ native libs to ensure they were present and
    hbase would use them if available but we let that drop a couple of
    versions ago and haven't picked it up since. Thanks for reviving them
    Mikhail. I can help making sure works on apache build box.

    St.Ack
  • Roman Shaposhnik at Feb 13, 2012 at 6:02 am

    On Sun, Feb 12, 2012 at 12:36 PM, Mikhail Bautin wrote:
    One difficulty with simply importing libhadoop.so into HBase codebase is
    that the dynamic library is probably a bit different for different versions
    of Hadoop. Is there a way to pull the .so file from Maven for the
    configured Hadoop version? Ideally this should be done in a
    platform-independent way, too, but making it work on Linux would be the
    first step.
    I'm pretty sure it'll be next to impossible to do that reliably. There used
    to be a time when folks wanted to use Maven for the native artifacts,
    but to the best of my knowledge that has been deemed to be
    "not such a good idea" (tm) after all.

    In fact if you look at Maven plugins dealing with native side of things pretty
    much most of them seem to be abandoned at this point.

    Personally, working with Maven has made me realize that the only
    reliable way to deal with external dependencies on native artifact
    is to depend on source artifact and always do the compilation
    during your own build. You mileage may, of course, vary.

    Thanks,
    Roman.
  • Mikhail Bautin at Feb 13, 2012 at 9:58 am
    Then how about solving the issue for the most common case (the default
    version of Hadoop)? We can import the default version of libhadoop.so into
    the HBase codebase and load it in tests, as I mentioned. This can be
    considered a hack but will definitely increase the test coverage.

    Thanks,
    --Mikhail
    On Sun, Feb 12, 2012 at 10:02 PM, Roman Shaposhnik wrote:

    On Sun, Feb 12, 2012 at 12:36 PM, Mikhail Bautin
    wrote:
    One difficulty with simply importing libhadoop.so into HBase codebase is
    that the dynamic library is probably a bit different for different versions
    of Hadoop. Is there a way to pull the .so file from Maven for the
    configured Hadoop version? Ideally this should be done in a
    platform-independent way, too, but making it work on Linux would be the
    first step.
    I'm pretty sure it'll be next to impossible to do that reliably. There used
    to be a time when folks wanted to use Maven for the native artifacts,
    but to the best of my knowledge that has been deemed to be
    "not such a good idea" (tm) after all.

    In fact if you look at Maven plugins dealing with native side of things
    pretty
    much most of them seem to be abandoned at this point.

    Personally, working with Maven has made me realize that the only
    reliable way to deal with external dependencies on native artifact
    is to depend on source artifact and always do the compilation
    during your own build. You mileage may, of course, vary.

    Thanks,
    Roman.
  • Roman Shaposhnik at Feb 13, 2012 at 6:53 pm

    On Mon, Feb 13, 2012 at 1:58 AM, Mikhail Bautin wrote:
    Then how about solving the issue for the most common case (the default
    version of Hadoop)? We can import the default version of libhadoop.so into
    the HBase codebase and load it in tests, as I mentioned. This can be
    considered a hack but will definitely increase the test coverage.
    You're not proposing importing a native binary into a source tree, are you?
    That won't be very reliable at all.

    We can probably come up with a # of workaroudns here, but at the end
    of the day, unless you recompiled the native bits here and now, chances
    are they won't be compatible with the OS you happen to be on.

    Thanks,
    Roman.
  • Todd Lipcon at Feb 13, 2012 at 7:16 pm
    Also keep in mind it's not just the hadoop version, but also the glibc
    version and host architecture. We'd have to publish built binaries for
    all combinations of architecture*hadoopVersion*glibcVersion

    Maybe we should just get a copy of _one_ of these versions on the
    hudson build boxes, and have a new hudson job which runs whichever
    tests depend on the native code there?

    -Todd
    On Mon, Feb 13, 2012 at 10:52 AM, Roman Shaposhnik wrote:
    On Mon, Feb 13, 2012 at 1:58 AM, Mikhail Bautin
    wrote:
    Then how about solving the issue for the most common case (the default
    version of Hadoop)? We can import the default version of libhadoop.so into
    the HBase codebase and load it in tests, as I mentioned. This can be
    considered a hack but will definitely increase the test coverage.
    You're not proposing importing a native binary into a source tree, are you?
    That won't be very reliable at all.

    We can probably come up with a # of workaroudns here, but at the end
    of the day, unless you recompiled the native bits here and now, chances
    are they won't be compatible with the OS you happen to be on.

    Thanks,
    Roman.


    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Mikhail Bautin at Feb 13, 2012 at 7:22 pm
    Would the following work as a complete solution for any platform? We can
    make this conditional on a new Maven profile.

    - Download the sources of the Hadoop version being used
    - Run "ant compile-native"
    - Add the directory of libhadoop.so to java.library.path in the test JVM
    options

    Thanks,
    --Mikhail
    On Mon, Feb 13, 2012 at 11:15 AM, Todd Lipcon wrote:

    Also keep in mind it's not just the hadoop version, but also the glibc
    version and host architecture. We'd have to publish built binaries for
    all combinations of architecture*hadoopVersion*glibcVersion

    Maybe we should just get a copy of _one_ of these versions on the
    hudson build boxes, and have a new hudson job which runs whichever
    tests depend on the native code there?

    -Todd
    On Mon, Feb 13, 2012 at 10:52 AM, Roman Shaposhnik wrote:
    On Mon, Feb 13, 2012 at 1:58 AM, Mikhail Bautin
    wrote:
    Then how about solving the issue for the most common case (the default
    version of Hadoop)? We can import the default version of libhadoop.so
    into
    the HBase codebase and load it in tests, as I mentioned. This can be
    considered a hack but will definitely increase the test coverage.
    You're not proposing importing a native binary into a source tree, are you?
    That won't be very reliable at all.

    We can probably come up with a # of workaroudns here, but at the end
    of the day, unless you recompiled the native bits here and now, chances
    are they won't be compatible with the OS you happen to be on.

    Thanks,
    Roman.


    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Todd Lipcon at Feb 13, 2012 at 7:42 pm

    On Mon, Feb 13, 2012 at 11:21 AM, Mikhail Bautin wrote:
    Would the following work as a complete solution for any platform? We can
    make this conditional on a new Maven profile.

    - Download the sources of the Hadoop version being used
    - Run "ant compile-native"
    - Add the directory of libhadoop.so to java.library.path in the test JVM
    options
    Sort of, except the compilation process differs based on the version -
    eg newer versions use Maven to build instead.

    -Todd
    On Mon, Feb 13, 2012 at 11:15 AM, Todd Lipcon wrote:

    Also keep in mind it's not just the hadoop version, but also the glibc
    version and host architecture. We'd have to publish built binaries for
    all combinations of architecture*hadoopVersion*glibcVersion

    Maybe we should just get a copy of _one_ of these versions on the
    hudson build boxes, and have a new hudson job which runs whichever
    tests depend on the native code there?

    -Todd
    On Mon, Feb 13, 2012 at 10:52 AM, Roman Shaposhnik wrote:
    On Mon, Feb 13, 2012 at 1:58 AM, Mikhail Bautin
    wrote:
    Then how about solving the issue for the most common case (the default
    version of Hadoop)? We can import the default version of libhadoop.so
    into
    the HBase codebase and load it in tests, as I mentioned. This can be
    considered a hack but will definitely increase the test coverage.
    You're not proposing importing a native binary into a source tree, are you?
    That won't be very reliable at all.

    We can probably come up with a # of workaroudns here, but at the end
    of the day, unless you recompiled the native bits here and now, chances
    are they won't be compatible with the OS you happen to be on.

    Thanks,
    Roman.


    --
    Todd Lipcon
    Software Engineer, Cloudera


    --
    Todd Lipcon
    Software Engineer, Cloudera

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieshbase, hadoop
postedFeb 11, '12 at 3:23a
activeFeb 13, '12 at 7:42p
posts11
users6
websitehbase.apache.org

People

Translate

site design / logo © 2022 Grokbase