FAQ
In https://issues.apache.org/jira/browse/HDFS-2834, Todd says, "

This is also useful whenever a native decompression codec is being used. In those cases, we generally have the following copies:

1) Socket -> DirectByteBuffer (in SocketChannel implementation)
2) DirectByteBuffer -> byte[] (in SocketInputStream)
3) byte[] -> Native buffer (set up for decompression)
4*) decompression to a different native buffer (not really a copy - decompression necessarily rewrites)
5) native buffer -> byte[]

with the proposed improvement we can hopefully eliminate #2,#3 for all applications, and #2,#3,and #5 for libhdfs.
"


It seems like we need to tweak the Decompressor (and Compressor?) classes to take DirectByteBuffer inputs / outputs rather than byte[]'s to support this improvement.

Is the right thing to do for me to open a jira in common for this and take a first stab at defining the interface?

- Tim.

The information and any attached documents contained in this message
may be confidential and/or legally privileged. The message is
intended solely for the addressee(s). If you are not the intended
recipient, you are hereby notified that any use, dissemination, or
reproduction is strictly prohibited and may be unlawful. If you are
not the intended recipient, please contact the sender immediately by
return e-mail and destroy all copies of the original message.

Search Discussions

  • Brian Bockelman at Mar 7, 2012 at 1:37 pm
    Actually, this one caught my eye when I originally read it:

    4*) decompression to a different native buffer (not really a copy - decompression necessarily rewrites)

    Actually, LZO can be done in-place (an awfully neat trick!). It's a micro-optimization, but possibly could save some buffer space.

    Brian
    On Mar 7, 2012, at 1:26 AM, Tim Broberg wrote:

    In https://issues.apache.org/jira/browse/HDFS-2834, Todd says, "

    This is also useful whenever a native decompression codec is being used. In those cases, we generally have the following copies:

    1) Socket -> DirectByteBuffer (in SocketChannel implementation)
    2) DirectByteBuffer -> byte[] (in SocketInputStream)
    3) byte[] -> Native buffer (set up for decompression)
    4*) decompression to a different native buffer (not really a copy - decompression necessarily rewrites)
    5) native buffer -> byte[]

    with the proposed improvement we can hopefully eliminate #2,#3 for all applications, and #2,#3,and #5 for libhdfs.
    "


    It seems like we need to tweak the Decompressor (and Compressor?) classes to take DirectByteBuffer inputs / outputs rather than byte[]'s to support this improvement.

    Is the right thing to do for me to open a jira in common for this and take a first stab at defining the interface?

    - Tim.

    The information and any attached documents contained in this message
    may be confidential and/or legally privileged. The message is
    intended solely for the addressee(s). If you are not the intended
    recipient, you are hereby notified that any use, dissemination, or
    reproduction is strictly prohibited and may be unlawful. If you are
    not the intended recipient, please contact the sender immediately by
    return e-mail and destroy all copies of the original message.
  • Robert Evans at Mar 7, 2012 at 3:17 pm
    I am a +1 on opening a new JIRA for a first stab at reducing the amount of data that gets copied around.

    --Bobby Evans


    On 3/7/12 1:26 AM, "Tim Broberg" wrote:

    In https://issues.apache.org/jira/browse/HDFS-2834, Todd says, "

    This is also useful whenever a native decompression codec is being used. In those cases, we generally have the following copies:

    1) Socket -> DirectByteBuffer (in SocketChannel implementation)
    2) DirectByteBuffer -> byte[] (in SocketInputStream)
    3) byte[] -> Native buffer (set up for decompression)
    4*) decompression to a different native buffer (not really a copy - decompression necessarily rewrites)
    5) native buffer -> byte[]

    with the proposed improvement we can hopefully eliminate #2,#3 for all applications, and #2,#3,and #5 for libhdfs.
    "


    It seems like we need to tweak the Decompressor (and Compressor?) classes to take DirectByteBuffer inputs / outputs rather than byte[]'s to support this improvement.

    Is the right thing to do for me to open a jira in common for this and take a first stab at defining the interface?

    - Tim.

    The information and any attached documents contained in this message
    may be confidential and/or legally privileged. The message is
    intended solely for the addressee(s). If you are not the intended
    recipient, you are hereby notified that any use, dissemination, or
    reproduction is strictly prohibited and may be unlawful. If you are
    not the intended recipient, please contact the sender immediately by
    return e-mail and destroy all copies of the original message.
  • Tim Broberg at Mar 7, 2012 at 7:42 pm
    Consider it stabbed. https://issues.apache.org/jira/browse/HADOOP-8148

    ...still need to think through how this applies to CompressorStream and DecompressorStream.

    - Tim.
    ________________________________________
    From: Robert Evans [evans@yahoo-inc.com]
    Sent: Wednesday, March 07, 2012 7:16 AM
    To: common-dev@hadoop.apache.org
    Subject: Re: Compressor tweaks corresponding to HDFS-2834, 3051?

    I am a +1 on opening a new JIRA for a first stab at reducing the amount of data that gets copied around.

    --Bobby Evans


    On 3/7/12 1:26 AM, "Tim Broberg" wrote:

    In https://issues.apache.org/jira/browse/HDFS-2834, Todd says, "

    This is also useful whenever a native decompression codec is being used. In those cases, we generally have the following copies:

    1) Socket -> DirectByteBuffer (in SocketChannel implementation)
    2) DirectByteBuffer -> byte[] (in SocketInputStream)
    3) byte[] -> Native buffer (set up for decompression)
    4*) decompression to a different native buffer (not really a copy - decompression necessarily rewrites)
    5) native buffer -> byte[]

    with the proposed improvement we can hopefully eliminate #2,#3 for all applications, and #2,#3,and #5 for libhdfs.
    "


    It seems like we need to tweak the Decompressor (and Compressor?) classes to take DirectByteBuffer inputs / outputs rather than byte[]'s to support this improvement.

    Is the right thing to do for me to open a jira in common for this and take a first stab at defining the interface?

    - Tim.

    The information and any attached documents contained in this message
    may be confidential and/or legally privileged. The message is
    intended solely for the addressee(s). If you are not the intended
    recipient, you are hereby notified that any use, dissemination, or
    reproduction is strictly prohibited and may be unlawful. If you are
    not the intended recipient, please contact the sender immediately by
    return e-mail and destroy all copies of the original message.

    The information and any attached documents contained in this message
    may be confidential and/or legally privileged. The message is
    intended solely for the addressee(s). If you are not the intended
    recipient, you are hereby notified that any use, dissemination, or
    reproduction is strictly prohibited and may be unlawful. If you are
    not the intended recipient, please contact the sender immediately by
    return e-mail and destroy all copies of the original message.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedMar 7, '12 at 7:27a
activeMar 7, '12 at 7:42p
posts4
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase