FAQ
Hi All,
I'm browsing the RPC code since quite a while now trying to find any entry point / interceptor slot that allows me to handle a RPC call response writable after it was send over the wire.
Does anybody has an idea how break into the RPC code from outside. All the interesting methods are private. :(

Background:
Heavy use of the RPC allocates hugh amount of Writable objects. We saw in multiple systems that the garbage collect can get so busy that the jvm almost freezes for seconds. Things like zookeeper sessions time out in that cases.
My idea is to create an object pool for writables. Borrowing an object from the pool is simple since this happen in our custom code, though we do know when the writable return was send over the wire and can be returned into the pool.
A dirty hack would be to overwrite the write(out) method in the writable, assuming that is the last thing done with the writable, though turns out that this method is called in other cases too, e.g. to measure throughput.

Any ideas?

Thanks,
Stefan

Search Discussions

  • Todd Lipcon at Dec 28, 2010 at 4:22 am
    Hi Stefan,

    Sounds interesting.

    Maybe you're looking for o.a.h.ipc.Server$Responder?

    -Todd
    On Mon, Dec 27, 2010 at 8:07 PM, Stefan Groschupf wrote:

    Hi All,
    I'm browsing the RPC code since quite a while now trying to find any entry
    point / interceptor slot that allows me to handle a RPC call response
    writable after it was send over the wire.
    Does anybody has an idea how break into the RPC code from outside. All the
    interesting methods are private. :(

    Background:
    Heavy use of the RPC allocates hugh amount of Writable objects. We saw in
    multiple systems that the garbage collect can get so busy that the jvm
    almost freezes for seconds. Things like zookeeper sessions time out in that
    cases.
    My idea is to create an object pool for writables. Borrowing an object from
    the pool is simple since this happen in our custom code, though we do know
    when the writable return was send over the wire and can be returned into the
    pool.
    A dirty hack would be to overwrite the write(out) method in the writable,
    assuming that is the last thing done with the writable, though turns out
    that this method is called in other cases too, e.g. to measure throughput.

    Any ideas?

    Thanks,
    Stefan



    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Stefan Groschupf at Dec 28, 2010 at 9:00 pm
    Hi Todd,
    Right, that is the code I'm looking into. Though Responder is inner private class and is created "responder = new Responder();"
    It would be great if the Responder implementation could be configured.
    Do you have any idea how to overwrite the Responder?
    Thanks,
    Stefan

    On Dec 27, 2010, at 8:21 PM, Todd Lipcon wrote:

    Hi Stefan,

    Sounds interesting.

    Maybe you're looking for o.a.h.ipc.Server$Responder?

    -Todd
    On Mon, Dec 27, 2010 at 8:07 PM, Stefan Groschupf wrote:

    Hi All,
    I'm browsing the RPC code since quite a while now trying to find any entry
    point / interceptor slot that allows me to handle a RPC call response
    writable after it was send over the wire.
    Does anybody has an idea how break into the RPC code from outside. All the
    interesting methods are private. :(

    Background:
    Heavy use of the RPC allocates hugh amount of Writable objects. We saw in
    multiple systems that the garbage collect can get so busy that the jvm
    almost freezes for seconds. Things like zookeeper sessions time out in that
    cases.
    My idea is to create an object pool for writables. Borrowing an object from
    the pool is simple since this happen in our custom code, though we do know
    when the writable return was send over the wire and can be returned into the
    pool.
    A dirty hack would be to overwrite the write(out) method in the writable,
    assuming that is the last thing done with the writable, though turns out
    that this method is called in other cases too, e.g. to measure throughput.

    Any ideas?

    Thanks,
    Stefan



    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Todd Lipcon at Dec 29, 2010 at 1:57 am

    On Tue, Dec 28, 2010 at 1:00 PM, Stefan Groschupf wrote:

    Hi Todd,
    Right, that is the code I'm looking into. Though Responder is inner private
    class and is created "responder = new Responder();"
    It would be great if the Responder implementation could be configured.
    Do you have any idea how to overwrite the Responder?
    Nope, it's not currently pluggable, nor do I think there's any compelling
    reason to make it pluggable. It's coupled quite tightly to the
    implementation right now.

    Perhaps you can hack something in a git branch, and if it has good results
    on something like NNBench it could be a general contribution?

    -Todd

    On Dec 27, 2010, at 8:21 PM, Todd Lipcon wrote:

    Hi Stefan,

    Sounds interesting.

    Maybe you're looking for o.a.h.ipc.Server$Responder?

    -Todd
    On Mon, Dec 27, 2010 at 8:07 PM, Stefan Groschupf wrote:

    Hi All,
    I'm browsing the RPC code since quite a while now trying to find any
    entry
    point / interceptor slot that allows me to handle a RPC call response
    writable after it was send over the wire.
    Does anybody has an idea how break into the RPC code from outside. All
    the
    interesting methods are private. :(

    Background:
    Heavy use of the RPC allocates hugh amount of Writable objects. We saw
    in
    multiple systems that the garbage collect can get so busy that the jvm
    almost freezes for seconds. Things like zookeeper sessions time out in
    that
    cases.
    My idea is to create an object pool for writables. Borrowing an object
    from
    the pool is simple since this happen in our custom code, though we do
    know
    when the writable return was send over the wire and can be returned into
    the
    pool.
    A dirty hack would be to overwrite the write(out) method in the
    writable,
    assuming that is the last thing done with the writable, though turns out
    that this method is called in other cases too, e.g. to measure
    throughput.
    Any ideas?

    Thanks,
    Stefan



    --
    Todd Lipcon
    Software Engineer, Cloudera

    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Lance Norskog at Dec 29, 2010 at 2:43 am
    Are you connecting to this JVM with RMI? RMI does a very nasty thing
    with garbage collection: it forces a blocking collection every 60
    seconds. Really. You have to change this with a system property.
    On Tue, Dec 28, 2010 at 5:56 PM, Todd Lipcon wrote:
    On Tue, Dec 28, 2010 at 1:00 PM, Stefan Groschupf wrote:

    Hi Todd,
    Right, that is the code I'm looking into. Though Responder is inner private
    class and is created  "responder = new Responder();"
    It would be great if the Responder implementation could be configured.
    Do you have any idea how to overwrite the Responder?
    Nope, it's not currently pluggable, nor do I think there's any compelling
    reason to make it pluggable. It's coupled quite tightly to the
    implementation right now.

    Perhaps you can hack something in a git branch, and if it has good results
    on something like NNBench it could be a general contribution?

    -Todd

    On Dec 27, 2010, at 8:21 PM, Todd Lipcon wrote:

    Hi Stefan,

    Sounds interesting.

    Maybe you're looking for o.a.h.ipc.Server$Responder?

    -Todd
    On Mon, Dec 27, 2010 at 8:07 PM, Stefan Groschupf wrote:

    Hi All,
    I'm browsing the RPC code since quite a while now trying to find any
    entry
    point / interceptor slot that allows me to handle a RPC call response
    writable after it was send over the wire.
    Does anybody has an idea how break into the RPC code from outside. All
    the
    interesting methods are private. :(

    Background:
    Heavy use of the RPC allocates hugh amount of Writable objects. We saw
    in
    multiple systems  that the garbage collect can get so busy that the jvm
    almost freezes for seconds. Things like zookeeper sessions time out in
    that
    cases.
    My idea is to create an object pool for writables. Borrowing an object
    from
    the pool is simple since this happen in our custom code, though we do
    know
    when the writable return was send over the wire and can be returned into
    the
    pool.
    A dirty hack would be to overwrite the write(out) method in the
    writable,
    assuming that is the last thing done with the writable, though turns out
    that this method is called in other cases too, e.g. to measure
    throughput.
    Any ideas?

    Thanks,
    Stefan



    --
    Todd Lipcon
    Software Engineer, Cloudera

    --
    Todd Lipcon
    Software Engineer, Cloudera


    --
    Lance Norskog
    [email protected]
  • Ted Dunning at Dec 29, 2010 at 6:48 am
    Nah... this is hadoop RPC
    On Tue, Dec 28, 2010 at 6:42 PM, Lance Norskog wrote:

    Are you connecting to this JVM with RMI? RMI does a very nasty thing
    with garbage collection: it forces a blocking collection every 60
    seconds. Really. You have to change this with a system property.
    On Tue, Dec 28, 2010 at 5:56 PM, Todd Lipcon wrote:
    On Tue, Dec 28, 2010 at 1:00 PM, Stefan Groschupf wrote:

    Hi Todd,
    Right, that is the code I'm looking into. Though Responder is inner
    private
    class and is created "responder = new Responder();"
    It would be great if the Responder implementation could be configured.
    Do you have any idea how to overwrite the Responder?
    Nope, it's not currently pluggable, nor do I think there's any compelling
    reason to make it pluggable. It's coupled quite tightly to the
    implementation right now.

    Perhaps you can hack something in a git branch, and if it has good results
    on something like NNBench it could be a general contribution?

    -Todd

    On Dec 27, 2010, at 8:21 PM, Todd Lipcon wrote:

    Hi Stefan,

    Sounds interesting.

    Maybe you're looking for o.a.h.ipc.Server$Responder?

    -Todd
    On Mon, Dec 27, 2010 at 8:07 PM, Stefan Groschupf wrote:

    Hi All,
    I'm browsing the RPC code since quite a while now trying to find any
    entry
    point / interceptor slot that allows me to handle a RPC call response
    writable after it was send over the wire.
    Does anybody has an idea how break into the RPC code from outside.
    All
    the
    interesting methods are private. :(

    Background:
    Heavy use of the RPC allocates hugh amount of Writable objects. We
    saw
    in
    multiple systems that the garbage collect can get so busy that the
    jvm
    almost freezes for seconds. Things like zookeeper sessions time out
    in
    that
    cases.
    My idea is to create an object pool for writables. Borrowing an
    object
    from
    the pool is simple since this happen in our custom code, though we do
    know
    when the writable return was send over the wire and can be returned
    into
    the
    pool.
    A dirty hack would be to overwrite the write(out) method in the
    writable,
    assuming that is the last thing done with the writable, though turns
    out
    that this method is called in other cases too, e.g. to measure
    throughput.
    Any ideas?

    Thanks,
    Stefan



    --
    Todd Lipcon
    Software Engineer, Cloudera

    --
    Todd Lipcon
    Software Engineer, Cloudera


    --
    Lance Norskog
    [email protected]
  • Stefan Groschupf at Dec 29, 2010 at 5:01 am
    Hi Todd,
    Thanks for the feedback.
    Nope, it's not currently pluggable, nor do I think there's any compelling
    reason to make it pluggable.
    Well, one could argue with an interceptor / filter it would be very easy to add compression or encryption to RPC.
    But since the nutch days the code base was never architected in extendable or modular way.
    Perhaps you can hack something in a git branch, and if it has good results
    on something like NNBench it could be a general contribution?
    Thanks - I pass on that offer. The days waiting a half year to get a patch into the codebase are behind me. :)
    I think I will just replace hadoop RPC with netty.

    Cheers,
    Stefan
  • Ted Dunning at Dec 28, 2010 at 7:22 am
    I would be very surprised if allocation itself is the problem as opposed to
    good old fashioned excess copying.

    It is very hard to write an allocator faster than the java generational gc,
    especially if you are talking about objects that are ephemeral.

    Have you looked at the tenuring distribution?
    On Mon, Dec 27, 2010 at 8:07 PM, Stefan Groschupf wrote:

    Hi All,
    I'm browsing the RPC code since quite a while now trying to find any entry
    point / interceptor slot that allows me to handle a RPC call response
    writable after it was send over the wire.
    Does anybody has an idea how break into the RPC code from outside. All the
    interesting methods are private. :(

    Background:
    Heavy use of the RPC allocates hugh amount of Writable objects. We saw in
    multiple systems that the garbage collect can get so busy that the jvm
    almost freezes for seconds. Things like zookeeper sessions time out in that
    cases.
    My idea is to create an object pool for writables. Borrowing an object from
    the pool is simple since this happen in our custom code, though we do know
    when the writable return was send over the wire and can be returned into the
    pool.
    A dirty hack would be to overwrite the write(out) method in the writable,
    assuming that is the last thing done with the writable, though turns out
    that this method is called in other cases too, e.g. to measure throughput.

    Any ideas?

    Thanks,
    Stefan
  • Stefan Groschupf at Dec 28, 2010 at 8:00 pm
    Hi Ted,
    I don't think the problem is allocation but garbage collection.
    When the gc kicks in everything freezes. Of course changing the gc algorithm helps a little.
    Stefan


    On Dec 27, 2010, at 11:21 PM, Ted Dunning wrote:

    I would be very surprised if allocation itself is the problem as opposed to
    good old fashioned excess copying.

    It is very hard to write an allocator faster than the java generational gc,
    especially if you are talking about objects that are ephemeral.

    Have you looked at the tenuring distribution?
    On Mon, Dec 27, 2010 at 8:07 PM, Stefan Groschupf wrote:

    Hi All,
    I'm browsing the RPC code since quite a while now trying to find any entry
    point / interceptor slot that allows me to handle a RPC call response
    writable after it was send over the wire.
    Does anybody has an idea how break into the RPC code from outside. All the
    interesting methods are private. :(

    Background:
    Heavy use of the RPC allocates hugh amount of Writable objects. We saw in
    multiple systems that the garbage collect can get so busy that the jvm
    almost freezes for seconds. Things like zookeeper sessions time out in that
    cases.
    My idea is to create an object pool for writables. Borrowing an object from
    the pool is simple since this happen in our custom code, though we do know
    when the writable return was send over the wire and can be returned into the
    pool.
    A dirty hack would be to overwrite the write(out) method in the writable,
    assuming that is the last thing done with the writable, though turns out
    that this method is called in other cases too, e.g. to measure throughput.

    Any ideas?

    Thanks,
    Stefan
  • Ted Dunning at Dec 28, 2010 at 11:02 pm
    Knowing the tenuring distribution will tell a lot about that exact issue.
    Ephemeral collections take on average less than one instruction per
    allocation and the allocation itself is generally only a single instruction.
    For ephemeral garbage, it is extremely unlikely that you can beat that.

    So the real question is whether you are actually creating so much garbage
    that you are over-whelming the collector or whether the data is much longer
    lived than it should be. *That* can cause lots of collection costs.

    To tell how long data lives, you need to get the tenuring distribution:

    -XX:+PrintTenuringDistribution Prints details about the tenuring
    distribution to standard out. It can be used to show this threshold and the
    ages of objects in the new generation. It is also useful for observing the
    lifetime distribution of an application.
    On Tue, Dec 28, 2010 at 11:59 AM, Stefan Groschupf wrote:

    I don't think the problem is allocation but garbage collection.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedDec 28, '10 at 4:08a
activeDec 29, '10 at 6:48a
posts10
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2023 Grokbase