Grokbase Groups HBase dev July 2012
FAQ
Hi,
Replication works good when run in short span. But its performance for
a long running setup seems to degrade at the slave cluster side. To an
extant, it made it unresponsive in one of our testing environment. As
per jstack on one node, all its priority handlers were blocked in the
replicateLogEntries method, which is blocked as the cluster is in bad
shape (2/4 nodes died; root is unassigned; and the node which had it
previously became un-responsive; and the only other remaining node
doesn't have any priority handler left to take care of the root region
assignment). The memory footprint of the app also increases (based on
`top`; unfortunately, no gc logs at the moment).

The replicateLogEntries is a high QOS method; ReplicationSink's
overall behavior is to act as a native hbase client and replicate the
mutations in its cluster. This may take some time, in case region is
splitting, possible gc pause, etc at the target region servers. It
enters in the retrying loop, and this blocks the priority handler
serving that method.
Meanwhile, other master cluster region servers are also shipping edits
(to this, or other regionservers). This makes the situation more
worse.

I wonder whether others have seen this before. Please share.

There is some scope of improvements at Sink side:

a) ReplicationSink#replicateLogEntries: Make it a normal operation (no
high QOS annotation), and ReplicationSink periodically checks whether
the client is still connected or not. In case its not, just throws an
exception and bail out. The client will do a resend of the shipment
anyway. This frees up the handlers from blocking, and cluster's
normal operation will not be impeded.

b) Have a threadpool in ReplicationSink and process per table request
in parallel. Should help in case of multi table replication.

c) Freeing the memory consumed by the shipped array, as soon as the
mutation list is populated. Currently, if the call to multi is blocked
(by any reason), the regionserver enters in the retrying logic... and
since entries of WALEdits array is copied as Put/Delete objects, it
can be freed.

Looking forward for some more/better suggestions, and make replication
more stable.

Thanks,
Himanshu

Search Discussions

  • Jean-Daniel Cryans at Jul 26, 2012 at 11:58 pm

    On Wed, Jul 25, 2012 at 5:58 PM, Himanshu Vashishtha wrote:
    Hi,
    Replication works good when run in short span. But its performance for
    a long running setup seems to degrade at the slave cluster side. To an
    extant, it made it unresponsive in one of our testing environment. As
    per jstack on one node, all its priority handlers were blocked in the
    replicateLogEntries method, which is blocked as the cluster is in bad
    shape (2/4 nodes died; root is unassigned; and the node which had it
    previously became un-responsive; and the only other remaining node
    doesn't have any priority handler left to take care of the root region
    assignment).
    See:
    https://issues.apache.org/jira/browse/HBASE-4280
    https://issues.apache.org/jira/browse/HBASE-5197
    https://issues.apache.org/jira/browse/HBASE-6207
    https://issues.apache.org/jira/browse/HBASE-6165

    Currently the best way to fix this would be to have a separate set of
    handlers completely.
    The memory footprint of the app also increases (based on
    `top`; unfortunately, no gc logs at the moment).
    You don't want to rely on top for that since it's a java application.
    Set you Xms as big as your Xmx and your application will always use
    all the memory it's given.
    The replicateLogEntries is a high QOS method; ReplicationSink's
    overall behavior is to act as a native hbase client and replicate the
    mutations in its cluster. This may take some time, in case region is
    splitting, possible gc pause, etc at the target region servers. It
    enters in the retrying loop, and this blocks the priority handler
    serving that method.
    Meanwhile, other master cluster region servers are also shipping edits
    (to this, or other regionservers). This makes the situation more
    worse.

    I wonder whether others have seen this before. Please share.
    See my first answer.
    There is some scope of improvements at Sink side:

    a) ReplicationSink#replicateLogEntries: Make it a normal operation (no
    high QOS annotation), and ReplicationSink periodically checks whether
    the client is still connected or not. In case its not, just throws an
    exception and bail out. The client will do a resend of the shipment
    anyway. This frees up the handlers from blocking, and cluster's
    normal operation will not be impeded.
    It wasn't working any better before HBASE-4280 :)
    b) Have a threadpool in ReplicationSink and process per table request
    in parallel. Should help in case of multi table replication.
    Currently it's trying to apply the edits sequentially, going parallel
    would apply them in the wrong order. Note that when a region server
    fail we do continue to replicate the new edits while we also replicate
    the backlog from the old server so currently it's not 100% perfect.
    c) Freeing the memory consumed by the shipped array, as soon as the
    mutation list is populated. Currently, if the call to multi is blocked
    (by any reason), the regionserver enters in the retrying logic... and
    since entries of WALEdits array is copied as Put/Delete objects, it
    can be freed.
    So free up the entries array at each position after the Put or Delete
    was created? We could do that, although it's not a big saving
    considering that entries will be at most 64MB big. In production here
    we run with just 1 MB.

    J-D
  • Lars hofhansl at Jul 27, 2012 at 8:17 pm
    So part of the problem seems to be HBase client (HTable) used in ReplicationSink taking a long time to fail in case something is slow/wrong in the slave cluster, correct?
    We faced similar problems with HTables in our app servers, where in some scenarios the client would be waiting in various retry loops for up to 20 minutes before finally throwing an exception (worst case is when ZK is down or not reachable)


    So here for the ReplicationSink's client we could aggressively lower the various timeouts, set ZK retry to 0, etc, since the source will retry anyway; and hence there would be less of a chance for the ReplicationSink to hog the priority handlers.

    For use we brought the "time-to-exception" down to 20s (worst case).

    -- Lars


    ----- Original Message -----
    From: Jean-Daniel Cryans <jdcryans@apache.org>
    To: dev@hbase.apache.org
    Cc:
    Sent: Thursday, July 26, 2012 4:58 PM
    Subject: Re: Long running replication: possible improvements

    On Wed, Jul 25, 2012 at 5:58 PM, Himanshu Vashishtha
    wrote:
    Hi,
    Replication works good when run in short span. But its performance for
    a long running setup seems to degrade at the slave cluster side. To an
    extant, it made it unresponsive in one of our testing environment. As
    per jstack on one node, all its priority handlers were blocked in the
    replicateLogEntries method, which is blocked as the cluster is in bad
    shape (2/4 nodes died; root is unassigned; and the node which had it
    previously became un-responsive; and the only other remaining node
    doesn't have any priority handler left to take care of the root region
    assignment).
    See:
    https://issues.apache.org/jira/browse/HBASE-4280
    https://issues.apache.org/jira/browse/HBASE-5197
    https://issues.apache.org/jira/browse/HBASE-6207
    https://issues.apache.org/jira/browse/HBASE-6165

    Currently the best way to fix this would be to have a separate set of
    handlers completely.
    The memory footprint of the app also increases (based on
    `top`; unfortunately, no gc logs at the moment).
    You don't want to rely on top for that since it's a java application.
    Set you Xms as big as your Xmx and your application will always use
    all the memory it's given.
    The replicateLogEntries is a high QOS method; ReplicationSink's
    overall behavior is to act as a native hbase client and replicate the
    mutations in its cluster. This may take some time, in case region is
    splitting, possible gc pause, etc at the target region servers. It
    enters in the retrying loop, and this blocks the priority handler
    serving that method.
    Meanwhile, other master cluster region servers are also shipping edits
    (to this, or other regionservers). This makes the situation more
    worse.

    I wonder whether others have seen this before. Please share.
    See my first answer.
    There is some scope of improvements at Sink side:

    a) ReplicationSink#replicateLogEntries: Make it a normal operation (no
    high QOS annotation), and ReplicationSink periodically checks whether
    the client is still connected or not. In case its not, just throws an
    exception and bail out. The client will do a resend of the shipment
    anyway. This frees up the  handlers from blocking, and cluster's
    normal operation will not be impeded.
    It wasn't working any better before HBASE-4280 :)
    b) Have a threadpool in ReplicationSink and process per table request
    in parallel. Should help in case of multi table replication.
    Currently it's trying to apply the edits sequentially, going parallel
    would apply them in the wrong order. Note that when a region server
    fail we do continue to replicate the new edits while we also replicate
    the backlog from the old server so currently it's not 100% perfect.
    c) Freeing the memory consumed by the shipped array, as soon as the
    mutation list is populated. Currently, if the call to multi is blocked
    (by any reason), the regionserver enters in the retrying logic... and
    since entries of WALEdits array is copied as Put/Delete objects, it
    can be freed.
    So free up the entries array at each position after the Put or Delete
    was created? We could do that, although it's not a big saving
    considering that entries will be at most 64MB big. In production here
    we run with just 1 MB.

    J-D

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieshbase, hadoop
postedJul 26, '12 at 1:29p
activeJul 27, '12 at 8:17p
posts3
users3
websitehbase.apache.org

People

Translate

site design / logo © 2022 Grokbase