FAQ
It appears to take 30 minutes or so for HBase to recover from the failure
of the regionserver holding the ROOT role. Please let me know what options
are available to more quickly recover from such a situation, as when this
happens our applications/SLAs are impacted.

It would also be good to be able to quickly recover from a failure of the
regionserver which owns the .META. table. During HBase startup, a random
server is elected to manage the ROOT and .META. tables (different servers).
This creates a single point of failure. At the very least, perhaps we can
find a way to force which server is selected for this role, perhaps even
just via startup order. We could then assign a server which doesn't
participate in flow tasks (no tasktracker), and so would be more stable.
There may also be a config option for this. Wondering if there is a way to
force election of a new ROOT/META owner within a minute or so instead of
30+ minutes.

--

Search Discussions

  • Jeff Whiting at Sep 11, 2012 at 2:54 pm
    It shouldn't be taking 30 minutes to pick up the failure. I don't think
    there is a way to assign it to a server. I would look at zookeeper
    timeout. It isn't until it timeout that it will reassign the regions. If
    you continue to have problems I would recommend emailing the hbase mailing
    list. They are very good with helping out about problems.
    On Monday, September 10, 2012 4:16:36 PM UTC-6, Willy Chang wrote:

    It appears to take 30 minutes or so for HBase to recover from the failure
    of the regionserver holding the ROOT role. Please let me know what
    options are available to more quickly recover from such a situation, as
    when this happens our applications/SLAs are impacted.

    It would also be good to be able to quickly recover from a failure of the
    regionserver which owns the .META. table. During HBase startup, a random
    server is elected to manage the ROOT and .META. tables (different
    servers). This creates a single point of failure. At the very least,
    perhaps we can find a way to force which server is selected for this role,
    perhaps even just via startup order. We could then assign a server which
    doesn't participate in flow tasks (no tasktracker), and so would be more
    stable. There may also be a config option for this. Wondering if there is a
    way to force election of a new ROOT/META owner within a minute or so
    instead of 30+ minutes.
    --
  • Willy Chang at Sep 11, 2012 at 5:31 pm
    Jeff - zookeeper timeout is 490000.

    zookeeper.session.timeout=490000
    hbase.zookeeper.property.tickTime=6000
    hbase.zookeeper.property.maxClientCnxns=1400
    On Tue, Sep 11, 2012 at 7:54 AM, Jeff Whiting wrote:

    It shouldn't be taking 30 minutes to pick up the failure. I don't think
    there is a way to assign it to a server. I would look at zookeeper
    timeout. It isn't until it timeout that it will reassign the regions. If
    you continue to have problems I would recommend emailing the hbase mailing
    list. They are very good with helping out about problems.
    On Monday, September 10, 2012 4:16:36 PM UTC-6, Willy Chang wrote:

    It appears to take 30 minutes or so for HBase to recover from the failure
    of the regionserver holding the ROOT role. Please let me know what
    options are available to more quickly recover from such a situation, as
    when this happens our applications/SLAs are impacted.

    It would also be good to be able to quickly recover from a failure of the
    regionserver which owns the .META. table. During HBase startup, a random
    server is elected to manage the ROOT and .META. tables (different
    servers). This creates a single point of failure. At the very least,
    perhaps we can find a way to force which server is selected for this role,
    perhaps even just via startup order. We could then assign a server which
    doesn't participate in flow tasks (no tasktracker), and so would be more
    stable. There may also be a config option for this. Wondering if there is a
    way to force election of a new ROOT/META owner within a minute or so
    instead of 30+ minutes.
    --


    --
  • Willy Chang at Sep 11, 2012 at 5:32 pm
    Can you also forward me the hbase mailing list?
    On Tue, Sep 11, 2012 at 7:54 AM, Jeff Whiting wrote:

    It shouldn't be taking 30 minutes to pick up the failure. I don't think
    there is a way to assign it to a server. I would look at zookeeper
    timeout. It isn't until it timeout that it will reassign the regions. If
    you continue to have problems I would recommend emailing the hbase mailing
    list. They are very good with helping out about problems.
    On Monday, September 10, 2012 4:16:36 PM UTC-6, Willy Chang wrote:

    It appears to take 30 minutes or so for HBase to recover from the failure
    of the regionserver holding the ROOT role. Please let me know what
    options are available to more quickly recover from such a situation, as
    when this happens our applications/SLAs are impacted.

    It would also be good to be able to quickly recover from a failure of the
    regionserver which owns the .META. table. During HBase startup, a random
    server is elected to manage the ROOT and .META. tables (different
    servers). This creates a single point of failure. At the very least,
    perhaps we can find a way to force which server is selected for this role,
    perhaps even just via startup order. We could then assign a server which
    doesn't participate in flow tasks (no tasktracker), and so would be more
    stable. There may also be a config option for this. Wondering if there is a
    way to force election of a new ROOT/META owner within a minute or so
    instead of 30+ minutes.
    --


    --
  • Harsh J at Sep 11, 2012 at 5:36 pm
    Hey Willy,

    The email list for HBase users is user@hbase.apache.org. See
    http://hbase.apache.org/mail-lists.html for more details on
    subscribe/other lists, etc.
    On Tue, Sep 11, 2012 at 10:56 PM, Willy Chang wrote:
    Can you also forward me the hbase mailing list?
    On Tue, Sep 11, 2012 at 7:54 AM, Jeff Whiting wrote:

    It shouldn't be taking 30 minutes to pick up the failure. I don't think
    there is a way to assign it to a server. I would look at zookeeper timeout.
    It isn't until it timeout that it will reassign the regions. If you continue
    to have problems I would recommend emailing the hbase mailing list. They
    are very good with helping out about problems.
    On Monday, September 10, 2012 4:16:36 PM UTC-6, Willy Chang wrote:

    It appears to take 30 minutes or so for HBase to recover from the failure
    of the regionserver holding the ROOT role. Please let me know what options
    are available to more quickly recover from such a situation, as when this
    happens our applications/SLAs are impacted.

    It would also be good to be able to quickly recover from a failure of the
    regionserver which owns the .META. table. During HBase startup, a random
    server is elected to manage the ROOT and .META. tables (different servers).
    This creates a single point of failure. At the very least, perhaps we can
    find a way to force which server is selected for this role, perhaps even
    just via startup order. We could then assign a server which doesn't
    participate in flow tasks (no tasktracker), and so would be more stable.
    There may also be a config option for this. Wondering if there is a way to
    force election of a new ROOT/META owner within a minute or so instead of 30+
    minutes.
    --


    --



    --
    Harsh J

    --

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcdh-user @
categorieshadoop
postedSep 10, '12 at 10:16p
activeSep 11, '12 at 5:36p
posts5
users3
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2018 Grokbase