Grokbase Groups HBase user June 2011
FAQ
Hi!

This morning, on our production system, we experienced a very bad behavior of HBase 0.20.6.

1- one of our region server crash
2- we restarted it with success (no error on the master nor on the region servers)
3- but we discovered that our HBase clients were enable to recover for this situation:

Each time a get() was performed, but ONLY ON THE BIGGEST TABLES, our HBase clients triggered an exception (actually coming fro the restarted region server):

org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: appInfo.aki519368.prod.capptain.com,801765cd68dcbfc04690770622c2edaa,1307369888185
at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2269)
at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1732)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)

More strange:

3- only client READING HBase triggered this exception: client writing to HBase recovered without any error from this failure (and the writes were effectively performed)

To fix this, we had to restart all our HBase clients reading from the BIGGEST TABLES. So we guess that the issue come from the HBase client library or the region server itself.

We reproduce this bug easily on our development servers: we kill a region server, we restart it and clients trying to "get" from regions served by the killed/restarted region server get this exception until we restart them.

So my questions are:

Is this a know issue ?
Has it been fixed in HBase 0.90 ?
Is it required to handle this exception in a special way on client side (e.g. close / reopen the table) ?

Thank a lot

Search Discussions

  • Stack at Jun 17, 2011 at 4:32 pm

    On Fri, Jun 17, 2011 at 6:42 AM, Vincent Barat wrote:
    Each time a get() was performed, but ONLY ON THE BIGGEST TABLES, our HBase
    clients triggered an exception (actually coming fro the restarted region
    server):
    org.apache.hadoop.hbase.NotServingRegionException:
    org.apache.hadoop.hbase.NotServingRegionException:
    appInfo.aki519368.prod.capptain.com,801765cd68dcbfc04690770622c2edaa,1307369888185

    The RS is saying that it is not serving this region.

    Is this server carrying any regions?

    If you look in the .META. table, what does it have as the server for
    this region? Probably this restarted regionserver?

    Can you find the region on your cluster (grep it in your master logs;
    see what the last entries say about where it was deployed).

    More strange:

    3- only client READING HBase triggered this exception: client writing to
    HBase recovered without any error from this failure (and the writes were
    effectively performed)

    The same client? (Would be odd if same cilent had two different
    addresses for same region).


    To fix this, we had to restart all our HBase clients reading from the
    BIGGEST TABLES. So we guess that the issue come from the HBase client
    library or the region server itself.

    If restarting clients only, that would seem to finger the client lib.

    We reproduce this bug easily on our development servers: we kill a region
    server, we restart it and clients trying to "get" from regions served by the
    killed/restarted region server get this exception until we restart them.

    So my questions are:

    Is this a know issue ?
    I'm not sure what issue is. I don't think Ive seen this one before.
    Has it been fixed in HBase 0.90 ?
    There are over 1k fixes in 0.90 over 0.20.x.
    Is it required to handle this exception in a special way on client side
    (e.g. close / reopen the table) ?
    No. NSRE is usually an exception that doesn't surface out of the
    client unless a problem.

    You say you can recreate easy enough. Can you try on 0.90.x?

    You should upgrade anyways.

    St.Ack

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedJun 17, '11 at 1:42p
activeJun 17, '11 at 4:32p
posts2
users2
websitehbase.apache.org

2 users in discussion

Stack: 1 post Vincent Barat: 1 post

People

Translate

site design / logo © 2022 Grokbase