FAQ
Hi, HBasers!

We are running a small (4 nodes) staging HBase cluster and have got stuck
in a strange state: we got our 'page' table not in any "boolean" state:

HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.92.1-cdh4.0.0, rUnknown, Mon Jun 4 17:25:27 PDT 2012

hbase(main):001:0> is_enabled 'page'
false

0 row(s) in 0.6310 seconds

hbase(main):002:0> is_disabled 'page'
false

0 row(s) in 0.0420 seconds

*hbase hbck* does not report inconsistencies:

...
page is okay.
Number of regions: 183
Deployed on: hbase02dev.303net.pvt,60020,1345151162063
hbase03dev.303net.pvt,60020,1345151153104
hbase04dev.303net.pvt,60020,1345151152636
...
0 inconsistencies detected.
Status: OK

Now we are blocked to update schema for this table since we cannot disable
it for '*alter table*':

hbase(main):003:0> disable 'page'

ERROR: org.apache.hadoop.hbase.TableNotEnabledException:
org.apache.hadoop.hbase.TableNotEnabledException: page
at
org.apache.hadoop.hbase.master.handler.DisableTableHandler.(HMaster.java:1154)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1336)

But we cannot enable it to leave this state:

hbase(main):001:0> enable 'page'

ERROR: org.apache.hadoop.hbase.TableNotDisabledException:
org.apache.hadoop.hbase.TableNotDisabledException: page
at
org.apache.hadoop.hbase.master.handler.EnableTableHandler.(HMaster.java:1142)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1336)

Data fromt the table is still available.

We tried restarting whole cluster, restarting Zookeeper. Nothing helped to
fix it. Did anybody have such issue?

I would appreciate on any help how to fix it.

--
Sincerely yours
Pavel Vozdvizhenskiy
Grid Dynamics / BigData

Search Discussions

  • Michael Stack at Aug 16, 2012 at 11:47 pm

    On Thu, Aug 16, 2012 at 3:48 PM, Pavel Vozdvizhenskiy wrote:
    I would appreciate on any help how to fix it.
    I've not come across this one before.

    If you list whats under /hbase/table? Does the table show there? You
    could try removing the znode? You can look by doing ./bin/hbase zkcli

    St.Ack
  • Himanshu Vashishtha at Aug 17, 2012 at 12:40 am
    I am assuming you initiated disable table request at the shell.
    Is it possible to have master server logs since you initiated the above request?

    I think the znode Stack is referring to is in DISABLING state;
    deleting it should resolve it but good to know the root cause.
    can you look at the UI, whether this table is hosted by regionservers.

    Himanshu
    On Thu, Aug 16, 2012 at 5:47 PM, Stack wrote:
    On Thu, Aug 16, 2012 at 3:48 PM, Pavel Vozdvizhenskiy
    wrote:
    I would appreciate on any help how to fix it.
    I've not come across this one before.

    If you list whats under /hbase/table? Does the table show there? You
    could try removing the znode? You can look by doing ./bin/hbase zkcli

    St.Ack
  • Pavel Vozdvizhenskiy at Aug 17, 2012 at 11:06 am
    Hi, Stack.

    Very much thank you for your help: we have resolved this issue!

    There was a stale *znode* in zookeeper tree that states that node is in
    'enabling' state, but last modified several days ago:

    [zk: myserver:2181(CONNECTED) 0] ls /hbase/table
    [page]
    [zk: myserver:2181(CONNECTED) 1] ls /hbase/table/page
    []
    [zk: myserver:2181(CONNECTED) 2] get /hbase/table/page
    � 11174@myserver*ENABLING*
    cZxid = 0x31a4a
    ctime = Fri Aug 10 11:23:18 EDT 2012
    mZxid = 0x31c4f
    mtime = Fri Aug 10 11:24:57 EDT 2012
    pZxid = 0x31a4a
    cversion = 0
    dataVersion = 5
    aclVersion = 0
    ephemeralOwner = 0x0
    dataLength = 40
    numChildren = 0

    I did following to fix this issue:
    1) stopped HBase: master and all region servers;
    2) stopped Zookeeper;
    3) made backup of Zookeeper data (/var/lib/zookeeper)
    4) started Zookeeper;
    5) removed znode using Zookeeper CLI:
    [zk: hbase01dev.303net.pvt:2181(CONNECTED) 3] delete /hbase/table/page
    [zk: hbase01dev.303net.pvt:2181(CONNECTED) 4] ls /hbase/table/page
    Node does not exist: /hbase/table/page
    6) started HBase: mater and all region servers.

    After this everything was fine: the table was showing as 'enabled' and
    'disable' worked as well:

    hbase(main):001:0> is_enabled 'page'
    true

    0 row(s) in 0.7120 seconds

    hbase(main):002:0> disable 'page'
    0 row(s) in 3.1160 seconds

    When it was in hung state, the table was actually served by RSes: I could
    count rows, do scans, run MR jobs using HBaseStorage pig class, etc. What
    was blocked is updates to table schema: alter did not work with table not
    in disabled state, but disabled did not work with table not in enabled
    state.

    All regions of the table were hosted by RSes. Here is excerpt from
    underlying HDFS structure:

    -rw-r--r-- 3 hbase hbase 1307 2012-08-10 11:24
    /hbase/page/.tableinfo.0000000004
    drwxr-xr-x - hbase hbase 0 2012-08-10 11:24 /hbase/page/.tmp
    drwxr-xr-x - hbase hbase 0 2012-08-16 18:55
    /hbase/page/01084884c5d8b61a5a1e529822563cae
    -rw-r--r-- 3 hbase hbase 523 2012-08-13 16:39
    /hbase/page/01084884c5d8b61a5a1e529822563cae/.regioninfo
    drwxr-xr-x - hbase hbase 0 2012-08-16 19:57
    /hbase/page/01084884c5d8b61a5a1e529822563cae/.tmp
    drwxr-xr-x - hbase hbase 0 2012-08-17 03:28
    /hbase/page/01084884c5d8b61a5a1e529822563cae/s
    -rw-rw-rw- 3 jenkins supergroup 742993 2012-08-17 03:08
    /hbase/page/01084884c5d8b61a5a1e529822563cae/s/11adf78853944d02a3e39c1eb0b631a3
    -rw-rw-rw- 3 jenkins supergroup 916762 2012-08-17 00:22
    /hbase/page/01084884c5d8b61a5a1e529822563cae/s/a0a9c21d470549f9ab6c29d73d26ce8d
    -rw-r--r-- 3 hbase hbase 4713301 2012-08-16 18:55
    /hbase/page/01084884c5d8b61a5a1e529822563cae/s/cf447b6576ad4cfe898dfee8e77c0e2c
    drwxr-xr-x - hbase hbase 0 2012-08-17 03:28
    /hbase/page/01084884c5d8b61a5a1e529822563cae/t
    -rw-rw-rw- 3 jenkins supergroup 27844042 2012-08-17 00:22
    /hbase/page/01084884c5d8b61a5a1e529822563cae/t/48a5c5cb10204854a7b76017145dfda7
    -rw-r--r-- 3 hbase hbase 697429695 2012-08-16 19:57
    /hbase/page/01084884c5d8b61a5a1e529822563cae/t/58b9027f020548e880f0d8c3c636ce18
    -rw-rw-rw- 3 jenkins supergroup 15529996 2012-08-17 03:08
    /hbase/page/01084884c5d8b61a5a1e529822563cae/t/bdc10a1f6285412caa60d23d745c1180
    ...

    The question I have now is whether I had to stop whole hbase cluster or not?
    Is is safe to remove stale *znode* while HBase is operating, if I sure no
    compaction / splitting is going?

    --
    Sincerely yours
    Pavel Vozdvizhenskiy
    Grid Dynamics / BigData


    On Fri, Aug 17, 2012 at 3:47 AM, Stack wrote:

    On Thu, Aug 16, 2012 at 3:48 PM, Pavel Vozdvizhenskiy
    wrote:
    I would appreciate on any help how to fix it.
    I've not come across this one before.

    If you list whats under /hbase/table? Does the table show there? You
    could try removing the znode? You can look by doing ./bin/hbase zkcli

    St.Ack
  • Michael Stack at Aug 17, 2012 at 2:00 pm

    On Fri, Aug 17, 2012 at 4:05 AM, Pavel Vozdvizhenskiy wrote:
    The question I have now is whether I had to stop whole hbase cluster or not?
    Is is safe to remove stale *znode* while HBase is operating, if I sure no
    compaction / splitting is going?
    My guess is that it would have worked.

    What you did was better though. I like how you did the backup first.
    The restart would also for sure cleared up any other strange state
    hangovers that may have come on because of the enabling/disabling
    issue.

    St.Ack

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedAug 16, '12 at 10:49p
activeAug 17, '12 at 2:00p
posts5
users3
websitehbase.apache.org

People

Translate

site design / logo © 2019 Grokbase