Grokbase Groups HBase user June 2011
FAQ
Hi all
I am running Hbase on a 6 node cluster. HBase comes up fine, I can create
a test table and put rows and scan. But I can't cleanly shut it down. the
stop-hbase command goes on for ever printing dots. And I can see a couple
of RegionServers are not terminating.

here are the details:

5 RS , 1 Master
3 zookeepers

hbase : 0.90.1-cdh3u0, r (both hadoop & hbase are Cloudera cdh 3
distributions)
hadoop : 0.20.2-cdh3u0

master-log : http://pastebin.com/tBvJDPHc
rserver log : http://pastebin.com/EsWYAuUk
hbase_site.xml : http://pastebin.com/sU7EM2QK


During the shutdown, I see this in the region server logs:

2011-06-10 12:03:55,940 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 70236052
2011-06-10 12:03:58,942 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 70236052
....


thanks very much for your help!
Sujee Maniyam
http://sujee.net

Search Discussions

  • Stack at Jun 10, 2011 at 9:10 pm
    That looks like we're waiting on the shutdown of the -ROOT- region?
    Is that so. Anything on why it won't go down earlier in the log?
    St.Ack

    On Fri, Jun 10, 2011 at 12:23 PM, Sujee Maniyam wrote:
    Hi all
    I am running  Hbase on a 6 node cluster.   HBase comes up fine, I can create
    a test table and put rows and scan.  But I can't cleanly shut it down.  the
    stop-hbase command goes on for ever printing dots.  And I can see a couple
    of RegionServers are not terminating.

    here are the details:

    5 RS , 1 Master
    3 zookeepers

    hbase : 0.90.1-cdh3u0, r  (both hadoop & hbase are Cloudera cdh 3
    distributions)
    hadoop : 0.20.2-cdh3u0

    master-log : http://pastebin.com/tBvJDPHc
    rserver log : http://pastebin.com/EsWYAuUk
    hbase_site.xml : http://pastebin.com/sU7EM2QK


    During the shutdown, I see this in the region server logs:

    2011-06-10 12:03:55,940 DEBUG
    org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 70236052
    2011-06-10 12:03:58,942 DEBUG
    org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 70236052
    ....


    thanks very much for your help!
    Sujee Maniyam
    http://sujee.net
  • Sujee Maniyam at Jun 10, 2011 at 9:26 pm
    looks like this RS has the ROOT region. The shutdown was initiated by a
    kill <pid> command by me.
    any thing specific I should look for in logs / config?

    thanks
    http://sujee.net

    On Fri, Jun 10, 2011 at 2:09 PM, Stack wrote:

    That looks like we're waiting on the shutdown of the -ROOT- region?
    Is that so. Anything on why it won't go down earlier in the log?
    St.Ack

    On Fri, Jun 10, 2011 at 12:23 PM, Sujee Maniyam wrote:
    Hi all
    I am running Hbase on a 6 node cluster. HBase comes up fine, I can create
    a test table and put rows and scan. But I can't cleanly shut it down. the
    stop-hbase command goes on for ever printing dots. And I can see a couple
    of RegionServers are not terminating.

    here are the details:

    5 RS , 1 Master
    3 zookeepers

    hbase : 0.90.1-cdh3u0, r (both hadoop & hbase are Cloudera cdh 3
    distributions)
    hadoop : 0.20.2-cdh3u0

    master-log : http://pastebin.com/tBvJDPHc
    rserver log : http://pastebin.com/EsWYAuUk
    hbase_site.xml : http://pastebin.com/sU7EM2QK


    During the shutdown, I see this in the region server logs:

    2011-06-10 12:03:55,940 DEBUG
    org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 70236052
    2011-06-10 12:03:58,942 DEBUG
    org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 70236052
    ....


    thanks very much for your help!
    Sujee Maniyam
    http://sujee.net
  • Jean-Daniel Cryans at Jun 10, 2011 at 9:38 pm
    There's a DNS mismatch:

    devperf-sn10,60020,1307732557915
    devperf-sn10.pcs.hds.com,60020,1307732557915

    And 0.90 has a big regression with that (0.92 already has the fixes,
    but it's not released yet). Make sure your nodes all resolve the same
    hostnames per http://hbase.apache.org/book.html#dns

    BTW the clue comes from those kinda lines:

    2011-06-10 12:03:50,975 INFO
    org.apache.hadoop.hbase.zookeeper.RegionServerTracker: No HServerInfo
    found for devperf-sn10.pcs.hds.com,60020,1307732557915

    J-D
    On Fri, Jun 10, 2011 at 9:26 PM, Sujee Maniyam wrote:
    looks like this RS has the ROOT region.  The shutdown was initiated by a
    kill <pid>  command by me.
    any thing specific I should look for in logs / config?

    thanks
    http://sujee.net

    On Fri, Jun 10, 2011 at 2:09 PM, Stack wrote:

    That looks like we're waiting on the shutdown of the -ROOT- region?
    Is that so.  Anything on why it won't go down earlier in the log?
    St.Ack

    On Fri, Jun 10, 2011 at 12:23 PM, Sujee Maniyam wrote:
    Hi all
    I am running  Hbase on a 6 node cluster.   HBase comes up fine, I can create
    a test table and put rows and scan.  But I can't cleanly shut it down. the
    stop-hbase command goes on for ever printing dots.  And I can see a couple
    of RegionServers are not terminating.

    here are the details:

    5 RS , 1 Master
    3 zookeepers

    hbase : 0.90.1-cdh3u0, r  (both hadoop & hbase are Cloudera cdh 3
    distributions)
    hadoop : 0.20.2-cdh3u0

    master-log : http://pastebin.com/tBvJDPHc
    rserver log : http://pastebin.com/EsWYAuUk
    hbase_site.xml : http://pastebin.com/sU7EM2QK


    During the shutdown, I see this in the region server logs:

    2011-06-10 12:03:55,940 DEBUG
    org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 70236052
    2011-06-10 12:03:58,942 DEBUG
    org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 70236052
    ....


    thanks very much for your help!
    Sujee Maniyam
    http://sujee.net
  • Sujee Maniyam at Jun 10, 2011 at 11:06 pm
    Jean
    DNS mismatch was the cause! good-eyes!

    Here is what I had to do:

    1) changed hostnames to fully qualified ones, on all machines
    file : /etc/sysconfig/network
    before: devperf-sn6
    now : devperf-sn6.pcs.hds.com


    2) used fully qualified hostnames (FQHN) in 'hbase-site.xml'
    before : devperf-sn6
    now : devperf-sn6.pcs.hds.com

    Then even after a restart, zookeeper was still doing lookup on old
    hostnames and erroring out

    3) I had some shorthand alias in /etc/hosts (on master node)
    ip_address1 hmaster
    ip_address2 rs1
    I deleted these (and restarted machine just to be sure)

    4) delete zookeeper dir on ZK machines (this one was not very obvious!)
    rm -rf /tmp/hbase-hadoop

    only then things started working!

    I am happy to document this in wiki some place if it might help others.

    A) Is there any other 'best practices' to keep DNS / HOST LOOKUPs straight?
    A2) would it be safer if I used the IP addresses? Or reverse DNS
    required even then?

    B) I do miss the short hand aliases in /etc/hosts. Is there a way to have
    these aliases, without interfering with Hbase / zookeeper?

    thanks for your help!
    http://sujee.net

    On Fri, Jun 10, 2011 at 2:38 PM, Jean-Daniel Cryans wrote:

    There's a DNS mismatch:

    devperf-sn10,60020,1307732557915
    devperf-sn10.pcs.hds.com,60020,1307732557915

    And 0.90 has a big regression with that (0.92 already has the fixes,
    but it's not released yet). Make sure your nodes all resolve the same
    hostnames per http://hbase.apache.org/book.html#dns

    BTW the clue comes from those kinda lines:

    2011-06-10 12:03:50,975 INFO
    org.apache.hadoop.hbase.zookeeper.RegionServerTracker: No HServerInfo
    found for devperf-sn10.pcs.hds.com,60020,1307732557915

    J-D
    On Fri, Jun 10, 2011 at 9:26 PM, Sujee Maniyam wrote:
    looks like this RS has the ROOT region. The shutdown was initiated by a
    kill <pid> command by me.
    any thing specific I should look for in logs / config?

    thanks
    http://sujee.net

    On Fri, Jun 10, 2011 at 2:09 PM, Stack wrote:

    That looks like we're waiting on the shutdown of the -ROOT- region?
    Is that so. Anything on why it won't go down earlier in the log?
    St.Ack

    On Fri, Jun 10, 2011 at 12:23 PM, Sujee Maniyam wrote:
    Hi all
    I am running Hbase on a 6 node cluster. HBase comes up fine, I can create
    a test table and put rows and scan. But I can't cleanly shut it down. the
    stop-hbase command goes on for ever printing dots. And I can see a couple
    of RegionServers are not terminating.

    here are the details:

    5 RS , 1 Master
    3 zookeepers

    hbase : 0.90.1-cdh3u0, r (both hadoop & hbase are Cloudera cdh 3
    distributions)
    hadoop : 0.20.2-cdh3u0

    master-log : http://pastebin.com/tBvJDPHc
    rserver log : http://pastebin.com/EsWYAuUk
    hbase_site.xml : http://pastebin.com/sU7EM2QK


    During the shutdown, I see this in the region server logs:

    2011-06-10 12:03:55,940 DEBUG
    org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on
    70236052
    2011-06-10 12:03:58,942 DEBUG
    org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on
    70236052
    ....


    thanks very much for your help!
    Sujee Maniyam
    http://sujee.net
  • Jean-Daniel Cryans at Jun 10, 2011 at 11:14 pm

    I am happy to document this in wiki some place if it might help others.
    We could add this to the document I referred to.
    A) Is there any other 'best practices' to keep DNS / HOST LOOKUPs straight?
    I'd ask a sysadmin, but I think you just need to keep things
    "consistant" meaning always in the same order.
    A2)  would it be safer if I used the IP addresses?  Or reverse DNS
    required even then?
    Might not work in all cases.
    B) I do miss the short hand aliases in /etc/hosts.  Is there a way to have
    these aliases, without interfering with Hbase / zookeeper?
    You should still be able to keep them in there, they just need to be
    second to the FQDN.

    J-D

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedJun 10, '11 at 7:25p
activeJun 10, '11 at 11:14p
posts6
users3
websitehbase.apache.org

People

Translate

site design / logo © 2022 Grokbase