I think there is a problem in 0.90.6. Rolling restart seems broke.
Mistakenly I had previous RC out on cluster and had only updated the master.
My cluster would not start. The master would assign out -ROOT- but it
would fail to open on the regionserver with this:
2012-02-27 20:16:09,559 DEBUG
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler:
Processing open of -ROOT-,,0.70236052
2012-02-27 20:16:09,561 DEBUG
org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:7003-0x135c07495b70002 Attempting to transition node
70236052/-ROOT- from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING
2012-02-27 20:16:09,570 WARN
org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:7003-0x135c07495b70002 Attempt to transition the
unassigned node for 70236052 from M_ZK_REGION_OFFLINE to
RS_ZK_REGION_OPENING failed, the node existed but was in the state
M_SERVER_SHUTDOWN set by the server sv4r11s38:7001
2012-02-27 20:16:09,570 WARN
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed
transition from OFFLINE to OPENING for region=70236052
2012-02-27 20:16:09,570 WARN
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Region
was hijacked? It no longer exists, encodedName=70236052
See how its thinking a state of M_ZK_REGION_OFFLINE is actually
M_SERVER_SHUTDOWN?
This seems to be because of this commit:
------------------------------------------------------------------------
r1244137 | tedyu | 2012-02-14 09:54:23 -0800 (Tue, 14 Feb 2012) | 3 lines
HBASE-5379 Backport HBASE-4287 to 0.90 - If region opening fails, try
to transition region back to
"offline" in ZK (Ram)
It does this:
Index: src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java
===================================================================
--- src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java (revision
1090348)
+++ src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java (working
copy)
@@ -107,6 +107,7 @@
RS_ZK_REGION_CLOSED (2), // RS has finished closing a region
RS_ZK_REGION_OPENING (3), // RS is in process of opening a region
RS_ZK_REGION_OPENED (4), // RS has finished opening a region
+ RS_ZK_REGION_FAILED_OPEN (5), // RS failed to open a region
// Messages originating from Master to RS
M_RS_OPEN_REGION (20), // Master asking RS to open a region
If you look at EventType in EventHandler, the constructor does nothing
w/ the passed value. Thats a problem. That means the enum is using
default ordinal and the addition of the above into middle of enums
shifts lower enums up one; M_ZK_REGION_OFFLINE is just before
M_SERVER_SHUTDOWN.
It looks like we need to back out HBASE-5379 from 0.90 branch and cut a new RC.
Does rolling restart work for you Ram?
St.Ack
On Sat, Feb 18, 2012 at 11:25 PM, rama krishna wrote:Hi Devs
The download of 0.90.6RC4 is available at
http://people.apache.org/~ramkrishna/0.90.6RC4/The release has been signed by Stack as my key is not yet registered with web of trust.
Regarding the new issues added to 0.90 after RC3 are
HBASE-5377 Fix licenses on the 0.90 branch.
HBASE-5379 Backport HBASE-4287 to 0.90 - If region opening fails, try to transition region back
to "offline" in ZK
HBASE-5396 Handle the regions in regionPlans while processing ServerShutdownHandler(Jieshan)Improvements HBASE-5327 Print a message when an invalid hbase.rootdir is passed (Jimmy Xiang)
HBASE-5197 [replication] Handle socket timeouts in ReplicationSource
to prevent DDOS
HBASE-5395 CopyTable needs to use GenericOptionsParserI would like to freeze the check ins to 0.90 till this RC goes out of release.Please provide your votes on the release. The voting closes on 25th Feb.Hope to release out 0.90.6 before Feb ends.Thanks to all who contributed and looking forward for your support.
RegardsRam