Grokbase Groups HBase user June 2011
FAQ
Hey all,

Interesting problem I'm running into. It appears that our master has died
after a failed region assignment. Any ideas as to why?


We are running version 0.90.1 in this cluster.
H3M1 = master
H3S* = regionservers

out on H3S3 the regionserver process is running (but didn't respond per the
log)

2011-06-27 10:18:22,517 INFO
org.apache.hadoop.hbase.master.handler.DisableTableHandler: Attemping to
disable table ContentGuidance
2011-06-27 10:18:22,522 INFO
org.apache.hadoop.hbase.master.handler.DisableTableHandler: Offlining 1
regions.
2011-06-27 10:18:23,527 INFO
org.apache.hadoop.hbase.master.handler.DisableTableHandler: Disabled table
is done=true
2011-06-27 10:18:24,536 INFO
org.apache.hadoop.hbase.master.handler.TableEventHandler: Handling table
operation C_M_DELETE_TABLE on table ContentGuidance
2011-06-27 10:18:24,584 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
Deleted region
ContentGuidance,,1298676132327.9d63b6173c7307c27aa9728f738f0ef3. from META
2011-06-27 10:19:17,296 WARN org.apache.hadoop.hbase.zookeeper.ZKTable:
Moving table ContentGuidance state to enabled but was already enabled
2011-06-27 10:19:17,313 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
HLog configuration: blocksize=64 MB, rollsize=60.8 MB, enabled=true,
flushlogentries=1, optionallogflushinternal=1000ms
2011-06-27 10:19:17,349 INFO
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syncFs
-- HDFS-200
2011-06-27 10:19:17,350 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
New hlog
/hbase/ContentGuidance/f38bbc9bc672b5ea415b6dcc7e1da43a/.logs/hlog.1309191557313
2011-06-27 10:19:17,350 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
Using getNumCurrentReplicas--HDFS-826
2011-06-27 10:19:17,433 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Onlined ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a.;
next sequenceid=1
2011-06-27 10:19:17,438 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
Added region
ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a. to META
2011-06-27 10:19:17,439 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a.
2011-06-27 10:19:17,439 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
IPC Server handler 87 on 60000.logSyncer exiting
2011-06-27 10:19:17,470 INFO
org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning 1 region(s)
round-robin across 4 server(s)
2011-06-27 10:19:17,472 INFO
org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning done
2011-06-27 10:19:17,480 INFO
org.apache.hadoop.hbase.master.AssignmentManager: H3S4,60020,1308946657703
unassigned znodes=1 of total=1
2011-06-27 10:19:21,530 INFO
org.apache.hadoop.hbase.master.handler.DisableTableHandler: Attemping to
disable table ContentGuidance
2011-06-27 10:19:21,536 INFO
org.apache.hadoop.hbase.master.handler.DisableTableHandler: Offlining 1
regions.
2011-06-27 10:19:22,543 INFO
org.apache.hadoop.hbase.master.handler.DisableTableHandler: Disabled table
is done=true
2011-06-27 10:19:23,544 INFO
org.apache.hadoop.hbase.master.handler.TableEventHandler: Handling table
operation C_M_DELETE_TABLE on table ContentGuidance
2011-06-27 10:19:23,551 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
Deleted region
ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a. from META
2011-06-27 10:20:17,479 FATAL org.apache.hadoop.hbase.master.HMaster: Failed
assignment of regions to serverName=H3S3,60020,1308946657608,
load=(requests=0, regions=7, usedHeap=976, maxHeap=1011)
java.net.SocketTimeoutException: Call to H3S3/10..x.x.x3:60020 failed on
socket timeout exception: java.net.SocketTimeoutException: 60000 millis
timeout while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/10..x.x.x:34904
remote=H3S3/10.x.x.x:60020]
at
org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:802)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:775)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
at $Proxy7.openRegions(Unknown Source)
at
org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:566)
at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:776)
at
org.apache.hadoop.hbase.master.AssignmentManager$SingleServerBulkAssigner.run(AssignmentManager.java:1310)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/10..x.x.x:34904
remote=H3S3/10..x.x.x:60020]
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.FilterInputStream.read(FilterInputStream.java:116)
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:299)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readInt(DataInputStream.java:370)
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:539)
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:477)
2011-06-27 10:20:17,482 INFO org.apache.hadoop.hbase.master.HMaster:
Aborting
2011-06-27 10:20:18,390 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
server on 60000
2011-06-27 10:20:18,390 INFO org.apache.hadoop.hbase.master.CatalogJanitor:
H3M1:60000-CatalogJanitor exiting
2011-06-27 10:20:18,390 INFO org.apache.hadoop.hbase.master.HMaster$1:
H3M1:60000-BalancerChore exiting
2011-06-27 10:20:18,392 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC
Server listener on 60000

Search Discussions

  • Jean-Daniel Cryans at Jun 27, 2011 at 5:21 pm
    Yeah the BulkAssigner uses a AssignmentManager.assign method that
    won't retry if it gets an exception because originally it was only
    used during startup (eg if you can't assign the regions properly at
    startup then kill yourself).

    In this case it happened to you when creating a table (it also uses
    the BulkAssigner now). To me it's a bug, it should just be assigned
    somewhere else. Mind filing a jira?

    As to how you got a 60 secs timeout, you should take a look at the RS
    log (and next time it happens also do a jstack).

    Thx,

    J-D
    On Mon, Jun 27, 2011 at 9:54 AM, Matt Davies wrote:
    Hey all,

    Interesting problem I'm running into.  It appears that our master has died
    after a failed region assignment.  Any ideas as to why?


    We are running version 0.90.1 in this cluster.
    H3M1 = master
    H3S* = regionservers

    out on H3S3 the regionserver process is running (but didn't respond per the
    log)

    2011-06-27 10:18:22,517 INFO
    org.apache.hadoop.hbase.master.handler.DisableTableHandler: Attemping to
    disable table ContentGuidance
    2011-06-27 10:18:22,522 INFO
    org.apache.hadoop.hbase.master.handler.DisableTableHandler: Offlining 1
    regions.
    2011-06-27 10:18:23,527 INFO
    org.apache.hadoop.hbase.master.handler.DisableTableHandler: Disabled table
    is done=true
    2011-06-27 10:18:24,536 INFO
    org.apache.hadoop.hbase.master.handler.TableEventHandler: Handling table
    operation C_M_DELETE_TABLE on table ContentGuidance
    2011-06-27 10:18:24,584 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
    Deleted region
    ContentGuidance,,1298676132327.9d63b6173c7307c27aa9728f738f0ef3. from META
    2011-06-27 10:19:17,296 WARN org.apache.hadoop.hbase.zookeeper.ZKTable:
    Moving table ContentGuidance state to enabled but was already enabled
    2011-06-27 10:19:17,313 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
    HLog configuration: blocksize=64 MB, rollsize=60.8 MB, enabled=true,
    flushlogentries=1, optionallogflushinternal=1000ms
    2011-06-27 10:19:17,349 INFO
    org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syncFs
    -- HDFS-200
    2011-06-27 10:19:17,350 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
    New hlog
    /hbase/ContentGuidance/f38bbc9bc672b5ea415b6dcc7e1da43a/.logs/hlog.1309191557313
    2011-06-27 10:19:17,350 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
    Using getNumCurrentReplicas--HDFS-826
    2011-06-27 10:19:17,433 INFO org.apache.hadoop.hbase.regionserver.HRegion:
    Onlined ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a.;
    next sequenceid=1
    2011-06-27 10:19:17,438 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
    Added region
    ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a. to META
    2011-06-27 10:19:17,439 INFO org.apache.hadoop.hbase.regionserver.HRegion:
    Closed ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a.
    2011-06-27 10:19:17,439 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
    IPC Server handler 87 on 60000.logSyncer exiting
    2011-06-27 10:19:17,470 INFO
    org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning 1 region(s)
    round-robin across 4 server(s)
    2011-06-27 10:19:17,472 INFO
    org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning done
    2011-06-27 10:19:17,480 INFO
    org.apache.hadoop.hbase.master.AssignmentManager: H3S4,60020,1308946657703
    unassigned znodes=1 of total=1
    2011-06-27 10:19:21,530 INFO
    org.apache.hadoop.hbase.master.handler.DisableTableHandler: Attemping to
    disable table ContentGuidance
    2011-06-27 10:19:21,536 INFO
    org.apache.hadoop.hbase.master.handler.DisableTableHandler: Offlining 1
    regions.
    2011-06-27 10:19:22,543 INFO
    org.apache.hadoop.hbase.master.handler.DisableTableHandler: Disabled table
    is done=true
    2011-06-27 10:19:23,544 INFO
    org.apache.hadoop.hbase.master.handler.TableEventHandler: Handling table
    operation C_M_DELETE_TABLE on table ContentGuidance
    2011-06-27 10:19:23,551 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
    Deleted region
    ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a. from META
    2011-06-27 10:20:17,479 FATAL org.apache.hadoop.hbase.master.HMaster: Failed
    assignment of regions to serverName=H3S3,60020,1308946657608,
    load=(requests=0, regions=7, usedHeap=976, maxHeap=1011)
    java.net.SocketTimeoutException: Call to H3S3/10..x.x.x3:60020 failed on
    socket timeout exception: java.net.SocketTimeoutException: 60000 millis
    timeout while waiting for channel to be ready for read. ch :
    java.nio.channels.SocketChannel[connected local=/10..x.x.x:34904
    remote=H3S3/10.x.x.x:60020]
    at
    org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:802)
    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:775)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
    at $Proxy7.openRegions(Unknown Source)
    at
    org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:566)
    at
    org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:776)
    at
    org.apache.hadoop.hbase.master.AssignmentManager$SingleServerBulkAssigner.run(AssignmentManager.java:1310)
    at
    java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)
    Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
    waiting for channel to be ready for read. ch :
    java.nio.channels.SocketChannel[connected local=/10..x.x.x:34904
    remote=H3S3/10..x.x.x:60020]
    at
    org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
    at java.io.FilterInputStream.read(FilterInputStream.java:116)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:299)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
    at java.io.DataInputStream.readInt(DataInputStream.java:370)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:539)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:477)
    2011-06-27 10:20:17,482 INFO org.apache.hadoop.hbase.master.HMaster:
    Aborting
    2011-06-27 10:20:18,390 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
    server on 60000
    2011-06-27 10:20:18,390 INFO org.apache.hadoop.hbase.master.CatalogJanitor:
    H3M1:60000-CatalogJanitor exiting
    2011-06-27 10:20:18,390 INFO org.apache.hadoop.hbase.master.HMaster$1:
    H3M1:60000-BalancerChore exiting
    2011-06-27 10:20:18,392 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC
    Server listener on 60000
  • Jean-Daniel Cryans at Jun 28, 2011 at 9:15 pm
    Looks like I'll be fixing it in the context of HBASE-3984.

    J-D

    On Mon, Jun 27, 2011 at 10:21 AM, Jean-Daniel Cryans
    wrote:
    Yeah the BulkAssigner uses a AssignmentManager.assign method that
    won't retry if it gets an exception because originally it was only
    used during startup (eg if you can't assign the regions properly at
    startup then kill yourself).

    In this case it happened to you when creating a table (it also uses
    the BulkAssigner now). To me it's a bug, it should just be assigned
    somewhere else. Mind filing a jira?

    As to how you got a 60 secs timeout, you should take a look at the RS
    log (and next time it happens also do a jstack).

    Thx,

    J-D
    On Mon, Jun 27, 2011 at 9:54 AM, Matt Davies wrote:
    Hey all,

    Interesting problem I'm running into.  It appears that our master has died
    after a failed region assignment.  Any ideas as to why?


    We are running version 0.90.1 in this cluster.
    H3M1 = master
    H3S* = regionservers

    out on H3S3 the regionserver process is running (but didn't respond per the
    log)

    2011-06-27 10:18:22,517 INFO
    org.apache.hadoop.hbase.master.handler.DisableTableHandler: Attemping to
    disable table ContentGuidance
    2011-06-27 10:18:22,522 INFO
    org.apache.hadoop.hbase.master.handler.DisableTableHandler: Offlining 1
    regions.
    2011-06-27 10:18:23,527 INFO
    org.apache.hadoop.hbase.master.handler.DisableTableHandler: Disabled table
    is done=true
    2011-06-27 10:18:24,536 INFO
    org.apache.hadoop.hbase.master.handler.TableEventHandler: Handling table
    operation C_M_DELETE_TABLE on table ContentGuidance
    2011-06-27 10:18:24,584 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
    Deleted region
    ContentGuidance,,1298676132327.9d63b6173c7307c27aa9728f738f0ef3. from META
    2011-06-27 10:19:17,296 WARN org.apache.hadoop.hbase.zookeeper.ZKTable:
    Moving table ContentGuidance state to enabled but was already enabled
    2011-06-27 10:19:17,313 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
    HLog configuration: blocksize=64 MB, rollsize=60.8 MB, enabled=true,
    flushlogentries=1, optionallogflushinternal=1000ms
    2011-06-27 10:19:17,349 INFO
    org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syncFs
    -- HDFS-200
    2011-06-27 10:19:17,350 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
    New hlog
    /hbase/ContentGuidance/f38bbc9bc672b5ea415b6dcc7e1da43a/.logs/hlog.1309191557313
    2011-06-27 10:19:17,350 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
    Using getNumCurrentReplicas--HDFS-826
    2011-06-27 10:19:17,433 INFO org.apache.hadoop.hbase.regionserver.HRegion:
    Onlined ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a.;
    next sequenceid=1
    2011-06-27 10:19:17,438 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
    Added region
    ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a. to META
    2011-06-27 10:19:17,439 INFO org.apache.hadoop.hbase.regionserver.HRegion:
    Closed ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a.
    2011-06-27 10:19:17,439 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
    IPC Server handler 87 on 60000.logSyncer exiting
    2011-06-27 10:19:17,470 INFO
    org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning 1 region(s)
    round-robin across 4 server(s)
    2011-06-27 10:19:17,472 INFO
    org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning done
    2011-06-27 10:19:17,480 INFO
    org.apache.hadoop.hbase.master.AssignmentManager: H3S4,60020,1308946657703
    unassigned znodes=1 of total=1
    2011-06-27 10:19:21,530 INFO
    org.apache.hadoop.hbase.master.handler.DisableTableHandler: Attemping to
    disable table ContentGuidance
    2011-06-27 10:19:21,536 INFO
    org.apache.hadoop.hbase.master.handler.DisableTableHandler: Offlining 1
    regions.
    2011-06-27 10:19:22,543 INFO
    org.apache.hadoop.hbase.master.handler.DisableTableHandler: Disabled table
    is done=true
    2011-06-27 10:19:23,544 INFO
    org.apache.hadoop.hbase.master.handler.TableEventHandler: Handling table
    operation C_M_DELETE_TABLE on table ContentGuidance
    2011-06-27 10:19:23,551 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
    Deleted region
    ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a. from META
    2011-06-27 10:20:17,479 FATAL org.apache.hadoop.hbase.master.HMaster: Failed
    assignment of regions to serverName=H3S3,60020,1308946657608,
    load=(requests=0, regions=7, usedHeap=976, maxHeap=1011)
    java.net.SocketTimeoutException: Call to H3S3/10..x.x.x3:60020 failed on
    socket timeout exception: java.net.SocketTimeoutException: 60000 millis
    timeout while waiting for channel to be ready for read. ch :
    java.nio.channels.SocketChannel[connected local=/10..x.x.x:34904
    remote=H3S3/10.x.x.x:60020]
    at
    org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:802)
    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:775)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
    at $Proxy7.openRegions(Unknown Source)
    at
    org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:566)
    at
    org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:776)
    at
    org.apache.hadoop.hbase.master.AssignmentManager$SingleServerBulkAssigner.run(AssignmentManager.java:1310)
    at
    java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)
    Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
    waiting for channel to be ready for read. ch :
    java.nio.channels.SocketChannel[connected local=/10..x.x.x:34904
    remote=H3S3/10..x.x.x:60020]
    at
    org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
    at java.io.FilterInputStream.read(FilterInputStream.java:116)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:299)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
    at java.io.DataInputStream.readInt(DataInputStream.java:370)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:539)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:477)
    2011-06-27 10:20:17,482 INFO org.apache.hadoop.hbase.master.HMaster:
    Aborting
    2011-06-27 10:20:18,390 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
    server on 60000
    2011-06-27 10:20:18,390 INFO org.apache.hadoop.hbase.master.CatalogJanitor:
    H3M1:60000-CatalogJanitor exiting
    2011-06-27 10:20:18,390 INFO org.apache.hadoop.hbase.master.HMaster$1:
    H3M1:60000-BalancerChore exiting
    2011-06-27 10:20:18,392 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC
    Server listener on 60000
  • Matt Davies at Jun 29, 2011 at 6:34 pm
    Thanks JD. Sorry - I got swamped and was going to file a Jira, but sounds
    like you don't need it now.


    On Tue, Jun 28, 2011 at 3:14 PM, Jean-Daniel Cryans wrote:

    Looks like I'll be fixing it in the context of HBASE-3984.

    J-D

    On Mon, Jun 27, 2011 at 10:21 AM, Jean-Daniel Cryans
    wrote:
    Yeah the BulkAssigner uses a AssignmentManager.assign method that
    won't retry if it gets an exception because originally it was only
    used during startup (eg if you can't assign the regions properly at
    startup then kill yourself).

    In this case it happened to you when creating a table (it also uses
    the BulkAssigner now). To me it's a bug, it should just be assigned
    somewhere else. Mind filing a jira?

    As to how you got a 60 secs timeout, you should take a look at the RS
    log (and next time it happens also do a jstack).

    Thx,

    J-D
    On Mon, Jun 27, 2011 at 9:54 AM, Matt Davies wrote:
    Hey all,

    Interesting problem I'm running into. It appears that our master has
    died
    after a failed region assignment. Any ideas as to why?


    We are running version 0.90.1 in this cluster.
    H3M1 = master
    H3S* = regionservers

    out on H3S3 the regionserver process is running (but didn't respond per
    the
    log)

    2011-06-27 10:18:22,517 INFO
    org.apache.hadoop.hbase.master.handler.DisableTableHandler: Attemping to
    disable table ContentGuidance
    2011-06-27 10:18:22,522 INFO
    org.apache.hadoop.hbase.master.handler.DisableTableHandler: Offlining 1
    regions.
    2011-06-27 10:18:23,527 INFO
    org.apache.hadoop.hbase.master.handler.DisableTableHandler: Disabled
    table
    is done=true
    2011-06-27 10:18:24,536 INFO
    org.apache.hadoop.hbase.master.handler.TableEventHandler: Handling table
    operation C_M_DELETE_TABLE on table ContentGuidance
    2011-06-27 10:18:24,584 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
    Deleted region
    ContentGuidance,,1298676132327.9d63b6173c7307c27aa9728f738f0ef3. from
    META
    2011-06-27 10:19:17,296 WARN org.apache.hadoop.hbase.zookeeper.ZKTable:
    Moving table ContentGuidance state to enabled but was already enabled
    2011-06-27 10:19:17,313 INFO
    org.apache.hadoop.hbase.regionserver.wal.HLog:
    HLog configuration: blocksize=64 MB, rollsize=60.8 MB, enabled=true,
    flushlogentries=1, optionallogflushinternal=1000ms
    2011-06-27 10:19:17,349 INFO
    org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using
    syncFs
    -- HDFS-200
    2011-06-27 10:19:17,350 INFO
    org.apache.hadoop.hbase.regionserver.wal.HLog:
    New hlog
    /hbase/ContentGuidance/f38bbc9bc672b5ea415b6dcc7e1da43a/.logs/hlog.1309191557313
    2011-06-27 10:19:17,350 INFO
    org.apache.hadoop.hbase.regionserver.wal.HLog:
    Using getNumCurrentReplicas--HDFS-826
    2011-06-27 10:19:17,433 INFO
    org.apache.hadoop.hbase.regionserver.HRegion:
    Onlined
    ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a.;
    next sequenceid=1
    2011-06-27 10:19:17,438 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
    Added region
    ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a. to META
    2011-06-27 10:19:17,439 INFO
    org.apache.hadoop.hbase.regionserver.HRegion:
    Closed ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a.
    2011-06-27 10:19:17,439 INFO
    org.apache.hadoop.hbase.regionserver.wal.HLog:
    IPC Server handler 87 on 60000.logSyncer exiting
    2011-06-27 10:19:17,470 INFO
    org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning 1
    region(s)
    round-robin across 4 server(s)
    2011-06-27 10:19:17,472 INFO
    org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning done
    2011-06-27 10:19:17,480 INFO
    org.apache.hadoop.hbase.master.AssignmentManager:
    H3S4,60020,1308946657703
    unassigned znodes=1 of total=1
    2011-06-27 10:19:21,530 INFO
    org.apache.hadoop.hbase.master.handler.DisableTableHandler: Attemping to
    disable table ContentGuidance
    2011-06-27 10:19:21,536 INFO
    org.apache.hadoop.hbase.master.handler.DisableTableHandler: Offlining 1
    regions.
    2011-06-27 10:19:22,543 INFO
    org.apache.hadoop.hbase.master.handler.DisableTableHandler: Disabled
    table
    is done=true
    2011-06-27 10:19:23,544 INFO
    org.apache.hadoop.hbase.master.handler.TableEventHandler: Handling table
    operation C_M_DELETE_TABLE on table ContentGuidance
    2011-06-27 10:19:23,551 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
    Deleted region
    ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a. from
    META
    2011-06-27 10:20:17,479 FATAL org.apache.hadoop.hbase.master.HMaster:
    Failed
    assignment of regions to serverName=H3S3,60020,1308946657608,
    load=(requests=0, regions=7, usedHeap=976, maxHeap=1011)
    java.net.SocketTimeoutException: Call to H3S3/10..x.x.x3:60020 failed on
    socket timeout exception: java.net.SocketTimeoutException: 60000 millis
    timeout while waiting for channel to be ready for read. ch :
    java.nio.channels.SocketChannel[connected local=/10..x.x.x:34904
    remote=H3S3/10.x.x.x:60020]
    at
    org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:802)
    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:775)
    at
    org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
    at $Proxy7.openRegions(Unknown Source)
    at
    org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:566)
    at
    org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:776)
    at
    org.apache.hadoop.hbase.master.AssignmentManager$SingleServerBulkAssigner.run(AssignmentManager.java:1310)
    at
    java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)
    Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
    waiting for channel to be ready for read. ch :
    java.nio.channels.SocketChannel[connected local=/10..x.x.x:34904
    remote=H3S3/10..x.x.x:60020]
    at
    org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
    at
    org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
    at
    org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
    at java.io.FilterInputStream.read(FilterInputStream.java:116)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:299)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
    at java.io.DataInputStream.readInt(DataInputStream.java:370)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:539)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:477)
    2011-06-27 10:20:17,482 INFO org.apache.hadoop.hbase.master.HMaster:
    Aborting
    2011-06-27 10:20:18,390 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
    server on 60000
    2011-06-27 10:20:18,390 INFO
    org.apache.hadoop.hbase.master.CatalogJanitor:
    H3M1:60000-CatalogJanitor exiting
    2011-06-27 10:20:18,390 INFO org.apache.hadoop.hbase.master.HMaster$1:
    H3M1:60000-BalancerChore exiting
    2011-06-27 10:20:18,392 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
    IPC
    Server listener on 60000

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedJun 27, '11 at 4:55p
activeJun 29, '11 at 6:34p
posts4
users3
websitehbase.apache.org

People

Translate

site design / logo © 2022 Grokbase