FAQ
(moving to the user mailing list, where it belongs)

My educated guess is that you had a GC pause that lasted for more than a
minute while a file was being written to. Even if the write wasn't
happening, your region server would have committed suicide anyways since it
was probably past it's lease timeout. In order to confirm that, we would
need to see the logs from a few minutes before the exception happened. Also,
please use pastebin.com (or similar) to store the logs.

Regarding the NPE, it's definitely not something that should happen but the
region server was already on it's way out.

J-D

2011/1/31 Zhou Shuaifeng <zhoushuaifeng@huawei.com>
2011-01-28 17:13:47,633

java.net.*SocketTimeoutException*: Call to c3s4.site/167.6.5.13:60020failed on
*socket timeout exception*: java.net.SocketTimeoutException: 60000 millis
timeout while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/167.6.5.14:*50735*remote=c3s4.site/
167.6.5.13:60020]

2011-01-28 17:13:47,634 INFO org.apache.hadoop.hbase.regionserver.*HRegionServer:
STOPPED: Failed open of daughter*



According this “60000 millis timeout”, I checked the log at 17:12:47,
there is a null point exception and ClosedChannelException, is there any
relation?:



2011-01-28 17:12:47,029 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder, call getRegionInfo([B@3a3e8cfb) from 167.6.5.17:51035: output
error

2011-01-28 17:12:47,029 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder, call getRegionInfo([B@7a0a4ce5) from 167.6.5.14:50307: output
error

2011-01-28 17:12:47,029 WARN org.apache.hadoop.ipc.HBaseServer: PRI IPC
Server handler 2 on 60020 caught: java.nio.channels.ClosedChannelException

at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)

at
sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)

at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1336)

at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:724)

at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:789)

at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1080)



2011-01-28 17:12:47,029 WARN org.apache.hadoop.ipc.HBaseServer: PRI IPC
Server handler 4 on 60020 caught: java.nio.channels.ClosedChannelException

at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)

at
sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)

at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1336)

at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:724)

at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:789)

at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1080)



2011-01-28 17:12:47,029 WARN org.apache.hadoop.ipc.HBaseServer: PRI IPC
Server handler 5 on 60020 caught: java.nio.channels.ClosedChannelException

at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)

at
sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)

at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1336)

at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:724)

at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:789)

at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1080)



2011-01-28 17:12:47,097 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
regionserver:60020-0x22dcbbf01b40006 Retrieved 78 byte(s) of data from znode
/hbase/unassigned/1489005343;
data=region=ufdr_c,201101258613813763326#174006,1295973611272,
server=c3s1.site:60000, state=M_ZK_REGION_OFFLINE

2011-01-28 17:12:47,097 ERROR
org.apache.hadoop.hbase.executor.EventHandler: Caught throwable while
processing event M_RS_OPEN_REGION

java.lang.NullPointerException

at
org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75)

at
org.apache.hadoop.hbase.executor.RegionTransitionData.fromBytes(RegionTransitionData.java:198)

at
org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:672)

at
org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNodeOpening(ZKAssign.java:552)

at
org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNodeOpening(ZKAssign.java:545)

at
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.transitionZookeeperOfflineToOpening(OpenRegionHandler.java:296)

at
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:90)

at
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)

at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:662)





Zhou Shuaifeng(Frank)
HUAWEI TECHNOLOGIES CO.,LTD. [image: huawei_logo]




-------------------------------------------------------------------------------------------------------------------------------------
This e-mail and its attachments contain confidential information from
HUAWEI, which
is intended only for the person or entity whose address is listed above.
Any use of the
information contained herein in any way (including, but not limited to,
total or partial
disclosure, reproduction, or dissemination) by persons other than the
intended
recipient(s) is prohibited. If you receive this e-mail in error, please
notify the sender by
phone or email immediately and delete it!

*发件人**:* Zhou Shuaifeng
*发送时间:* 2011年1月31日 17:44
*收件人:* dev@hbase.apache.org
*抄送:* yanlijun@huawei.com; wei.zhi@huawei.com
*主题:* IPC Server Responder out put error causing RegionServer down



Hi,



when I update my hbase from 0.20.6 to 0.90, some error occur:

Below is logs on 167.6.5.13, because Meta table is served by this server,
so other regionservers want to connect this server to getRegionInfo, but got
some error:



2011-01-28 17:14:18,291 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder, call getRegionInfo([B@15e43aec) from 167.6.5.11:34887: output
error

2011-01-28 17:14:18,291 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder, call getRegionInfo([B@3ad8672e) from 167.6.5.14:50735: output
error



This caused the server on 11 and 14 socket timeout and shutdown:



2011-01-28 17:13:47,633 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
serverName=c3s5.site,60020,1296203449919, load=(requests=0, regions=2710,
usedHeap=4622, maxHeap=8165): *Failed open of daughter*ufdr_c,201101268613812388592#085317,1296205930314.cd36b19931bd2b2adcca176a88dbd6e5.

java.net.*SocketTimeoutException*: Call to c3s4.site/167.6.5.13:60020failed on
*socket timeout exception*: java.net.SocketTimeoutException: 60000 millis
timeout while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/167.6.5.14:*50735*remote=c3s4.site/
167.6.5.13:60020]

2011-01-28 17:13:47,634 INFO org.apache.hadoop.hbase.regionserver.*HRegionServer:
STOPPED: Failed open of daughter*

* *

*Now, my question is what**’**s the cause of **“*output error” ?



Before the “output error” warning, there is an EOFException, is there any
relationship with this error?



2011-01-28 17:13:43,413 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
listener on 60020: readAndProcess threw exception java.io.EOFException.
Count of bytes read: 0

java.io.EOFException

at java.io.DataInputStream.readFully(DataInputStream.java:180)

at java.io.DataInputStream.readUTF(DataInputStream.java:592)

at java.io.DataInputStream.readUTF(DataInputStream.java:547)

at
org.apache.hadoop.hbase.ipc.HBaseRPC$Invocation.readFields(HBaseRPC.java:122)

at
org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:975)

at
org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:943)

at
org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:519)

at
org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:313)

at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:662)

2011-01-28 17:13:43,413 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open
region: ufdr_c,201101258613814756901#232839,1295985296267.2025697489

2011-01-28 17:13:43,418 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
regionserver:60020-0x22dcbbf01b40006 Retrieved 92 byte(s) of data from znode
/hbase/unassigned/1489005343;
data=region=ufdr_c,201101258613813763326#174006,1295973611272,
server=c3s4.site,60020,1296203450194, state=RS_ZK_REGION_OPENING

2011-01-28 17:13:43,420 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
regionserver:60020-0x22dcbbf01b40006 Retrieved 92 byte(s) of data from znode
/hbase/unassigned/1497320702;
data=region=ufdr_c,201101268613810041680#085457,1296052066413,
server=c3s4.site,60020,1296203450194, state=RS_ZK_REGION_OPENING

2011-01-28 17:13:43,441 DEBUG
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher:
regionserver:60020-0x22dcbbf01b40006 Received ZooKeeper Event,
type=NodeDataChanged, state=SyncConnected, path=/hbase/unassigned/1489005343

2011-01-28 17:13:43,441 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:60020-0x22dcbbf01b40006 Successfully transitioned node
1489005343 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING







Zhou Shuaifeng(Frank)

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 3 of 3 | next ›
Discussion Overview
groupdev @
categorieshbase, hadoop
postedJan 31, '11 at 9:44a
activeJan 31, '11 at 6:23p
posts3
users2
websitehbase.apache.org

People

Translate

site design / logo © 2022 Grokbase