FAQ
Hello,

We are running into a Region server shutdown again during write loads (90
clients) , with Connection rest by peer issue? Any suggestions..

Setup: 30 Nodes. Hbase 0.90.0, Hadoop-append , CentOS, dell 1950 6G RAM.

2011-02-04 02:36:16,808 WARN org.apache.hadoop.hdfs.DFSClient:
DFSOutputStream ResponseProcessor exception for block
blk_-4303650603271778933_2022254java.io.IOException: Connection reset by
peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
at sun.nio.ch.IOUtil.read(IOUtil.java:206)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
at
org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.DataInputStream.readFully(DataInputStream.java:178)
at java.io.DataInputStream.readLong(DataInputStream.java:399)
at
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:122)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2547)

2011-02-04 02:36:16,809 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_-4303650603271778933_2022254 bad datanode[0]
10.76.99.115:50010
2011-02-04 02:36:16,880 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
serverName=nafhdi12708mwh.io.askjeeves.info,60020,1296782105691,
load=(requests=63, regions=165, usedHeap=2274, maxHeap=4070): Failed open of
daughter
compresstable,\x074k\xB6\x91\xC6\x98\x87,1296815758006.86cf8a61169de38e7ea72fb01c351eb1.
java.io.IOException: All datanodes XXXXXXXXXXXX:50010 are bad. Aborting...
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2680)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1500(DFSClient.java:2172)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2371)

hdfs config:

<property>
<name>dfs.datanode.max.xcievers</name>
<value>4096</value>
</property>

<property>
<name>dfs.replication</name>
<value>3</value>
</property>

<property>
<name>dfs.datanode.du.reserved</name>
<value>5368709120</value>
</property>

<property>
<name>dfs.datanode.handler.count</name>
<value>100</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>100</value>
</property>

<property>
<name>dfs.datanode.socket.write.timeout</name>
<value>0</value>

GC OPTS: -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -Xmn256m
-XX:CMSInitiatingOccupancyFraction=70

Thanks,
Charan

Search Discussions

  • Stack at Feb 5, 2011 at 6:28 am
    Please put up more from that log so we can see more around this failed
    region open. Can you check out the datanode to its side. Does it
    have errors? Is it 'peer' referred to below? (Usually there is the
    address who we are talking to). Pastebin it all. Thanks.
    St.Ack
    On Fri, Feb 4, 2011 at 4:15 PM, charan kumar wrote:
    Hello,

    We are running into a Region server shutdown again during write loads (90
    clients) , with Connection rest by peer issue? Any suggestions..

    Setup: 30 Nodes. Hbase 0.90.0, Hadoop-append , CentOS, dell 1950 6G RAM.

    2011-02-04 02:36:16,808 WARN org.apache.hadoop.hdfs.DFSClient:
    DFSOutputStream ResponseProcessor exception  for block
    blk_-4303650603271778933_2022254java.io.IOException: Connection reset by
    peer
    at sun.nio.ch.FileDispatcher.read0(Native Method)
    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
    at sun.nio.ch.IOUtil.read(IOUtil.java:206)
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
    at
    org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
    at
    org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
    at
    org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
    at
    org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
    at java.io.DataInputStream.readFully(DataInputStream.java:178)
    at java.io.DataInputStream.readLong(DataInputStream.java:399)
    at
    org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:122)
    at
    org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2547)

    2011-02-04 02:36:16,809 WARN org.apache.hadoop.hdfs.DFSClient: Error
    Recovery for block blk_-4303650603271778933_2022254 bad datanode[0]
    10.76.99.115:50010
    2011-02-04 02:36:16,880 FATAL
    org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
    serverName=nafhdi12708mwh.io.askjeeves.info,60020,1296782105691,
    load=(requests=63, regions=165, usedHeap=2274, maxHeap=4070): Failed open of
    daughter
    compresstable,\x074k\xB6\x91\xC6\x98\x87,1296815758006.86cf8a61169de38e7ea72fb01c351eb1.
    java.io.IOException: All datanodes XXXXXXXXXXXX:50010 are bad. Aborting...
    at
    org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2680)
    at
    org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1500(DFSClient.java:2172)
    at
    org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2371)

    hdfs config:

    <property>
    <name>dfs.datanode.max.xcievers</name>
    <value>4096</value>
    </property>

    <property>
    <name>dfs.replication</name>
    <value>3</value>
    </property>

    <property>
    <name>dfs.datanode.du.reserved</name>
    <value>5368709120</value>
    </property>

    <property>
    <name>dfs.datanode.handler.count</name>
    <value>100</value>
    </property>
    <property>
    <name>dfs.namenode.handler.count</name>
    <value>100</value>
    </property>

    <property>
    <name>dfs.datanode.socket.write.timeout</name>
    <value>0</value>

    GC OPTS: -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -Xmn256m
    -XX:CMSInitiatingOccupancyFraction=70

    Thanks,
    Charan

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedFeb 5, '11 at 12:15a
activeFeb 5, '11 at 6:28a
posts2
users2
websitehbase.apache.org

2 users in discussion

Charan kumar: 1 post Stack: 1 post

People

Translate

site design / logo © 2018 Grokbase