More information if anyone come across the same issue...
----------------------
vm-ddc5-42b3 - NN1
vm-F0CD-5B46 - NN2
After Killing the PID of NN1 Active namenode, i see below error in NN2
log.
The standby namenode vm-F0CD-5B46 is trying to check the connection with
vm-ddc5-42b3 and the connection is refused.
Is that reason why vm-F0CD-5B46 is not able become ACTIVE name node once
vm-ddc5-42b3(NN1-ACTIVE) goes down ??
Unable to trigger a roll of the active NN
java.net.ConnectException: Call From vm-F0CD-5B46.nam.nsroot.net/10.49.216.121 to vm-ddc5-42b3.nam.nsroot.net:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefusedat org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:721)
at org.apache.hadoop.ipc.Client.call(Client.java:1164)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy12.rollEditLog(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.rollEditLog(NamenodeProtocolTranslatorPB.java:137)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.triggerActiveLogRoll(EditLogTailer.java:268)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.access$600(EditLogTailer.java:61)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:310)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:452)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:523)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:488)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:476)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:570)
at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:220)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1213)
at org.apache.hadoop.ipc.Client.call(Client.java:1140)
... 10 more
-------MADHU
On Wednesday, March 13, 2013 5:58:08 PM UTC-5, Madhu M wrote:Hello,
Any help would be apprecaited..
I have below setup for High Availability
NN1 (active) + ZKFC1
NN2 (standby) + ZKFC2
Then I kill the active node : kill -9 on NN1 process
NN2 stay on standby instead of changing to ACTIVE status
I see below error in the log of Failover Controller FC1 which runs on NN1
The error in the log is infinitely printing
6:45:51.232 PM INFO org.apache.hadoop.ipc.Client
Retrying connect to server: vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020 <
http://vm-f0cd-5b46.nam.nsroot.net/10.49.216.121:8020>. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 SECONDS)
6:45:51.233 PM WARN org.apache.hadoop.ha.HealthMonitor
Transport-level exception trying to monitor health of NameNode at vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020 <
http://vm-f0cd-5b46.nam.nsroot.net/10.49.216.121:8020>: Call From vm-F0CD-5B46.nam.nsroot.net/10.49.216.121 <
http://vm-f0cd-5b46.nam.nsroot.net/10.49.216.121> to vm-F0CD-5B46.nam.nsroot.net:8020 <
http://vm-f0cd-5b46.nam.nsroot.net:8020/> failed on connection exception: java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused6:45:53.237 PM INFO org.apache.hadoop.ipc.Client
Retrying connect to server: vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020 <
http://vm-f0cd-5b46.nam.nsroot.net/10.49.216.121:8020>. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 SECONDS)
6:45:53.237 PM
Thanks
Madhu
WARN org
--