FAQ
Madhu,

Please attach the complete logs of NN2 and ZKFC2.
The active NameNode was killed. So it is expected that NN2 won't be able to
connect to NN1.

-Vinithra
On Wed, Mar 13, 2013 at 5:08 PM, Madhu M wrote:

Vinithra,

I see below error in NN2 log.

vm-ddc5-42b3 - NN1
vm-F0CD-5B46 - NN2

After Killing the PID of NN1 Active namenode, i see below error in NN2
log.

The standby namenode vm-F0CD-5B46 is trying to check the connection with
vm-ddc5-42b3 and the connection is refused.
Is that reason why vm-F0CD-5B46 is not able become ACTIVE name node once
vm-ddc5-42b3(NN1-ACTIVE) goes down ??


Unable to trigger a roll of the active NN
java.net.ConnectException: Call From vm-F0CD-5B46.nam.nsroot.net/10.49.216.121 to vm-ddc5-42b3.nam.nsroot.net:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused

at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:721)
at org.apache.hadoop.ipc.Client.call(Client.java:1164)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy12.rollEditLog(Unknown Source)

at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.rollEditLog(NamenodeProtocolTranslatorPB.java:137)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.triggerActiveLogRoll(EditLogTailer.java:268)

at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.access$600(EditLogTailer.java:61)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:310)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279)

at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:452)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292)

Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207)

at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:523)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:488)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:476)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:570)

at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:220)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1213)
at org.apache.hadoop.ipc.Client.call(Client.java:1140)
... 10 more

On Wednesday, March 13, 2013 6:57:27 PM UTC-5, Madhu M wrote:

*Version*: 4.1.0 (#361 built by jenkins on 20121023-1750 git:
c6bb3cddebc9f938f6e8eabb4241df**955368f980)

QuorumJournal HA

Zookeeper is running without any errors

Below are last two lines in the ZKFC2 log before i killed the NN1 PID,
and nothing printed after i killed the NN1. ZKFC2 doesn't even know that
NN1 went down.



7:28:59.893 PM INFO org.apache.hadoop.ha.**ZKFailoverController

ZK Election indicated that NameNode at vm-F0CD-5B46.nam.nsroot.net/**10.49.216.121:8020 <http://vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020> should become standby

7:29:00.001 PM INFO org.apache.hadoop.ha.**ZKFailoverController

Successfully transitioned NameNode at vm-F0CD-5B46.nam.nsroot.net/**10.49.216.121:8020 <http://vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020> to standby state


On Wednesday, March 13, 2013 6:35:47 PM UTC-5, Vinithra wrote:

Madhu,

Which version of CDH are you using? Have you set up QuorumJournal HA or
NFS HA? It is more interesting to see the logs of NN2 and ZKFC2. Also, is
your ZK service running without errors?

-Vinithra
On Wed, Mar 13, 2013 at 3:53 PM, Madhu M wrote:

Hello,

Any help would be apprecaited..

I have below setup for High Availability

NN1 (active) + ZKFC1
NN2 (standby) + ZKFC2

Then I kill the active node : kill -9 on NN1 process
NN2 stay on standby instead of changing to ACTIVE status

I see below error in the log of Failover Controller FC1 which runs on
NN1
The error in the log is infinitely printing

6:45:51.232 PM INFO org.apache.hadoop.ipc.Client

Retrying connect to server: vm-F0CD-5B46.nam.nsroot.net/**10.49.216.121:8020 <http://vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020>. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixed**Sleep(maxRetries=1, sleepTime=1 SECONDS)



6:45:51.233 PM WARN org.apache.hadoop.ha.**HealthMonitor

Transport-level exception trying to monitor health of NameNode at vm-F0CD-5B46.nam.nsroot.net/**10.49.216.121:8020 <http://vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020>: Call From vm-F0CD-5B46.nam.nsroot.net/**10.49.216.121 <http://vm-F0CD-5B46.nam.nsroot.net/10.49.216.121> to vm-F0CD-5B46.nam.nsroot.net:**8020 <http://vm-F0CD-5B46.nam.nsroot.net:8020> failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/**ConnectionRefused <http://wiki.apache.org/hadoop/ConnectionRefused>



6:45:53.237 PM INFO org.apache.hadoop.ipc.Client

Retrying connect to server: vm-F0CD-5B46.nam.nsroot.net/**10.49.216.121:8020 <http://vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020>. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixed**Sleep(maxRetries=1, sleepTime=1 SECONDS)



6:45:53.237 PM



Thanks

Madhu
WARN org.apache.hadoop.ha.**HealthMonitor

Transport-level exception trying to monitor health of NameNode at vm-F0CD-5B46.nam.nsroot.net/**10.49.216.121:8020 <http://vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020>: Call From vm-F0CD-5B46.nam.nsroot.net/**10.49.216.121 <http://vm-F0CD-5B46.nam.nsroot.net/10.49.216.121> to vm-F0CD-5B46.nam.nsroot.net:**8020 <http://vm-F0CD-5B46.nam.nsroot.net:8020> failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/**ConnectionRefused <http://wiki.apache.org/hadoop/ConnectionRefused>

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 5 of 5 | next ›
Discussion Overview
groupscm-users @
categorieshadoop
postedMar 13, '13 at 10:53p
activeMar 14, '13 at 12:31a
posts5
users2
websitecloudera.com
irc#hadoop

2 users in discussion

Madhu M: 3 posts Vinithra Varadharajan: 2 posts

People

Translate

site design / logo © 2022 Grokbase