FAQ
Hello,

Any help would be apprecaited..

I have below setup for High Availability

NN1 (active) + ZKFC1
NN2 (standby) + ZKFC2

Then I kill the active node : kill -9 on NN1 process
NN2 stay on standby instead of changing to ACTIVE status

I see below error in the log of Failover Controller FC1 which runs on NN1
The error in the log is infinitely printing

6:45:51.232 PM INFO org.apache.hadoop.ipc.Client

Retrying connect to server: vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 SECONDS)

6:45:51.233 PM WARN org.apache.hadoop.ha.HealthMonitor

Transport-level exception trying to monitor health of NameNode at vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020: Call From vm-F0CD-5B46.nam.nsroot.net/10.49.216.121 to vm-F0CD-5B46.nam.nsroot.net:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused

6:45:53.237 PM INFO org.apache.hadoop.ipc.Client

Retrying connect to server: vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 SECONDS)

6:45:53.237 PM



Thanks

Madhu
WARN org.apache.hadoop.ha.HealthMonitor

Transport-level exception trying to monitor health of NameNode at vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020: Call From vm-F0CD-5B46.nam.nsroot.net/10.49.216.121 to vm-F0CD-5B46.nam.nsroot.net:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused

Search Discussions

  • Vinithra Varadharajan at Mar 13, 2013 at 11:36 pm
    Madhu,

    Which version of CDH are you using? Have you set up QuorumJournal HA or NFS
    HA? It is more interesting to see the logs of NN2 and ZKFC2. Also, is your
    ZK service running without errors?

    -Vinithra
    On Wed, Mar 13, 2013 at 3:53 PM, Madhu M wrote:

    Hello,

    Any help would be apprecaited..

    I have below setup for High Availability

    NN1 (active) + ZKFC1
    NN2 (standby) + ZKFC2

    Then I kill the active node : kill -9 on NN1 process
    NN2 stay on standby instead of changing to ACTIVE status

    I see below error in the log of Failover Controller FC1 which runs on NN1
    The error in the log is infinitely printing

    6:45:51.232 PM INFO org.apache.hadoop.ipc.Client

    Retrying connect to server: vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 SECONDS)

    6:45:51.233 PM WARN org.apache.hadoop.ha.HealthMonitor

    Transport-level exception trying to monitor health of NameNode at vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020: Call From vm-F0CD-5B46.nam.nsroot.net/10.49.216.121 to vm-F0CD-5B46.nam.nsroot.net:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused

    6:45:53.237 PM INFO org.apache.hadoop.ipc.Client

    Retrying connect to server: vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 SECONDS)

    6:45:53.237 PM



    Thanks

    Madhu
    WARN org.apache.hadoop.ha.HealthMonitor

    Transport-level exception trying to monitor health of NameNode at vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020: Call From vm-F0CD-5B46.nam.nsroot.net/10.49.216.121 to vm-F0CD-5B46.nam.nsroot.net:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
  • Madhu M at Mar 13, 2013 at 11:57 pm
    *Version*: 4.1.0 (#361 built by jenkins on 20121023-1750 git:
    c6bb3cddebc9f938f6e8eabb4241df955368f980)

    QuorumJournal HA

    Zookeeper is running without any errors

    Below are last two lines in the ZKFC2 log before i killed the NN1 PID, and
    nothing printed after i killed the NN1. ZKFC2 doesn't even know that NN1
    went down.



    7:28:59.893 PM INFO org.apache.hadoop.ha.ZKFailoverController

    ZK Election indicated that NameNode at vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020 should become standby

    7:29:00.001 PM INFO org.apache.hadoop.ha.ZKFailoverController

    Successfully transitioned NameNode at vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020 to standby state


    On Wednesday, March 13, 2013 6:35:47 PM UTC-5, Vinithra wrote:

    Madhu,

    Which version of CDH are you using? Have you set up QuorumJournal HA or
    NFS HA? It is more interesting to see the logs of NN2 and ZKFC2. Also, is
    your ZK service running without errors?

    -Vinithra

    On Wed, Mar 13, 2013 at 3:53 PM, Madhu M <madhu.m...@gmail.com<javascript:>
    wrote:
    Hello,

    Any help would be apprecaited..

    I have below setup for High Availability

    NN1 (active) + ZKFC1
    NN2 (standby) + ZKFC2

    Then I kill the active node : kill -9 on NN1 process
    NN2 stay on standby instead of changing to ACTIVE status

    I see below error in the log of Failover Controller FC1 which runs on
    NN1
    The error in the log is infinitely printing

    6:45:51.232 PM INFO org.apache.hadoop.ipc.Client

    Retrying connect to server: vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 SECONDS)


    6:45:51.233 PM WARN org.apache.hadoop.ha.HealthMonitor

    Transport-level exception trying to monitor health of NameNode at vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020: Call From vm-F0CD-5B46.nam.nsroot.net/10.49.216.121 to vm-F0CD-5B46.nam.nsroot.net:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused


    6:45:53.237 PM INFO org.apache.hadoop.ipc.Client

    Retrying connect to server: vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 SECONDS)


    6:45:53.237 PM



    Thanks

    Madhu
    WARN org.apache.hadoop.ha.HealthMonitor

    Transport-level exception trying to monitor health of NameNode at vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020: Call From vm-F0CD-5B46.nam.nsroot.net/10.49.216.121 to vm-F0CD-5B46.nam.nsroot.net:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
  • Madhu M at Mar 14, 2013 at 12:08 am
    Vinithra,

    I see below error in NN2 log.

    vm-ddc5-42b3 - NN1
    vm-F0CD-5B46 - NN2

    After Killing the PID of NN1 Active namenode, i see below error in NN2
    log.

    The standby namenode vm-F0CD-5B46 is trying to check the connection with
    vm-ddc5-42b3 and the connection is refused.
    Is that reason why vm-F0CD-5B46 is not able become ACTIVE name node once
    vm-ddc5-42b3(NN1-ACTIVE) goes down ??


    Unable to trigger a roll of the active NN
    java.net.ConnectException: Call From vm-F0CD-5B46.nam.nsroot.net/10.49.216.121 to vm-ddc5-42b3.nam.nsroot.net:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:721)
    at org.apache.hadoop.ipc.Client.call(Client.java:1164)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
    at $Proxy12.rollEditLog(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.rollEditLog(NamenodeProtocolTranslatorPB.java:137)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.triggerActiveLogRoll(EditLogTailer.java:268)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.access$600(EditLogTailer.java:61)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:310)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
    at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:452)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292)
    Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:523)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:488)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:476)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:570)
    at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:220)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1213)
    at org.apache.hadoop.ipc.Client.call(Client.java:1140)
    ... 10 more

    On Wednesday, March 13, 2013 6:57:27 PM UTC-5, Madhu M wrote:

    *Version*: 4.1.0 (#361 built by jenkins on 20121023-1750 git:
    c6bb3cddebc9f938f6e8eabb4241df955368f980)

    QuorumJournal HA

    Zookeeper is running without any errors

    Below are last two lines in the ZKFC2 log before i killed the NN1 PID,
    and nothing printed after i killed the NN1. ZKFC2 doesn't even know that
    NN1 went down.



    7:28:59.893 PM INFO org.apache.hadoop.ha.ZKFailoverController

    ZK Election indicated that NameNode at vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020 should become standby

    7:29:00.001 PM INFO org.apache.hadoop.ha.ZKFailoverController

    Successfully transitioned NameNode at vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020 to standby state


    On Wednesday, March 13, 2013 6:35:47 PM UTC-5, Vinithra wrote:

    Madhu,

    Which version of CDH are you using? Have you set up QuorumJournal HA or
    NFS HA? It is more interesting to see the logs of NN2 and ZKFC2. Also, is
    your ZK service running without errors?

    -Vinithra
    On Wed, Mar 13, 2013 at 3:53 PM, Madhu M wrote:

    Hello,

    Any help would be apprecaited..

    I have below setup for High Availability

    NN1 (active) + ZKFC1
    NN2 (standby) + ZKFC2

    Then I kill the active node : kill -9 on NN1 process
    NN2 stay on standby instead of changing to ACTIVE status

    I see below error in the log of Failover Controller FC1 which runs on
    NN1
    The error in the log is infinitely printing

    6:45:51.232 PM INFO org.apache.hadoop.ipc.Client

    Retrying connect to server: vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 SECONDS)


    6:45:51.233 PM WARN org.apache.hadoop.ha.HealthMonitor

    Transport-level exception trying to monitor health of NameNode at vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020: Call From vm-F0CD-5B46.nam.nsroot.net/10.49.216.121 to vm-F0CD-5B46.nam.nsroot.net:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused


    6:45:53.237 PM INFO org.apache.hadoop.ipc.Client

    Retrying connect to server: vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 SECONDS)


    6:45:53.237 PM



    Thanks

    Madhu
    WARN org.apache.hadoop.ha.HealthMonitor

    Transport-level exception trying to monitor health of NameNode at vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020: Call From vm-F0CD-5B46.nam.nsroot.net/10.49.216.121 to vm-F0CD-5B46.nam.nsroot.net:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
  • Vinithra Varadharajan at Mar 14, 2013 at 12:31 am
    Madhu,

    Please attach the complete logs of NN2 and ZKFC2.
    The active NameNode was killed. So it is expected that NN2 won't be able to
    connect to NN1.

    -Vinithra
    On Wed, Mar 13, 2013 at 5:08 PM, Madhu M wrote:

    Vinithra,

    I see below error in NN2 log.

    vm-ddc5-42b3 - NN1
    vm-F0CD-5B46 - NN2

    After Killing the PID of NN1 Active namenode, i see below error in NN2
    log.

    The standby namenode vm-F0CD-5B46 is trying to check the connection with
    vm-ddc5-42b3 and the connection is refused.
    Is that reason why vm-F0CD-5B46 is not able become ACTIVE name node once
    vm-ddc5-42b3(NN1-ACTIVE) goes down ??


    Unable to trigger a roll of the active NN
    java.net.ConnectException: Call From vm-F0CD-5B46.nam.nsroot.net/10.49.216.121 to vm-ddc5-42b3.nam.nsroot.net:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused

    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:721)
    at org.apache.hadoop.ipc.Client.call(Client.java:1164)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
    at $Proxy12.rollEditLog(Unknown Source)

    at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.rollEditLog(NamenodeProtocolTranslatorPB.java:137)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.triggerActiveLogRoll(EditLogTailer.java:268)

    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.access$600(EditLogTailer.java:61)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:310)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279)

    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
    at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:452)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292)

    Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207)

    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:523)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:488)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:476)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:570)

    at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:220)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1213)
    at org.apache.hadoop.ipc.Client.call(Client.java:1140)
    ... 10 more

    On Wednesday, March 13, 2013 6:57:27 PM UTC-5, Madhu M wrote:

    *Version*: 4.1.0 (#361 built by jenkins on 20121023-1750 git:
    c6bb3cddebc9f938f6e8eabb4241df**955368f980)

    QuorumJournal HA

    Zookeeper is running without any errors

    Below are last two lines in the ZKFC2 log before i killed the NN1 PID,
    and nothing printed after i killed the NN1. ZKFC2 doesn't even know that
    NN1 went down.



    7:28:59.893 PM INFO org.apache.hadoop.ha.**ZKFailoverController

    ZK Election indicated that NameNode at vm-F0CD-5B46.nam.nsroot.net/**10.49.216.121:8020 <http://vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020> should become standby

    7:29:00.001 PM INFO org.apache.hadoop.ha.**ZKFailoverController

    Successfully transitioned NameNode at vm-F0CD-5B46.nam.nsroot.net/**10.49.216.121:8020 <http://vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020> to standby state


    On Wednesday, March 13, 2013 6:35:47 PM UTC-5, Vinithra wrote:

    Madhu,

    Which version of CDH are you using? Have you set up QuorumJournal HA or
    NFS HA? It is more interesting to see the logs of NN2 and ZKFC2. Also, is
    your ZK service running without errors?

    -Vinithra
    On Wed, Mar 13, 2013 at 3:53 PM, Madhu M wrote:

    Hello,

    Any help would be apprecaited..

    I have below setup for High Availability

    NN1 (active) + ZKFC1
    NN2 (standby) + ZKFC2

    Then I kill the active node : kill -9 on NN1 process
    NN2 stay on standby instead of changing to ACTIVE status

    I see below error in the log of Failover Controller FC1 which runs on
    NN1
    The error in the log is infinitely printing

    6:45:51.232 PM INFO org.apache.hadoop.ipc.Client

    Retrying connect to server: vm-F0CD-5B46.nam.nsroot.net/**10.49.216.121:8020 <http://vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020>. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixed**Sleep(maxRetries=1, sleepTime=1 SECONDS)



    6:45:51.233 PM WARN org.apache.hadoop.ha.**HealthMonitor

    Transport-level exception trying to monitor health of NameNode at vm-F0CD-5B46.nam.nsroot.net/**10.49.216.121:8020 <http://vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020>: Call From vm-F0CD-5B46.nam.nsroot.net/**10.49.216.121 <http://vm-F0CD-5B46.nam.nsroot.net/10.49.216.121> to vm-F0CD-5B46.nam.nsroot.net:**8020 <http://vm-F0CD-5B46.nam.nsroot.net:8020> failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/**ConnectionRefused <http://wiki.apache.org/hadoop/ConnectionRefused>



    6:45:53.237 PM INFO org.apache.hadoop.ipc.Client

    Retrying connect to server: vm-F0CD-5B46.nam.nsroot.net/**10.49.216.121:8020 <http://vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020>. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixed**Sleep(maxRetries=1, sleepTime=1 SECONDS)



    6:45:53.237 PM



    Thanks

    Madhu
    WARN org.apache.hadoop.ha.**HealthMonitor

    Transport-level exception trying to monitor health of NameNode at vm-F0CD-5B46.nam.nsroot.net/**10.49.216.121:8020 <http://vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020>: Call From vm-F0CD-5B46.nam.nsroot.net/**10.49.216.121 <http://vm-F0CD-5B46.nam.nsroot.net/10.49.216.121> to vm-F0CD-5B46.nam.nsroot.net:**8020 <http://vm-F0CD-5B46.nam.nsroot.net:8020> failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/**ConnectionRefused <http://wiki.apache.org/hadoop/ConnectionRefused>

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupscm-users @
categorieshadoop
postedMar 13, '13 at 10:53p
activeMar 14, '13 at 12:31a
posts5
users2
websitecloudera.com
irc#hadoop

2 users in discussion

Madhu M: 3 posts Vinithra Varadharajan: 2 posts

People

Translate

site design / logo © 2022 Grokbase