c6bb3cddebc9f938f6e8eabb4241df955368f980)
QuorumJournal HA
Zookeeper is running without any errors
Below are last two lines in the ZKFC2 log before i killed the NN1 PID, and
nothing printed after i killed the NN1. ZKFC2 doesn't even know that NN1
went down.
7:28:59.893 PM INFO org.apache.hadoop.ha.ZKFailoverController
ZK Election indicated that NameNode at vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020 should become standby
7:29:00.001 PM INFO org.apache.hadoop.ha.ZKFailoverController
Successfully transitioned NameNode at vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020 to standby state
On Wednesday, March 13, 2013 6:35:47 PM UTC-5, Vinithra wrote:
Madhu,
Which version of CDH are you using? Have you set up QuorumJournal HA or
NFS HA? It is more interesting to see the logs of NN2 and ZKFC2. Also, is
your ZK service running without errors?
-Vinithra
On Wed, Mar 13, 2013 at 3:53 PM, Madhu M <madhu.m...@gmail.com<javascript:>
Madhu,
Which version of CDH are you using? Have you set up QuorumJournal HA or
NFS HA? It is more interesting to see the logs of NN2 and ZKFC2. Also, is
your ZK service running without errors?
-Vinithra
On Wed, Mar 13, 2013 at 3:53 PM, Madhu M <madhu.m...@gmail.com<javascript:>
wrote:
Hello,
Any help would be apprecaited..
I have below setup for High Availability
NN1 (active) + ZKFC1
NN2 (standby) + ZKFC2
Then I kill the active node : kill -9 on NN1 process
NN2 stay on standby instead of changing to ACTIVE status
I see below error in the log of Failover Controller FC1 which runs on
NN1
The error in the log is infinitely printing
6:45:51.232 PM INFO org.apache.hadoop.ipc.Client
Retrying connect to server: vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 SECONDS)
6:45:51.233 PM WARN org.apache.hadoop.ha.HealthMonitor
Transport-level exception trying to monitor health of NameNode at vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020: Call From vm-F0CD-5B46.nam.nsroot.net/10.49.216.121 to vm-F0CD-5B46.nam.nsroot.net:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
6:45:53.237 PM INFO org.apache.hadoop.ipc.Client
Retrying connect to server: vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 SECONDS)
6:45:53.237 PM
Thanks
Madhu
WARN org.apache.hadoop.ha.HealthMonitor
Transport-level exception trying to monitor health of NameNode at vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020: Call From vm-F0CD-5B46.nam.nsroot.net/10.49.216.121 to vm-F0CD-5B46.nam.nsroot.net:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
Hello,
Any help would be apprecaited..
I have below setup for High Availability
NN1 (active) + ZKFC1
NN2 (standby) + ZKFC2
Then I kill the active node : kill -9 on NN1 process
NN2 stay on standby instead of changing to ACTIVE status
I see below error in the log of Failover Controller FC1 which runs on
NN1
The error in the log is infinitely printing
6:45:51.232 PM INFO org.apache.hadoop.ipc.Client
Retrying connect to server: vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 SECONDS)
6:45:51.233 PM WARN org.apache.hadoop.ha.HealthMonitor
Transport-level exception trying to monitor health of NameNode at vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020: Call From vm-F0CD-5B46.nam.nsroot.net/10.49.216.121 to vm-F0CD-5B46.nam.nsroot.net:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
6:45:53.237 PM INFO org.apache.hadoop.ipc.Client
Retrying connect to server: vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 SECONDS)
6:45:53.237 PM
Thanks
Madhu
WARN org.apache.hadoop.ha.HealthMonitor
Transport-level exception trying to monitor health of NameNode at vm-F0CD-5B46.nam.nsroot.net/10.49.216.121:8020: Call From vm-F0CD-5B46.nam.nsroot.net/10.49.216.121 to vm-F0CD-5B46.nam.nsroot.net:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused