FAQ
What document are you looking at?

It would be worth your time to pause and review the documentation here
on HDFS namenode HA using the quorum journal manager, so you are able to
ask more specific questions once you have a better understanding

http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html

Also, it might be worth setting up the free version of Cloudera Manager
and using it to configure the cluster. You can use it to configure
these thing (NN HA, and Auto Failover) in a small test cluster. Once
you have a better understanding of the service relationships by
examining what it sets up for you (zookeper, journal nodes, Active and
Standby NameNode, etc), you can go back to manual configuration over CDH
with more understanding, if you really need to do things manually.

Also consider reviewing some of the free online training content we have
available so you are more confident with the stack of components and
using cloudera manager to set up and configure the services.

http://university.cloudera.com/onlineresources/clouderamanager

Todd
On 10/3/13 8:50 PM, Siddharth Tiwari wrote:
There is no description of it in the document Kumar, canyou help with
it ? I am installing things using tarballs. Can you guide me which one
to use and what to configure ?

On Thursday, 3 October 2013 18:17:46 UTC-7, kumar y wrote:

Do you have zkfc (zookeeper fail over controller ) process running
on both Namenodes ? Zkfc is needed for auto failover to work.


On Thu, Oct 3, 2013 at 6:07 PM, Siddharth Tiwari
<siddha...@gmail.com <javascript:>> wrote:

Hi Kumar,

Now I can see both my namenodes and can manually do failover,
but automatic failover doesnt work. They both stay in standby
mode, can you help ? I have configured zookeeper and followed
the documentation.


On Wednesday, 2 October 2013 23:27:41 UTC-7, kumar y wrote:

Have you ran this "sudo -u hdfs hdfs namenode
-bootstrapStandby" on standby namenode ?

enter the nn1 in safemode and then run the above command
on nn2. If this doesnt work I would even suggest stop n1
and tar up namenode meta dir and extract it on nn2 for
first time and then start nn1 and nn2.


On Wed, Oct 2, 2013 at 11:24 PM, Siddharth Tiwari
wrote:

Thanks Kumar, Finally it got resolved, but when I
start standby I get this error :-

2013-10-02 23:20:28,543 FATAL
org.apache.hadoop.hdfs.server.namenode.NameNode:
Exception in namenode join
java.io.FileNotFoundException: No valid image files found
at
org.apache.hadoop.hdfs.server.namenode.FSImageTransactionalStorageInspector.getLatestImages(FSImageTransactionalStorageInspector.java:144)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:610)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:274)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:639)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:476)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:403)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:437)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:613)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:598)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1169)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1233)
2013-10-02 23:20:28,546 INFO
org.apache.hadoop.util.ExitUtil: Exiting with status 1
2013-10-02 23:20:28,547 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode:
SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at
node1.sidbd.local/192.168.147.101 <http://192.168.147.101>
************************************************************/


On Wednesday, 2 October 2013 21:27:31 UTC-7, kumar y
wrote:

oh then here is the link to CDH docs...

http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-High-Availability-Guide/cdh4hag_topic_2_1.html
<http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-High-Availability-Guide/cdh4hag_topic_2_1.html>


On Wed, Oct 2, 2013 at 9:22 PM, Siddharth Tiwari
wrote:

Thank Kumar,

I am not upgrading, I am setting up a new
cluster with this configuration. Will this doc
work for it ?


On Wednesday, 2 October 2013 20:21:07 UTC-7,
kumar y wrote:

please follow this document...

http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/PDF/CDH4-High-Availability-Guide.pdf
<http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/PDF/CDH4-High-Availability-Guide.pdf>

and can post here if you face any issues
during upgrade process


On Wed, Oct 2, 2013 at 8:18 PM, Siddharth
Tiwari wrote:

Hi Kumar,

Thanks a lot for these steps.
How do I go with the second step ? I
think I am doing something wrong there :(


On Wednesday, 2 October 2013 19:06:57
UTC-7, kumar y wrote:

please correct the below property
in your hdfs-site...change
mycluster to sidcluster ( this
should not matter for NN to come
up, this will be used by dfs
clients to know the active NN )

1.
<property>
2.
<name>dfs.client.failover.proxy.provider.*mycluste*r</name>
3.
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
4.
</property>



also please make sure that
node1.sidbd.cluster and
node2.sidbd.cluster can be
resolved to a valid ip.

These are the order of the steps
that i usually follow to enable NN
HA...

1) stop the namenode and add all
the new HA related properties to
config files ( core-site and
hdfs-site.xml)
2) install journal nodes and start
them
3) start the primary namenode and
then run sudo -u hdfs hdfs
namenode -initializeSharedEdits
4) push the same config to the
standby namenode and then run sudo
-u hdfs hdfs namenode
-bootstrapStandby ( only for first
time so it will get metadata from
primary NN)
5) start the standby namenode, if
the automatic failover is not
enabled then you need to manually
make one NN active.



On Wed, Oct 2, 2013 at 5:34 PM,
Siddharth Tiwari
wrote:

Sure Kumar, here are the links:-

http://pastebin.com/embed_js.php?i=LZ1W8hAu
<http://pastebin.com/embed_js.php?i=LZ1W8hAu> --
core-site.xml

http://pastebin.com/embed_js.php?i=af0tND7k
<http://pastebin.com/embed_js.php?i=af0tND7k>
-- hdfs-site.xml

please help


On Wednesday, 2 October 2013
16:52:48 UTC-7, kumar y wrote:

HI Siddharth,

can you pastebin your
core-site and hdfs-site.xml ?


On Wed, Oct 2, 2013 at
4:43 PM, Siddharth Tiwari
wrote:

2013-10-02
16:41:02,669 INFO
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager:
dfs.block.invalidate.limit=1000
2013-10-02
16:41:02,678 INFO
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager:
dfs.block.access.token.enable=false
2013-10-02
16:41:02,678 INFO
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager:
defaultReplication
= 3
2013-10-02
16:41:02,679 INFO
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager:
maxReplication
= 512
2013-10-02
16:41:02,679 INFO
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager:
minReplication
= 1
2013-10-02
16:41:02,679 INFO
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager:
maxReplicationStreams
= 2
2013-10-02
16:41:02,679 INFO
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager:
shouldCheckForEnoughRacks
= false
2013-10-02
16:41:02,679 INFO
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager:
replicationRecheckInterval
= 3000
2013-10-02
16:41:02,679 INFO
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager:
encryptDataTransfer
= false
2013-10-02
16:41:02,679 INFO
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager:
maxNumBlocksToLog
= 1000
2013-10-02
16:41:02,684 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
fsOwner = hadoop
(auth:SIMPLE)
2013-10-02
16:41:02,684 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
supergroup =
supergroup
2013-10-02
16:41:02,684 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isPermissionEnabled = true
2013-10-02
16:41:12,436 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
Determined nameservice
ID: sidcluster
2013-10-02
16:41:12,436 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
HA Enabled: true
2013-10-02
16:41:12,447 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
Append Enabled: true
2013-10-02
16:41:12,609 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode:
Caching file names
occuring more than 10
times
2013-10-02
16:41:12,611 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
dfs.namenode.safemode.threshold-pct
= 0.9990000128746033
2013-10-02
16:41:12,611 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
dfs.namenode.safemode.min.datanodes
= 0
2013-10-02
16:41:12,611 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
dfs.namenode.safemode.extension
= 30000
2013-10-02
16:41:32,657 INFO
org.apache.hadoop.hdfs.server.common.Storage:
Lock on
/opt/hadoop-data/hdfs/namenode/in_use.lock
acquired by nodename
34585@node1.sidbd.local
2013-10-02
16:41:42,976 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage:
No edit log streams
selected.
2013-10-02
16:41:42,984 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage:
Loading image file
/opt/hadoop-data/hdfs/namenode/current/fsimage_0000000000000000000
using no compression
2013-10-02
16:41:42,984 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage:
Number of files = 1
2013-10-02
16:41:42,985 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage:
Number of files under
construction = 0
2013-10-02
16:41:42,985 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage:
Image file of size 121
loaded in 0 seconds.
2013-10-02
16:41:42,986 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage:
Loaded image for txid
0 from
/opt/hadoop-data/hdfs/namenode/current/fsimage_0000000000000000000
2013-10-02
16:41:43,004 INFO
org.apache.hadoop.hdfs.server.namenode.NameCache:
initialized with 0
entries 0 lookups
2013-10-02
16:41:43,005 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
Finished loading
FSImage in 30394 msecs
2013-10-02
16:41:43,092 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
Stopping services
started for standby state
2013-10-02
16:41:43,195 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
Stopping services
started for active state
2013-10-02
16:41:43,195 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
Stopping services
started for standby state
2013-10-02
16:41:43,195 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
Stopping services
started for standby state
2013-10-02
16:41:43,195 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
Stopping NameNode
metrics system...
2013-10-02
16:41:43,196 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
NameNode metrics
system stopped.
2013-10-02
16:41:43,196 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
NameNode metrics
system shutdown complete.
2013-10-02
16:41:43,197 FATAL
org.apache.hadoop.hdfs.server.namenode.NameNode:
Exception in namenode join
java.io.IOException:
Failed on local
exception:
java.net.SocketException:
Unresolved address;
Host Details : local
host is: "sidcluster";
destination host is:
(unknown):0;
at
org.apache.hadoop.net
<http://org.apache.hadoop.net>.NetUtils.wrapException(NetUtils.java:763)
at
org.apache.hadoop.ipc.Server.bind(Server.java:403)
at
org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:501)
at
org.apache.hadoop.ipc.Server.<init>(Server.java:1893)
at
org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:970)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server.<init>(ProtobufRpcEngine.java:375)
at
org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:350)
at
org.apache.hadoop.ipc.RPC.getServer(RPC.java:695)
at
org.apache.hadoop.ipc.RPC.getServer(RPC.java:684)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.<init>(NameNodeRpcServer.java:247)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.java:460)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:439)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:613)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:598)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1169)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1233)
Caused by:
java.net.SocketException:
Unresolved address
at sun.nio.ch.Net
<http://sun.nio.ch.Net>.translateToSocketException(Net.java:157)
at sun.nio.ch.Net
<http://sun.nio.ch.Net>.translateException(Net.java:183)
at sun.nio.ch.Net
<http://sun.nio.ch.Net>.translateException(Net.java:189)
at sun.nio.ch
<http://sun.nio.ch>.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:76)
at
org.apache.hadoop.ipc.Server.bind(Server.java:386)
... 14 more
Caused by:
java.nio.channels.UnresolvedAddressException
at
sun.nio.ch.Net.checkAddress(Net.java:127)
at sun.nio.ch
<http://sun.nio.ch>.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:208)
at sun.nio.ch
<http://sun.nio.ch>.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
... 15 more
2013-10-02
16:41:43,200 INFO
org.apache.hadoop.util.ExitUtil:
Exiting with status 1
2013-10-02
16:41:43,202 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode:
SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting
down NameNode at
node1.sidbd.local/192.168.147.101
<http://192.168.147.101>



Please help with this
error.
--

---
You received this
message because you
are subscribed to the
Google Groups "CDH
Users" group.
To unsubscribe from
this group and stop
receiving emails from
it, send an email to
cdh-user+u...@cloudera.org.


For more options,
visit
https://groups.google.com/a/cloudera.org/groups/opt_out
<https://groups.google.com/a/cloudera.org/groups/opt_out>.


--

---
You received this message
because you are subscribed to
the Google Groups "CDH Users"
group.
To unsubscribe from this group
and stop receiving emails from
it, send an email to
cdh-user+u...@cloudera.org.
For more options, visit
https://groups.google.com/a/cloudera.org/groups/opt_out
<https://groups.google.com/a/cloudera.org/groups/opt_out>.


--

---
You received this message because you
are subscribed to the Google Groups
"CDH Users" group.
To unsubscribe from this group and
stop receiving emails from it, send an
email to cdh-user+u...@cloudera.org.
For more options, visit
https://groups.google.com/a/cloudera.org/groups/opt_out
<https://groups.google.com/a/cloudera.org/groups/opt_out>.


--

---
You received this message because you are
subscribed to the Google Groups "CDH Users" group.
To unsubscribe from this group and stop
receiving emails from it, send an email to
cdh-user+u...@cloudera.org.
For more options, visit
https://groups.google.com/a/cloudera.org/groups/opt_out
<https://groups.google.com/a/cloudera.org/groups/opt_out>.


--

---
You received this message because you are subscribed
to the Google Groups "CDH Users" group.
To unsubscribe from this group and stop receiving
emails from it, send an email to
cdh-user+u...@cloudera.org.
For more options, visit
https://groups.google.com/a/cloudera.org/groups/opt_out <https://groups.google.com/a/cloudera.org/groups/opt_out>.


--

---
You received this message because you are subscribed to the
Google Groups "CDH Users" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to cdh-user+u...@cloudera.org <javascript:>.
For more options, visit
https://groups.google.com/a/cloudera.org/groups/opt_out
<https://groups.google.com/a/cloudera.org/groups/opt_out>.


--

---
You received this message because you are subscribed to the Google
Groups "CDH Users" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to cdh-user+unsubscribe@cloudera.org.
For more options, visit
https://groups.google.com/a/cloudera.org/groups/opt_out.
To unsubscribe from this group and stop receiving emails from it, send an email to scm-users+unsubscribe@cloudera.org.

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupscm-users @
categorieshadoop
postedOct 4, '13 at 4:56a
activeOct 4, '13 at 4:56a
posts1
users1
websitecloudera.com
irc#hadoop

1 user in discussion

Todd Grayson: 1 post

People

Translate

site design / logo © 2022 Grokbase