The summary is quite inaccurate.
On Mon, Feb 14, 2011 at 8:48 AM, Mark Kerzner wrote:
Hi,
is it accurate to say that
- In 0.20 the Secondary NameNode acts as a cold spare; it can be used to
recreate the HDFS if the Primary NameNode fails, but with the delay of
minutes if not hours, and there is also some data loss;
The Secondary NN is not a spare. It is used to augment the work of the
Primary, by offloading some of its work to another machine. The work
offloaded is "log rollup" or "checkpointing". This has been a source of
constant confusion (some named it incorrectly as a "secondary" and now we
are stuck with it).
The Secondary NN certainly cannot take over for the Primary. It is not its
purpose.
Yes, there is data loss.
- in 0.21 there are streaming edits to a Backup Node (HADOOP-4539), which
replaces the Secondary NameNode. The Backup Node can be used as a warm
spare, with the failover being a matter of seconds. There can be multiple
Backup Nodes, for additional insurance against failure, and previous best
common practices apply to it;
There is no "Backup NN" in the manner you are thinking of. It is completely
manual, and requires restart of the "whole world", and takes about 2-3 hours
to happen. If you are lucky, you may have only a little data loss (people
have lost entire clusters due to this -- from what I understand, you are far
better off resurrecting the Primary instead of trying to bring up a Backup
NN).
In any case, when you run it like you mention above, you will have to
(a) make sure that the primary is dead
(b) edit hdfs-site.xml on *every* datanode to point to the new IP address of
the backup, and restart each datanode.
(c) wait for 2-3 hours for all the block-reports from every restarted DN to
finish
2-3 hrs afterwards:
(d) after that, restart all TT and the JT to connect to the new NN
(e) finally, restart all the clients (eg, HBase, Oozie, etc)
Many companies, including Yahoo! and Facebook, use a couple of NetApp filers
to hold the actual data that the NN writes. The two NetApp filers are run in
"HA" mode with NVRAM copying. But the NN remains a single point of failure,
and there is probably some data loss.
See Dhruba's blog-post about the Avatar NN + some custom "stackable HDFS"
code on all the clients + Zookeeper + the dual NetApp filers.
It helps Facebook do manual, controlled, fail-over during software upgrades,
at the cost of some performance loss on the DataNodes (the DataNodes have to
do 2x block reports, and each block-report is expensive, so it limits the
DataNode a bit). The article does not talk about dataloss when the
fail-over is initiated manually, so I don't know about that.
http://hadoopblog.blogspot.com/2010/02/hadoop-namenode-high-availability.htmlThank you. Sincerely,
Mark