Geetings to everyone.
For some time already we are testing HDFS filesystem (without
Map/Reduce) for our cluster setup.
It is mostly OK. I've encountered FreeBSD support issue
that I did workaround by creating custom shell script "stat" command
that mimics Linux behaviour.
Now I've decided to look at reliability/HA options for name node.
As far as I could see, secondary name node should be replaced either
with checkpoint or backup node. Unfortunately, documentation does not
shows any pros or cons of the options. So, anyway, I've decided to go
backup node way.
It was strange for me that default configuration is done with deprecated
secondary name node setup. Also all the scripts are good for secondary
name node and not "new way".
I did next things:
1) I've chosen one node to be my backup node
2) I did create conf/backup file that has my backup node hostname
3) I did replace last line of start-dfs.sh script (the line that were
starting secondary name node) with next line:
"$HADOOP_COMMON_HOME"/bin/hadoop-daemons.sh --config $HADOOP_CONF_DIR
--hosts backup --script "$bin"/hdfs start namenode -backup $nameStartOpt
4) I did add dfs.http.address property to my hdsf-site.xml file. Until I
did so, my backup node could not find my name node
5) I did create name directory on backup host
6) I stopped secondary name node
7) I did try to start my backup node.
Now I am getting java.io.FileNotFoundException:
messages on backup node and 2011-06-07 15:40:51,534 WARN
org.mortbay.log: /getimage: java.io.IOException: GetImage failed.
java.io.IOException: Inconsistent checkpoint fields.
LV = -24 namespaceID = 1842738969 cTime = 0; checkpointTime = 1307458455405.
Expecting respectively: -24; 1842738969; 0; 1307458455406
on name node.
Is it ok, will they sync after some time?
BTW: Is it correct that
http://hadoop.apache.org/common/docs/current/index.html point to 0.20
and not 0.21 documentation?
Best regards, Vitalii Tymchyshyn