Greetings all. I have been observing some interesting problems that
sometimes making hbase start/restart very hard to achieve. Here is a
Power goes out of a rack, and kills some datanodes, and some regionservers.
We power things back on, HDFS reports all datanodes back to normal,
and we cold restart hbase.
Obviously we have some log files in the /hbase/.logs directory on
HDFS. So, when master starts, it scans that dir and attempts to
replay the logs and insert all the data into the region files, so far
Now at some instances, we get this message:
20:47:37,343 WARN org.apache.hadoop.hbase.util.FSUtils: Waited
121173ms for lease recovery on
failed to create file
for DFSClient_hb_m_10.101.7.1:60000_1294029805305 on client
10.101.7.1, because this file is already being created by NN_Recovery
Those messages (in master.log), will spew continuously and hbase will
not start. My understanding that namenode or maybe some datanode is
holding a lease on a file, and master is unable to process it. Left
by itself, the problem will not go away. The only way to resolve it,
is to shutdown the master, do
hadoop fs -cp /hbase/.logs/* /tmp/.logs
hadoop fs -rm /hbase/.logs/*
hadoop fs -mv /tmp/.logs/* /hbase/.logs/
Start master, and things are back to normal (all logs replay, master starts).
So, a question -- is there some sort of HDFS setting (are we hitting a
bug), to instruct the lease to be removed automatically? A timer
maybe? Can master be granted an authority maybe to copy a file into a
new name, and then replay it? It seems silly that master shouldn't be
able to do that, after all, its an hbase log file anyway.
Next, there is this situation:
2011-01-02 20:56:58,219 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_-1736208949609845257_8228359 failed because
recovery from primary datanode 10.101.1.6:50010 failed 6 times.
Pipeline was 10.101.6.1:50010, 10.103.5.8:50010, 10.103.5.6:50010,
10.101.1.6:50010. Marking primary datanode as bad.
Here /hbase/.logs/log_name exists, but the data is missing completely.
It seems this empty file persists after hbase/hdfs crash. The only
solution is to perform the above (cp, rm, mv), or simply delete those
files by hand. Now, is it possible that master would do that?
Master should be able to detect invalid files in the .log/ dir and get
rid of them without operators interaction, is there is some sort of
design element that I am simply missing?