For big-large clusters, it helps if the NN waits for N seconds after
the threshold percentage being satisfied (minimum # of replicas of
file's blocks being available) so that other DNs get some extra time
to report in their blocks as well and help ease the initial client
load the cluster receives. This is where the extension comes useful at
(certainly tunable to a more suitable value).
For small clusters (single rack or so) you can probably make it 0 to
shed off the extra wait.
However, if you're ever working with NN recovery stuff (one reason the
NN is down, due to), I recommend setting the threshold itself to >
1.1f to make sure the NN doesn't auto-exit safemode until you're sure
that the new inode/block counts are alright and you haven't made any
mistakes with the recovery process. You can then exit safemode
manually when sure. In safemode, the NN does not issue block
deletions, so data loss would not occur out of mistakes made (such as
starting with an old copy of fsimage accidentally, etc.)
On Fri, Sep 21, 2012 at 1:47 PM, Bertrand Dechoux wrote:
I would like to know the relevance of dfs.safemode.extension.
Why would someone wait after leaving the safemode?
Why is it recommended not to set it to 0 instead of 30000 (30 seconds)?