FAQ
I modify the value of "dfs.safemode.threshold.pct" to zero, now everything is ok.
log file as below
But there are still three questions

1.. Can I regain percentage of blocks that should satisfy the minimal replication requirement
to 99.9%? hadoop balancer? For I feel it will be more safe.

2. I set "dfs.safemode.threshold.pct" to "0" or "0f", two value both work, but which one is
better? I guess "0"

3. When HDFS start up in safe mode, the log file should show
"The reported blocks 0 needs additional 2 blocks to reach the threshold 0.9990 of total blocks 3. Safe mode will 'not' be turned off automatically."
There miss a word "not" , right?

Ring

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at computeb-05.pcm/172.172.2.6

************************************************************/

2011-04-08 16:33:37,312 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG: host = computeb-05.pcm/172.172.2.6

STARTUP_MSG: args = []

STARTUP_MSG: version = 0.20.2-CDH3B4

STARTUP_MSG: build = -r 3aa7c91592ea1c53f3a913a581dbfcdfebe98bfe; compiled by 'root' on Mon Feb 21 17:31:12 EST 2011

************************************************************/

2011-04-08 16:33:37,441 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null

2011-04-08 16:33:37,443 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext

2011-04-08 16:33:37,464 INFO org.apache.hadoop.hdfs.util.GSet: VM type = 32-bit

2011-04-08 16:........................................


2011-04-08 16:33:37,832 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of under-replicated blocks = 4

2011-04-08 16:33:37,832 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of over-replicated blocks = 0

2011-04-08 16:33:37,832 INFO org.apache.hadoop.hdfs.StateChange: STATE* Leaving safe mode after 0 secs.

2011-04-08 16:33:37,832 INFO org.apache.hadoop.hdfs.StateChange: STATE* Network topology has 0 racks and 0 datanodes

2011-04-08 16:33:37,832 INFO org.apache.hadoop.hdfs.StateChange: STATE* UnderReplicatedBlocks has 4 blocks

2011-04-08 16:33:37,835 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list

2011-04-08 16:33:37,849 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 9100





----- Original Message -----
From: "springring" <springring@126.com>
To: <common-user@hadoop.apache.org>
Cc: <hdfs-dev@hadoop.apache.org>
Sent: Friday, April 08, 2011 3:45 PM
Subject: Re:HDFS start-up with safe mode?

Hi,

I guess that something about "threshold 0.9990". When HDFS start up,
it come in safe mode first, then check a value(I don't know what value or percent?)
of my hadoop,and fine the value below 99.9%, so the safe mode will not turn off?

but the conclusion of the log file is "Safe mode will be turned off automatically"?

I'm lost.
___________________________________________________
2011-04-08 11:58:21,036 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe mode ON.
The reported blocks 0 needs additional 2 blocks to reach the threshold 0.9990 of total blocks 3. Safe mode will be turned off automatically.
________________________________________________________________________

----- Original Message -----
From: "springring" <springring@126.com>
To: <common-user@hadoop.apache.org>
Sent: Friday, April 08, 2011 2:20 PM
Subject: Fw: start-up with safe mode?

Hi,

When I start up hadoop, the namenode log show "STATE* Safe mode ON" like that , how to set it off?
I can set it off with command "hadoop fs -dfsadmin leave" after start up, but how can I just start HDFS
out of Safe mode?
Thanks.

Ring

the startup log________________________________________________________________

2011-04-08 11:58:20,655 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null
2011-04-08 11:58:20,657 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
2011-04-08 11:58:20,678 INFO org.apache.hadoop.hdfs.util.GSet: VM type = 32-bit
2011-04-08 11:58:20,678 INFO org.apache.hadoop.hdfs.util.GSet: 2% max memory = 17.77875 MB
2011-04-08 11:58:20,678 INFO org.apache.hadoop.hdfs.util.GSet: capacity = 2^22 = 4194304 entries
2011-04-08 11:58:20,678 INFO org.apache.hadoop.hdfs.util.GSet: recommended=4194304, actual=4194304
2011-04-08 11:58:20,697 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hdfs
2011-04-08 11:58:20,697 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2011-04-08 11:58:20,697 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true
2011-04-08 11:58:20,701 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: dfs.block.invalidate.limit=1000
2011-04-08 11:58:20,701 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
2011-04-08 11:58:20,976 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext
2011-04-08 11:58:21,001 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 17
2011-04-08 11:58:21,007 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 0
2011-04-08 11:58:21,007 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 1529 loaded in 0 seconds.
2011-04-08 11:58:21,007 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file /tmp/hadoop-hdfs/dfs/name/current/edits of size 4 edits # 0 loaded in 0 seconds.
2011-04-08 11:58:21,009 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 1529 saved in 0 seconds.
2011-04-08 11:58:21,022 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 1529 saved in 0 seconds.
2011-04-08 11:58:21,032 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading FSImage in 339 msecs
2011-04-08 11:58:21,036 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe mode ON.
The reported blocks 0 needs additional 2 blocks to reach the threshold 0.9990 of total blocks 3. Safe mode will be turned off automatically.

Search Discussions

  • Harsh J at Apr 8, 2011 at 5:58 pm
    Hello,

    I'm not quite clear why you'd want to disable a consistency check such
    as the safemode feature. It is to guarantee that your DFS is to be
    made ready only after it has sufficient blocks reported to start
    handling your dfs requests. If your NN ever goes into safemode later,
    it is vital that you take a look at logs and fsck reports to determine
    what's gone wrong.
    On Fri, Apr 8, 2011 at 3:06 PM, springring wrote:
    I modify the value of "dfs.safemode.threshold.pct" to zero, now everything is ok.
    log file as below
    But there are still three questions

    1.. Can I regain percentage of blocks that should satisfy the minimal replication requirement
    to 99.9%?  hadoop balancer? For I feel it will be more safe.
    The safemode is to guarantee that. That is why it is called the 'safe'
    mode. Not sure what you mean by the balancer thing.

    In production one never restarts the NameNode frequently, so I s'pose
    this is just to get rid of some development hassles?

    You may want to additionally lower the leave-safemode extension period
    from 30s to 0s to get rid of the check entirely, anyway.
  • Matthew Foley at Apr 8, 2011 at 7:27 pm
    1.. Can I regain percentage of blocks that should satisfy the minimal replication requirement
    to 99.9%? hadoop balancer? For I feel it will be more safe.

    2. I set "dfs.safemode.threshold.pct" to "0" or "0f", two value both work, but which one is
    better? I guess "0"

    3. When HDFS start up in safe mode, the log file should show
    "The reported blocks 0 needs additional 2 blocks to reach the threshold 0.9990 of total blocks 3. Safe mode will 'not' be turned off automatically."
    There miss a word "not" , right?

    Regarding your follow-on questions, in case it wasn't clear from my first message (copied below):

    At startup time, the namenode reads its namespace from disk (the FSImage and edits files). This includes all the HDFS filenames and block lists that it should know, but not the mappings of block replicas to datanodes. Then it waits in safe mode for all or most of the datanodes to send their Initial Block Reports, which let the namenode build its map of which blocks have replicas in which datanodes. It keeps waiting until dfs.namenode.safemode.threshold-pct of the blocks that it knows about from FSImage have been reported from at least dfs.namenode.replication.min (default 1) datanodes [so that's a third config parameter I didn't mention earlier]. If this threshold is achieved, it will post a log that it is ready to leave safe mode, wait for dfs.namenode.safemode.extension seconds, then automatically leave safe mode and generate replication requests for any under-replicated blocks (by default, those with replication < 3).

    If it doesn't reach the "safe replication for all known blocks" threshold, then it will not leave safe mode automatically. It logs the condition and waits for an admin to decide what to do, because generally it means whole datanodes or sets of datanodes did not come up or are not able to communicate with the namenode. Hadoop wants a human to look at the situation before hadoop starts trying to madly generate re-replication commands for under-replicated blocks, and deleting blocks with zero replicas available.

    BTW, this thread is better for hdfs-user than hdfs-dev.
    --Matt

    On Apr 8, 2011, at 11:52 AM, Matthew Foley wrote:

    From: Matthew Foley <mattf@yahoo-inc.com
    Date: April 8, 2011 11:52:00 AM PDT
    To: "common-user@hadoop.apache.org " <common-user@hadoop.apache.org
    Cc: Matthew Foley <mattf@yahoo-inc.com
    Subject: Re: start-up with safe mode?

    Hi Ring,

    The purpose of starting up with safe mode enabled, is to prevent replication thrashing before and during Initial Block Reports from all the datanodes. Consider this thought experiment:
    - a cluster with 100 datanodes and replication 3
    - so any pair of datanodes only have aprx 2% overlap in their block content
    - the datanodes don't all start up exactly simultaneously
    - When the first two datanodes start up, if the cluster weren't in safe mode, 98% of their blocks would be declared under-replicated, and they would immediately be asked to replicate them ALL to each other!
    - When the third datanode starts up, it gets even worse, since it also only has a 2% overlap with each of the other two.
    - It just continues getting worse until many of the datanodes are registered, and the rate slows down for introduction of new blocks with only one found replica.

    While safe mode is on, the namenode doesn't attempt to change anything in the namespace or blockspace, including which datanodes have replicas of which blocks, although it does accept Block Reports from the datanodes telling which blocks they have. So the above described replication storm doesn't happen during safe mode. All (or almost all) the datanodes are allowed to register and give their Block Reports. THEN the namenode scans for blocks that truly are under-replicated. It gives a 30-second warning, then leaves safe mode, and generates appropriate replication requests to fix any under-replicated blocks now known.

    That said, you can modify this behavior with the configuration parameters
    dfs.namenode.safemode.threshold-pct
    and dfs.namenode.safemode.extension
    These default to 0.999 (100% minus delta), and 30000 (30 sec), respectively (defined in DFSConfigKeys).
    Search for them in the docs for details.

    If you want non-default values, you'd typically set them in hdfs-site.xml in your namenode's config directory.
    Setting them both to 0 will give you what you are asking for, but it probably isn't what you want :-)

    --Matt


    On Apr 8, 2011, at 2:36 AM, springring wrote:

    I modify the value of "dfs.safemode.threshold.pct" to zero, now everything is ok.
    log file as below
    But there are still three questions

    1.. Can I regain percentage of blocks that should satisfy the minimal replication requirement
    to 99.9%? hadoop balancer? For I feel it will be more safe.

    2. I set "dfs.safemode.threshold.pct" to "0" or "0f", two value both work, but which one is
    better? I guess "0"

    3. When HDFS start up in safe mode, the log file should show
    "The reported blocks 0 needs additional 2 blocks to reach the threshold 0.9990 of total blocks 3. Safe mode will 'not' be turned off automatically."
    There miss a word "not" , right?

    Ring

    /************************************************************

    SHUTDOWN_MSG: Shutting down NameNode at computeb-05.pcm/172.172.2.6

    ************************************************************/

    2011-04-08 16:33:37,312 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:

    /************************************************************

    STARTUP_MSG: Starting NameNode

    STARTUP_MSG: host = computeb-05.pcm/172.172.2.6

    STARTUP_MSG: args = []

    STARTUP_MSG: version = 0.20.2-CDH3B4

    STARTUP_MSG: build = -r 3aa7c91592ea1c53f3a913a581dbfcdfebe98bfe; compiled by 'root' on Mon Feb 21 17:31:12 EST 2011

    ************************************************************/

    2011-04-08 16:33:37,441 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null

    2011-04-08 16:33:37,443 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext

    2011-04-08 16:33:37,464 INFO org.apache.hadoop.hdfs.util.GSet: VM type = 32-bit

    2011-04-08 16:........................................


    2011-04-08 16:33:37,832 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of under-replicated blocks = 4

    2011-04-08 16:33:37,832 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of over-replicated blocks = 0

    2011-04-08 16:33:37,832 INFO org.apache.hadoop.hdfs.StateChange: STATE* Leaving safe mode after 0 secs.

    2011-04-08 16:33:37,832 INFO org.apache.hadoop.hdfs.StateChange: STATE* Network topology has 0 racks and 0 datanodes

    2011-04-08 16:33:37,832 INFO org.apache.hadoop.hdfs.StateChange: STATE* UnderReplicatedBlocks has 4 blocks

    2011-04-08 16:33:37,835 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list

    2011-04-08 16:33:37,849 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 9100





    ----- Original Message -----
    From: "springring" <springring@126.com
    To: <common-user@hadoop.apache.org
    Cc: <hdfs-dev@hadoop.apache.org
    Sent: Friday, April 08, 2011 3:45 PM
    Subject: Re:HDFS start-up with safe mode?


    Hi,

    I guess that something about "threshold 0.9990". When HDFS start up,
    it come in safe mode first, then check a value(I don't know what value or percent?)
    of my hadoop,and fine the value below 99.9%, so the safe mode will not turn off?

    but the conclusion of the log file is "Safe mode will be turned off automatically"?

    I'm lost.
    ___________________________________________________
    2011-04-08 11:58:21,036 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe mode ON.
    The reported blocks 0 needs additional 2 blocks to reach the threshold 0.9990 of total blocks 3. Safe mode will be turned off automatically.
    ________________________________________________________________________

    ----- Original Message -----
    From: "springring" <springring@126.com
    To: <common-user@hadoop.apache.org
    Sent: Friday, April 08, 2011 2:20 PM
    Subject: Fw: start-up with safe mode?





    Hi,

    When I start up hadoop, the namenode log show "STATE* Safe mode ON" like that , how to set it off?
    I can set it off with command "hadoop fs -dfsadmin leave" after start up, but how can I just start HDFS
    out of Safe mode?
    Thanks.

    Ring

    the startup log________________________________________________________________

    2011-04-08 11:58:20,655 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null
    2011-04-08 11:58:20,657 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
    2011-04-08 11:58:20,678 INFO org.apache.hadoop.hdfs.util.GSet: VM type = 32-bit
    2011-04-08 11:58:20,678 INFO org.apache.hadoop.hdfs.util.GSet: 2% max memory = 17.77875 MB
    2011-04-08 11:58:20,678 INFO org.apache.hadoop.hdfs.util.GSet: capacity = 2^22 = 4194304 entries
    2011-04-08 11:58:20,678 INFO org.apache.hadoop.hdfs.util.GSet: recommended=4194304, actual=4194304
    2011-04-08 11:58:20,697 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hdfs
    2011-04-08 11:58:20,697 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
    2011-04-08 11:58:20,697 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true
    2011-04-08 11:58:20,701 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: dfs.block.invalidate.limit=1000
    2011-04-08 11:58:20,701 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
    2011-04-08 11:58:20,976 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext
    2011-04-08 11:58:21,001 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 17
    2011-04-08 11:58:21,007 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 0
    2011-04-08 11:58:21,007 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 1529 loaded in 0 seconds.
    2011-04-08 11:58:21,007 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file /tmp/hadoop-hdfs/dfs/name/current/edits of size 4 edits # 0 loaded in 0 seconds.
    2011-04-08 11:58:21,009 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 1529 saved in 0 seconds.
    2011-04-08 11:58:21,022 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 1529 saved in 0 seconds.
    2011-04-08 11:58:21,032 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading FSImage in 339 msecs
    2011-04-08 11:58:21,036 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe mode ON.
    The reported blocks 0 needs additional 2 blocks to reach the threshold 0.9990 of total blocks 3. Safe mode will be turned off automatically.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedApr 8, '11 at 9:37a
activeApr 8, '11 at 7:27p
posts3
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase