FAQ
While this is not a scm specific issue I thought I'd asked here first since
I installed with scm. I can move the question to user@hadoop.apache.org if
it's appropriate.
--- info ----
- 10 node cluster is used for testing
-version = 2.0.0-cdh4.2.0
-namenode is also datanode (that machine is zip4)
-----
I had ip address issues with the nodes. Removed a problem node from the
cluster,
  -planned to do some cleanup on that node
  -then add it back
  -then rebalance the cluster
But I deleted the dfs.data.dir on the namenode/datanode.
--> it's been a long day.

cm shows the zip4 datanode as fine.!!? But the namenode on zip4 is stopped.
I tried
   *hadoop namenode -recover*
* -*this generated:
      hdfs.StateChange: STATE* Safe mode is ON.
        So i tried : "hdfs dfsadmin -safemode leave" to turn safe mode off
as suggested.
      no-go:
         ... to zip4:8020 failed on connection exception
   -that failed with :
      WARN common.Storage: Storage directory /tmp/hadoop-linux/dfs/name does
not exist
   -which caused:
      InconsistentFSStateException: Directory /tmp/hadoop-linux/dfs/name is
in an inconsistent state: storage directory does not exist or....

This is a test cluster, but has alot of good test data. I would prefer to
not lose the data. But if I do it's not the end of the world.
  Any suggestions?
thanks
John

Search Discussions

  • Harsh J at Jul 20, 2013 at 1:20 am
    Are you facing an issue with the NameNode startup right now? What is
    it logging when you try to start it?

    If you've deleted dfs.data.dir, then there usually isn't a problem as
    you should have replicas elsewhere. The only loss would be some of the
    single-replica blocks, if you had single-replica files.

    If you've deleted dfs.name.dir, you need to place back a current/
    directory from the secondary namenode backup and boot up.

    The "namenode -recover" is to only be used for editlog corruptions,
    which am not sure you're running into, based on your details.
    On Sat, Jul 20, 2013 at 5:27 AM, John Meza wrote:
    While this is not a scm specific issue I thought I'd asked here first since
    I installed with scm. I can move the question to user@hadoop.apache.org if
    it's appropriate.
    --- info ----
    - 10 node cluster is used for testing
    -version = 2.0.0-cdh4.2.0
    -namenode is also datanode (that machine is zip4)
    -----
    I had ip address issues with the nodes. Removed a problem node from the
    cluster,
    -planned to do some cleanup on that node
    -then add it back
    -then rebalance the cluster
    But I deleted the dfs.data.dir on the namenode/datanode.
    --> it's been a long day.

    cm shows the zip4 datanode as fine.!!? But the namenode on zip4 is stopped.
    I tried
    hadoop namenode -recover
    -this generated:
    hdfs.StateChange: STATE* Safe mode is ON.
    So i tried : "hdfs dfsadmin -safemode leave" to turn safe mode off as
    suggested.
    no-go:
    ... to zip4:8020 failed on connection exception
    -that failed with :
    WARN common.Storage: Storage directory /tmp/hadoop-linux/dfs/name does
    not exist
    -which caused:
    InconsistentFSStateException: Directory /tmp/hadoop-linux/dfs/name is
    in an inconsistent state: storage directory does not exist or....

    This is a test cluster, but has alot of good test data. I would prefer to
    not lose the data. But if I do it's not the end of the world.
    Any suggestions?
    thanks
    John






    --
    Harsh J
  • John Meza at Jul 20, 2013 at 2:10 am
    Yes, the problem is unchanged.
    With hadoop namenode -recover it's returning:

    13/07/19 16:07:15 INFO hdfs.StateChange: STATE* Safe mode is ON.
    Use "hdfs dfsadmin -safemode leave" to turn safe mode off.
    13/07/19 16:07:15 WARN common.Storage: Storage directory
    /tmp/hadoop-linux/dfs/name does not exist
    13/07/19 16:07:15 INFO namenode.MetaRecoveryContext: RECOVERY FAILED:
    caught exception
    org.apache.hadoop.hdfs.server.common.InconsistentFSStateException:
    Directory /tmp/hadoop-linux/dfs/name is in an inconsistent state: storage
    directory does not exist or is not accessible.
    at
    org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:295)
    at
    org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:201)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:592)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:435)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:397)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.doRecovery(NameNode.java:1064)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1136)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1205)
    13/07/19 16:07:15 FATAL namenode.NameNode: Exception in namenode join
    org.apache.hadoop.hdfs.server.common.InconsistentFSStateException:
    Directory /tmp/hadoop-linux/dfs/name is in an inconsistent state: storage
    directory does not exist or is not accessible.
    at
    org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:295)
    at
    org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:201)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:592)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:435)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:397)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.doRecovery(NameNode.java:1064)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1136)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1205)
    13/07/19 16:07:15 INFO util.ExitUtil: Exiting with status 1
    13/07/19 16:07:15 INFO namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at zip4.esri.com/10.47.102.147
    ************************************************************/


    I'll look for a copy of the dfs.name.dir directory from the secondary
    namenode backup and retry the
    "namenode -recover"?

    thanks for the quick reply,
    John

    On Fri, Jul 19, 2013 at 6:19 PM, Harsh J wrote:

    Are you facing an issue with the NameNode startup right now? What is
    it logging when you try to start it?

    If you've deleted dfs.data.dir, then there usually isn't a problem as
    you should have replicas elsewhere. The only loss would be some of the
    single-replica blocks, if you had single-replica files.

    If you've deleted dfs.name.dir, you need to place back a current/
    directory from the secondary namenode backup and boot up.

    The "namenode -recover" is to only be used for editlog corruptions,
    which am not sure you're running into, based on your details.
    On Sat, Jul 20, 2013 at 5:27 AM, John Meza wrote:
    While this is not a scm specific issue I thought I'd asked here first since
    I installed with scm. I can move the question to user@hadoop.apache.orgif
    it's appropriate.
    --- info ----
    - 10 node cluster is used for testing
    -version = 2.0.0-cdh4.2.0
    -namenode is also datanode (that machine is zip4)
    -----
    I had ip address issues with the nodes. Removed a problem node from the
    cluster,
    -planned to do some cleanup on that node
    -then add it back
    -then rebalance the cluster
    But I deleted the dfs.data.dir on the namenode/datanode.
    --> it's been a long day.

    cm shows the zip4 datanode as fine.!!? But the namenode on zip4 is stopped.
    I tried
    hadoop namenode -recover
    -this generated:
    hdfs.StateChange: STATE* Safe mode is ON.
    So i tried : "hdfs dfsadmin -safemode leave" to turn safe mode off as
    suggested.
    no-go:
    ... to zip4:8020 failed on connection exception
    -that failed with :
    WARN common.Storage: Storage directory /tmp/hadoop-linux/dfs/name does
    not exist
    -which caused:
    InconsistentFSStateException: Directory /tmp/hadoop-linux/dfs/name is
    in an inconsistent state: storage directory does not exist or....

    This is a test cluster, but has alot of good test data. I would prefer to
    not lose the data. But if I do it's not the end of the world.
    Any suggestions?
    thanks
    John






    --
    Harsh J
  • Harsh J at Jul 20, 2013 at 2:44 am
    Hi John,

    But *what* is the problem exactly? Why are you trying to run the
    -recover function? That isn't clear. The -recover tool is for a very
    specific case of edit log corruption, and will not solve the problem
    of an erased directory.

    Anyhow, to just answer your question, please visit on the NN,
    /var/run/cloudera-scm-agent/process/, run a "ls -ltr *NAMENODE*" to
    find the latest dir, cd into it, export HADOOP_CONF_DIR=$PWD, then run
    "hadoop namenode -recover" and it will pick up proper name directory
    configs.
    On Sat, Jul 20, 2013 at 7:40 AM, John Meza wrote:
    Yes, the problem is unchanged.
    With hadoop namenode -recover it's returning:

    13/07/19 16:07:15 INFO hdfs.StateChange: STATE* Safe mode is ON.
    Use "hdfs dfsadmin -safemode leave" to turn safe mode off.
    13/07/19 16:07:15 WARN common.Storage: Storage directory
    /tmp/hadoop-linux/dfs/name does not exist
    13/07/19 16:07:15 INFO namenode.MetaRecoveryContext: RECOVERY FAILED: caught
    exception
    org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory
    /tmp/hadoop-linux/dfs/name is in an inconsistent state: storage directory
    does not exist or is not accessible.
    at
    org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:295)
    at
    org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:201)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:592)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:435)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:397)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.doRecovery(NameNode.java:1064)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1136)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1205)
    13/07/19 16:07:15 FATAL namenode.NameNode: Exception in namenode join
    org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory
    /tmp/hadoop-linux/dfs/name is in an inconsistent state: storage directory
    does not exist or is not accessible.
    at
    org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:295)
    at
    org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:201)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:592)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:435)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:397)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.doRecovery(NameNode.java:1064)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1136)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1205)
    13/07/19 16:07:15 INFO util.ExitUtil: Exiting with status 1
    13/07/19 16:07:15 INFO namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at zip4.esri.com/10.47.102.147
    ************************************************************/


    I'll look for a copy of the dfs.name.dir directory from the secondary
    namenode backup and retry the
    "namenode -recover"?

    thanks for the quick reply,
    John

    On Fri, Jul 19, 2013 at 6:19 PM, Harsh J wrote:

    Are you facing an issue with the NameNode startup right now? What is
    it logging when you try to start it?

    If you've deleted dfs.data.dir, then there usually isn't a problem as
    you should have replicas elsewhere. The only loss would be some of the
    single-replica blocks, if you had single-replica files.

    If you've deleted dfs.name.dir, you need to place back a current/
    directory from the secondary namenode backup and boot up.

    The "namenode -recover" is to only be used for editlog corruptions,
    which am not sure you're running into, based on your details.
    On Sat, Jul 20, 2013 at 5:27 AM, John Meza wrote:
    While this is not a scm specific issue I thought I'd asked here first
    since
    I installed with scm. I can move the question to user@hadoop.apache.org
    if
    it's appropriate.
    --- info ----
    - 10 node cluster is used for testing
    -version = 2.0.0-cdh4.2.0
    -namenode is also datanode (that machine is zip4)
    -----
    I had ip address issues with the nodes. Removed a problem node from the
    cluster,
    -planned to do some cleanup on that node
    -then add it back
    -then rebalance the cluster
    But I deleted the dfs.data.dir on the namenode/datanode.
    --> it's been a long day.

    cm shows the zip4 datanode as fine.!!? But the namenode on zip4 is
    stopped.
    I tried
    hadoop namenode -recover
    -this generated:
    hdfs.StateChange: STATE* Safe mode is ON.
    So i tried : "hdfs dfsadmin -safemode leave" to turn safe mode
    off as
    suggested.
    no-go:
    ... to zip4:8020 failed on connection exception
    -that failed with :
    WARN common.Storage: Storage directory /tmp/hadoop-linux/dfs/name
    does
    not exist
    -which caused:
    InconsistentFSStateException: Directory /tmp/hadoop-linux/dfs/name
    is
    in an inconsistent state: storage directory does not exist or....

    This is a test cluster, but has alot of good test data. I would prefer
    to
    not lose the data. But if I do it's not the end of the world.
    Any suggestions?
    thanks
    John






    --
    Harsh J


    --
    Harsh J
  • John Meza at Jul 20, 2013 at 2:53 am
    Sorry. Problem: namenode is stopped, desn't start.

    Why is it looking in the tmp directory?
      On Jul 19, 2013 7:44 PM, "Harsh J" wrote:

    Hi John,

    But *what* is the problem exactly? Why are you trying to run the
    -recover function? That isn't clear. The -recover tool is for a very
    specific case of edit log corruption, and will not solve the problem
    of an erased directory.

    Anyhow, to just answer your question, please visit on the NN,
    /var/run/cloudera-scm-agent/process/, run a "ls -ltr *NAMENODE*" to
    find the latest dir, cd into it, export HADOOP_CONF_DIR=$PWD, then run
    "hadoop namenode -recover" and it will pick up proper name directory
    configs.
    On Sat, Jul 20, 2013 at 7:40 AM, John Meza wrote:
    Yes, the problem is unchanged.
    With hadoop namenode -recover it's returning:

    13/07/19 16:07:15 INFO hdfs.StateChange: STATE* Safe mode is ON.
    Use "hdfs dfsadmin -safemode leave" to turn safe mode off.
    13/07/19 16:07:15 WARN common.Storage: Storage directory
    /tmp/hadoop-linux/dfs/name does not exist
    13/07/19 16:07:15 INFO namenode.MetaRecoveryContext: RECOVERY FAILED: caught
    exception
    org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory
    /tmp/hadoop-linux/dfs/name is in an inconsistent state: storage directory
    does not exist or is not accessible.
    at
    org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:295)
    at
    org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:201)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:592)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:435)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:397)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.doRecovery(NameNode.java:1064)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1136)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1205)
    13/07/19 16:07:15 FATAL namenode.NameNode: Exception in namenode join
    org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory
    /tmp/hadoop-linux/dfs/name is in an inconsistent state: storage directory
    does not exist or is not accessible.
    at
    org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:295)
    at
    org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:201)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:592)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:435)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:397)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.doRecovery(NameNode.java:1064)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1136)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1205)
    13/07/19 16:07:15 INFO util.ExitUtil: Exiting with status 1
    13/07/19 16:07:15 INFO namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at zip4.esri.com/10.47.102.147
    ************************************************************/


    I'll look for a copy of the dfs.name.dir directory from the secondary
    namenode backup and retry the
    "namenode -recover"?

    thanks for the quick reply,
    John

    On Fri, Jul 19, 2013 at 6:19 PM, Harsh J wrote:

    Are you facing an issue with the NameNode startup right now? What is
    it logging when you try to start it?

    If you've deleted dfs.data.dir, then there usually isn't a problem as
    you should have replicas elsewhere. The only loss would be some of the
    single-replica blocks, if you had single-replica files.

    If you've deleted dfs.name.dir, you need to place back a current/
    directory from the secondary namenode backup and boot up.

    The "namenode -recover" is to only be used for editlog corruptions,
    which am not sure you're running into, based on your details.
    On Sat, Jul 20, 2013 at 5:27 AM, John Meza wrote:
    While this is not a scm specific issue I thought I'd asked here first
    since
    I installed with scm. I can move the question to
    user@hadoop.apache.org
    if
    it's appropriate.
    --- info ----
    - 10 node cluster is used for testing
    -version = 2.0.0-cdh4.2.0
    -namenode is also datanode (that machine is zip4)
    -----
    I had ip address issues with the nodes. Removed a problem node from
    the
    cluster,
    -planned to do some cleanup on that node
    -then add it back
    -then rebalance the cluster
    But I deleted the dfs.data.dir on the namenode/datanode.
    --> it's been a long day.

    cm shows the zip4 datanode as fine.!!? But the namenode on zip4 is
    stopped.
    I tried
    hadoop namenode -recover
    -this generated:
    hdfs.StateChange: STATE* Safe mode is ON.
    So i tried : "hdfs dfsadmin -safemode leave" to turn safe mode
    off as
    suggested.
    no-go:
    ... to zip4:8020 failed on connection exception
    -that failed with :
    WARN common.Storage: Storage directory /tmp/hadoop-linux/dfs/name
    does
    not exist
    -which caused:
    InconsistentFSStateException: Directory
    /tmp/hadoop-linux/dfs/name
    is
    in an inconsistent state: storage directory does not exist or....

    This is a test cluster, but has alot of good test data. I would prefer
    to
    not lose the data. But if I do it's not the end of the world.
    Any suggestions?
    thanks
    John






    --
    Harsh J


    --
    Harsh J
  • Harsh J at Jul 20, 2013 at 3:10 am
    What does your NN startup log say about why it refuses to start? Have
    you looked there first?

    It looked at the /tmp directory as it was missing the right service
    configs that carry the right directory.
    On Sat, Jul 20, 2013 at 8:21 AM, John Meza wrote:
    Sorry. Problem: namenode is stopped, desn't start.

    Why is it looking in the tmp directory?
    On Jul 19, 2013 7:44 PM, "Harsh J" wrote:

    Hi John,

    But *what* is the problem exactly? Why are you trying to run the
    -recover function? That isn't clear. The -recover tool is for a very
    specific case of edit log corruption, and will not solve the problem
    of an erased directory.

    Anyhow, to just answer your question, please visit on the NN,
    /var/run/cloudera-scm-agent/process/, run a "ls -ltr *NAMENODE*" to
    find the latest dir, cd into it, export HADOOP_CONF_DIR=$PWD, then run
    "hadoop namenode -recover" and it will pick up proper name directory
    configs.
    On Sat, Jul 20, 2013 at 7:40 AM, John Meza wrote:
    Yes, the problem is unchanged.
    With hadoop namenode -recover it's returning:

    13/07/19 16:07:15 INFO hdfs.StateChange: STATE* Safe mode is ON.
    Use "hdfs dfsadmin -safemode leave" to turn safe mode off.
    13/07/19 16:07:15 WARN common.Storage: Storage directory
    /tmp/hadoop-linux/dfs/name does not exist
    13/07/19 16:07:15 INFO namenode.MetaRecoveryContext: RECOVERY FAILED:
    caught
    exception
    org.apache.hadoop.hdfs.server.common.InconsistentFSStateException:
    Directory
    /tmp/hadoop-linux/dfs/name is in an inconsistent state: storage
    directory
    does not exist or is not accessible.
    at

    org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:295)
    at

    org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:201)
    at

    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:592)
    at

    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:435)
    at

    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:397)
    at

    org.apache.hadoop.hdfs.server.namenode.NameNode.doRecovery(NameNode.java:1064)
    at

    org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1136)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1205)
    13/07/19 16:07:15 FATAL namenode.NameNode: Exception in namenode join
    org.apache.hadoop.hdfs.server.common.InconsistentFSStateException:
    Directory
    /tmp/hadoop-linux/dfs/name is in an inconsistent state: storage
    directory
    does not exist or is not accessible.
    at

    org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:295)
    at

    org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:201)
    at

    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:592)
    at

    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:435)
    at

    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:397)
    at

    org.apache.hadoop.hdfs.server.namenode.NameNode.doRecovery(NameNode.java:1064)
    at

    org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1136)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1205)
    13/07/19 16:07:15 INFO util.ExitUtil: Exiting with status 1
    13/07/19 16:07:15 INFO namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at zip4.esri.com/10.47.102.147
    ************************************************************/


    I'll look for a copy of the dfs.name.dir directory from the secondary
    namenode backup and retry the
    "namenode -recover"?

    thanks for the quick reply,
    John

    On Fri, Jul 19, 2013 at 6:19 PM, Harsh J wrote:

    Are you facing an issue with the NameNode startup right now? What is
    it logging when you try to start it?

    If you've deleted dfs.data.dir, then there usually isn't a problem as
    you should have replicas elsewhere. The only loss would be some of the
    single-replica blocks, if you had single-replica files.

    If you've deleted dfs.name.dir, you need to place back a current/
    directory from the secondary namenode backup and boot up.

    The "namenode -recover" is to only be used for editlog corruptions,
    which am not sure you're running into, based on your details.
    On Sat, Jul 20, 2013 at 5:27 AM, John Meza wrote:
    While this is not a scm specific issue I thought I'd asked here first
    since
    I installed with scm. I can move the question to
    user@hadoop.apache.org
    if
    it's appropriate.
    --- info ----
    - 10 node cluster is used for testing
    -version = 2.0.0-cdh4.2.0
    -namenode is also datanode (that machine is zip4)
    -----
    I had ip address issues with the nodes. Removed a problem node from
    the
    cluster,
    -planned to do some cleanup on that node
    -then add it back
    -then rebalance the cluster
    But I deleted the dfs.data.dir on the namenode/datanode.
    --> it's been a long day.

    cm shows the zip4 datanode as fine.!!? But the namenode on zip4 is
    stopped.
    I tried
    hadoop namenode -recover
    -this generated:
    hdfs.StateChange: STATE* Safe mode is ON.
    So i tried : "hdfs dfsadmin -safemode leave" to turn safe mode
    off as
    suggested.
    no-go:
    ... to zip4:8020 failed on connection exception
    -that failed with :
    WARN common.Storage: Storage directory
    /tmp/hadoop-linux/dfs/name
    does
    not exist
    -which caused:
    InconsistentFSStateException: Directory
    /tmp/hadoop-linux/dfs/name
    is
    in an inconsistent state: storage directory does not exist or....

    This is a test cluster, but has alot of good test data. I would
    prefer
    to
    not lose the data. But if I do it's not the end of the world.
    Any suggestions?
    thanks
    John






    --
    Harsh J


    --
    Harsh J


    --
    Harsh J
  • John Meza at Jul 25, 2013 at 12:43 am
    Unfortunately the cluster could be restarted, but the data is lost. Some
    reasons:
    -NN, SSN and a DN were on same machine
    -dfs.name.dir on NN, dfs.data.dir for local DN, fs.checkpoint.dir for SNN
    all shared a parent directory--> which was deleted
        -so "hdfs namenode -importCheckpoint" was not an option

    I'll have to get new test data.
    thanks for looking at this.
    John

    On Fri, Jul 19, 2013 at 8:10 PM, Harsh J wrote:

    What does your NN startup log say about why it refuses to start? Have
    you looked there first?

    It looked at the /tmp directory as it was missing the right service
    configs that carry the right directory.
    On Sat, Jul 20, 2013 at 8:21 AM, John Meza wrote:
    Sorry. Problem: namenode is stopped, desn't start.

    Why is it looking in the tmp directory?
    On Jul 19, 2013 7:44 PM, "Harsh J" wrote:

    Hi John,

    But *what* is the problem exactly? Why are you trying to run the
    -recover function? That isn't clear. The -recover tool is for a very
    specific case of edit log corruption, and will not solve the problem
    of an erased directory.

    Anyhow, to just answer your question, please visit on the NN,
    /var/run/cloudera-scm-agent/process/, run a "ls -ltr *NAMENODE*" to
    find the latest dir, cd into it, export HADOOP_CONF_DIR=$PWD, then run
    "hadoop namenode -recover" and it will pick up proper name directory
    configs.
    On Sat, Jul 20, 2013 at 7:40 AM, John Meza wrote:
    Yes, the problem is unchanged.
    With hadoop namenode -recover it's returning:

    13/07/19 16:07:15 INFO hdfs.StateChange: STATE* Safe mode is ON.
    Use "hdfs dfsadmin -safemode leave" to turn safe mode off.
    13/07/19 16:07:15 WARN common.Storage: Storage directory
    /tmp/hadoop-linux/dfs/name does not exist
    13/07/19 16:07:15 INFO namenode.MetaRecoveryContext: RECOVERY FAILED:
    caught
    exception
    org.apache.hadoop.hdfs.server.common.InconsistentFSStateException:
    Directory
    /tmp/hadoop-linux/dfs/name is in an inconsistent state: storage
    directory
    does not exist or is not accessible.
    at
    org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:295)
    at
    org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:201)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:592)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:435)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:397)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.doRecovery(NameNode.java:1064)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1136)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1205)
    13/07/19 16:07:15 FATAL namenode.NameNode: Exception in namenode join
    org.apache.hadoop.hdfs.server.common.InconsistentFSStateException:
    Directory
    /tmp/hadoop-linux/dfs/name is in an inconsistent state: storage
    directory
    does not exist or is not accessible.
    at
    org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:295)
    at
    org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:201)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:592)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:435)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:397)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.doRecovery(NameNode.java:1064)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1136)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1205)
    13/07/19 16:07:15 INFO util.ExitUtil: Exiting with status 1
    13/07/19 16:07:15 INFO namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at zip4.esri.com/10.47.102.147
    ************************************************************/


    I'll look for a copy of the dfs.name.dir directory from the secondary
    namenode backup and retry the
    "namenode -recover"?

    thanks for the quick reply,
    John

    On Fri, Jul 19, 2013 at 6:19 PM, Harsh J wrote:

    Are you facing an issue with the NameNode startup right now? What is
    it logging when you try to start it?

    If you've deleted dfs.data.dir, then there usually isn't a problem as
    you should have replicas elsewhere. The only loss would be some of
    the
    single-replica blocks, if you had single-replica files.

    If you've deleted dfs.name.dir, you need to place back a current/
    directory from the secondary namenode backup and boot up.

    The "namenode -recover" is to only be used for editlog corruptions,
    which am not sure you're running into, based on your details.
    On Sat, Jul 20, 2013 at 5:27 AM, John Meza wrote:
    While this is not a scm specific issue I thought I'd asked here
    first
    since
    I installed with scm. I can move the question to
    user@hadoop.apache.org
    if
    it's appropriate.
    --- info ----
    - 10 node cluster is used for testing
    -version = 2.0.0-cdh4.2.0
    -namenode is also datanode (that machine is zip4)
    -----
    I had ip address issues with the nodes. Removed a problem node from
    the
    cluster,
    -planned to do some cleanup on that node
    -then add it back
    -then rebalance the cluster
    But I deleted the dfs.data.dir on the namenode/datanode.
    --> it's been a long day.

    cm shows the zip4 datanode as fine.!!? But the namenode on zip4 is
    stopped.
    I tried
    hadoop namenode -recover
    -this generated:
    hdfs.StateChange: STATE* Safe mode is ON.
    So i tried : "hdfs dfsadmin -safemode leave" to turn safe
    mode
    off as
    suggested.
    no-go:
    ... to zip4:8020 failed on connection exception
    -that failed with :
    WARN common.Storage: Storage directory
    /tmp/hadoop-linux/dfs/name
    does
    not exist
    -which caused:
    InconsistentFSStateException: Directory
    /tmp/hadoop-linux/dfs/name
    is
    in an inconsistent state: storage directory does not exist or....

    This is a test cluster, but has alot of good test data. I would
    prefer
    to
    not lose the data. But if I do it's not the end of the world.
    Any suggestions?
    thanks
    John






    --
    Harsh J


    --
    Harsh J


    --
    Harsh J

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupscm-users @
categorieshadoop
postedJul 19, '13 at 11:57p
activeJul 25, '13 at 12:43a
posts7
users2
websitecloudera.com
irc#hadoop

2 users in discussion

John Meza: 4 posts Harsh J: 3 posts

People

Translate

site design / logo © 2022 Grokbase