FAQ
Hi ,

My hadoop version is basic on hadoop 0.20.2 realase, patched
HADOOP-4675,5745,MAPREDUCE-1070,551,1089 (support ganglia31,fairscheduler
preemption,hdfs append), and patched
HADOOP-6099,HDFS-278,Patches-from-Dhruba-Borthakur,HDFS-200
(support scribe).

Last Friday I found that some of my test hadoop cluster nodes's time is not
in the normal state, they are some number of hours beyond the normal time.
So I run the next command, and add it to the crontab job.
/usr/bin/rdate -s time-b.nist.gov

And then my hadoop cluster namenode crashed, after my restarting the
namenode.
And I don't know whether it is relationed by modifying the time.
The error log:

2011-02-12 18:44:46,603 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of blocks
= 196
2011-02-12 18:44:46,603 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid
blocks = 0
2011-02-12 18:44:46,603 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
under-replicated blocks = 29
2011-02-12 18:44:46,603 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
over-replicated blocks = 41
2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange: STATE*
Leaving safe mode after 69 secs.
2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe
mode is OFF.
2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange: STATE*
Network topology has 1 racks and 5 datanodes
2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange: STATE*
UnderReplicatedBlocks has 29 blocks
2011-02-12 18:44:46,886 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask
192.168.1.14:50010 to replicate blk_-8806907658071633346_1750 to
datanode(s) 192.168.1.83:50010
2011-02-12 18:44:46,887 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask
192.168.1.83:50010 to replicate blk_-7689075547598626554_1800 to
datanode(s) 192.168.1.10:50010
2011-02-12 18:44:46,887 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask
192.168.1.84:50010 to replicate blk_-7587424527299099175_1717 to
datanode(s) 192.168.1.10:50010
2011-02-12 18:44:46,887 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask
192.168.1.84:50010 to replicate blk_-6925943363757944243_1909 to
datanode(s) 192.168.1.13:50010
2011-02-12 18:44:46,888 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask
192.168.1.14:50010 to replicate blk_-6835423500788375545_1928 to
datanode(s) 192.168.1.10:50010
2011-02-12 18:44:46,888 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask
192.168.1.83:50010 to replicate blk_-6477488774631498652_1742 to
datanode(s) 192.168.1.84:50010
2011-02-12 18:44:46,889 WARN
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: ReplicationMonitor
thread received Runtime exception. java.lang.IllegalStateException:
generationStamp (=1) == GenerationStamp.WILDCARD_STAMP
java.lang.IllegalStateException: generationStamp (=1) ==
GenerationStamp.WILDCARD_STAMP
at
org.apache.hadoop.hdfs.protocol.Block.validateGenerationStamp(Block.java:148)
at org.apache.hadoop.hdfs.protocol.Block.compareTo(Block.java:156)
at org.apache.hadoop.hdfs.protocol.Block.compareTo(Block.java:30)
at java.util.TreeMap.put(TreeMap.java:545)
at java.util.TreeSet.add(TreeSet.java:238)
at
org.apache.hadoop.hdfs.server.namenode.DatanodeDescriptor.addBlocksToBeInvalidated(DatanodeDescriptor.java:284)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.invalidateWorkForOneNode(FSNamesystem.java:2743)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.computeInvalidateWork(FSNamesystem.java:2419)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.computeDatanodeWork(FSNamesystem.java:2412)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem$ReplicationMonitor.run(FSNamesystem.java:2357)
at java.lang.Thread.run(Thread.java:619)

2011-02-12 18:44:46,892 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop5/192.168.1.84
************************************************************/

Thanks,
Jameson

Search Discussions

  • Jameson Li at Feb 14, 2011 at 1:50 pm
    Hi ,

    My hadoop version is basic on hadoop 0.20.2 realase, patched
    HADOOP-4675,5745,MAPREDUCE-1070,551,1089 (support
    ganglia31,fairscheduler preemption,hdfs append), and patched
    HADOOP-6099,HDFS-278,Patches-from-Dhruba-Borthakur,HDFS-200 (support
    scribe).

    Last Friday I found that some of my test hadoop cluster nodes's time
    is not in the normal state, they are some number of hours beyond the
    normal time.
    So I run the next command, and add it to the crontab job.
    /usr/bin/rdate -s time-b.nist.gov

    And then my hadoop cluster namenode crashed, after my restarting the namenode.
    And I don't know whether it is relationed by modifying the time.

    The error log:
    2011-02-12 18:44:46,603 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of
    blocks = 196
    2011-02-12 18:44:46,603 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid
    blocks = 0
    2011-02-12 18:44:46,603 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
    under-replicated blocks = 29
    2011-02-12 18:44:46,603 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
    over-replicated blocks = 41
    2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange:
    STATE* Leaving safe mode after 69 secs.
    2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange:
    STATE* Safe mode is OFF.
    2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange:
    STATE* Network topology has 1 racks and 5 datanodes
    2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange:
    STATE* UnderReplicatedBlocks has 29 blocks
    2011-02-12 18:44:46,886 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK* ask 192.168.1.14:50010 to replicate
    blk_-8806907658071633346_1750 to datanode(s) 192.168.1.83:50010
    2011-02-12 18:44:46,887 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK* ask 192.168.1.83:50010 to replicate
    blk_-7689075547598626554_1800 to datanode(s) 192.168.1.10:50010
    2011-02-12 18:44:46,887 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK* ask 192.168.1.84:50010 to replicate
    blk_-7587424527299099175_1717 to datanode(s) 192.168.1.10:50010
    2011-02-12 18:44:46,887 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK* ask 192.168.1.84:50010 to replicate
    blk_-6925943363757944243_1909 to datanode(s) 192.168.1.13:50010
    2011-02-12 18:44:46,888 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK* ask 192.168.1.14:50010 to replicate
    blk_-6835423500788375545_1928 to datanode(s) 192.168.1.10:50010
    2011-02-12 18:44:46,888 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK* ask 192.168.1.83:50010 to replicate
    blk_-6477488774631498652_1742 to datanode(s) 192.168.1.84:50010
    2011-02-12 18:44:46,889 WARN
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
    ReplicationMonitor thread received Runtime exception.
    java.lang.IllegalStateException: generationStamp (=1) ==
    GenerationStamp.WILDCARD_STAMP java.lang.IllegalStateException:
    generationStamp (=1) == GenerationStamp.WILDCARD_STAMP
    at org.apache.hadoop.hdfs.protocol.Block.validateGenerationStamp(Block.java:148)
    at org.apache.hadoop.hdfs.protocol.Block.compareTo(Block.java:156)
    at org.apache.hadoop.hdfs.protocol.Block.compareTo(Block.java:30)
    at java.util.TreeMap.put(TreeMap.java:545)
    at java.util.TreeSet.add(TreeSet.java:238)
    at org.apache.hadoop.hdfs.server.namenode.DatanodeDescriptor.addBlocksToBeInvalidated(DatanodeDescriptor.java:284)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.invalidateWorkForOneNode(FSNamesystem.java:2743)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.computeInvalidateWork(FSNamesystem.java:2419)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.computeDatanodeWork(FSNamesystem.java:2412)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem$ReplicationMonitor.run(FSNamesystem.java:2357)
    at java.lang.Thread.run(Thread.java:619)
    2011-02-12 18:44:46,892 INFO
    org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at hadoop5/192.168.1.84
    ************************************************************/


    Thanks,
    Jameson
  • Todd Lipcon at Feb 14, 2011 at 3:58 pm
    Hi Jameson,

    My first instinct is that you have an incomplete patch series for hdfs
    append, and that's what caused your problem. There were many bug fixes along
    the way for hadoop-0.20-append and maybe you've missed some in your manually
    patched build.

    -Todd
    On Mon, Feb 14, 2011 at 5:49 AM, Jameson Li wrote:

    Hi ,

    My hadoop version is basic on hadoop 0.20.2 realase, patched
    HADOOP-4675,5745,MAPREDUCE-1070,551,1089 (support
    ganglia31,fairscheduler preemption,hdfs append), and patched
    HADOOP-6099,HDFS-278,Patches-from-Dhruba-Borthakur,HDFS-200 (support
    scribe).

    Last Friday I found that some of my test hadoop cluster nodes's time
    is not in the normal state, they are some number of hours beyond the
    normal time.
    So I run the next command, and add it to the crontab job.
    /usr/bin/rdate -s time-b.nist.gov

    And then my hadoop cluster namenode crashed, after my restarting the
    namenode.
    And I don't know whether it is relationed by modifying the time.

    The error log:
    2011-02-12 18:44:46,603 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of
    blocks = 196
    2011-02-12 18:44:46,603 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid
    blocks = 0
    2011-02-12 18:44:46,603 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
    under-replicated blocks = 29
    2011-02-12 18:44:46,603 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
    over-replicated blocks = 41
    2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange:
    STATE* Leaving safe mode after 69 secs.
    2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange:
    STATE* Safe mode is OFF.
    2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange:
    STATE* Network topology has 1 racks and 5 datanodes
    2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange:
    STATE* UnderReplicatedBlocks has 29 blocks
    2011-02-12 18:44:46,886 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK* ask 192.168.1.14:50010 to replicate
    blk_-8806907658071633346_1750 to datanode(s) 192.168.1.83:50010
    2011-02-12 18:44:46,887 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK* ask 192.168.1.83:50010 to replicate
    blk_-7689075547598626554_1800 to datanode(s) 192.168.1.10:50010
    2011-02-12 18:44:46,887 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK* ask 192.168.1.84:50010 to replicate
    blk_-7587424527299099175_1717 to datanode(s) 192.168.1.10:50010
    2011-02-12 18:44:46,887 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK* ask 192.168.1.84:50010 to replicate
    blk_-6925943363757944243_1909 to datanode(s) 192.168.1.13:50010
    2011-02-12 18:44:46,888 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK* ask 192.168.1.14:50010 to replicate
    blk_-6835423500788375545_1928 to datanode(s) 192.168.1.10:50010
    2011-02-12 18:44:46,888 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK* ask 192.168.1.83:50010 to replicate
    blk_-6477488774631498652_1742 to datanode(s) 192.168.1.84:50010
    2011-02-12 18:44:46,889 WARN
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
    ReplicationMonitor thread received Runtime exception.
    java.lang.IllegalStateException: generationStamp (=1) ==
    GenerationStamp.WILDCARD_STAMP java.lang.IllegalStateException:
    generationStamp (=1) == GenerationStamp.WILDCARD_STAMP
    at
    org.apache.hadoop.hdfs.protocol.Block.validateGenerationStamp(Block.java:148)
    at org.apache.hadoop.hdfs.protocol.Block.compareTo(Block.java:156)
    at org.apache.hadoop.hdfs.protocol.Block.compareTo(Block.java:30)
    at java.util.TreeMap.put(TreeMap.java:545)
    at java.util.TreeSet.add(TreeSet.java:238)
    at
    org.apache.hadoop.hdfs.server.namenode.DatanodeDescriptor.addBlocksToBeInvalidated(DatanodeDescriptor.java:284)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.invalidateWorkForOneNode(FSNamesystem.java:2743)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.computeInvalidateWork(FSNamesystem.java:2419)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.computeDatanodeWork(FSNamesystem.java:2412)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem$ReplicationMonitor.run(FSNamesystem.java:2357)
    at java.lang.Thread.run(Thread.java:619)
    2011-02-12 18:44:46,892 INFO
    org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at hadoop5/192.168.1.84
    ************************************************************/


    Thanks,
    Jameson


    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Jameson Li at Feb 15, 2011 at 3:51 am
    Hi Todd,

    Thanks very much. I think you are really right.

    I had used the hadoop-0.20-append patchs that is mentioned here:
    http://github.com/lenn0x/Hadoop-Append

    After reading the
    patch:0002-HDFS-278.patch<https://github.com/lenn0x/Hadoop-Append/blob/master/0002-HDFS-278.patch>
    ,
    I found that the file "src/hdfs/org/apache/hadoop/hdfs/DFSClient.java" in my
    cluster does not contain these lines:

    *
    this.maxBlockAcquireFailures =
    conf.getInt("dfs.client.max.block.acquire.failures",
    MAX_BLOCK_ACQUIRE_FAILURES);
    *


    It just looks like this:
    * this.maxBlockAcquireFailures = getMaxBlockAcquireFailures(conf);*

    So I changed the
    0002-HDFS-278.patch<https://github.com/lenn0x/Hadoop-Append/blob/master/0002-HDFS-278.patch>
    ,
    and the diff between the origin
    0002-HDFS-278.patch<https://github.com/lenn0x/Hadoop-Append/blob/master/0002-HDFS-278.patch>
    and
    the new patch after my change is:
    *diff 0002-HDFS-278.patch ../hadoop-new/patch-origion/0002-HDFS-278.patch *
    *0a1,10*
    *> From 56463073cf051f1e11b4d3921542979e53daead4 Mon Sep 17 00:00:00 2001*
    *> From: Chris Goffinet <cg@chrisgoffinet.com>*
    *> Date: Mon, 20 Jul 2009 17:20:13 -0700*
    *> Subject: [PATCH 2/4] HDFS-278*
    *> *
    *> ---*
    *> src/hdfs/org/apache/hadoop/hdfs/DFSClient.java | 70
    ++++++++++++++++++++++--*
    *> 1 files changed, 64 insertions(+), 6 deletions(-)*
    *> *
    *> diff --git a/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java
    b/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java*
    *2,3c12,13*
    *< --- src/hdfs/org/apache/hadoop/hdfs/DFSClient.java*
    *< +++ src/hdfs/org/apache/hadoop/hdfs/DFSClient.java*
    *---*
    *> --- a/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java*
    *> +++ b/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java*
    *19,20c29,32*
    *< @@ -188,5 +192,7 @@ public class DFSClient implements FSConstants,
    java.io.Closeable {*
    *< this.maxBlockAcquireFailures = getMaxBlockAcquireFailures(conf);*
    *---*
    *> @@ -167,7 +171,9 @@ public class DFSClient implements FSConstants,
    java.io.Closeable {*
    *> this.maxBlockAcquireFailures = *
    *>
    conf.getInt("dfs.client.max.block.acquire.failures",*
    *> MAX_BLOCK_ACQUIRE_FAILURES);*
    *118a131,133*
    *> -- *
    *> 1.6.3.1*
    *> *

    Did I miss some of the patchs about hadoop-0.20-append?
    How could I recover my NN and let it work that I can export the data?

    2011/2/14 Todd Lipcon <todd@cloudera.com>
    Hi Jameson,

    My first instinct is that you have an incomplete patch series for hdfs
    append, and that's what caused your problem. There were many bug fixes along
    the way for hadoop-0.20-append and maybe you've missed some in your manually
    patched build.

    -Todd

    On Mon, Feb 14, 2011 at 5:49 AM, Jameson Li wrote:

    Hi ,

    My hadoop version is basic on hadoop 0.20.2 realase, patched
    HADOOP-4675,5745,MAPREDUCE-1070,551,1089 (support
    ganglia31,fairscheduler preemption,hdfs append), and patched
    HADOOP-6099,HDFS-278,Patches-from-Dhruba-Borthakur,HDFS-200 (support
    scribe).

    Last Friday I found that some of my test hadoop cluster nodes's time
    is not in the normal state, they are some number of hours beyond the
    normal time.
    So I run the next command, and add it to the crontab job.
    /usr/bin/rdate -s time-b.nist.gov

    And then my hadoop cluster namenode crashed, after my restarting the
    namenode.
    And I don't know whether it is relationed by modifying the time.

    The error log:
    2011-02-12 18:44:46,603 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of
    blocks = 196
    2011-02-12 18:44:46,603 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid
    blocks = 0
    2011-02-12 18:44:46,603 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
    under-replicated blocks = 29
    2011-02-12 18:44:46,603 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
    over-replicated blocks = 41
    2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange:
    STATE* Leaving safe mode after 69 secs.
    2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange:
    STATE* Safe mode is OFF.
    2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange:
    STATE* Network topology has 1 racks and 5 datanodes
    2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange:
    STATE* UnderReplicatedBlocks has 29 blocks
    2011-02-12 18:44:46,886 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK* ask 192.168.1.14:50010 to replicate
    blk_-8806907658071633346_1750 to datanode(s) 192.168.1.83:50010
    2011-02-12 18:44:46,887 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK* ask 192.168.1.83:50010 to replicate
    blk_-7689075547598626554_1800 to datanode(s) 192.168.1.10:50010
    2011-02-12 18:44:46,887 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK* ask 192.168.1.84:50010 to replicate
    blk_-7587424527299099175_1717 to datanode(s) 192.168.1.10:50010
    2011-02-12 18:44:46,887 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK* ask 192.168.1.84:50010 to replicate
    blk_-6925943363757944243_1909 to datanode(s) 192.168.1.13:50010
    2011-02-12 18:44:46,888 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK* ask 192.168.1.14:50010 to replicate
    blk_-6835423500788375545_1928 to datanode(s) 192.168.1.10:50010
    2011-02-12 18:44:46,888 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK* ask 192.168.1.83:50010 to replicate
    blk_-6477488774631498652_1742 to datanode(s) 192.168.1.84:50010
    2011-02-12 18:44:46,889 WARN
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
    ReplicationMonitor thread received Runtime exception.
    java.lang.IllegalStateException: generationStamp (=1) ==
    GenerationStamp.WILDCARD_STAMP java.lang.IllegalStateException:
    generationStamp (=1) == GenerationStamp.WILDCARD_STAMP
    at
    org.apache.hadoop.hdfs.protocol.Block.validateGenerationStamp(Block.java:148)
    at org.apache.hadoop.hdfs.protocol.Block.compareTo(Block.java:156)
    at org.apache.hadoop.hdfs.protocol.Block.compareTo(Block.java:30)
    at java.util.TreeMap.put(TreeMap.java:545)
    at java.util.TreeSet.add(TreeSet.java:238)
    at
    org.apache.hadoop.hdfs.server.namenode.DatanodeDescriptor.addBlocksToBeInvalidated(DatanodeDescriptor.java:284)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.invalidateWorkForOneNode(FSNamesystem.java:2743)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.computeInvalidateWork(FSNamesystem.java:2419)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.computeDatanodeWork(FSNamesystem.java:2412)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem$ReplicationMonitor.run(FSNamesystem.java:2357)
    at java.lang.Thread.run(Thread.java:619)
    2011-02-12 18:44:46,892 INFO
    org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at hadoop5/192.168.1.84
    ************************************************************/


    Thanks,
    Jameson


    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Jameson Li at Feb 16, 2011 at 10:02 am
    I have updated my test cluster to the old state before patching.
    And everything goes normal.

    Thanks.


    2011/2/15 Jameson Li <hovlj.ei@gmail.com>:
    Hi Todd,
    Thanks very much. I think you are really right.
    I had used the hadoop-0.20-append patchs that is mentioned
    here:http://github.com/lenn0x/Hadoop-Append

    After reading the patch:0002-HDFS-278.patch , I found that the file
    "src/hdfs/org/apache/hadoop/hdfs/DFSClient.java" in my cluster does not
    contain these lines:

    this.maxBlockAcquireFailures =
    conf.getInt("dfs.client.max.block.acquire.failures",
    MAX_BLOCK_ACQUIRE_FAILURES);

    It just looks like this:
    this.maxBlockAcquireFailures = getMaxBlockAcquireFailures(conf);

    So I changed the 0002-HDFS-278.patch , and the diff between the
    origin 0002-HDFS-278.patch and the new patch after my change is:
    diff 0002-HDFS-278.patch ../hadoop-new/patch-origion/0002-HDFS-278.patch
    0a1,10
    From 56463073cf051f1e11b4d3921542979e53daead4 Mon Sep 17 00:00:00 2001
    From: Chris Goffinet <cg@chrisgoffinet.com>
    Date: Mon, 20 Jul 2009 17:20:13 -0700
    Subject: [PATCH 2/4] HDFS-278

    ---
    src/hdfs/org/apache/hadoop/hdfs/DFSClient.java |   70
    ++++++++++++++++++++++--
    1 files changed, 64 insertions(+), 6 deletions(-)

    diff --git a/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java
    b/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java
    2,3c12,13
    < --- src/hdfs/org/apache/hadoop/hdfs/DFSClient.java
    < +++ src/hdfs/org/apache/hadoop/hdfs/DFSClient.java
    ---
    --- a/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java
    +++ b/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java
    19,20c29,32
    < @@ -188,5 +192,7 @@ public class DFSClient implements FSConstants,
    java.io.Closeable {
    <      this.maxBlockAcquireFailures = getMaxBlockAcquireFailures(conf);
    ---
    @@ -167,7 +171,9 @@ public class DFSClient implements FSConstants,
    java.io.Closeable {
    this.maxBlockAcquireFailures =

    conf.getInt("dfs.client.max.block.acquire.failures",
    MAX_BLOCK_ACQUIRE_FAILURES);
    118a131,133
    --
    1.6.3.1
    Did I miss some of the patchs about hadoop-0.20-append?
    How could I recover my NN and  let it work that I can export the data?
    2011/2/14 Todd Lipcon <todd@cloudera.com>
    Hi Jameson,
    My first instinct is that you have an incomplete patch series for hdfs
    append, and that's what caused your problem. There were many bug fixes along
    the way for hadoop-0.20-append and maybe you've missed some in your manually
    patched build.
    -Todd
    On Mon, Feb 14, 2011 at 5:49 AM, Jameson Li wrote:

    Hi ,

    My hadoop version is basic on hadoop 0.20.2 realase, patched
    HADOOP-4675,5745,MAPREDUCE-1070,551,1089 (support
    ganglia31,fairscheduler preemption,hdfs append), and patched
    HADOOP-6099,HDFS-278,Patches-from-Dhruba-Borthakur,HDFS-200 (support
    scribe).

    Last Friday I found that some of my test hadoop cluster nodes's time
    is not in the normal state, they are some number of hours beyond the
    normal time.
    So I run the next command, and add it to the crontab job.
    /usr/bin/rdate -s time-b.nist.gov

    And then my hadoop cluster namenode crashed, after my restarting the
    namenode.
    And I don't know whether it is relationed by modifying the time.

    The error log:
    2011-02-12 18:44:46,603 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of
    blocks = 196
    2011-02-12 18:44:46,603 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid
    blocks = 0
    2011-02-12 18:44:46,603 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
    under-replicated blocks = 29
    2011-02-12 18:44:46,603 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
    over-replicated blocks = 41
    2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange:
    STATE* Leaving safe mode after 69 secs.
    2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange:
    STATE* Safe mode is OFF.
    2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange:
    STATE* Network topology has 1 racks and 5 datanodes
    2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange:
    STATE* UnderReplicatedBlocks has 29 blocks
    2011-02-12 18:44:46,886 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK* ask 192.168.1.14:50010 to replicate
    blk_-8806907658071633346_1750 to datanode(s) 192.168.1.83:50010
    2011-02-12 18:44:46,887 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK* ask 192.168.1.83:50010 to replicate
    blk_-7689075547598626554_1800 to datanode(s) 192.168.1.10:50010
    2011-02-12 18:44:46,887 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK* ask 192.168.1.84:50010 to replicate
    blk_-7587424527299099175_1717 to datanode(s) 192.168.1.10:50010
    2011-02-12 18:44:46,887 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK* ask 192.168.1.84:50010 to replicate
    blk_-6925943363757944243_1909 to datanode(s) 192.168.1.13:50010
    2011-02-12 18:44:46,888 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK* ask 192.168.1.14:50010 to replicate
    blk_-6835423500788375545_1928 to datanode(s) 192.168.1.10:50010
    2011-02-12 18:44:46,888 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK* ask 192.168.1.83:50010 to replicate
    blk_-6477488774631498652_1742 to datanode(s) 192.168.1.84:50010
    2011-02-12 18:44:46,889 WARN
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
    ReplicationMonitor thread received Runtime exception.
    java.lang.IllegalStateException: generationStamp (=1) ==
    GenerationStamp.WILDCARD_STAMP java.lang.IllegalStateException:
    generationStamp (=1) == GenerationStamp.WILDCARD_STAMP
    at
    org.apache.hadoop.hdfs.protocol.Block.validateGenerationStamp(Block.java:148)
    at
    org.apache.hadoop.hdfs.protocol.Block.compareTo(Block.java:156)
    at org.apache.hadoop.hdfs.protocol.Block.compareTo(Block.java:30)
    at java.util.TreeMap.put(TreeMap.java:545)
    at java.util.TreeSet.add(TreeSet.java:238)
    at
    org.apache.hadoop.hdfs.server.namenode.DatanodeDescriptor.addBlocksToBeInvalidated(DatanodeDescriptor.java:284)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.invalidateWorkForOneNode(FSNamesystem.java:2743)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.computeInvalidateWork(FSNamesystem.java:2419)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.computeDatanodeWork(FSNamesystem.java:2412)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem$ReplicationMonitor.run(FSNamesystem.java:2357)
    at java.lang.Thread.run(Thread.java:619)
    2011-02-12 18:44:46,892 INFO
    org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at hadoop5/192.168.1.84
    ************************************************************/


    Thanks,
    Jameson


    --
    Todd Lipcon
    Software Engineer, Cloudera

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouphdfs-user @
categorieshadoop
postedFeb 14, '11 at 9:04a
activeFeb 16, '11 at 10:02a
posts5
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Jameson Li: 4 posts Todd Lipcon: 1 post

People

Translate

site design / logo © 2022 Grokbase