I have a 5-node cluster running CDH 4.1.3 in high availability
configuration and would like to perform a rolling upgrade to CDH 4.2.0.
However, when I tried to decommission an HDFS datanode, the node entered
the "decommissioning" state and has stayed there for nearly the past two
hours. The namenode logs repeat the following messages:
...
2013-03-12 18:13:00,018 WARN
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault:
Not able to place enough replicas, still in need of 1 to reach 1
For more information, please enable DEBUG log level on
org.apache.commons.logging.impl.Log4JLogger
2013-03-12 18:13:02,317 INFO
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Block:
blk_103468439867474828_41668, Expected Replicas: 3, live replicas: 2,
corrupt replicas: 0, decommissioned replicas: 1, excess replicas: 0, Is
Open File: false, Datanodes having this block: 172.17.1.9:50010
172.17.1.94:50010 172.17.1.151:50010 , Current Datanode:
172.17.1.151:50010, Is current datanode decommissioning: true
2013-03-12 18:13:03,019 WARN
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault:
Not able to place enough replicas, still in need of 1 to reach 1
For more information, please enable DEBUG log level on
org.apache.commons.logging.impl.Log4JLogger
...
On the web console, the datanode in question is listed as decommissioning
on the primary HDFS namenode, but continues to be listed as active on the
standby namenode. Is HDFS high availability interfering with the
decommissioning process, and if so, what can I do to decommission a
datanode and proceed with a rolling upgrade?
--