Decommissioning never ends when node to decommission has blocks that are under-replicated and cannot be replicated to the expected level of replication

Key: HDFS-1590
URL: https://issues.apache.org/jira/browse/HDFS-1590
Project: Hadoop HDFS
Issue Type: Bug
Components: name-node
Affects Versions: 0.20.2
Environment: Linux
Reporter: Mathias Herberts
Priority: Minor

On a test cluster with 4 DNs and a default repl level of 3, I recently attempted to decommission one of the DNs. Right after the modification of the dfs.hosts.exclude file and the 'dfsadmin -refreshNodes', I could see the blocks being replicated to other nodes.

After a while, the replication stopped but the node was not marked as decommissioned.

When running an 'fsck -files -blocks -locations' I saw that all files had a replication of 4 (which is logical given there are 4 DNs), but some of the files had an expected replication set to 10 (those were job.jar files from M/R jobs).

I ran 'fs -setrep 3' on those files and shortly after the namenode reported the DN as decommissioned.

Shouldn't this case be checked by the NameNode when decommissioning a node? I.e considere a node decommissioned if either one of the following is true for each block on the node being decommissioned:

1. It is replicated more than the expected replication level.
2. It is replicated as much as possible given the available nodes, even though it is less replicated than expected.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouphdfs-dev @
postedJan 21, '11 at 4:00p
activeJan 21, '11 at 4:00p

1 user in discussion

Mathias Herberts (JIRA): 1 post



site design / logo © 2022 Grokbase