[
https://issues.apache.org/jira/browse/HADOOP-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Konstantin Shvachko updated HADOOP-3002:
----------------------------------------
Attachment: DelBlocksInSafeMode.patch
This is the patch that postpones removal of blocks until the safe mode is off.
The main reason for delition was that block report processing was removing blocks that do not belong
to any file directly ignoring the regular mechanism that first adds invalid blocks into recentInvalidateSets
and then schedules them for deletion via heartbeats.
# I changed block report processing to just placing invalid blocks to recentInvalidateSets
and not returning any commands to data-nodes. This optimized processReport() because now it
does not scan the block report once again looking for invalid blocks.
# I changed heartbeat processing because it never checked the safe mode and would schedule
replications or deletions if there were any in the pending lists.
During startup the pending lists are empty but in manual safe mode it may not be the case.
So now the only commands that are allowed when safe mode is on are requests for block reports
and distributed upgrade commands.
It is not clear why some code in handleHeartbeat() is inside the synchronized section and some is not.
Placed everything inside.
HDFS should not remove blocks while in safemode.
------------------------------------------------
Key: HADOOP-3002
URL:
https://issues.apache.org/jira/browse/HADOOP-3002Project: Hadoop Core
Issue Type: Bug
Components: dfs
Affects Versions: 0.16.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
Priority: Blocker
Fix For: 0.17.0, 0.18.0
Attachments: DelBlocksInSafeMode.patch
I noticed that data-nodes are removing blocks during a rather prolonged distributed upgrade when the name-node is in safe mode.
This happened on my experimental cluster with accelerated block report rate.
By definition in safe mode the name-node should not
- accept client requests to change the namespace state, and
- schedule block replications and/or block removal for the data-nodes.
We don't want any unnecessary replications until all blocks are reported during startup.
We also don't want to remove blocks if safe mode is entered manually.
In heartbeat processing we explicitly verify that the name-node is in safe-mode and do not return any block commands to the data-nodes.
Block reports can also return block commands, which should be banned during safe mode.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.