Improve Datanode startup time

Key: HDFS-1443
URL: https://issues.apache.org/jira/browse/HDFS-1443
Project: Hadoop HDFS
Issue Type: Improvement
Components: data-node
Affects Versions: 0.20.2
Reporter: Matt Foley
Assignee: Matt Foley
Fix For: 0.22.0

One of the factors slowing down cluster restart is the startup time for the Datanodes. In particular, if Upgrade is needed, the Datanodes must do a Snapshot and this can take 5-15 minutes per volume, serially. Thus, for a 4-disk datanode, it may be 45 minutes before it is ready to send its initial Block Report to the Namenode. This is an umbrella bug for the following four pieces of work to improve Datanode startup time:

1. Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it once per directory instead of once per file. This is the biggest villain, responsible for 90% of that 45 minute delay. See subordinate bug for details.

2. Refactor Upgrade process in DataStorage to run volume-parallel. There is already a bug open for this, HDFS-270, and the volume-parallel work in DirectoryScanner from HDFS-854 is a good foundation to build on.

3. Refactor the FSDir() and getVolumeMap() call chains in FSDataset, so they share data and run volume-parallel. Currently the two constructors for in-memory directory tree and replicas map run THREE full scans of the entire disk - once in FSDir(), once in recoverTempUnlinkedBlock(), and once in addToReplicasMap(). During each scan, a new File object is created for each of the 100,000 or so items in the native file system (for a 50,000-block node). This impacts GC as well as disk traffic.

4. Make getGenerationStampFromFile() more efficient. Currently this routine is called by addToReplicasMap() for every blockfile in the directory tree, and it does a full listing of each file's containing directory on every call. This is the equivalent of doing lots MORE full disk scans. The underlying disk i/o buffers probably prevent disk thrashing, but we are still creating bazillions of unnecessary File objects that need to be GC'ed. There is a simple refactoring that prevents this.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouphdfs-dev @
postedOct 7, '10 at 9:18p
activeOct 7, '10 at 9:18p

1 user in discussion

Matt Foley (JIRA): 1 post



site design / logo © 2022 Grokbase