Hello Hadoop Users list:

We are running Hadoop version 0.18.2. My team lead has asked me to investigate the answer to a particular question regarding Hadoop's handling of offline DataNodes - specifically, we would like to know how long a node can be offline before it is totally rebuilt when it has been readded to the cluster.
From what I've been able to determine from the documentation it appears to me that the NameNode will simply begin scheduling block replication on its remaining cluster members. If the offline node comes back online, and it reports all its blocks as being uncorrupted, then the NameNode just cleans up the "extra" blocks.
In other words, there is no explicit handling based on the length of the outage - the behavior of the cluster will depend entirely on the outage duration.

Anyone care to shed some light on this?

Joseph Hammerman

Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 3 | next ›
Discussion Overview
groupcommon-user @
postedMay 26, '09 at 10:10p
activeMay 26, '09 at 11:32p

2 users in discussion

Joe Hammerman: 2 posts Aaron Kimball: 1 post



site design / logo © 2022 Grokbase