FAQ
I'm wondering what the proper actions to take in light of a NameNode or
DataNode failure are in an application which is holding a reference to a
FileSystem object.
* Does the FileSystem handle all of this itself (e.g. reconnect logic)?
* Do I need to get a new FileSystem using .get(Configuration)?
* Does the FileSystem need to be closed before re-getting?
* Do the answers to these questions depend on whether it's a NameNode or
DataNode that's failed?

In short, how does an application (not a Hadoop job -- just an app using
HDFS) properly recover from a NameNode or DataNode failure? I haven't
figured out the magic juju yet and my applications are not handling DFS
outages gracefully.

Thanks,
Brian

Search Discussions

  • Ariel Rabkin at Mar 2, 2009 at 5:48 am
    DataNode failures should be transparent. NameNode failures will bring
    down the whole HDFS and result in noticeable outage. Replicating the
    NameNode is on the long-term roadmap, but my impression is that it
    won't be happening very soon.

    --Ari
    On Thu, Feb 26, 2009 at 5:30 PM, Brian Long wrote:
    I'm wondering what the proper actions to take in light of a NameNode or
    DataNode failure are in an application which is holding a reference to a
    FileSystem object.
    * Does the FileSystem handle all of this itself (e.g. reconnect logic)?
    * Do I need to get a new FileSystem using .get(Configuration)?
    * Does the FileSystem need to be closed before re-getting?
    * Do the answers to these questions depend on whether it's a NameNode or
    DataNode that's failed?

    In short, how does an application (not a Hadoop job -- just an app using
    HDFS) properly recover from a NameNode or DataNode failure? I haven't
    figured out the magic juju yet and my applications are not handling DFS
    outages gracefully.

    Thanks,
    Brian


    --
    Ari Rabkin asrabkin@gmail.com
    UC Berkeley Computer Science Department

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedFeb 27, '09 at 1:30a
activeMar 2, '09 at 5:48a
posts2
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Ariel Rabkin: 1 post Brian Long: 1 post

People

Translate

site design / logo © 2022 Grokbase