FAQ
Just a bit of a feedback here.

One of our hadoop 0.16.4 namenodes had gotten a disk full incident
today. No second backup namenode was in place. Both files fsimage and
edits seem to have gotten corrupted. After quite a bit of debugging
and fiddling with a hex edtor we managed to resurrect the files and
continue with just minor loss.

Thankfully this only happened on a development cluster - not on
production. But shouldn't that be something that should NEVER happen?

cheers
--
Torsten

Search Discussions

  • Raghu Angadi at Jul 30, 2008 at 6:37 pm
    You should always have more than one location (preferably on different
    disks) for fsimage and editslog.

    A few months back I had a proposal to keep checksums for each record on
    fsimage and editslog and NameNode would recover transparently from such
    corruptions when there are more than one copies available. It didn't
    come up in priority since there were no such failures observed.

    You should certainly report these cases and will help the feature gain
    more traction.

    Raghu.

    Torsten Curdt wrote:
    Just a bit of a feedback here.

    One of our hadoop 0.16.4 namenodes had gotten a disk full incident
    today. No second backup namenode was in place. Both files fsimage and
    edits seem to have gotten corrupted. After quite a bit of debugging and
    fiddling with a hex edtor we managed to resurrect the files and continue
    with just minor loss.

    Thankfully this only happened on a development cluster - not on
    production. But shouldn't that be something that should NEVER happen?

    cheers
    --
    Torsten
  • Torsten Curdt at Jul 30, 2008 at 10:04 pm

    On Jul 30, 2008, at 20:35, Raghu Angadi wrote:

    You should always have more than one location (preferably on
    different disks) for fsimage and editslog.
    On production we do frequent backups. Is there a mechanism from inside
    hadoop now to do something like that now? The "more than one location"
    bit sounds a little like that.
    A few months back I had a proposal to keep checksums for each record
    on fsimage and editslog and NameNode would recover transparently
    from such corruptions when there are more than one copies available.
    It didn't come up in priority since there were no such failures
    observed.

    You should certainly report these cases and will help the feature
    gain more traction.
    Will file a bug report tomorrow.

    cheers
    --
    Torsten
  • Raghu Angadi at Jul 30, 2008 at 10:25 pm

    Torsten Curdt wrote:
    On Jul 30, 2008, at 20:35, Raghu Angadi wrote:

    You should always have more than one location (preferably on different
    disks) for fsimage and editslog.
    On production we do frequent backups. Is there a mechanism from inside
    hadoop now to do something like that now? The "more than one location"
    bit sounds a little like that.
    You can specify multiple directories for "dfs.name.dir", in which case
    fsimage and editslog are written to multiple places. If one of these
    goes bad, you can use the other one.

    See http://wiki.apache.org/hadoop/FAQ#15

    Raghu.
    A few months back I had a proposal to keep checksums for each record
    on fsimage and editslog and NameNode would recover transparently from
    such corruptions when there are more than one copies available. It
    didn't come up in priority since there were no such failures observed.

    You should certainly report these cases and will help the feature gain
    more traction.
    Will file a bug report tomorrow.

    cheers
    --
    Torsten
  • Konstantin Shvachko at Jul 30, 2008 at 10:48 pm
    You should also run a secondary name-node, which does namespace checkpoints and shrinks the edits log file.
    And this is exactly the case when the checkpoint image comes handy.
    http://wiki.apache.org/hadoop/FAQ#7
    In the recent release you can start the primary node using the secondary image directly.
    In the old releases you need to move some files around.
    --Konstantin

    Raghu Angadi wrote:
    Torsten Curdt wrote:
    On Jul 30, 2008, at 20:35, Raghu Angadi wrote:

    You should always have more than one location (preferably on
    different disks) for fsimage and editslog.
    On production we do frequent backups. Is there a mechanism from inside
    hadoop now to do something like that now? The "more than one location"
    bit sounds a little like that.
    You can specify multiple directories for "dfs.name.dir", in which case
    fsimage and editslog are written to multiple places. If one of these
    goes bad, you can use the other one.

    See http://wiki.apache.org/hadoop/FAQ#15

    Raghu.
    A few months back I had a proposal to keep checksums for each record
    on fsimage and editslog and NameNode would recover transparently from
    such corruptions when there are more than one copies available. It
    didn't come up in priority since there were no such failures observed.

    You should certainly report these cases and will help the feature
    gain more traction.
    Will file a bug report tomorrow.

    cheers
    --
    Torsten

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJul 30, '08 at 6:09p
activeJul 30, '08 at 10:48p
posts5
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase