FAQ
Hi.

I know that there were some hard to find bugs with replication set to 2,
which caused data loss to HDFS users.

Was there any progress with these issues, and if there any fixes which were
introduced?

Regards.

Search Discussions

  • Brian Bockelman at Apr 10, 2009 at 4:31 pm
    Most of the issues were resolved in 0.19.1 -- I think 0.20.0 is going
    to be even better.

    We run about 300TB @ 2 replicas, and haven't had file loss that was
    Hadoop's fault since about January.

    Brian
    On Apr 10, 2009, at 11:11 AM, Stas Oskin wrote:

    Hi.

    I know that there were some hard to find bugs with replication set
    to 2,
    which caused data loss to HDFS users.

    Was there any progress with these issues, and if there any fixes
    which were
    introduced?

    Regards.
  • Stas Oskin at Apr 10, 2009 at 6:53 pm
    2009/4/10 Brian Bockelman <bbockelm@cse.unl.edu>
    Most of the issues were resolved in 0.19.1 -- I think 0.20.0 is going to be
    even better.

    We run about 300TB @ 2 replicas, and haven't had file loss that was
    Hadoop's fault since about January.

    Brian
    And you running 0.19.1?

    Regards.
  • Stas Oskin at Apr 10, 2009 at 6:55 pm
    Actually, now I remember that you posted some time ago about your University
    loosing about 300 files.
    So since then the situation has improved I presume?

    2009/4/10 Stas Oskin <stas.oskin@gmail.com>
    2009/4/10 Brian Bockelman <bbockelm@cse.unl.edu>
    Most of the issues were resolved in 0.19.1 -- I think 0.20.0 is going to
    be even better.

    We run about 300TB @ 2 replicas, and haven't had file loss that was
    Hadoop's fault since about January.

    Brian
    And you running 0.19.1?

    Regards.
  • Brian Bockelman at Apr 10, 2009 at 7:02 pm

    On Apr 10, 2009, at 1:54 PM, Stas Oskin wrote:

    Actually, now I remember that you posted some time ago about your
    University
    loosing about 300 files.
    So since then the situation has improved I presume?
    Yup! The only files we lose now are due to multiple simultaneous
    hardware loss. Since January: 11 files to accidentally reformatting 2
    nodes at once, 35 to a night with 2 dead nodes. Make no mistake -
    HDFS with 2 replicas is *not* an archive-quality file system. HDFS
    does not replace tape storage for long term storage.

    Brian

    2009/4/10 Stas Oskin <stas.oskin@gmail.com>
    2009/4/10 Brian Bockelman <bbockelm@cse.unl.edu>
    Most of the issues were resolved in 0.19.1 -- I think 0.20.0 is
    going to
    be even better.

    We run about 300TB @ 2 replicas, and haven't had file loss that was
    Hadoop's fault since about January.

    Brian
    And you running 0.19.1?

    Regards.
  • Brian Bockelman at Apr 10, 2009 at 7:03 pm

    On Apr 10, 2009, at 1:53 PM, Stas Oskin wrote:

    2009/4/10 Brian Bockelman <bbockelm@cse.unl.edu>
    Most of the issues were resolved in 0.19.1 -- I think 0.20.0 is
    going to be
    even better.

    We run about 300TB @ 2 replicas, and haven't had file loss that was
    Hadoop's fault since about January.

    Brian
    And you running 0.19.1?
    0.19.1 with a few convenience patches (mostly, they improve logging so
    the local file system researchers can play around with our data
    patterns).

    Brian
  • Todd Lipcon at Apr 10, 2009 at 7:07 pm

    On Fri, Apr 10, 2009 at 12:03 PM, Brian Bockelman wrote:

    0.19.1 with a few convenience patches (mostly, they improve logging so the
    local file system researchers can play around with our data patterns).
    Hey Brian,

    I'm curious about this. Could you elaborate a bit on what kind of stuff
    you're logging? I'm interested in what FS metrics you're looking at and how
    you instrumented the code.

    -Todd
  • Brian Bockelman at Apr 10, 2009 at 7:24 pm

    On Apr 10, 2009, at 2:06 PM, Todd Lipcon wrote:

    On Fri, Apr 10, 2009 at 12:03 PM, Brian Bockelman <bbockelm@cse.unl.edu
    wrote:

    0.19.1 with a few convenience patches (mostly, they improve logging
    so the
    local file system researchers can play around with our data
    patterns).
    Hey Brian,

    I'm curious about this. Could you elaborate a bit on what kind of
    stuff
    you're logging? I'm interested in what FS metrics you're looking at
    and how
    you instrumented the code.

    -Todd
    No clue what they're doing *with* the data, but I know what we've
    applied to HDFS to get the data. We apply both of these patches:
    http://issues.apache.org/jira/browse/HADOOP-5222
    https://issues.apache.org/jira/browse/HADOOP-5625

    This adds the duration and offset to each read. Each read is then
    logged through the HDFS audit mechanisms. We've been pulling the logs
    through the web interface and putting them back into HDFS, then
    processing them (actually, today we've been playing with log
    collection via Chukwa).

    There is a student who is looking at our cluster's I/O access
    patterns, and there's a few folks who do work in designing metadata
    caching algorithms that love to see application traces. Personally,
    I'm interested in hooking the logfiles up to our I/O accounting system
    so I can keep historical records of transfers and compare it to our
    other file systems.

    Brian

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedApr 10, '09 at 4:12p
activeApr 10, '09 at 7:24p
posts8
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2021 Grokbase