FAQ
Hi,

what are the options to keep a copy of data from an HDFS instance in
sync with a backup file system which is not HDFS? Are there Rsync-like
tools that allow only to transfer deltas or would one have to implement
that oneself (e.g. by writing a java program that accesses both
filesystems)?

Thanks in advance,

Robert

P.S.: Why would one want that? E.g. to have a completely redundant copy
which in case of systematic failure (e.g. data corruption due to a bug)
offers a backup not affected by that problem.

Search Discussions

  • Ted Dunning at May 16, 2008 at 5:10 pm
    Why not go to the next step and use a second cluster as the backup?

    On 5/16/08 6:33 AM, "Robert Krüger" wrote:


    Hi,

    what are the options to keep a copy of data from an HDFS instance in
    sync with a backup file system which is not HDFS? Are there Rsync-like
    tools that allow only to transfer deltas or would one have to implement
    that oneself (e.g. by writing a java program that accesses both
    filesystems)?

    Thanks in advance,

    Robert

    P.S.: Why would one want that? E.g. to have a completely redundant copy
    which in case of systematic failure (e.g. data corruption due to a bug)
    offers a backup not affected by that problem.
  • Jim R. Wilson at May 16, 2008 at 6:44 pm
    There was some chatter on the Hbase list about a dual hdfs/s3 driver
    class which would write to both but only read from hdfs. Of course,
    having this functionality at the hadoop level would be better than in
    a subsidiary project.

    Maybe the ability to specify a secondary filesystem in the
    hadoop-site.xml? Candidates might include S3, NFS, or of course,
    another HDFS in a geographically isolated location.

    -- Jim R. Wilson (jimbojw)
    On Fri, May 16, 2008 at 12:06 PM, Ted Dunning wrote:

    Why not go to the next step and use a second cluster as the backup?

    On 5/16/08 6:33 AM, "Robert Krüger" wrote:


    Hi,

    what are the options to keep a copy of data from an HDFS instance in
    sync with a backup file system which is not HDFS? Are there Rsync-like
    tools that allow only to transfer deltas or would one have to implement
    that oneself (e.g. by writing a java program that accesses both
    filesystems)?

    Thanks in advance,

    Robert

    P.S.: Why would one want that? E.g. to have a completely redundant copy
    which in case of systematic failure (e.g. data corruption due to a bug)
    offers a backup not affected by that problem.
  • Robert Krüger at May 16, 2008 at 10:56 pm
    The reasoning was that in the event of system-inherent failures (i.e.
    bugs in HDFS which corrupt the files) a system set up with a completely
    different technology would protect from that type of failure would
    prevent it from becoming catastrophic. Sounds (and probably in our case
    is) a bit paranoid but is common practice e.g. in the aerospace industry
    for really critical systems.


    Ted Dunning wrote:
    Why not go to the next step and use a second cluster as the backup?

    On 5/16/08 6:33 AM, "Robert Krüger" wrote:

    Hi,

    what are the options to keep a copy of data from an HDFS instance in
    sync with a backup file system which is not HDFS? Are there Rsync-like
    tools that allow only to transfer deltas or would one have to implement
    that oneself (e.g. by writing a java program that accesses both
    filesystems)?

    Thanks in advance,

    Robert

    P.S.: Why would one want that? E.g. to have a completely redundant copy
    which in case of systematic failure (e.g. data corruption due to a bug)
    offers a backup not affected by that problem.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMay 16, '08 at 1:34p
activeMay 16, '08 at 10:56p
posts4
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase