FAQ
I am very new to Hadoop. I am considering setting up a Hadoop cluster
consisting of 5 nodes where each node has 3 internal hard drives. I
understand HDFS has a configurable redundancy feature but what happens if
an entire drive crashes (physically) for whatever reason? How does Hadoop
recover, if it can, from this situation? What else should I know before
setting up my cluster this way? Thanks in advance.

Search Discussions

  • Mohammad Tariq at Aug 10, 2012 at 6:44 pm
    Hello Aji,

    Hadoop's redundancy feature allows data to be replicated over the
    entire cluster. So, even if entire disk is gone or even the entire
    machine for that matter, your data is still there in other node(s).
    But, we need to keep one thing in mind that the 'master' node is the
    single point of failure in a Hadoop cluster. If the machine running
    master process(es) is down, you are trapped. For more detail you can
    visit the home page at : redundancy feature

    Regards,
    Mohammad Tariq

    On Sat, Aug 11, 2012 at 12:08 AM, Aji Janis wrote:
    I am very new to Hadoop. I am considering setting up a Hadoop cluster
    consisting of 5 nodes where each node has 3 internal hard drives. I
    understand HDFS has a configurable redundancy feature but what happens if an
    entire drive crashes (physically) for whatever reason? How does Hadoop
    recover, if it can, from this situation? What else should I know before
    setting up my cluster this way? Thanks in advance.
  • Ted Dunning at Aug 10, 2012 at 6:56 pm
    Hadoop's file system was (mostly) copied from the concepts of Google's old
    file system.

    The original paper is probably the best way to learn about that.

    http://research.google.com/archive/gfs.html


    On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis wrote:

    I am very new to Hadoop. I am considering setting up a Hadoop cluster
    consisting of 5 nodes where each node has 3 internal hard drives. I
    understand HDFS has a configurable redundancy feature but what happens if
    an entire drive crashes (physically) for whatever reason? How does Hadoop
    recover, if it can, from this situation? What else should I know before
    setting up my cluster this way? Thanks in advance.

  • Anil gupta at Aug 10, 2012 at 7:12 pm
    Hi Aji,

    Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha
    then Namenode is not a single point of failure.However, Hadoop 2.0.0 is not
    of production quality yet(its in Alpha).
    Namenode use to be a Single Point of Failure in releases prior to Hadoop
    2.0.0.

    HTH,
    Anil Gupta
    On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning wrote:

    Hadoop's file system was (mostly) copied from the concepts of Google's old
    file system.

    The original paper is probably the best way to learn about that.

    http://research.google.com/archive/gfs.html


    On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis wrote:

    I am very new to Hadoop. I am considering setting up a Hadoop cluster
    consisting of 5 nodes where each node has 3 internal hard drives. I
    understand HDFS has a configurable redundancy feature but what happens if
    an entire drive crashes (physically) for whatever reason? How does Hadoop
    recover, if it can, from this situation? What else should I know before
    setting up my cluster this way? Thanks in advance.


    --
    Thanks & Regards,
    Anil Gupta
  • Mohammad Tariq at Aug 10, 2012 at 7:17 pm
    Very correctly said by Anil. Actually Hadoop HA is not yet production
    ready and you are about to begin your Hadoop journey, so just thought
    of not mentioning it. If you want to use HA, just pull it from the
    trunk and do a build.

    Regards,
    Mohammad Tariq

    On Sat, Aug 11, 2012 at 12:42 AM, anil gupta wrote:
    Hi Aji,

    Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha then
    Namenode is not a single point of failure.However, Hadoop 2.0.0 is not of
    production quality yet(its in Alpha).
    Namenode use to be a Single Point of Failure in releases prior to Hadoop
    2.0.0.

    HTH,
    Anil Gupta

    On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning wrote:

    Hadoop's file system was (mostly) copied from the concepts of Google's old
    file system.

    The original paper is probably the best way to learn about that.

    http://research.google.com/archive/gfs.html


    On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis wrote:

    I am very new to Hadoop. I am considering setting up a Hadoop cluster
    consisting of 5 nodes where each node has 3 internal hard drives. I
    understand HDFS has a configurable redundancy feature but what happens if an
    entire drive crashes (physically) for whatever reason? How does Hadoop
    recover, if it can, from this situation? What else should I know before
    setting up my cluster this way? Thanks in advance.


    --
    Thanks & Regards,
    Anil Gupta
  • Harsh J at Aug 12, 2012 at 12:44 pm
    Mohammad,

    The HA feature is very much functional, if you've tried it. I would
    not say it is "not production ready", for I see it in use at several
    places already, including my humbly small MBP (two local NNs, just for
    the fun of it) :)
    On Sat, Aug 11, 2012 at 12:46 AM, Mohammad Tariq wrote:
    Very correctly said by Anil. Actually Hadoop HA is not yet production
    ready and you are about to begin your Hadoop journey, so just thought
    of not mentioning it. If you want to use HA, just pull it from the
    trunk and do a build.

    Regards,
    Mohammad Tariq

    On Sat, Aug 11, 2012 at 12:42 AM, anil gupta wrote:
    Hi Aji,

    Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha then
    Namenode is not a single point of failure.However, Hadoop 2.0.0 is not of
    production quality yet(its in Alpha).
    Namenode use to be a Single Point of Failure in releases prior to Hadoop
    2.0.0.

    HTH,
    Anil Gupta

    On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning wrote:

    Hadoop's file system was (mostly) copied from the concepts of Google's old
    file system.

    The original paper is probably the best way to learn about that.

    http://research.google.com/archive/gfs.html


    On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis wrote:

    I am very new to Hadoop. I am considering setting up a Hadoop cluster
    consisting of 5 nodes where each node has 3 internal hard drives. I
    understand HDFS has a configurable redundancy feature but what happens if an
    entire drive crashes (physically) for whatever reason? How does Hadoop
    recover, if it can, from this situation? What else should I know before
    setting up my cluster this way? Thanks in advance.


    --
    Thanks & Regards,
    Anil Gupta


    --
    Harsh J
  • Arun C Murthy at Aug 12, 2012 at 7:07 pm
    Yep, hadoop-2 is alpha but is progressing nicely...

    However, if you have access to some 'enterprise HA' utilities (VMWare or Linux HA) you can get *very decent* production-grade high-availability in hadoop-1.x too (both NameNode for HDFS and JobTracker for MapReduce).

    Arun
    On Aug 10, 2012, at 12:12 PM, anil gupta wrote:

    Hi Aji,

    Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha then Namenode is not a single point of failure.However, Hadoop 2.0.0 is not of production quality yet(its in Alpha).
    Namenode use to be a Single Point of Failure in releases prior to Hadoop 2.0.0.

    HTH,
    Anil Gupta

    On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning wrote:
    Hadoop's file system was (mostly) copied from the concepts of Google's old file system.

    The original paper is probably the best way to learn about that.

    http://research.google.com/archive/gfs.html



    On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis wrote:
    I am very new to Hadoop. I am considering setting up a Hadoop cluster consisting of 5 nodes where each node has 3 internal hard drives. I understand HDFS has a configurable redundancy feature but what happens if an entire drive crashes (physically) for whatever reason? How does Hadoop recover, if it can, from this situation? What else should I know before setting up my cluster this way? Thanks in advance.






    --
    Thanks & Regards,
    Anil Gupta
    --
    Arun C. Murthy
    Hortonworks Inc.
    http://hortonworks.com/
  • Aji Janis at Aug 13, 2012 at 1:58 pm
    Thank you everyone for all the feedback and suggestions. Its good to know
    these details as I move forward.

    Piling on to the question, I am curious if any of you have experience with
    Accumulo (a requirement for me hence not optional). I was wondering if the
    data loss (physical crash of the hard drive) in this case would be resolved
    by Hadoop (HDFS I should say). Any suggestions and/or where I could find
    some specs on this would be really appreciated!


    Thank you again for all the pointers.
    -Aji






    On Sun, Aug 12, 2012 at 3:07 PM, Arun C Murthy wrote:

    Yep, hadoop-2 is alpha but is progressing nicely...

    However, if you have access to some 'enterprise HA' utilities (VMWare or
    Linux HA) you can get *very decent* production-grade high-availability in
    hadoop-1.x too (both NameNode for HDFS and JobTracker for MapReduce).

    Arun

    On Aug 10, 2012, at 12:12 PM, anil gupta wrote:

    Hi Aji,

    Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha
    then Namenode is not a single point of failure.However, Hadoop 2.0.0 is not
    of production quality yet(its in Alpha).
    Namenode use to be a Single Point of Failure in releases prior to Hadoop
    2.0.0.

    HTH,
    Anil Gupta
    On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning wrote:

    Hadoop's file system was (mostly) copied from the concepts of Google's
    old file system.

    The original paper is probably the best way to learn about that.

    http://research.google.com/archive/gfs.html


    On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis wrote:

    I am very new to Hadoop. I am considering setting up a Hadoop cluster
    consisting of 5 nodes where each node has 3 internal hard drives. I
    understand HDFS has a configurable redundancy feature but what happens if
    an entire drive crashes (physically) for whatever reason? How does Hadoop
    recover, if it can, from this situation? What else should I know before
    setting up my cluster this way? Thanks in advance.


    --
    Thanks & Regards,
    Anil Gupta


    --
    Arun C. Murthy
    Hortonworks Inc.
    http://hortonworks.com/

  • Harsh J at Aug 13, 2012 at 2:56 pm
    Aji,

    The best place would be to ask on Apache Accumulo's own user lists,
    subscrib-able at http://accumulo.apache.org/mailing_list.html

    That said, if Accumulo bases itself on HDFS, then its data safety
    should be the same or nearly the same as what HDFS itself can offer.

    Note that with 2.1.0 (upcoming) and above releases of HDFS, we offer a
    working hsync() API that allows you to write files with guarantee that
    it has been written to the disk (like the fsync() *nix call). You can
    read some more about this at an earlier thread:
    http://search-hadoop.com/m/ATVOETSy4X1

    HTH, and do let us know what you find on the Accumulo side.
    On Mon, Aug 13, 2012 at 7:27 PM, Aji Janis wrote:
    Thank you everyone for all the feedback and suggestions. Its good to know
    these details as I move forward.

    Piling on to the question, I am curious if any of you have experience with
    Accumulo (a requirement for me hence not optional). I was wondering if the
    data loss (physical crash of the hard drive) in this case would be resolved
    by Hadoop (HDFS I should say). Any suggestions and/or where I could find
    some specs on this would be really appreciated!


    Thank you again for all the pointers.
    -Aji







    On Sun, Aug 12, 2012 at 3:07 PM, Arun C Murthy wrote:

    Yep, hadoop-2 is alpha but is progressing nicely...

    However, if you have access to some 'enterprise HA' utilities (VMWare or
    Linux HA) you can get *very decent* production-grade high-availability in
    hadoop-1.x too (both NameNode for HDFS and JobTracker for MapReduce).

    Arun

    On Aug 10, 2012, at 12:12 PM, anil gupta wrote:

    Hi Aji,

    Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha
    then Namenode is not a single point of failure.However, Hadoop 2.0.0 is not
    of production quality yet(its in Alpha).
    Namenode use to be a Single Point of Failure in releases prior to Hadoop
    2.0.0.

    HTH,
    Anil Gupta

    On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning <tdunning@maprtech.com>
    wrote:
    Hadoop's file system was (mostly) copied from the concepts of Google's
    old file system.

    The original paper is probably the best way to learn about that.

    http://research.google.com/archive/gfs.html


    On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis wrote:

    I am very new to Hadoop. I am considering setting up a Hadoop cluster
    consisting of 5 nodes where each node has 3 internal hard drives. I
    understand HDFS has a configurable redundancy feature but what happens if an
    entire drive crashes (physically) for whatever reason? How does Hadoop
    recover, if it can, from this situation? What else should I know before
    setting up my cluster this way? Thanks in advance.


    --
    Thanks & Regards,
    Anil Gupta


    --
    Arun C. Murthy
    Hortonworks Inc.
    http://hortonworks.com/


    --
    Harsh J
  • Steve Loughran at Aug 13, 2012 at 3:08 pm
    On 13 August 2012 07:55, Harsh J wrote:

    >

    Note that with 2.1.0 (upcoming) and above releases of HDFS, we offer a
    working hsync() API that allows you to write files with guarantee that
    it has been written to the disk (like the fsync() *nix call).

    A guarantee that the OS thinks it's been written to HDD.

    For anyone using Hadoop or any other program (e.g MySQL) in a virtualized
    environment , even when the OS thinks it has flushed a virtual disk -know
    that you may have set some VM params to say "when we said "flush to disk"
    we meant it":
    https://forums.virtualbox.org/viewtopic.php?t=13661
  • Harsh J at Aug 13, 2012 at 3:43 pm
    Hey Steve,

    Interesting, thanks for pointing that out! I didn't know that it
    disables this by default :)
    On Mon, Aug 13, 2012 at 8:38 PM, Steve Loughran wrote:

    On 13 August 2012 07:55, Harsh J wrote:



    Note that with 2.1.0 (upcoming) and above releases of HDFS, we offer a
    working hsync() API that allows you to write files with guarantee that
    it has been written to the disk (like the fsync() *nix call).

    A guarantee that the OS thinks it's been written to HDD.

    For anyone using Hadoop or any other program (e.g MySQL) in a virtualized
    environment , even when the OS thinks it has flushed a virtual disk -know
    that you may have set some VM params to say "when we said "flush to disk" we
    meant it":
    https://forums.virtualbox.org/viewtopic.php?t=13661



    --
    Harsh J
  • Steve Loughran at Aug 13, 2012 at 4:11 pm

    On 13 August 2012 08:42, Harsh J wrote:

    Hey Steve,

    Interesting, thanks for pointing that out! I didn't know that it
    disables this by default :)
    It's always something to watch out for: someone implementing a disk FS, OS,
    VM environment discovering that they get great benchmark numbers if they
    make flushing async, and thinking "most people don't need it anyway".
    That's mostly true -and some programs over-flush-, but if you do want to be
    sure your data is saved to disk, these people are being dangerous rather
    than helpful

    I don't think it's an issue if you are saving to network mounted storage
    -which can include the storage of the host OS. If you do some experiments,
    NFS chat to the host system is usually as fast as working with a virtual
    HDD in the same host OS -which has extra layers of indirection. Virtual
    HDDs can get fragmented even when the VM thinks it's just allocated big
    linear blocks -you need to defrag the virtual HDD then the physical disk
    image to correct that.
  • Jeffrey Buell at Aug 14, 2012 at 12:16 am
    This is never an issue on vSphere. The ESXi hypervisor does not send the completion interrupt back to the guest until the IO is finished, so if the guest OS thinks an IO is flushed to disk, it really is flushed to disk. hsync() will work in a ESXi VM exactly like in a native OS.

    The physical storage layer might lie about completion (e.g., most SANs with redundant battery-backed caches), but this applies equally to native and virtualized OSes.

    It is always tempting to implement some kind of write caching in the virtualization layer to try to improve storage performance, but of course this comes at the cost of safety and predictability.

    Jeff

    From: Steve Loughran
    Sent: Monday, August 13, 2012 8:08 AM
    To: user@hadoop.apache.org
    Subject: Re: Hadoop hardware failure recovery


    On 13 August 2012 07:55, Harsh J wrote:

    Note that with 2.1.0 (upcoming) and above releases of HDFS, we offer a
    working hsync() API that allows you to write files with guarantee that
    it has been written to the disk (like the fsync() *nix call).

    A guarantee that the OS thinks it's been written to HDD.

    For anyone using Hadoop or any other program (e.g MySQL) in a virtualized environment , even when the OS thinks it has flushed a virtual disk -know that you may have set some VM params to say "when we said "flush to disk" we meant it":
    https://forums.virtualbox.org/viewtopic.php?t=13661
  • Harsh J at Aug 12, 2012 at 12:46 pm
    Aji,

    Your question seems to be about data (blocks) loss. Hadoop would
    recover by detecting the blocks you've lost from that disk failure
    (auto) and re-replicate their copies from the other available
    redundant copies in the cluster (Controlled via the configurable
    replication factor).
    On Sat, Aug 11, 2012 at 12:08 AM, Aji Janis wrote:
    I am very new to Hadoop. I am considering setting up a Hadoop cluster
    consisting of 5 nodes where each node has 3 internal hard drives. I
    understand HDFS has a configurable redundancy feature but what happens if an
    entire drive crashes (physically) for whatever reason? How does Hadoop
    recover, if it can, from this situation? What else should I know before
    setting up my cluster this way? Thanks in advance.


    --
    Harsh J
  • Mohammad Tariq at Aug 12, 2012 at 5:48 pm
    Hello Harsh,

    Thanks a lot for keeping me updated.

    @Aji : Apologies for misguiding you unintentionally.

    Regards,
    Mohammad Tariq


    On Sun, Aug 12, 2012 at 6:15 PM, Harsh J wrote:

    Aji,

    Your question seems to be about data (blocks) loss. Hadoop would
    recover by detecting the blocks you've lost from that disk failure
    (auto) and re-replicate their copies from the other available
    redundant copies in the cluster (Controlled via the configurable
    replication factor).
    On Sat, Aug 11, 2012 at 12:08 AM, Aji Janis wrote:
    I am very new to Hadoop. I am considering setting up a Hadoop cluster
    consisting of 5 nodes where each node has 3 internal hard drives. I
    understand HDFS has a configurable redundancy feature but what happens if an
    entire drive crashes (physically) for whatever reason? How does Hadoop
    recover, if it can, from this situation? What else should I know before
    setting up my cluster this way? Thanks in advance.


    --
    Harsh J

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshadoop
postedAug 10, '12 at 6:38p
activeAug 14, '12 at 12:16a
posts15
users8
websitehadoop.apache.org
irc#hadoop

People

Translate

site design / logo © 2021 Grokbase