FAQ
Why is HBase consisdered high in consistency and that it gives up
parition tolerance? My understanding is that failure of one data node
still doesn't impact client as they would re-adjust the list of
available data nodes.

Search Discussions

  • Jean-Daniel Cryans at Dec 2, 2011 at 7:59 pm
    No, data is only served by one region server (even if it resides on
    multiple data nodes). If it dies, clients need to wait for the log
    replay and region reassignment.

    J-D
    On Fri, Dec 2, 2011 at 11:57 AM, Mohit Anchlia wrote:
    Why is HBase consisdered high in consistency and that it gives up
    parition tolerance? My understanding is that failure of one data node
    still doesn't impact client as they would re-adjust the list of
    available data nodes.
  • Mohit Anchlia at Dec 2, 2011 at 8:02 pm
    Where can I read more on this specific subject?

    Based on your answer I have more questions, but I want to read more
    specific information about how it works and why it's designed that
    way.
    On Fri, Dec 2, 2011 at 11:59 AM, Jean-Daniel Cryans wrote:
    No, data is only served by one region server (even if it resides on
    multiple data nodes). If it dies, clients need to wait for the log
    replay and region reassignment.

    J-D
    On Fri, Dec 2, 2011 at 11:57 AM, Mohit Anchlia wrote:
    Why is HBase consisdered high in consistency and that it gives up
    parition tolerance? My understanding is that failure of one data node
    still doesn't impact client as they would re-adjust the list of
    available data nodes.
  • Jean-Daniel Cryans at Dec 2, 2011 at 8:03 pm
    Get the HBase book:
    http://www.amazon.com/HBase-Definitive-Guide-Lars-George/dp/1449396100

    And/Or read the Bigtable paper.

    J-D
    On Fri, Dec 2, 2011 at 12:01 PM, Mohit Anchlia wrote:
    Where can I read more on this specific subject?

    Based on your answer I have more questions, but I want to read more
    specific information about how it works and why it's designed that
    way.
    On Fri, Dec 2, 2011 at 11:59 AM, Jean-Daniel Cryans wrote:
    No, data is only served by one region server (even if it resides on
    multiple data nodes). If it dies, clients need to wait for the log
    replay and region reassignment.

    J-D
    On Fri, Dec 2, 2011 at 11:57 AM, Mohit Anchlia wrote:
    Why is HBase consisdered high in consistency and that it gives up
    parition tolerance? My understanding is that failure of one data node
    still doesn't impact client as they would re-adjust the list of
    available data nodes.
  • Ian Varley at Dec 2, 2011 at 8:16 pm
    Mohit,

    Yeah, those are great places to go and learn.

    To fill in a bit more on this topic: "partition-tolerance" usually refers to the idea that you could have a complete disconnection between N sets of machines in your data center, but still be taking writes and serving reads from all the servers. Some "NoSQL" databases can do this (to a degree), but HBase cannot; the master and ZK quorum must be accessible from any machine that's up and running the cluster.

    Individual machines can go down, as J-D said, and the master will reassign those regions to another region server. So, imagine you had a network switch fail that disconnected 10 machines in a 20-machine cluster; you wouldn't have 2 baby 10-machine clusters, like you might with some other software; you'd just have 10 machines "down" (and probably a significant interruption while the master replays logs on the remaining 10). That would also require that the underlying HDFS cluster (assuming it's on the same machines) was keeping replicas of the blocks on different racks (which it does by default), otherwise there's no hope.

    HBase makes this trade-off intentionally, because in real-world scenarios, there aren't too many cases where a true network partition would be survived by the rest of your stack, either (e.g. imagine a case where application servers can't access a relational database server because of a partition; you're just down). The focus of HBase fault tolerance is recovering from isolated machine failures, not the collapse of your infrastructure.

    Ian


    On Dec 2, 2011, at 2:03 PM, Jean-Daniel Cryans wrote:

    Get the HBase book:
    http://www.amazon.com/HBase-Definitive-Guide-Lars-George/dp/1449396100

    And/Or read the Bigtable paper.

    J-D

    On Fri, Dec 2, 2011 at 12:01 PM, Mohit Anchlia wrote:
    Where can I read more on this specific subject?

    Based on your answer I have more questions, but I want to read more
    specific information about how it works and why it's designed that
    way.

    On Fri, Dec 2, 2011 at 11:59 AM, Jean-Daniel Cryans wrote:
    No, data is only served by one region server (even if it resides on
    multiple data nodes). If it dies, clients need to wait for the log
    replay and region reassignment.

    J-D

    On Fri, Dec 2, 2011 at 11:57 AM, Mohit Anchlia wrote:
    Why is HBase consisdered high in consistency and that it gives up
    parition tolerance? My understanding is that failure of one data node
    still doesn't impact client as they would re-adjust the list of
    available data nodes.
  • Mohit Anchlia at Dec 2, 2011 at 8:55 pm
    Thanks for the overview. It's helpful. Can you also help me understand
    why 2 region servers for the same row keys can't be running on the
    nodes where blocks are being replicated? I am assuming all the
    logs/HFiles etc are already being replicated so if one region server
    fails other region server is still taking reads/writes.
    On Fri, Dec 2, 2011 at 12:15 PM, Ian Varley wrote:
    Mohit,

    Yeah, those are great places to go and learn.

    To fill in a bit more on this topic: "partition-tolerance" usually refers to the idea that you could have a complete disconnection between N sets of machines in your data center, but still be taking writes and serving reads from all the servers. Some "NoSQL" databases can do this (to a degree), but HBase cannot; the master and ZK quorum must be accessible from any machine that's up and running the cluster.

    Individual machines can go down, as J-D said, and the master will reassign those regions to another region server. So, imagine you had a network switch fail that disconnected 10 machines in a 20-machine cluster; you wouldn't have 2 baby 10-machine clusters, like you might with some other software; you'd just have 10 machines "down" (and probably a significant interruption while the master replays logs on the remaining 10). That would also require that the underlying HDFS cluster (assuming it's on the same machines) was keeping replicas of the blocks on different racks (which it does by default), otherwise there's no hope.

    HBase makes this trade-off intentionally, because in real-world scenarios, there aren't too many cases where a true network partition would be survived by the rest of your stack, either (e.g. imagine a case where application servers can't access a relational database server because of a partition; you're just down). The focus of HBase fault tolerance is recovering from isolated machine failures, not the collapse of your infrastructure.

    Ian


    On Dec 2, 2011, at 2:03 PM, Jean-Daniel Cryans wrote:

    Get the HBase book:
    http://www.amazon.com/HBase-Definitive-Guide-Lars-George/dp/1449396100

    And/Or read the Bigtable paper.

    J-D

    On Fri, Dec 2, 2011 at 12:01 PM, Mohit Anchlia wrote:
    Where can I read more on this specific subject?

    Based on your answer I have more questions, but I want to read more
    specific information about how it works and why it's designed that
    way.

    On Fri, Dec 2, 2011 at 11:59 AM, Jean-Daniel Cryans wrote:
    No, data is only served by one region server (even if it resides on
    multiple data nodes). If it dies, clients need to wait for the log
    replay and region reassignment.

    J-D

    On Fri, Dec 2, 2011 at 11:57 AM, Mohit Anchlia wrote:
    Why is HBase consisdered high in consistency and that it gives up
    parition tolerance? My understanding is that failure of one data node
    still doesn't impact client as they would re-adjust the list of
    available data nodes.
  • Ian Varley at Dec 2, 2011 at 9:42 pm
    The simple answer is that HBase isn't architected such that 2 region servers can simultaneously host the same region. In addition to being much simpler from an architecture point of view, that also allows for user-facing features that would be difficult or impossible to achieve otherwise: single-row put atomicity, atomic check-and-set operations, atomic increment operations, etc.--things that are only possible if you know for sure that exactly one machine is in control of the row.

    Ian

    On Dec 2, 2011, at 2:54 PM, Mohit Anchlia wrote:

    Thanks for the overview. It's helpful. Can you also help me understand
    why 2 region servers for the same row keys can't be running on the
    nodes where blocks are being replicated? I am assuming all the
    logs/HFiles etc are already being replicated so if one region server
    fails other region server is still taking reads/writes.

    On Fri, Dec 2, 2011 at 12:15 PM, Ian Varley wrote:
    Mohit,

    Yeah, those are great places to go and learn.

    To fill in a bit more on this topic: "partition-tolerance" usually refers to the idea that you could have a complete disconnection between N sets of machines in your data center, but still be taking writes and serving reads from all the servers. Some "NoSQL" databases can do this (to a degree), but HBase cannot; the master and ZK quorum must be accessible from any machine that's up and running the cluster.

    Individual machines can go down, as J-D said, and the master will reassign those regions to another region server. So, imagine you had a network switch fail that disconnected 10 machines in a 20-machine cluster; you wouldn't have 2 baby 10-machine clusters, like you might with some other software; you'd just have 10 machines "down" (and probably a significant interruption while the master replays logs on the remaining 10). That would also require that the underlying HDFS cluster (assuming it's on the same machines) was keeping replicas of the blocks on different racks (which it does by default), otherwise there's no hope.

    HBase makes this trade-off intentionally, because in real-world scenarios, there aren't too many cases where a true network partition would be survived by the rest of your stack, either (e.g. imagine a case where application servers can't access a relational database server because of a partition; you're just down). The focus of HBase fault tolerance is recovering from isolated machine failures, not the collapse of your infrastructure.

    Ian


    On Dec 2, 2011, at 2:03 PM, Jean-Daniel Cryans wrote:

    Get the HBase book:
    http://www.amazon.com/HBase-Definitive-Guide-Lars-George/dp/1449396100

    And/Or read the Bigtable paper.

    J-D

    On Fri, Dec 2, 2011 at 12:01 PM, Mohit Anchlia wrote:
    Where can I read more on this specific subject?

    Based on your answer I have more questions, but I want to read more
    specific information about how it works and why it's designed that
    way.

    On Fri, Dec 2, 2011 at 11:59 AM, Jean-Daniel Cryans wrote:
    No, data is only served by one region server (even if it resides on
    multiple data nodes). If it dies, clients need to wait for the log
    replay and region reassignment.

    J-D

    On Fri, Dec 2, 2011 at 11:57 AM, Mohit Anchlia wrote:
    Why is HBase consisdered high in consistency and that it gives up
    parition tolerance? My understanding is that failure of one data node
    still doesn't impact client as they would re-adjust the list of
    available data nodes.
  • Mohit Anchlia at Dec 2, 2011 at 9:48 pm
    Thanks. I am having just bit of conflict in understanding how is
    random node failure different than network partition? In both cases
    there is an impact clearly visible to the user (time it takes to
    failover and replay logs)?
    On Fri, Dec 2, 2011 at 1:42 PM, Ian Varley wrote:
    The simple answer is that HBase isn't architected such that 2 region servers can simultaneously host the same region. In addition to being much simpler from an architecture point of view, that also allows for user-facing features that would be difficult or impossible to achieve otherwise: single-row put atomicity, atomic check-and-set operations, atomic increment operations, etc.--things that are only possible if you know for sure that exactly one machine is in control of the row.

    Ian

    On Dec 2, 2011, at 2:54 PM, Mohit Anchlia wrote:

    Thanks for the overview. It's helpful. Can you also help me understand
    why 2 region servers for the same row keys can't be running on the
    nodes where blocks are being replicated? I am assuming all the
    logs/HFiles etc are already being replicated so if one region server
    fails other region server is still taking reads/writes.

    On Fri, Dec 2, 2011 at 12:15 PM, Ian Varley wrote:
    Mohit,

    Yeah, those are great places to go and learn.

    To fill in a bit more on this topic: "partition-tolerance" usually refers to the idea that you could have a complete disconnection between N sets of machines in your data center, but still be taking writes and serving reads from all the servers. Some "NoSQL" databases can do this (to a degree), but HBase cannot; the master and ZK quorum must be accessible from any machine that's up and running the cluster.

    Individual machines can go down, as J-D said, and the master will reassign those regions to another region server. So, imagine you had a network switch fail that disconnected 10 machines in a 20-machine cluster; you wouldn't have 2 baby 10-machine clusters, like you might with some other software; you'd just have 10 machines "down" (and probably a significant interruption while the master replays logs on the remaining 10). That would also require that the underlying HDFS cluster (assuming it's on the same machines) was keeping replicas of the blocks on different racks (which it does by default), otherwise there's no hope.

    HBase makes this trade-off intentionally, because in real-world scenarios, there aren't too many cases where a true network partition would be survived by the rest of your stack, either (e.g. imagine a case where application servers can't access a relational database server because of a partition; you're just down). The focus of HBase fault tolerance is recovering from isolated machine failures, not the collapse of your infrastructure.

    Ian


    On Dec 2, 2011, at 2:03 PM, Jean-Daniel Cryans wrote:

    Get the HBase book:
    http://www.amazon.com/HBase-Definitive-Guide-Lars-George/dp/1449396100

    And/Or read the Bigtable paper.

    J-D

    On Fri, Dec 2, 2011 at 12:01 PM, Mohit Anchlia wrote:
    Where can I read more on this specific subject?

    Based on your answer I have more questions, but I want to read more
    specific information about how it works and why it's designed that
    way.

    On Fri, Dec 2, 2011 at 11:59 AM, Jean-Daniel Cryans wrote:
    No, data is only served by one region server (even if it resides on
    multiple data nodes). If it dies, clients need to wait for the log
    replay and region reassignment.

    J-D

    On Fri, Dec 2, 2011 at 11:57 AM, Mohit Anchlia wrote:
    Why is HBase consisdered high in consistency and that it gives up
    parition tolerance? My understanding is that failure of one data node
    still doesn't impact client as they would re-adjust the list of
    available data nodes.
  • Andrew Purtell at Dec 3, 2011 at 8:32 am

    From: Mohit Anchlia <mohitanchlia@gmail.com>
    I am having just bit of conflict in understanding how is
    random node failure different than network partition? In both cases
    there is an impact clearly visible to the user (time it takes to
    failover and replay logs)?

    I think you are conflating things a bit.
    Partition tolerance in CAP, shorthanded, is the ability of a system to survive message loss (due to server failure, network problem, etc.). BigTable/HBase does this of course. A server failure or message loss does not toast the database.

    Availability is the dimension of "CAP" that you are pondering here. ("time it takes to failover and replay logs") Recovery from message loss / server failure includes the time it takes to fail over and replay logs.

    BigTable does trade some availability to achieve a stronger level of consistency than would be possible otherwise. The Google paper includes some discussion of this design rationale.


    Best regards,

        - Andy

    ________________________________
    From: Mohit Anchlia <mohitanchlia@gmail.com>
    To: user@hbase.apache.org
    Sent: Saturday, December 3, 2011 5:48 AM
    Subject: Re: HBase and Consistency in CAP

    Thanks. I am having just bit of conflict in understanding how is
    random node failure different than network partition? In both cases
    there is an impact clearly visible to the user (time it takes to
    failover and replay logs)?
    On Fri, Dec 2, 2011 at 1:42 PM, Ian Varley wrote:
    The simple answer is that HBase isn't architected such that 2 region servers can simultaneously host the same region. In addition to being much simpler from an architecture point of view, that also allows for user-facing features that would be difficult or impossible to achieve otherwise: single-row put atomicity, atomic check-and-set operations, atomic increment operations, etc.--things that are only possible if you know for sure that exactly one machine is in control of the row.

    Ian

    On Dec 2, 2011, at 2:54 PM, Mohit Anchlia wrote:

    Thanks for the overview. It's helpful. Can you also help me understand
    why 2 region servers for the same row keys can't be running on the
    nodes where blocks are being replicated? I am assuming all the
    logs/HFiles etc are already being replicated so if one region server
    fails other region server is still taking reads/writes.

    On Fri, Dec 2, 2011 at 12:15 PM, Ian Varley wrote:
    Mohit,

    Yeah, those are great places to go and learn.

    To fill in a bit more on this topic: "partition-tolerance" usually refers to the idea that you could have a complete disconnection between N sets of machines in your data center, but still be taking writes and serving reads from all the servers. Some "NoSQL" databases can do this (to a degree), but HBase cannot; the master and ZK quorum must be accessible from any machine that's up and running the cluster.

    Individual machines can go down, as J-D said, and the master will reassign those regions to another region server. So, imagine you had a network switch fail that disconnected 10 machines in a 20-machine cluster; you wouldn't have 2 baby 10-machine clusters, like you might with some other software; you'd just have 10 machines "down" (and probably a significant interruption while the master replays logs on the remaining 10). That would also require that the underlying HDFS cluster (assuming it's on the same machines) was keeping replicas of the blocks on different racks (which it does by default), otherwise there's no hope.

    HBase makes this trade-off intentionally, because in real-world scenarios, there aren't too many cases where a true network partition would be survived by the rest of your stack, either (e.g. imagine a case where application servers can't access a relational database server because of a partition; you're just down). The focus of HBase fault tolerance is recovering from isolated machine failures, not the collapse of your infrastructure.

    Ian


    On Dec 2, 2011, at 2:03 PM, Jean-Daniel Cryans wrote:

    Get the HBase book:
    http://www.amazon.com/HBase-Definitive-Guide-Lars-George/dp/1449396100

    And/Or read the Bigtable paper.

    J-D

    On Fri, Dec 2, 2011 at 12:01 PM, Mohit Anchlia wrote:
    Where can I read more on this specific subject?

    Based on your answer I have more questions, but I want to read more
    specific information about how it works and why it's designed that
    way.

    On Fri, Dec 2, 2011 at 11:59 AM, Jean-Daniel Cryans wrote:
    No, data is only served by one region server (even if it resides on
    multiple data nodes). If it dies, clients need to wait for the log
    replay and region reassignment.

    J-D

    On Fri, Dec 2, 2011 at 11:57 AM, Mohit Anchlia wrote:
    Why is HBase consisdered high in consistency and that it gives up
    parition tolerance? My understanding is that failure of one data node
    still doesn't impact client as they would re-adjust the list of
    available data nodes.

    Â
  • Todd Lipcon at Dec 3, 2011 at 8:49 am
    You may also be interested in this article by my colleague Henry:
    http://www.cloudera.com/blog/2010/04/cap-confusion-problems-with-partition-tolerance/

    The short summary of the article is that CAP isn't "C, A, or P, choose
    two," but rather "When P happens, choose A or C."

    Partitions, like death and taxes, are unavoidable -- think of machine
    death as just a partition of that machine out into the networking
    equivalent of the afterlife. So it's up to the system designer to
    decide if, when that happens, we give up availability or give up
    consistency.

    In HBase's case we choose consistency, so we have to give up some availability.

    -Todd
    On Sat, Dec 3, 2011 at 12:32 AM, Andrew Purtell wrote:
    From: Mohit Anchlia <mohitanchlia@gmail.com>
    I am having just bit of conflict in understanding how is
    random node failure different than network partition? In both cases
    there is an impact clearly visible to the user (time it takes to
    failover and replay logs)?

    I think you are conflating things a bit.
    Partition tolerance in CAP, shorthanded, is the ability of a system to survive message loss (due to server failure, network problem, etc.). BigTable/HBase does this of course. A server failure or message loss does not toast the database.

    Availability is the dimension of "CAP" that you are pondering here. ("time it takes to failover and replay logs") Recovery from message loss / server failure includes the time it takes to fail over and replay logs.

    BigTable does trade some availability to achieve a stronger level of consistency than would be possible otherwise. The Google paper includes some discussion of this design rationale.


    Best regards,

        - Andy

    ________________________________
    From: Mohit Anchlia <mohitanchlia@gmail.com>
    To: user@hbase.apache.org
    Sent: Saturday, December 3, 2011 5:48 AM
    Subject: Re: HBase and Consistency in CAP

    Thanks. I am having just bit of conflict in understanding how is
    random node failure different than network partition? In both cases
    there is an impact clearly visible to the user (time it takes to
    failover and replay logs)?
    On Fri, Dec 2, 2011 at 1:42 PM, Ian Varley wrote:
    The simple answer is that HBase isn't architected such that 2 region servers can simultaneously host the same region. In addition to being much simpler from an architecture point of view, that also allows for user-facing features that would be difficult or impossible to achieve otherwise: single-row put atomicity, atomic check-and-set operations, atomic increment operations, etc.--things that are only possible if you know for sure that exactly one machine is in control of the row.

    Ian

    On Dec 2, 2011, at 2:54 PM, Mohit Anchlia wrote:

    Thanks for the overview. It's helpful. Can you also help me understand
    why 2 region servers for the same row keys can't be running on the
    nodes where blocks are being replicated? I am assuming all the
    logs/HFiles etc are already being replicated so if one region server
    fails other region server is still taking reads/writes.

    On Fri, Dec 2, 2011 at 12:15 PM, Ian Varley wrote:
    Mohit,

    Yeah, those are great places to go and learn.

    To fill in a bit more on this topic: "partition-tolerance" usually refers to the idea that you could have a complete disconnection between N sets of machines in your data center, but still be taking writes and serving reads from all the servers. Some "NoSQL" databases can do this (to a degree), but HBase cannot; the master and ZK quorum must be accessible from any machine that's up and running the cluster.

    Individual machines can go down, as J-D said, and the master will reassign those regions to another region server. So, imagine you had a network switch fail that disconnected 10 machines in a 20-machine cluster; you wouldn't have 2 baby 10-machine clusters, like you might with some other software; you'd just have 10 machines "down" (and probably a significant interruption while the master replays logs on the remaining 10). That would also require that the underlying HDFS cluster (assuming it's on the same machines) was keeping replicas of the blocks on different racks (which it does by default), otherwise there's no hope.

    HBase makes this trade-off intentionally, because in real-world scenarios, there aren't too many cases where a true network partition would be survived by the rest of your stack, either (e.g. imagine a case where application servers can't access a relational database server because of a partition; you're just down). The focus of HBase fault tolerance is recovering from isolated machine failures, not the collapse of your infrastructure.

    Ian


    On Dec 2, 2011, at 2:03 PM, Jean-Daniel Cryans wrote:

    Get the HBase book:
    http://www.amazon.com/HBase-Definitive-Guide-Lars-George/dp/1449396100

    And/Or read the Bigtable paper.

    J-D

    On Fri, Dec 2, 2011 at 12:01 PM, Mohit Anchlia wrote:
    Where can I read more on this specific subject?

    Based on your answer I have more questions, but I want to read more
    specific information about how it works and why it's designed that
    way.

    On Fri, Dec 2, 2011 at 11:59 AM, Jean-Daniel Cryans wrote:
    No, data is only served by one region server (even if it resides on
    multiple data nodes). If it dies, clients need to wait for the log
    replay and region reassignment.

    J-D

    On Fri, Dec 2, 2011 at 11:57 AM, Mohit Anchlia wrote:
    Why is HBase consisdered high in consistency and that it gives up
    parition tolerance? My understanding is that failure of one data node
    still doesn't impact client as they would re-adjust the list of
    available data nodes.


    --
    Todd Lipcon
    Software Engineer, Cloudera

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedDec 2, '11 at 7:58p
activeDec 3, '11 at 8:49a
posts10
users5
websitehbase.apache.org

People

Translate

site design / logo © 2018 Grokbase