FAQ
Add internal status monitoring to RegionServer
----------------------------------------------

Key: HBASE-1964
URL: https://issues.apache.org/jira/browse/HBASE-1964
Project: Hadoop HBase
Issue Type: Improvement
Components: client
Affects Versions: 0.20.1
Reporter: elsif


When a hadoop/hbase cluster is under heavy load it will inevitably reach a tipping point where data is lost or corrupted. A
graceful method is needed to put the cluster into safe mode until more resources can be added or the load on the cluster has been
reduced.

St.Ack has suggested the following short-term task: "Meantime, it should be possible to have a cron run a script that checks
cluster resources from time-to-time -- e.g. how full hdfs is, how much each regionserver is carrying -- and when it determines the needle is in the red,
flip the cluster to be read-only."

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Andrew Purtell (JIRA) at Nov 10, 2009 at 2:29 pm
    [ https://issues.apache.org/jira/browse/HBASE-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12775432#action_12775432 ]

    Andrew Purtell commented on HBASE-1964:
    ---------------------------------------

    bq. When a hadoop/hbase cluster is under heavy load it will inevitably reach a tipping point where data is lost or corrupted

    We take exception to this statement. One can corrupt an Oracle database by overcommitting RAM such that the kernel panics in get_free_page (on Linux).

    bq. A graceful method is needed to put the cluster into safe mode until more resources can be added or the load on the cluster has been reduced.

    There is no substitute for competent monitoring and administration of production systems, especially ones which try to support terascale or petascale storage and computation over 10s or 100s of servers. However, certainly it is the case that HBase has opportunities to sense overloading and take self preserving actions where currently it does not.
    Add internal status monitoring to RegionServer
    ----------------------------------------------

    Key: HBASE-1964
    URL: https://issues.apache.org/jira/browse/HBASE-1964
    Project: Hadoop HBase
    Issue Type: Improvement
    Components: client
    Affects Versions: 0.20.1
    Reporter: elsif

    When a hadoop/hbase cluster is under heavy load it will inevitably reach a tipping point where data is lost or corrupted. A
    graceful method is needed to put the cluster into safe mode until more resources can be added or the load on the cluster has been
    reduced.
    St.Ack has suggested the following short-term task: "Meantime, it should be possible to have a cron run a script that checks
    cluster resources from time-to-time -- e.g. how full hdfs is, how much each regionserver is carrying -- and when it determines the needle is in the red,
    flip the cluster to be read-only."
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Andrew Purtell (JIRA) at Feb 4, 2010 at 4:42 am
    [ https://issues.apache.org/jira/browse/HBASE-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Andrew Purtell updated HBASE-1964:
    ----------------------------------

    Affects Version/s: (was: 0.20.1)
    Fix Version/s: 0.21.0
    Assignee: Andrew Purtell
    Summary: Enter temporary "safe mode" to ride over transient FS layer problems (was: Add internal status monitoring to RegionServer)

    Refocus this issue as "Enter temporary "safe mode" to ride over transient FS layer problems", as part of ride over restart.
    Enter temporary "safe mode" to ride over transient FS layer problems
    --------------------------------------------------------------------

    Key: HBASE-1964
    URL: https://issues.apache.org/jira/browse/HBASE-1964
    Project: Hadoop HBase
    Issue Type: Improvement
    Components: client
    Reporter: elsif
    Assignee: Andrew Purtell
    Fix For: 0.21.0


    When a hadoop/hbase cluster is under heavy load it will inevitably reach a tipping point where data is lost or corrupted. A
    graceful method is needed to put the cluster into safe mode until more resources can be added or the load on the cluster has been
    reduced.
    St.Ack has suggested the following short-term task: "Meantime, it should be possible to have a cron run a script that checks
    cluster resources from time-to-time -- e.g. how full hdfs is, how much each regionserver is carrying -- and when it determines the needle is in the red,
    flip the cluster to be read-only."
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Andrew Purtell (JIRA) at Feb 4, 2010 at 4:44 am
    [ https://issues.apache.org/jira/browse/HBASE-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Andrew Purtell updated HBASE-1964:
    ----------------------------------

    Issue Type: Sub-task (was: Improvement)
    Parent: HBASE-2183
    Enter temporary "safe mode" to ride over transient FS layer problems
    --------------------------------------------------------------------

    Key: HBASE-1964
    URL: https://issues.apache.org/jira/browse/HBASE-1964
    Project: Hadoop HBase
    Issue Type: Sub-task
    Components: client
    Reporter: elsif
    Assignee: Andrew Purtell
    Fix For: 0.21.0


    When a hadoop/hbase cluster is under heavy load it will inevitably reach a tipping point where data is lost or corrupted. A
    graceful method is needed to put the cluster into safe mode until more resources can be added or the load on the cluster has been
    reduced.
    St.Ack has suggested the following short-term task: "Meantime, it should be possible to have a cron run a script that checks
    cluster resources from time-to-time -- e.g. how full hdfs is, how much each regionserver is carrying -- and when it determines the needle is in the red,
    flip the cluster to be read-only."
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieshbase, hadoop
postedNov 9, '09 at 7:17p
activeFeb 4, '10 at 4:44a
posts4
users1
websitehbase.apache.org

1 user in discussion

Andrew Purtell (JIRA): 4 posts

People

Translate

site design / logo © 2022 Grokbase