FAQ
Name node should notify administrator if when struggling with replication
-------------------------------------------------------------------------

Key: HADOOP-3323
URL: https://issues.apache.org/jira/browse/HADOOP-3323
Project: Hadoop Core
Issue Type: Improvement
Components: dfs
Reporter: Robert Chansler


Name node performance suffers if either the replication queue is to big, or the avail space at data nodes is too small. In either case, the administrator should be notified.

If the situation is really desperate, the name node perhaps should enter safe mode.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Chris Douglas (JIRA) at Jul 15, 2008 at 10:33 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613759#action_12613759 ]

    Chris Douglas commented on HADOOP-3323:
    ---------------------------------------

    After some discussion, it's become clear that this may be completed in two parts:

    # A brief health check the namenode can perform itself
    # A metrics-based solution tracking namenode throughput over time, capable of inferring more complex and nuanced desperation

    Work on (2) will fall out of a generalized metrics reporting and alerting mechanism to be completed in concert with HADOOP-3719. The particular set of metrics and implementation will remain in this JIRA. Specifically, the implementation will likely correlate the size of the replication queue (FSNamesystemMetrics::pendingReplicationBlocks) with Datanode metrics tracking replicated blocks (DataNodeMetrics::blocksReplicated) aggregated across the cluster. The intent would be to track replication throughput, presuming that slow replication at the datanodes, a slow-draining replication queue, and low storage capacity would accurately capture the conditions called out here.

    In a separate JIRA, (1) will track a ping-like facility for querying the baseline health of the Namenode. In particular, it will verify that all expected threads are alive, perform inexpensive sanity checks on data structures, etc. Administrators periodically running this check can configure/attach to the notification scheme used in their deployment.
    Name node should notify administrator if when struggling with replication
    -------------------------------------------------------------------------

    Key: HADOOP-3323
    URL: https://issues.apache.org/jira/browse/HADOOP-3323
    Project: Hadoop Core
    Issue Type: Improvement
    Components: dfs
    Reporter: Robert Chansler

    Name node performance suffers if either the replication queue is to big, or the avail space at data nodes is too small. In either case, the administrator should be notified.
    If the situation is really desperate, the name node perhaps should enter safe mode.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Jul 15, 2008 at 10:46 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas reassigned HADOOP-3323:
    -------------------------------------

    Assignee: Mac Yang
    Name node should notify administrator if when struggling with replication
    -------------------------------------------------------------------------

    Key: HADOOP-3323
    URL: https://issues.apache.org/jira/browse/HADOOP-3323
    Project: Hadoop Core
    Issue Type: Improvement
    Components: dfs
    Reporter: Robert Chansler
    Assignee: Mac Yang

    Name node performance suffers if either the replication queue is to big, or the avail space at data nodes is too small. In either case, the administrator should be notified.
    If the situation is really desperate, the name node perhaps should enter safe mode.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedApr 29, '08 at 12:29a
activeJul 15, '08 at 10:46p
posts3
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Chris Douglas (JIRA): 3 posts

People

Translate

site design / logo © 2022 Grokbase