FAQ
Namenode Web UI capacity report is inconsistent with Balancer
-------------------------------------------------------------

Key: HADOOP-4430
URL: https://issues.apache.org/jira/browse/HADOOP-4430
Project: Hadoop Core
Issue Type: Bug
Affects Versions: 0.19.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
Fix For: 0.19.0


Solution to 2816 changed
- Total Capacity definition from (the disk space of all data directories) to (the disk space of all the data directories - the reserved space)
- We added a new element Present Capacity to the report. It is set to (Used Capacity + Remaining Capacity)
- We changed the Used Percentage reported from (Used Capacity)/(Total Capacity) to (Used Capacity)/(Present Capacity)
- All these changes are displayed on Namenode Web UI.

Balancer functionality
Balancer script is started with a threshold parameter. It tries to move the blocks from the nodes that have Used % that is more than (Cluster average + threshold) to the nodes that have less than (Cluster average - threshold). Essentially balancer gets all the datanodes used % to with in (the Cluster average +/- threshold).

Inconsistencies due to the change in 2816
When MapReduce jobs are run, temporary files are generated. This eats away a lot of space from Present Capacity. The difference between the Total Capacity and the Present Capacity can be huge. Currently balancer computes Used Percentage based (Used Capacity)/(Total Capacity). The Used % the balancer uses could be significantly different from Used % displayed on the Namenode Web UI. When balancer is done balancing, the Namenode Used % might still appear unbalanced.



--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Suresh Srinivas (JIRA) at Oct 16, 2008 at 6:42 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640259#action_12640259 ]

    Suresh Srinivas commented on HADOOP-4430:
    -----------------------------------------

    Proposed solution:
    - The definition of "Configured Capacity" from 2816 will be retained.
    - The "DFS Used %" will be changed from (DFS Used)/(Present Capacity) to (DFS Used)/(Configured Capacity)
    - "Present Capacity" introduced in 2816 should be same as "Configured Capacity", if the MapReduce generated temporary files do not take more than the reserved space. When the temporary files use more than the reserved space, "Present Capacity" reduces proportionally. With this change, "Present Capacity" data is removed. Instead, the extra space exceeding reserved space used by temporary files, is reported as "Non DFS Used" space.
    - New "DFS Remaining %" will be added to explicitly to indicate remaining % space for DFS used.
    - Currently a percentage factor, as defined by "dfs.datanode.du.pct", is used to reduce the actual remaining space to calculate the DFS Remaining. This does not serve any purpose (see the comments in 2816). This will be removed.

    Here are the definition of data reported on the Web UI:
    Configured Capacity: Disk space corresponding to all the data directories - Reserved space as defined by dfs.datanode.du.reserved
    DFS Used: Space used by DFS
    Non DFS Used: 0 if the temporary files do not exceed reserved space. Otherwise this is the size by which temporary files exceed the reserved space and encroach into the DFS configured space.
    DFS Remaining: (Configured Capacity - DFS Used - Non DFS Used)
    DFS Used %: (DFS Used / Configured Capacity) * 100
    DFS Remaining % = (DFS Remaining / Configured Capacity) * 100

    Namenode Web UI capacity report is inconsistent with Balancer
    -------------------------------------------------------------

    Key: HADOOP-4430
    URL: https://issues.apache.org/jira/browse/HADOOP-4430
    Project: Hadoop Core
    Issue Type: Bug
    Affects Versions: 0.19.0
    Reporter: Suresh Srinivas
    Assignee: Suresh Srinivas
    Fix For: 0.19.0


    Solution to 2816 changed
    - Total Capacity definition from (the disk space of all data directories) to (the disk space of all the data directories - the reserved space)
    - We added a new element Present Capacity to the report. It is set to (Used Capacity + Remaining Capacity)
    - We changed the Used Percentage reported from (Used Capacity)/(Total Capacity) to (Used Capacity)/(Present Capacity)
    - All these changes are displayed on Namenode Web UI.
    Balancer functionality
    Balancer script is started with a threshold parameter. It tries to move the blocks from the nodes that have Used % that is more than (Cluster average + threshold) to the nodes that have less than (Cluster average - threshold). Essentially balancer gets all the datanodes used % to with in (the Cluster average +/- threshold).
    Inconsistencies due to the change in 2816
    When MapReduce jobs are run, temporary files are generated. This eats away a lot of space from Present Capacity. The difference between the Total Capacity and the Present Capacity can be huge. Currently balancer computes Used Percentage based (Used Capacity)/(Total Capacity). The Used % the balancer uses could be significantly different from Used % displayed on the Namenode Web UI. When balancer is done balancing, the Namenode Used % might still appear unbalanced.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Robert Chansler (JIRA) at Oct 16, 2008 at 6:42 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Robert Chansler updated HADOOP-4430:
    ------------------------------------

    Priority: Blocker (was: Major)

    We really want this in 0.19 to avoid operational confusion. A successful execution of the rebalancer should result in the appearance of balance on the home page!

    Namenode Web UI capacity report is inconsistent with Balancer
    -------------------------------------------------------------

    Key: HADOOP-4430
    URL: https://issues.apache.org/jira/browse/HADOOP-4430
    Project: Hadoop Core
    Issue Type: Bug
    Affects Versions: 0.19.0
    Reporter: Suresh Srinivas
    Assignee: Suresh Srinivas
    Priority: Blocker
    Fix For: 0.19.0


    Solution to 2816 changed
    - Total Capacity definition from (the disk space of all data directories) to (the disk space of all the data directories - the reserved space)
    - We added a new element Present Capacity to the report. It is set to (Used Capacity + Remaining Capacity)
    - We changed the Used Percentage reported from (Used Capacity)/(Total Capacity) to (Used Capacity)/(Present Capacity)
    - All these changes are displayed on Namenode Web UI.
    Balancer functionality
    Balancer script is started with a threshold parameter. It tries to move the blocks from the nodes that have Used % that is more than (Cluster average + threshold) to the nodes that have less than (Cluster average - threshold). Essentially balancer gets all the datanodes used % to with in (the Cluster average +/- threshold).
    Inconsistencies due to the change in 2816
    When MapReduce jobs are run, temporary files are generated. This eats away a lot of space from Present Capacity. The difference between the Total Capacity and the Present Capacity can be huge. Currently balancer computes Used Percentage based (Used Capacity)/(Total Capacity). The Used % the balancer uses could be significantly different from Used % displayed on the Namenode Web UI. When balancer is done balancing, the Namenode Used % might still appear unbalanced.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Devaraj Das (JIRA) at Oct 17, 2008 at 5:51 am
    [ https://issues.apache.org/jira/browse/HADOOP-4430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Devaraj Das updated HADOOP-4430:
    --------------------------------

    Component/s: dfs
    Namenode Web UI capacity report is inconsistent with Balancer
    -------------------------------------------------------------

    Key: HADOOP-4430
    URL: https://issues.apache.org/jira/browse/HADOOP-4430
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.19.0
    Reporter: Suresh Srinivas
    Assignee: Suresh Srinivas
    Priority: Blocker
    Fix For: 0.19.0


    Solution to 2816 changed
    - Total Capacity definition from (the disk space of all data directories) to (the disk space of all the data directories - the reserved space)
    - We added a new element Present Capacity to the report. It is set to (Used Capacity + Remaining Capacity)
    - We changed the Used Percentage reported from (Used Capacity)/(Total Capacity) to (Used Capacity)/(Present Capacity)
    - All these changes are displayed on Namenode Web UI.
    Balancer functionality
    Balancer script is started with a threshold parameter. It tries to move the blocks from the nodes that have Used % that is more than (Cluster average + threshold) to the nodes that have less than (Cluster average - threshold). Essentially balancer gets all the datanodes used % to with in (the Cluster average +/- threshold).
    Inconsistencies due to the change in 2816
    When MapReduce jobs are run, temporary files are generated. This eats away a lot of space from Present Capacity. The difference between the Total Capacity and the Present Capacity can be huge. Currently balancer computes Used Percentage based (Used Capacity)/(Total Capacity). The Used % the balancer uses could be significantly different from Used % displayed on the Namenode Web UI. When balancer is done balancing, the Namenode Used % might still appear unbalanced.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Suresh Srinivas (JIRA) at Oct 17, 2008 at 6:46 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Suresh Srinivas updated HADOOP-4430:
    ------------------------------------

    Attachment: HADOOP-4430.patch

    The changes are based on the solution presented in an earlier comment.

    Here is the test-patch result:
    [exec] +1 overall.

    [exec] +1 @author. The patch does not contain any @author tags.

    [exec] +1 tests included. The patch appears to include 3 new or modified tests.

    [exec] +1 javadoc. The javadoc tool did not generate any warning messages.

    [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.

    [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.

    Namenode Web UI capacity report is inconsistent with Balancer
    -------------------------------------------------------------

    Key: HADOOP-4430
    URL: https://issues.apache.org/jira/browse/HADOOP-4430
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.19.0
    Reporter: Suresh Srinivas
    Assignee: Suresh Srinivas
    Priority: Blocker
    Fix For: 0.19.0

    Attachments: HADOOP-4430.patch


    Solution to 2816 changed
    - Total Capacity definition from (the disk space of all data directories) to (the disk space of all the data directories - the reserved space)
    - We added a new element Present Capacity to the report. It is set to (Used Capacity + Remaining Capacity)
    - We changed the Used Percentage reported from (Used Capacity)/(Total Capacity) to (Used Capacity)/(Present Capacity)
    - All these changes are displayed on Namenode Web UI.
    Balancer functionality
    Balancer script is started with a threshold parameter. It tries to move the blocks from the nodes that have Used % that is more than (Cluster average + threshold) to the nodes that have less than (Cluster average - threshold). Essentially balancer gets all the datanodes used % to with in (the Cluster average +/- threshold).
    Inconsistencies due to the change in 2816
    When MapReduce jobs are run, temporary files are generated. This eats away a lot of space from Present Capacity. The difference between the Total Capacity and the Present Capacity can be huge. Currently balancer computes Used Percentage based (Used Capacity)/(Total Capacity). The Used % the balancer uses could be significantly different from Used % displayed on the Namenode Web UI. When balancer is done balancing, the Namenode Used % might still appear unbalanced.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Suresh Srinivas (JIRA) at Oct 17, 2008 at 6:48 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Suresh Srinivas updated HADOOP-4430:
    ------------------------------------

    Release Note:
    Incompatible changes:
    1) Config parameter dfs.datanode.du.pct is no longer used and is removed from the hadoop-default.xml.

    2) Namenoe Web UI has the following changes:
    The following parameters are removed:
    * Total Capacity

    The following parameters are added to both Cluster Summary and Datanode information:
    * Configured Capacity - This is total diskspace of all the data directories minus the resereved capacity defined by
    * Non DFS Used - This indicates the disk space taken by non DFS file
    * DFS remaining % - This is remaining % of Configured Capacity available for DFS use

    The following parameters are modified:
    * DFS Used % - This is changed from % of Total Capacity to % of Configured Capacity

    Hadoop Flags: [Incompatible change]
    Status: Patch Available (was: Open)
    Namenode Web UI capacity report is inconsistent with Balancer
    -------------------------------------------------------------

    Key: HADOOP-4430
    URL: https://issues.apache.org/jira/browse/HADOOP-4430
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.19.0
    Reporter: Suresh Srinivas
    Assignee: Suresh Srinivas
    Priority: Blocker
    Fix For: 0.19.0

    Attachments: HADOOP-4430.patch


    Solution to 2816 changed
    - Total Capacity definition from (the disk space of all data directories) to (the disk space of all the data directories - the reserved space)
    - We added a new element Present Capacity to the report. It is set to (Used Capacity + Remaining Capacity)
    - We changed the Used Percentage reported from (Used Capacity)/(Total Capacity) to (Used Capacity)/(Present Capacity)
    - All these changes are displayed on Namenode Web UI.
    Balancer functionality
    Balancer script is started with a threshold parameter. It tries to move the blocks from the nodes that have Used % that is more than (Cluster average + threshold) to the nodes that have less than (Cluster average - threshold). Essentially balancer gets all the datanodes used % to with in (the Cluster average +/- threshold).
    Inconsistencies due to the change in 2816
    When MapReduce jobs are run, temporary files are generated. This eats away a lot of space from Present Capacity. The difference between the Total Capacity and the Present Capacity can be huge. Currently balancer computes Used Percentage based (Used Capacity)/(Total Capacity). The Used % the balancer uses could be significantly different from Used % displayed on the Namenode Web UI. When balancer is done balancing, the Namenode Used % might still appear unbalanced.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Oct 17, 2008 at 9:00 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640662#action_12640662 ]

    Hairong Kuang commented on HADOOP-4430:
    ---------------------------------------

    1. DatanodeInfo.append() line 190: u should be nonDFSUsed.
    2. FSNamesystem.getCapacityUsedNodnDFS() line 3306: return should be out of the synchronized block.
    3. FSnamesystem.getCapacityRemainingPercent() line 3324: the calculation is not consistent with that in DatanodeInfo.getRemainingPercent().
    Namenode Web UI capacity report is inconsistent with Balancer
    -------------------------------------------------------------

    Key: HADOOP-4430
    URL: https://issues.apache.org/jira/browse/HADOOP-4430
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.19.0
    Reporter: Suresh Srinivas
    Assignee: Suresh Srinivas
    Priority: Blocker
    Fix For: 0.19.0

    Attachments: HADOOP-4430.patch


    Solution to 2816 changed
    - Total Capacity definition from (the disk space of all data directories) to (the disk space of all the data directories - the reserved space)
    - We added a new element Present Capacity to the report. It is set to (Used Capacity + Remaining Capacity)
    - We changed the Used Percentage reported from (Used Capacity)/(Total Capacity) to (Used Capacity)/(Present Capacity)
    - All these changes are displayed on Namenode Web UI.
    Balancer functionality
    Balancer script is started with a threshold parameter. It tries to move the blocks from the nodes that have Used % that is more than (Cluster average + threshold) to the nodes that have less than (Cluster average - threshold). Essentially balancer gets all the datanodes used % to with in (the Cluster average +/- threshold).
    Inconsistencies due to the change in 2816
    When MapReduce jobs are run, temporary files are generated. This eats away a lot of space from Present Capacity. The difference between the Total Capacity and the Present Capacity can be huge. Currently balancer computes Used Percentage based (Used Capacity)/(Total Capacity). The Used % the balancer uses could be significantly different from Used % displayed on the Namenode Web UI. When balancer is done balancing, the Namenode Used % might still appear unbalanced.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Suresh Srinivas (JIRA) at Oct 17, 2008 at 9:44 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Suresh Srinivas updated HADOOP-4430:
    ------------------------------------

    Attachment: HADOOP-4430.patch

    Thanks Hairong for the review. I have attached a new patch with the suggested changes.
    Namenode Web UI capacity report is inconsistent with Balancer
    -------------------------------------------------------------

    Key: HADOOP-4430
    URL: https://issues.apache.org/jira/browse/HADOOP-4430
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.19.0
    Reporter: Suresh Srinivas
    Assignee: Suresh Srinivas
    Priority: Blocker
    Fix For: 0.19.0

    Attachments: HADOOP-4430.patch, HADOOP-4430.patch


    Solution to 2816 changed
    - Total Capacity definition from (the disk space of all data directories) to (the disk space of all the data directories - the reserved space)
    - We added a new element Present Capacity to the report. It is set to (Used Capacity + Remaining Capacity)
    - We changed the Used Percentage reported from (Used Capacity)/(Total Capacity) to (Used Capacity)/(Present Capacity)
    - All these changes are displayed on Namenode Web UI.
    Balancer functionality
    Balancer script is started with a threshold parameter. It tries to move the blocks from the nodes that have Used % that is more than (Cluster average + threshold) to the nodes that have less than (Cluster average - threshold). Essentially balancer gets all the datanodes used % to with in (the Cluster average +/- threshold).
    Inconsistencies due to the change in 2816
    When MapReduce jobs are run, temporary files are generated. This eats away a lot of space from Present Capacity. The difference between the Total Capacity and the Present Capacity can be huge. Currently balancer computes Used Percentage based (Used Capacity)/(Total Capacity). The Used % the balancer uses could be significantly different from Used % displayed on the Namenode Web UI. When balancer is done balancing, the Namenode Used % might still appear unbalanced.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Suresh Srinivas (JIRA) at Oct 17, 2008 at 10:51 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Suresh Srinivas updated HADOOP-4430:
    ------------------------------------

    Attachment: HADOOP-4430.patch

    Previous patch does not build. Attaching a new one.
    Namenode Web UI capacity report is inconsistent with Balancer
    -------------------------------------------------------------

    Key: HADOOP-4430
    URL: https://issues.apache.org/jira/browse/HADOOP-4430
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.19.0
    Reporter: Suresh Srinivas
    Assignee: Suresh Srinivas
    Priority: Blocker
    Fix For: 0.19.0

    Attachments: HADOOP-4430.patch, HADOOP-4430.patch, HADOOP-4430.patch


    Solution to 2816 changed
    - Total Capacity definition from (the disk space of all data directories) to (the disk space of all the data directories - the reserved space)
    - We added a new element Present Capacity to the report. It is set to (Used Capacity + Remaining Capacity)
    - We changed the Used Percentage reported from (Used Capacity)/(Total Capacity) to (Used Capacity)/(Present Capacity)
    - All these changes are displayed on Namenode Web UI.
    Balancer functionality
    Balancer script is started with a threshold parameter. It tries to move the blocks from the nodes that have Used % that is more than (Cluster average + threshold) to the nodes that have less than (Cluster average - threshold). Essentially balancer gets all the datanodes used % to with in (the Cluster average +/- threshold).
    Inconsistencies due to the change in 2816
    When MapReduce jobs are run, temporary files are generated. This eats away a lot of space from Present Capacity. The difference between the Total Capacity and the Present Capacity can be huge. Currently balancer computes Used Percentage based (Used Capacity)/(Total Capacity). The Used % the balancer uses could be significantly different from Used % displayed on the Namenode Web UI. When balancer is done balancing, the Namenode Used % might still appear unbalanced.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Oct 17, 2008 at 10:55 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640692#action_12640692 ]

    Hairong Kuang commented on HADOOP-4430:
    ---------------------------------------

    +1. The patch looks good to me.
    Namenode Web UI capacity report is inconsistent with Balancer
    -------------------------------------------------------------

    Key: HADOOP-4430
    URL: https://issues.apache.org/jira/browse/HADOOP-4430
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.19.0
    Reporter: Suresh Srinivas
    Assignee: Suresh Srinivas
    Priority: Blocker
    Fix For: 0.19.0

    Attachments: HADOOP-4430.patch, HADOOP-4430.patch, HADOOP-4430.patch


    Solution to 2816 changed
    - Total Capacity definition from (the disk space of all data directories) to (the disk space of all the data directories - the reserved space)
    - We added a new element Present Capacity to the report. It is set to (Used Capacity + Remaining Capacity)
    - We changed the Used Percentage reported from (Used Capacity)/(Total Capacity) to (Used Capacity)/(Present Capacity)
    - All these changes are displayed on Namenode Web UI.
    Balancer functionality
    Balancer script is started with a threshold parameter. It tries to move the blocks from the nodes that have Used % that is more than (Cluster average + threshold) to the nodes that have less than (Cluster average - threshold). Essentially balancer gets all the datanodes used % to with in (the Cluster average +/- threshold).
    Inconsistencies due to the change in 2816
    When MapReduce jobs are run, temporary files are generated. This eats away a lot of space from Present Capacity. The difference between the Total Capacity and the Present Capacity can be huge. Currently balancer computes Used Percentage based (Used Capacity)/(Total Capacity). The Used % the balancer uses could be significantly different from Used % displayed on the Namenode Web UI. When balancer is done balancing, the Namenode Used % might still appear unbalanced.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Oct 18, 2008 at 8:09 am
    [ https://issues.apache.org/jira/browse/HADOOP-4430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640757#action_12640757 ]

    Hadoop QA commented on HADOOP-4430:
    -----------------------------------

    +1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12392389/HADOOP-4430.patch
    against trunk revision 705831.

    +1 @author. The patch does not contain any @author tags.

    +1 tests included. The patch appears to include 3 new or modified tests.

    +1 javadoc. The javadoc tool did not generate any warning messages.

    +1 javac. The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs. The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 core tests. The patch passed core unit tests.

    +1 contrib tests. The patch passed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3492/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3492/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3492/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3492/console

    This message is automatically generated.
    Namenode Web UI capacity report is inconsistent with Balancer
    -------------------------------------------------------------

    Key: HADOOP-4430
    URL: https://issues.apache.org/jira/browse/HADOOP-4430
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.19.0
    Reporter: Suresh Srinivas
    Assignee: Suresh Srinivas
    Priority: Blocker
    Fix For: 0.19.0

    Attachments: HADOOP-4430.patch, HADOOP-4430.patch, HADOOP-4430.patch


    Solution to 2816 changed
    - Total Capacity definition from (the disk space of all data directories) to (the disk space of all the data directories - the reserved space)
    - We added a new element Present Capacity to the report. It is set to (Used Capacity + Remaining Capacity)
    - We changed the Used Percentage reported from (Used Capacity)/(Total Capacity) to (Used Capacity)/(Present Capacity)
    - All these changes are displayed on Namenode Web UI.
    Balancer functionality
    Balancer script is started with a threshold parameter. It tries to move the blocks from the nodes that have Used % that is more than (Cluster average + threshold) to the nodes that have less than (Cluster average - threshold). Essentially balancer gets all the datanodes used % to with in (the Cluster average +/- threshold).
    Inconsistencies due to the change in 2816
    When MapReduce jobs are run, temporary files are generated. This eats away a lot of space from Present Capacity. The difference between the Total Capacity and the Present Capacity can be huge. Currently balancer computes Used Percentage based (Used Capacity)/(Total Capacity). The Used % the balancer uses could be significantly different from Used % displayed on the Namenode Web UI. When balancer is done balancing, the Namenode Used % might still appear unbalanced.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Suresh Srinivas (JIRA) at Oct 20, 2008 at 6:14 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Suresh Srinivas updated HADOOP-4430:
    ------------------------------------

    Release Note:
    Incompatible changes:
    This change modifies/retains the changes made in 2816 as follows:
    1) Present Capacity added in 2816 is removed from the Web UI
    2) Change of Total Capacity to Configured Capacity and its definition from 2816 is retained in the Web UI
    3) Data node protocol change to report Configured Capacity instead of Total Capacity is retained.
    4) DFS Used% was calculated as a percentage of Present Capacity. It is changed to percentage of Configured Capacity.

    Other incompatible changes:
    1) Config parameter dfs.datanode.du.pct is no longer used and is removed from the hadoop-default.xml.

    2) Namenode Web UI has the following addional changes:
    The following parameters are added to both Cluster Summary and Datanode information:
    * Non DFS Used - This indicates the disk space taken by non DFS file
    * DFS remaining % - This is remaining % of Configured Capacity available for DFS use


    was:
    Incompatible changes:
    1) Config parameter dfs.datanode.du.pct is no longer used and is removed from the hadoop-default.xml.

    2) Namenoe Web UI has the following changes:
    The following parameters are removed:
    * Total Capacity

    The following parameters are added to both Cluster Summary and Datanode information:
    * Configured Capacity - This is total diskspace of all the data directories minus the resereved capacity defined by
    * Non DFS Used - This indicates the disk space taken by non DFS file
    * DFS remaining % - This is remaining % of Configured Capacity available for DFS use

    The following parameters are modified:
    * DFS Used % - This is changed from % of Total Capacity to % of Configured Capacity


    Namenode Web UI capacity report is inconsistent with Balancer
    -------------------------------------------------------------

    Key: HADOOP-4430
    URL: https://issues.apache.org/jira/browse/HADOOP-4430
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.19.0
    Reporter: Suresh Srinivas
    Assignee: Suresh Srinivas
    Priority: Blocker
    Fix For: 0.19.0

    Attachments: HADOOP-4430.patch, HADOOP-4430.patch, HADOOP-4430.patch


    Solution to 2816 changed
    - Total Capacity definition from (the disk space of all data directories) to (the disk space of all the data directories - the reserved space)
    - We added a new element Present Capacity to the report. It is set to (Used Capacity + Remaining Capacity)
    - We changed the Used Percentage reported from (Used Capacity)/(Total Capacity) to (Used Capacity)/(Present Capacity)
    - All these changes are displayed on Namenode Web UI.
    Balancer functionality
    Balancer script is started with a threshold parameter. It tries to move the blocks from the nodes that have Used % that is more than (Cluster average + threshold) to the nodes that have less than (Cluster average - threshold). Essentially balancer gets all the datanodes used % to with in (the Cluster average +/- threshold).
    Inconsistencies due to the change in 2816
    When MapReduce jobs are run, temporary files are generated. This eats away a lot of space from Present Capacity. The difference between the Total Capacity and the Present Capacity can be huge. Currently balancer computes Used Percentage based (Used Capacity)/(Total Capacity). The Used % the balancer uses could be significantly different from Used % displayed on the Namenode Web UI. When balancer is done balancing, the Namenode Used % might still appear unbalanced.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Oct 20, 2008 at 6:18 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-4430:
    ----------------------------------

    Resolution: Fixed
    Hadoop Flags: [Incompatible change, Reviewed] (was: [Incompatible change])
    Status: Resolved (was: Patch Available)

    I just committed this. Thank you, Suresh!
    Namenode Web UI capacity report is inconsistent with Balancer
    -------------------------------------------------------------

    Key: HADOOP-4430
    URL: https://issues.apache.org/jira/browse/HADOOP-4430
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.19.0
    Reporter: Suresh Srinivas
    Assignee: Suresh Srinivas
    Priority: Blocker
    Fix For: 0.19.0

    Attachments: HADOOP-4430.patch, HADOOP-4430.patch, HADOOP-4430.patch


    Solution to 2816 changed
    - Total Capacity definition from (the disk space of all data directories) to (the disk space of all the data directories - the reserved space)
    - We added a new element Present Capacity to the report. It is set to (Used Capacity + Remaining Capacity)
    - We changed the Used Percentage reported from (Used Capacity)/(Total Capacity) to (Used Capacity)/(Present Capacity)
    - All these changes are displayed on Namenode Web UI.
    Balancer functionality
    Balancer script is started with a threshold parameter. It tries to move the blocks from the nodes that have Used % that is more than (Cluster average + threshold) to the nodes that have less than (Cluster average - threshold). Essentially balancer gets all the datanodes used % to with in (the Cluster average +/- threshold).
    Inconsistencies due to the change in 2816
    When MapReduce jobs are run, temporary files are generated. This eats away a lot of space from Present Capacity. The difference between the Total Capacity and the Present Capacity can be huge. Currently balancer computes Used Percentage based (Used Capacity)/(Total Capacity). The Used % the balancer uses could be significantly different from Used % displayed on the Namenode Web UI. When balancer is done balancing, the Namenode Used % might still appear unbalanced.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Oct 20, 2008 at 6:20 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641094#action_12641094 ]

    Raghu Angadi commented on HADOOP-4430:
    --------------------------------------

    What is the worst case possible if some one upgrades without noting the changes?
    Namenode Web UI capacity report is inconsistent with Balancer
    -------------------------------------------------------------

    Key: HADOOP-4430
    URL: https://issues.apache.org/jira/browse/HADOOP-4430
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.19.0
    Reporter: Suresh Srinivas
    Assignee: Suresh Srinivas
    Priority: Blocker
    Fix For: 0.19.0

    Attachments: HADOOP-4430.patch, HADOOP-4430.patch, HADOOP-4430.patch


    Solution to 2816 changed
    - Total Capacity definition from (the disk space of all data directories) to (the disk space of all the data directories - the reserved space)
    - We added a new element Present Capacity to the report. It is set to (Used Capacity + Remaining Capacity)
    - We changed the Used Percentage reported from (Used Capacity)/(Total Capacity) to (Used Capacity)/(Present Capacity)
    - All these changes are displayed on Namenode Web UI.
    Balancer functionality
    Balancer script is started with a threshold parameter. It tries to move the blocks from the nodes that have Used % that is more than (Cluster average + threshold) to the nodes that have less than (Cluster average - threshold). Essentially balancer gets all the datanodes used % to with in (the Cluster average +/- threshold).
    Inconsistencies due to the change in 2816
    When MapReduce jobs are run, temporary files are generated. This eats away a lot of space from Present Capacity. The difference between the Total Capacity and the Present Capacity can be huge. Currently balancer computes Used Percentage based (Used Capacity)/(Total Capacity). The Used % the balancer uses could be significantly different from Used % displayed on the Namenode Web UI. When balancer is done balancing, the Namenode Used % might still appear unbalanced.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Suresh Srinivas (JIRA) at Oct 20, 2008 at 6:50 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641113#action_12641113 ]

    Suresh Srinivas commented on HADOOP-4430:
    -----------------------------------------

    This change is mainly related to Web UI. It provides better clarity to how the file system capacity is represented on Web UI. This should not affect any functionality post upgrade.
    Namenode Web UI capacity report is inconsistent with Balancer
    -------------------------------------------------------------

    Key: HADOOP-4430
    URL: https://issues.apache.org/jira/browse/HADOOP-4430
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.19.0
    Reporter: Suresh Srinivas
    Assignee: Suresh Srinivas
    Priority: Blocker
    Fix For: 0.19.0

    Attachments: HADOOP-4430.patch, HADOOP-4430.patch, HADOOP-4430.patch


    Solution to 2816 changed
    - Total Capacity definition from (the disk space of all data directories) to (the disk space of all the data directories - the reserved space)
    - We added a new element Present Capacity to the report. It is set to (Used Capacity + Remaining Capacity)
    - We changed the Used Percentage reported from (Used Capacity)/(Total Capacity) to (Used Capacity)/(Present Capacity)
    - All these changes are displayed on Namenode Web UI.
    Balancer functionality
    Balancer script is started with a threshold parameter. It tries to move the blocks from the nodes that have Used % that is more than (Cluster average + threshold) to the nodes that have less than (Cluster average - threshold). Essentially balancer gets all the datanodes used % to with in (the Cluster average +/- threshold).
    Inconsistencies due to the change in 2816
    When MapReduce jobs are run, temporary files are generated. This eats away a lot of space from Present Capacity. The difference between the Total Capacity and the Present Capacity can be huge. Currently balancer computes Used Percentage based (Used Capacity)/(Total Capacity). The Used % the balancer uses could be significantly different from Used % displayed on the Namenode Web UI. When balancer is done balancing, the Namenode Used % might still appear unbalanced.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Robert Chansler (JIRA) at Oct 21, 2008 at 11:41 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Robert Chansler updated HADOOP-4430:
    ------------------------------------

    Release Note:
    Changed reporting in the NameNode Web UI to more closely reflect the behavior of the re-balancer. Removed no longer used config parameter dfs.datanode.du.pct from hadoop-default.xml.


    was:
    Incompatible changes:
    This change modifies/retains the changes made in 2816 as follows:
    1) Present Capacity added in 2816 is removed from the Web UI
    2) Change of Total Capacity to Configured Capacity and its definition from 2816 is retained in the Web UI
    3) Data node protocol change to report Configured Capacity instead of Total Capacity is retained.
    4) DFS Used% was calculated as a percentage of Present Capacity. It is changed to percentage of Configured Capacity.

    Other incompatible changes:
    1) Config parameter dfs.datanode.du.pct is no longer used and is removed from the hadoop-default.xml.

    2) Namenode Web UI has the following addional changes:
    The following parameters are added to both Cluster Summary and Datanode information:
    * Non DFS Used - This indicates the disk space taken by non DFS file
    * DFS remaining % - This is remaining % of Configured Capacity available for DFS use


    Hadoop Flags: [Incompatible change, Reviewed] (was: [Reviewed, Incompatible change])
    Namenode Web UI capacity report is inconsistent with Balancer
    -------------------------------------------------------------

    Key: HADOOP-4430
    URL: https://issues.apache.org/jira/browse/HADOOP-4430
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.19.0
    Reporter: Suresh Srinivas
    Assignee: Suresh Srinivas
    Priority: Blocker
    Fix For: 0.19.0

    Attachments: HADOOP-4430.patch, HADOOP-4430.patch, HADOOP-4430.patch


    Solution to 2816 changed
    - Total Capacity definition from (the disk space of all data directories) to (the disk space of all the data directories - the reserved space)
    - We added a new element Present Capacity to the report. It is set to (Used Capacity + Remaining Capacity)
    - We changed the Used Percentage reported from (Used Capacity)/(Total Capacity) to (Used Capacity)/(Present Capacity)
    - All these changes are displayed on Namenode Web UI.
    Balancer functionality
    Balancer script is started with a threshold parameter. It tries to move the blocks from the nodes that have Used % that is more than (Cluster average + threshold) to the nodes that have less than (Cluster average - threshold). Essentially balancer gets all the datanodes used % to with in (the Cluster average +/- threshold).
    Inconsistencies due to the change in 2816
    When MapReduce jobs are run, temporary files are generated. This eats away a lot of space from Present Capacity. The difference between the Total Capacity and the Present Capacity can be huge. Currently balancer computes Used Percentage based (Used Capacity)/(Total Capacity). The Used % the balancer uses could be significantly different from Used % displayed on the Namenode Web UI. When balancer is done balancing, the Namenode Used % might still appear unbalanced.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hudson (JIRA) at Oct 23, 2008 at 9:58 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642315#action_12642315 ]

    Hudson commented on HADOOP-4430:
    --------------------------------

    Integrated in Hadoop-trunk #640 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/640/])

    Namenode Web UI capacity report is inconsistent with Balancer
    -------------------------------------------------------------

    Key: HADOOP-4430
    URL: https://issues.apache.org/jira/browse/HADOOP-4430
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.19.0
    Reporter: Suresh Srinivas
    Assignee: Suresh Srinivas
    Priority: Blocker
    Fix For: 0.19.0

    Attachments: HADOOP-4430.patch, HADOOP-4430.patch, HADOOP-4430.patch


    Solution to 2816 changed
    - Total Capacity definition from (the disk space of all data directories) to (the disk space of all the data directories - the reserved space)
    - We added a new element Present Capacity to the report. It is set to (Used Capacity + Remaining Capacity)
    - We changed the Used Percentage reported from (Used Capacity)/(Total Capacity) to (Used Capacity)/(Present Capacity)
    - All these changes are displayed on Namenode Web UI.
    Balancer functionality
    Balancer script is started with a threshold parameter. It tries to move the blocks from the nodes that have Used % that is more than (Cluster average + threshold) to the nodes that have less than (Cluster average - threshold). Essentially balancer gets all the datanodes used % to with in (the Cluster average +/- threshold).
    Inconsistencies due to the change in 2816
    When MapReduce jobs are run, temporary files are generated. This eats away a lot of space from Present Capacity. The difference between the Total Capacity and the Present Capacity can be huge. Currently balancer computes Used Percentage based (Used Capacity)/(Total Capacity). The Used % the balancer uses could be significantly different from Used % displayed on the Namenode Web UI. When balancer is done balancing, the Namenode Used % might still appear unbalanced.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedOct 16, '08 at 5:58p
activeOct 23, '08 at 9:58p
posts17
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Hudson (JIRA): 17 posts

People

Translate

site design / logo © 2022 Grokbase