FAQ
More improvement on block placement performance
-----------------------------------------------

Key: HADOOP-5638
URL: https://issues.apache.org/jira/browse/HADOOP-5638
Project: Hadoop Core
Issue Type: Improvement
Components: dfs
Reporter: Hairong Kuang
Assignee: Hairong Kuang
Fix For: 0.21.0


Block placement algorithm currently has an excluded node list, which contains all datanodes that have been visited. This list is implemented as an array list, whose cost of inserting is O(1) but the cost of query "contains" is O(n), where n is the number of datanodes. This makes the cost of block placement to be O(n*n) when a cluster is full.

I propose to change the data structure of the excluded node list as a HashMap. So in average, the cost of insertion is O(1) and the cost of query is O(1). This makes the block placement algorithm to be O(n) in average.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Hairong Kuang (JIRA) at Apr 7, 2009 at 10:54 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-5638:
    ----------------------------------

    Description:
    Block placement algorithm currently has an excluded node list, which contains all datanodes that have been visited. This list is implemented as an array list, whose cost of inserting is O(1) but the cost of query "contains" is O( n ), where n is the number of datanodes. This makes the cost of block placement to be O(n*n) when a cluster is full.

    I propose to change the data structure of the excluded node list as a HashMap. So in average, the cost of insertion is O(1) and the cost of query is O(1). This makes the block placement algorithm to be O( n ) in average.

    was:
    Block placement algorithm currently has an excluded node list, which contains all datanodes that have been visited. This list is implemented as an array list, whose cost of inserting is O(1) but the cost of query "contains" is O(n), where n is the number of datanodes. This makes the cost of block placement to be O(n*n) when a cluster is full.

    I propose to change the data structure of the excluded node list as a HashMap. So in average, the cost of insertion is O(1) and the cost of query is O(1). This makes the block placement algorithm to be O(n) in average.

    More improvement on block placement performance
    -----------------------------------------------

    Key: HADOOP-5638
    URL: https://issues.apache.org/jira/browse/HADOOP-5638
    Project: Hadoop Core
    Issue Type: Improvement
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Hairong Kuang
    Fix For: 0.21.0


    Block placement algorithm currently has an excluded node list, which contains all datanodes that have been visited. This list is implemented as an array list, whose cost of inserting is O(1) but the cost of query "contains" is O( n ), where n is the number of datanodes. This makes the cost of block placement to be O(n*n) when a cluster is full.
    I propose to change the data structure of the excluded node list as a HashMap. So in average, the cost of insertion is O(1) and the cost of query is O(1). This makes the block placement algorithm to be O( n ) in average.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Apr 7, 2009 at 11:43 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-5638:
    ----------------------------------

    Attachment: excludedList.patch

    This patch changed the data type of the excluded list from ArrayList to HashMap.
    More improvement on block placement performance
    -----------------------------------------------

    Key: HADOOP-5638
    URL: https://issues.apache.org/jira/browse/HADOOP-5638
    Project: Hadoop Core
    Issue Type: Improvement
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Hairong Kuang
    Fix For: 0.21.0

    Attachments: excludedList.patch


    Block placement algorithm currently has an excluded node list, which contains all datanodes that have been visited. This list is implemented as an array list, whose cost of inserting is O(1) but the cost of query "contains" is O( n ), where n is the number of datanodes. This makes the cost of block placement to be O(n*n) when a cluster is full.
    I propose to change the data structure of the excluded node list as a HashMap. So in average, the cost of insertion is O(1) and the cost of query is O(1). This makes the block placement algorithm to be O( n ) in average.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Apr 7, 2009 at 11:51 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696830#action_12696830 ]

    Hairong Kuang commented on HADOOP-5638:
    ---------------------------------------

    I did the same experiment described in HADOOP-5603. On a full cluster with 3150 nodes, the patch reduced the amount of time on placing a block on two datanodes from 2.1s to less than 0.1s.
    More improvement on block placement performance
    -----------------------------------------------

    Key: HADOOP-5638
    URL: https://issues.apache.org/jira/browse/HADOOP-5638
    Project: Hadoop Core
    Issue Type: Improvement
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Hairong Kuang
    Fix For: 0.21.0

    Attachments: excludedList.patch


    Block placement algorithm currently has an excluded node list, which contains all datanodes that have been visited. This list is implemented as an array list, whose cost of inserting is O(1) but the cost of query "contains" is O( n ), where n is the number of datanodes. This makes the cost of block placement to be O(n*n) when a cluster is full.
    I propose to change the data structure of the excluded node list as a HashMap. So in average, the cost of insertion is O(1) and the cost of query is O(1). This makes the block placement algorithm to be O( n ) in average.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Apr 8, 2009 at 11:00 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697283#action_12697283 ]

    Tsz Wo (Nicholas), SZE commented on HADOOP-5638:
    ------------------------------------------------

    The type of excludedNodes in countNumOfAvailableNodes(..) should be Collection<Node>. Everything else looks good.
    More improvement on block placement performance
    -----------------------------------------------

    Key: HADOOP-5638
    URL: https://issues.apache.org/jira/browse/HADOOP-5638
    Project: Hadoop Core
    Issue Type: Improvement
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Hairong Kuang
    Fix For: 0.21.0

    Attachments: excludedList.patch


    Block placement algorithm currently has an excluded node list, which contains all datanodes that have been visited. This list is implemented as an array list, whose cost of inserting is O(1) but the cost of query "contains" is O( n ), where n is the number of datanodes. This makes the cost of block placement to be O(n*n) when a cluster is full.
    I propose to change the data structure of the excluded node list as a HashMap. So in average, the cost of insertion is O(1) and the cost of query is O(1). This makes the block placement algorithm to be O( n ) in average.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Apr 8, 2009 at 11:30 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-5638:
    ----------------------------------

    Attachment: excludedList1.patch

    This patch incorporates Nicholas' comment plus fixes a few javadoc typos.
    More improvement on block placement performance
    -----------------------------------------------

    Key: HADOOP-5638
    URL: https://issues.apache.org/jira/browse/HADOOP-5638
    Project: Hadoop Core
    Issue Type: Improvement
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Hairong Kuang
    Fix For: 0.21.0

    Attachments: excludedList.patch, excludedList1.patch


    Block placement algorithm currently has an excluded node list, which contains all datanodes that have been visited. This list is implemented as an array list, whose cost of inserting is O(1) but the cost of query "contains" is O( n ), where n is the number of datanodes. This makes the cost of block placement to be O(n*n) when a cluster is full.
    I propose to change the data structure of the excluded node list as a HashMap. So in average, the cost of insertion is O(1) and the cost of query is O(1). This makes the block placement algorithm to be O( n ) in average.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Apr 8, 2009 at 11:30 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-5638:
    ----------------------------------

    Status: Patch Available (was: Open)
    More improvement on block placement performance
    -----------------------------------------------

    Key: HADOOP-5638
    URL: https://issues.apache.org/jira/browse/HADOOP-5638
    Project: Hadoop Core
    Issue Type: Improvement
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Hairong Kuang
    Fix For: 0.21.0

    Attachments: excludedList.patch, excludedList1.patch


    Block placement algorithm currently has an excluded node list, which contains all datanodes that have been visited. This list is implemented as an array list, whose cost of inserting is O(1) but the cost of query "contains" is O( n ), where n is the number of datanodes. This makes the cost of block placement to be O(n*n) when a cluster is full.
    I propose to change the data structure of the excluded node list as a HashMap. So in average, the cost of insertion is O(1) and the cost of query is O(1). This makes the block placement algorithm to be O( n ) in average.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Apr 9, 2009 at 10:31 am
    [ https://issues.apache.org/jira/browse/HADOOP-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697471#action_12697471 ]

    Hadoop QA commented on HADOOP-5638:
    -----------------------------------

    -1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12405025/excludedList1.patch
    against trunk revision 763502.

    +1 @author. The patch does not contain any @author tags.

    +1 tests included. The patch appears to include 3 new or modified tests.

    +1 javadoc. The javadoc tool did not generate any warning messages.

    +1 javac. The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs. The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 release audit. The applied patch does not increase the total number of release audit warnings.

    -1 core tests. The patch failed core unit tests.

    -1 contrib tests. The patch failed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/170/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/170/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/170/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/170/console

    This message is automatically generated.
    More improvement on block placement performance
    -----------------------------------------------

    Key: HADOOP-5638
    URL: https://issues.apache.org/jira/browse/HADOOP-5638
    Project: Hadoop Core
    Issue Type: Improvement
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Hairong Kuang
    Fix For: 0.21.0

    Attachments: excludedList.patch, excludedList1.patch


    Block placement algorithm currently has an excluded node list, which contains all datanodes that have been visited. This list is implemented as an array list, whose cost of inserting is O(1) but the cost of query "contains" is O( n ), where n is the number of datanodes. This makes the cost of block placement to be O(n*n) when a cluster is full.
    I propose to change the data structure of the excluded node list as a HashMap. So in average, the cost of insertion is O(1) and the cost of query is O(1). This makes the block placement algorithm to be O( n ) in average.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Apr 9, 2009 at 8:10 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-5638:
    ----------------------------------

    Status: Patch Available (was: Open)
    More improvement on block placement performance
    -----------------------------------------------

    Key: HADOOP-5638
    URL: https://issues.apache.org/jira/browse/HADOOP-5638
    Project: Hadoop Core
    Issue Type: Improvement
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Hairong Kuang
    Fix For: 0.21.0

    Attachments: excludedList.patch, excludedList1.patch, excludedList2.patch


    Block placement algorithm currently has an excluded node list, which contains all datanodes that have been visited. This list is implemented as an array list, whose cost of inserting is O(1) but the cost of query "contains" is O( n ), where n is the number of datanodes. This makes the cost of block placement to be O(n*n) when a cluster is full.
    I propose to change the data structure of the excluded node list as a HashMap. So in average, the cost of insertion is O(1) and the cost of query is O(1). This makes the block placement algorithm to be O( n ) in average.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Apr 9, 2009 at 8:10 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-5638:
    ----------------------------------

    Status: Open (was: Patch Available)
    More improvement on block placement performance
    -----------------------------------------------

    Key: HADOOP-5638
    URL: https://issues.apache.org/jira/browse/HADOOP-5638
    Project: Hadoop Core
    Issue Type: Improvement
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Hairong Kuang
    Fix For: 0.21.0

    Attachments: excludedList.patch, excludedList1.patch, excludedList2.patch


    Block placement algorithm currently has an excluded node list, which contains all datanodes that have been visited. This list is implemented as an array list, whose cost of inserting is O(1) but the cost of query "contains" is O( n ), where n is the number of datanodes. This makes the cost of block placement to be O(n*n) when a cluster is full.
    I propose to change the data structure of the excluded node list as a HashMap. So in average, the cost of insertion is O(1) and the cost of query is O(1). This makes the block placement algorithm to be O( n ) in average.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Apr 9, 2009 at 8:10 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-5638:
    ----------------------------------

    Attachment: excludedList2.patch

    This failed test seemed to be caused by port collision. Maybe there was a concurrent running of another junit test. The new patch fixed this.
    More improvement on block placement performance
    -----------------------------------------------

    Key: HADOOP-5638
    URL: https://issues.apache.org/jira/browse/HADOOP-5638
    Project: Hadoop Core
    Issue Type: Improvement
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Hairong Kuang
    Fix For: 0.21.0

    Attachments: excludedList.patch, excludedList1.patch, excludedList2.patch


    Block placement algorithm currently has an excluded node list, which contains all datanodes that have been visited. This list is implemented as an array list, whose cost of inserting is O(1) but the cost of query "contains" is O( n ), where n is the number of datanodes. This makes the cost of block placement to be O(n*n) when a cluster is full.
    I propose to change the data structure of the excluded node list as a HashMap. So in average, the cost of insertion is O(1) and the cost of query is O(1). This makes the block placement algorithm to be O( n ) in average.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Apr 10, 2009 at 8:14 am
    [ https://issues.apache.org/jira/browse/HADOOP-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697766#action_12697766 ]

    Hadoop QA commented on HADOOP-5638:
    -----------------------------------

    -1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12405099/excludedList2.patch
    against trunk revision 763728.

    +1 @author. The patch does not contain any @author tags.

    +1 tests included. The patch appears to include 3 new or modified tests.

    +1 javadoc. The javadoc tool did not generate any warning messages.

    +1 javac. The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs. The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 release audit. The applied patch does not increase the total number of release audit warnings.

    -1 core tests. The patch failed core unit tests.

    -1 contrib tests. The patch failed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/178/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/178/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/178/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/178/console

    This message is automatically generated.
    More improvement on block placement performance
    -----------------------------------------------

    Key: HADOOP-5638
    URL: https://issues.apache.org/jira/browse/HADOOP-5638
    Project: Hadoop Core
    Issue Type: Improvement
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Hairong Kuang
    Fix For: 0.21.0

    Attachments: excludedList.patch, excludedList1.patch, excludedList2.patch


    Block placement algorithm currently has an excluded node list, which contains all datanodes that have been visited. This list is implemented as an array list, whose cost of inserting is O(1) but the cost of query "contains" is O( n ), where n is the number of datanodes. This makes the cost of block placement to be O(n*n) when a cluster is full.
    I propose to change the data structure of the excluded node list as a HashMap. So in average, the cost of insertion is O(1) and the cost of query is O(1). This makes the block placement algorithm to be O( n ) in average.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Suresh Srinivas (JIRA) at Apr 10, 2009 at 11:27 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698010#action_12698010 ]

    Suresh Srinivas commented on HADOOP-5638:
    -----------------------------------------

    {{ReplicationTargetChooser.java}} line 205, there is no need to check for {{containsKey()}}, you can just overwrite the the existing entry (if there is any) by calling just {{put()}}. This change can be done in other places as well.

    More improvement on block placement performance
    -----------------------------------------------

    Key: HADOOP-5638
    URL: https://issues.apache.org/jira/browse/HADOOP-5638
    Project: Hadoop Core
    Issue Type: Improvement
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Hairong Kuang
    Fix For: 0.21.0

    Attachments: excludedList.patch, excludedList1.patch, excludedList2.patch


    Block placement algorithm currently has an excluded node list, which contains all datanodes that have been visited. This list is implemented as an array list, whose cost of inserting is O(1) but the cost of query "contains" is O( n ), where n is the number of datanodes. This makes the cost of block placement to be O(n*n) when a cluster is full.
    I propose to change the data structure of the excluded node list as a HashMap. So in average, the cost of insertion is O(1) and the cost of query is O(1). This makes the block placement algorithm to be O( n ) in average.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Apr 15, 2009 at 6:50 am
    [ https://issues.apache.org/jira/browse/HADOOP-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-5638:
    ----------------------------------

    Attachment: excludedList3.patch

    This patch incorporates Suresh's review comment.
    More improvement on block placement performance
    -----------------------------------------------

    Key: HADOOP-5638
    URL: https://issues.apache.org/jira/browse/HADOOP-5638
    Project: Hadoop Core
    Issue Type: Improvement
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Hairong Kuang
    Fix For: 0.21.0

    Attachments: excludedList.patch, excludedList1.patch, excludedList2.patch, excludedList3.patch


    Block placement algorithm currently has an excluded node list, which contains all datanodes that have been visited. This list is implemented as an array list, whose cost of inserting is O(1) but the cost of query "contains" is O( n ), where n is the number of datanodes. This makes the cost of block placement to be O(n*n) when a cluster is full.
    I propose to change the data structure of the excluded node list as a HashMap. So in average, the cost of insertion is O(1) and the cost of query is O(1). This makes the block placement algorithm to be O( n ) in average.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Apr 15, 2009 at 6:50 am
    [ https://issues.apache.org/jira/browse/HADOOP-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-5638:
    ----------------------------------

    Status: Patch Available (was: Open)
    More improvement on block placement performance
    -----------------------------------------------

    Key: HADOOP-5638
    URL: https://issues.apache.org/jira/browse/HADOOP-5638
    Project: Hadoop Core
    Issue Type: Improvement
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Hairong Kuang
    Fix For: 0.21.0

    Attachments: excludedList.patch, excludedList1.patch, excludedList2.patch, excludedList3.patch


    Block placement algorithm currently has an excluded node list, which contains all datanodes that have been visited. This list is implemented as an array list, whose cost of inserting is O(1) but the cost of query "contains" is O( n ), where n is the number of datanodes. This makes the cost of block placement to be O(n*n) when a cluster is full.
    I propose to change the data structure of the excluded node list as a HashMap. So in average, the cost of insertion is O(1) and the cost of query is O(1). This makes the block placement algorithm to be O( n ) in average.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Apr 15, 2009 at 6:50 am
    [ https://issues.apache.org/jira/browse/HADOOP-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-5638:
    ----------------------------------

    Status: Open (was: Patch Available)
    More improvement on block placement performance
    -----------------------------------------------

    Key: HADOOP-5638
    URL: https://issues.apache.org/jira/browse/HADOOP-5638
    Project: Hadoop Core
    Issue Type: Improvement
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Hairong Kuang
    Fix For: 0.21.0

    Attachments: excludedList.patch, excludedList1.patch, excludedList2.patch, excludedList3.patch


    Block placement algorithm currently has an excluded node list, which contains all datanodes that have been visited. This list is implemented as an array list, whose cost of inserting is O(1) but the cost of query "contains" is O( n ), where n is the number of datanodes. This makes the cost of block placement to be O(n*n) when a cluster is full.
    I propose to change the data structure of the excluded node list as a HashMap. So in average, the cost of insertion is O(1) and the cost of query is O(1). This makes the block placement algorithm to be O( n ) in average.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Suresh Srinivas (JIRA) at Apr 16, 2009 at 6:21 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699791#action_12699791 ]

    Suresh Srinivas commented on HADOOP-5638:
    -----------------------------------------

    +1

    Nits - There are several typos {{replca}}, {{datanodesthat}}, {{choosen}}, {{availabe}}, {{tranverses}}, {{relicated}}

    More improvement on block placement performance
    -----------------------------------------------

    Key: HADOOP-5638
    URL: https://issues.apache.org/jira/browse/HADOOP-5638
    Project: Hadoop Core
    Issue Type: Improvement
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Hairong Kuang
    Fix For: 0.21.0

    Attachments: excludedList.patch, excludedList1.patch, excludedList2.patch, excludedList3.patch


    Block placement algorithm currently has an excluded node list, which contains all datanodes that have been visited. This list is implemented as an array list, whose cost of inserting is O(1) but the cost of query "contains" is O( n ), where n is the number of datanodes. This makes the cost of block placement to be O(n*n) when a cluster is full.
    I propose to change the data structure of the excluded node list as a HashMap. So in average, the cost of insertion is O(1) and the cost of query is O(1). This makes the block placement algorithm to be O( n ) in average.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Apr 16, 2009 at 10:06 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-5638:
    ----------------------------------

    Attachment: excludedList4.patch

    This patch additionally fixed the typos that Suresh pointed out.
    More improvement on block placement performance
    -----------------------------------------------

    Key: HADOOP-5638
    URL: https://issues.apache.org/jira/browse/HADOOP-5638
    Project: Hadoop Core
    Issue Type: Improvement
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Hairong Kuang
    Fix For: 0.21.0

    Attachments: excludedList.patch, excludedList1.patch, excludedList2.patch, excludedList3.patch, excludedList4.patch


    Block placement algorithm currently has an excluded node list, which contains all datanodes that have been visited. This list is implemented as an array list, whose cost of inserting is O(1) but the cost of query "contains" is O( n ), where n is the number of datanodes. This makes the cost of block placement to be O(n*n) when a cluster is full.
    I propose to change the data structure of the excluded node list as a HashMap. So in average, the cost of insertion is O(1) and the cost of query is O(1). This makes the block placement algorithm to be O( n ) in average.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Apr 17, 2009 at 12:30 am
    [ https://issues.apache.org/jira/browse/HADOOP-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699943#action_12699943 ]

    Hairong Kuang commented on HADOOP-5638:
    ---------------------------------------

    Ant patch passed:
    [exec] +1 overall.
    [exec]
    [exec] +1 @author. The patch does not contain any @author tags.
    [exec]
    [exec] +1 tests included. The patch appears to include 3 new or modified tests.
    [exec]
    [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
    [exec]
    [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
    [exec]
    [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
    [exec]
    [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
    [exec]
    [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

    Ant test-core passed except for crashed TestBackupNode reported at HADOOP-5573.
    More improvement on block placement performance
    -----------------------------------------------

    Key: HADOOP-5638
    URL: https://issues.apache.org/jira/browse/HADOOP-5638
    Project: Hadoop Core
    Issue Type: Improvement
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Hairong Kuang
    Fix For: 0.21.0

    Attachments: excludedList.patch, excludedList1.patch, excludedList2.patch, excludedList3.patch, excludedList4.patch


    Block placement algorithm currently has an excluded node list, which contains all datanodes that have been visited. This list is implemented as an array list, whose cost of inserting is O(1) but the cost of query "contains" is O( n ), where n is the number of datanodes. This makes the cost of block placement to be O(n*n) when a cluster is full.
    I propose to change the data structure of the excluded node list as a HashMap. So in average, the cost of insertion is O(1) and the cost of query is O(1). This makes the block placement algorithm to be O( n ) in average.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Apr 17, 2009 at 12:34 am
    [ https://issues.apache.org/jira/browse/HADOOP-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-5638:
    ----------------------------------

    Resolution: Fixed
    Hadoop Flags: [Reviewed]
    Status: Resolved (was: Patch Available)

    I've just committed this!
    More improvement on block placement performance
    -----------------------------------------------

    Key: HADOOP-5638
    URL: https://issues.apache.org/jira/browse/HADOOP-5638
    Project: Hadoop Core
    Issue Type: Improvement
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Hairong Kuang
    Fix For: 0.21.0

    Attachments: excludedList.patch, excludedList1.patch, excludedList2.patch, excludedList3.patch, excludedList4.patch


    Block placement algorithm currently has an excluded node list, which contains all datanodes that have been visited. This list is implemented as an array list, whose cost of inserting is O(1) but the cost of query "contains" is O( n ), where n is the number of datanodes. This makes the cost of block placement to be O(n*n) when a cluster is full.
    I propose to change the data structure of the excluded node list as a HashMap. So in average, the cost of insertion is O(1) and the cost of query is O(1). This makes the block placement algorithm to be O( n ) in average.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hudson (JIRA) at Apr 20, 2009 at 9:43 am
    [ https://issues.apache.org/jira/browse/HADOOP-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700726#action_12700726 ]

    Hudson commented on HADOOP-5638:
    --------------------------------

    Integrated in Hadoop-trunk #811 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/811/])

    More improvement on block placement performance
    -----------------------------------------------

    Key: HADOOP-5638
    URL: https://issues.apache.org/jira/browse/HADOOP-5638
    Project: Hadoop Core
    Issue Type: Improvement
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Hairong Kuang
    Fix For: 0.21.0

    Attachments: excludedList.patch, excludedList1.patch, excludedList2.patch, excludedList3.patch, excludedList4.patch


    Block placement algorithm currently has an excluded node list, which contains all datanodes that have been visited. This list is implemented as an array list, whose cost of inserting is O(1) but the cost of query "contains" is O( n ), where n is the number of datanodes. This makes the cost of block placement to be O(n*n) when a cluster is full.
    I propose to change the data structure of the excluded node list as a HashMap. So in average, the cost of insertion is O(1) and the cost of query is O(1). This makes the block placement algorithm to be O( n ) in average.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedApr 7, '09 at 10:52p
activeApr 20, '09 at 9:43a
posts21
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Hudson (JIRA): 21 posts

People

Translate

site design / logo © 2023 Grokbase