Grokbase Groups HBase dev June 2009
FAQ
Extend TestHeapSize and ClassSize to do "deep" sizing of Objects
----------------------------------------------------------------

Key: HBASE-1590
URL: https://issues.apache.org/jira/browse/HBASE-1590
Project: Hadoop HBase
Issue Type: Improvement
Affects Versions: 0.20.0
Reporter: Jonathan Gray
Fix For: 0.20.0


As discussed in HBASE-1554 there is a bit of a disconnect between how ClassSize calculates the heap size and how we need to calculate heap size in our implementations.

For example, the LRU block cache can be sized via ClassSize, but it is a shallow sizing. There is a backing ConcurrentHashMap that is the largest memory consumer. However, ClassSize only counts that as a single reference. But in our heapSize() reporting, we want to include *everything* within that Object.

This issue is to resolve that dissonance. We may need to create an additional ClassSize.estimateDeep(), we may need to rethink our HeapSize interface, or maybe just leave it as is. The two primary goals of all this testing is to 1) ensure that if something is changed and the sizing is not updated, our tests fail, and 2) ensure our sizing is as accurate as possible.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Jonathan Gray (JIRA) at Jun 30, 2009 at 2:41 am
    [ https://issues.apache.org/jira/browse/HBASE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725440#action_12725440 ]

    Jonathan Gray commented on HBASE-1590:
    --------------------------------------

    I'm not sure we need to do this anymore. Patch going in for HBASE-1591 cleaned up LruBlockCache heapsizing and it works well now and is accurate.

    Remaining issues are...

    - How do we really ensure sizing of the protected members of things like ConcurrentHashMap (Entry and Segment). Can use SizeOf but would rather try to do some hackery/reflection business so we can dig in with ClassSize.
    - Review of MemStore heapSize implementation... Same issue as above for ConcurrentSkipListMap
    Extend TestHeapSize and ClassSize to do "deep" sizing of Objects
    ----------------------------------------------------------------

    Key: HBASE-1590
    URL: https://issues.apache.org/jira/browse/HBASE-1590
    Project: Hadoop HBase
    Issue Type: Improvement
    Affects Versions: 0.20.0
    Reporter: Jonathan Gray
    Fix For: 0.20.0


    As discussed in HBASE-1554 there is a bit of a disconnect between how ClassSize calculates the heap size and how we need to calculate heap size in our implementations.
    For example, the LRU block cache can be sized via ClassSize, but it is a shallow sizing. There is a backing ConcurrentHashMap that is the largest memory consumer. However, ClassSize only counts that as a single reference. But in our heapSize() reporting, we want to include *everything* within that Object.
    This issue is to resolve that dissonance. We may need to create an additional ClassSize.estimateDeep(), we may need to rethink our HeapSize interface, or maybe just leave it as is. The two primary goals of all this testing is to 1) ensure that if something is changed and the sizing is not updated, our tests fail, and 2) ensure our sizing is as accurate as possible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Jun 30, 2009 at 3:50 am
    [ https://issues.apache.org/jira/browse/HBASE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725458#action_12725458 ]

    stack commented on HBASE-1590:
    ------------------------------

    Move it out of 0.20.0?
    Extend TestHeapSize and ClassSize to do "deep" sizing of Objects
    ----------------------------------------------------------------

    Key: HBASE-1590
    URL: https://issues.apache.org/jira/browse/HBASE-1590
    Project: Hadoop HBase
    Issue Type: Improvement
    Affects Versions: 0.20.0
    Reporter: Jonathan Gray
    Fix For: 0.20.0


    As discussed in HBASE-1554 there is a bit of a disconnect between how ClassSize calculates the heap size and how we need to calculate heap size in our implementations.
    For example, the LRU block cache can be sized via ClassSize, but it is a shallow sizing. There is a backing ConcurrentHashMap that is the largest memory consumer. However, ClassSize only counts that as a single reference. But in our heapSize() reporting, we want to include *everything* within that Object.
    This issue is to resolve that dissonance. We may need to create an additional ClassSize.estimateDeep(), we may need to rethink our HeapSize interface, or maybe just leave it as is. The two primary goals of all this testing is to 1) ensure that if something is changed and the sizing is not updated, our tests fail, and 2) ensure our sizing is as accurate as possible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Jonathan Gray (JIRA) at Jun 30, 2009 at 4:45 am
    [ https://issues.apache.org/jira/browse/HBASE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725469#action_12725469 ]

    Jonathan Gray commented on HBASE-1590:
    --------------------------------------

    Let's keep it open, would like to get Erik's input tomorrow.

    We need to address above 2 issues, this is as fine a place as any.
    Extend TestHeapSize and ClassSize to do "deep" sizing of Objects
    ----------------------------------------------------------------

    Key: HBASE-1590
    URL: https://issues.apache.org/jira/browse/HBASE-1590
    Project: Hadoop HBase
    Issue Type: Improvement
    Affects Versions: 0.20.0
    Reporter: Jonathan Gray
    Fix For: 0.20.0


    As discussed in HBASE-1554 there is a bit of a disconnect between how ClassSize calculates the heap size and how we need to calculate heap size in our implementations.
    For example, the LRU block cache can be sized via ClassSize, but it is a shallow sizing. There is a backing ConcurrentHashMap that is the largest memory consumer. However, ClassSize only counts that as a single reference. But in our heapSize() reporting, we want to include *everything* within that Object.
    This issue is to resolve that dissonance. We may need to create an additional ClassSize.estimateDeep(), we may need to rethink our HeapSize interface, or maybe just leave it as is. The two primary goals of all this testing is to 1) ensure that if something is changed and the sizing is not updated, our tests fail, and 2) ensure our sizing is as accurate as possible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Erik Holstad (JIRA) at Jul 1, 2009 at 9:52 pm
    [ https://issues.apache.org/jira/browse/HBASE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726234#action_12726234 ]

    Erik Holstad commented on HBASE-1590:
    -------------------------------------

    Have been working on a deepClassSize and have a working version of it. There are a couple of things that makes the whole concept of checking the size of a class rather than
    an object hard. Let's take the TreeMap as an example, it has a reference to the root entry which in has references to entry left, right and parent, so how do you know when to stop?
    From the two main goals we already have 1 so we have 2 left.
    One thing we could do, is to lift some test code using Instrumention.getObjectSize() into some test, so we don't have to include the jar. The problem is then how we should run it, since it requires -javaagent:/home/erik/src/tgzs/SizeOf.jar at the moment. Will see if I can work around this, to be able to use in unit test.

    Extend TestHeapSize and ClassSize to do "deep" sizing of Objects
    ----------------------------------------------------------------

    Key: HBASE-1590
    URL: https://issues.apache.org/jira/browse/HBASE-1590
    Project: Hadoop HBase
    Issue Type: Improvement
    Affects Versions: 0.20.0
    Reporter: Jonathan Gray
    Fix For: 0.20.0


    As discussed in HBASE-1554 there is a bit of a disconnect between how ClassSize calculates the heap size and how we need to calculate heap size in our implementations.
    For example, the LRU block cache can be sized via ClassSize, but it is a shallow sizing. There is a backing ConcurrentHashMap that is the largest memory consumer. However, ClassSize only counts that as a single reference. But in our heapSize() reporting, we want to include *everything* within that Object.
    This issue is to resolve that dissonance. We may need to create an additional ClassSize.estimateDeep(), we may need to rethink our HeapSize interface, or maybe just leave it as is. The two primary goals of all this testing is to 1) ensure that if something is changed and the sizing is not updated, our tests fail, and 2) ensure our sizing is as accurate as possible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Nitay Joffe (JIRA) at Jul 1, 2009 at 9:57 pm
    [ https://issues.apache.org/jira/browse/HBASE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726239#action_12726239 ]

    Nitay Joffe commented on HBASE-1590:
    ------------------------------------

    What if you maintain a Set<Object> of references that have been counted already. That way you can traverse any data structure and check if you need to recurse. For example, when you get to the 'parent' reference you'll see it has already been counted so you just count the reference itself without recursing into it.
    Extend TestHeapSize and ClassSize to do "deep" sizing of Objects
    ----------------------------------------------------------------

    Key: HBASE-1590
    URL: https://issues.apache.org/jira/browse/HBASE-1590
    Project: Hadoop HBase
    Issue Type: Improvement
    Affects Versions: 0.20.0
    Reporter: Jonathan Gray
    Fix For: 0.20.0


    As discussed in HBASE-1554 there is a bit of a disconnect between how ClassSize calculates the heap size and how we need to calculate heap size in our implementations.
    For example, the LRU block cache can be sized via ClassSize, but it is a shallow sizing. There is a backing ConcurrentHashMap that is the largest memory consumer. However, ClassSize only counts that as a single reference. But in our heapSize() reporting, we want to include *everything* within that Object.
    This issue is to resolve that dissonance. We may need to create an additional ClassSize.estimateDeep(), we may need to rethink our HeapSize interface, or maybe just leave it as is. The two primary goals of all this testing is to 1) ensure that if something is changed and the sizing is not updated, our tests fail, and 2) ensure our sizing is as accurate as possible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Jul 1, 2009 at 11:05 pm
    [ https://issues.apache.org/jira/browse/HBASE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726260#action_12726260 ]

    stack commented on HBASE-1590:
    ------------------------------

    @holstad is Instrumention.getObjectSize() a sizeof call? SizeOf is GPL, right? Let me know if you want me to work on build to add like we have for clover where you point at a sizeof install and then run an ant task with -javaagent:/home/erik/src/tgzs/SizeOf.jar. We could run this as part of hudson build (I think -- maybe GPL code is disallowed up on hudson ... would have to see)... or we could run it as part of release.
    Extend TestHeapSize and ClassSize to do "deep" sizing of Objects
    ----------------------------------------------------------------

    Key: HBASE-1590
    URL: https://issues.apache.org/jira/browse/HBASE-1590
    Project: Hadoop HBase
    Issue Type: Improvement
    Affects Versions: 0.20.0
    Reporter: Jonathan Gray
    Fix For: 0.20.0


    As discussed in HBASE-1554 there is a bit of a disconnect between how ClassSize calculates the heap size and how we need to calculate heap size in our implementations.
    For example, the LRU block cache can be sized via ClassSize, but it is a shallow sizing. There is a backing ConcurrentHashMap that is the largest memory consumer. However, ClassSize only counts that as a single reference. But in our heapSize() reporting, we want to include *everything* within that Object.
    This issue is to resolve that dissonance. We may need to create an additional ClassSize.estimateDeep(), we may need to rethink our HeapSize interface, or maybe just leave it as is. The two primary goals of all this testing is to 1) ensure that if something is changed and the sizing is not updated, our tests fail, and 2) ensure our sizing is as accurate as possible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Erik Holstad (JIRA) at Jul 1, 2009 at 11:36 pm
    [ https://issues.apache.org/jira/browse/HBASE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726272#action_12726272 ]

    Erik Holstad commented on HBASE-1590:
    -------------------------------------

    @Nitay, yes that would work if we were checking sizes of Objects, but now we are just dealing with classes so it is very hard to take that approach.

    @Stack, yup it is GPL, I just wasn't sure how you would add specific -jvm arguments to Hudson, been trying to get it to work from within eclipse without setting the arguments but without any luck so far. Seems like there are some tools that ship with the sun version but not with Openjdk until 7. So if we can run it with the arguments for now would be really nice.


    Extend TestHeapSize and ClassSize to do "deep" sizing of Objects
    ----------------------------------------------------------------

    Key: HBASE-1590
    URL: https://issues.apache.org/jira/browse/HBASE-1590
    Project: Hadoop HBase
    Issue Type: Improvement
    Affects Versions: 0.20.0
    Reporter: Jonathan Gray
    Fix For: 0.20.0


    As discussed in HBASE-1554 there is a bit of a disconnect between how ClassSize calculates the heap size and how we need to calculate heap size in our implementations.
    For example, the LRU block cache can be sized via ClassSize, but it is a shallow sizing. There is a backing ConcurrentHashMap that is the largest memory consumer. However, ClassSize only counts that as a single reference. But in our heapSize() reporting, we want to include *everything* within that Object.
    This issue is to resolve that dissonance. We may need to create an additional ClassSize.estimateDeep(), we may need to rethink our HeapSize interface, or maybe just leave it as is. The two primary goals of all this testing is to 1) ensure that if something is changed and the sizing is not updated, our tests fail, and 2) ensure our sizing is as accurate as possible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Jonathan Gray (JIRA) at Jul 2, 2009 at 8:55 pm
    [ https://issues.apache.org/jira/browse/HBASE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Jonathan Gray updated HBASE-1590:
    ---------------------------------

    Fix Version/s: (was: 0.20.0)
    0.20.1

    Punting to 0.20.1 ... Doing something for this will be useful but let's not hold up 0.20.0

    What needs to be done for 0.20.0 will now be handled over in HBASE-1607
    Extend TestHeapSize and ClassSize to do "deep" sizing of Objects
    ----------------------------------------------------------------

    Key: HBASE-1590
    URL: https://issues.apache.org/jira/browse/HBASE-1590
    Project: Hadoop HBase
    Issue Type: Improvement
    Affects Versions: 0.20.0
    Reporter: Jonathan Gray
    Fix For: 0.20.1


    As discussed in HBASE-1554 there is a bit of a disconnect between how ClassSize calculates the heap size and how we need to calculate heap size in our implementations.
    For example, the LRU block cache can be sized via ClassSize, but it is a shallow sizing. There is a backing ConcurrentHashMap that is the largest memory consumer. However, ClassSize only counts that as a single reference. But in our heapSize() reporting, we want to include *everything* within that Object.
    This issue is to resolve that dissonance. We may need to create an additional ClassSize.estimateDeep(), we may need to rethink our HeapSize interface, or maybe just leave it as is. The two primary goals of all this testing is to 1) ensure that if something is changed and the sizing is not updated, our tests fail, and 2) ensure our sizing is as accurate as possible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Jonathan Gray (JIRA) at Aug 12, 2009 at 10:54 pm
    [ https://issues.apache.org/jira/browse/HBASE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Jonathan Gray resolved HBASE-1590.
    ----------------------------------

    Resolution: Won't Fix
    Assignee: Jonathan Gray

    All testing on 0.20 shows we are more than okay w.r.t. our HeapSizing. Will open a new issue against 0.21 if we do need any further improvements.
    Extend TestHeapSize and ClassSize to do "deep" sizing of Objects
    ----------------------------------------------------------------

    Key: HBASE-1590
    URL: https://issues.apache.org/jira/browse/HBASE-1590
    Project: Hadoop HBase
    Issue Type: Improvement
    Affects Versions: 0.20.0
    Reporter: Jonathan Gray
    Assignee: Jonathan Gray
    Fix For: 0.20.1


    As discussed in HBASE-1554 there is a bit of a disconnect between how ClassSize calculates the heap size and how we need to calculate heap size in our implementations.
    For example, the LRU block cache can be sized via ClassSize, but it is a shallow sizing. There is a backing ConcurrentHashMap that is the largest memory consumer. However, ClassSize only counts that as a single reference. But in our heapSize() reporting, we want to include *everything* within that Object.
    This issue is to resolve that dissonance. We may need to create an additional ClassSize.estimateDeep(), we may need to rethink our HeapSize interface, or maybe just leave it as is. The two primary goals of all this testing is to 1) ensure that if something is changed and the sizing is not updated, our tests fail, and 2) ensure our sizing is as accurate as possible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Aug 12, 2009 at 11:40 pm
    [ https://issues.apache.org/jira/browse/HBASE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    stack reopened HBASE-1590:
    --------------------------


    Mind if I keep this open JG? I think it'd be sweet integrating the sizeof jar fetching it if user asks for it. Maybe when we move build to ivy it wouldn't be too hard.
    Extend TestHeapSize and ClassSize to do "deep" sizing of Objects
    ----------------------------------------------------------------

    Key: HBASE-1590
    URL: https://issues.apache.org/jira/browse/HBASE-1590
    Project: Hadoop HBase
    Issue Type: Improvement
    Affects Versions: 0.20.0
    Reporter: Jonathan Gray
    Assignee: Jonathan Gray
    Fix For: 0.21.0


    As discussed in HBASE-1554 there is a bit of a disconnect between how ClassSize calculates the heap size and how we need to calculate heap size in our implementations.
    For example, the LRU block cache can be sized via ClassSize, but it is a shallow sizing. There is a backing ConcurrentHashMap that is the largest memory consumer. However, ClassSize only counts that as a single reference. But in our heapSize() reporting, we want to include *everything* within that Object.
    This issue is to resolve that dissonance. We may need to create an additional ClassSize.estimateDeep(), we may need to rethink our HeapSize interface, or maybe just leave it as is. The two primary goals of all this testing is to 1) ensure that if something is changed and the sizing is not updated, our tests fail, and 2) ensure our sizing is as accurate as possible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Aug 12, 2009 at 11:40 pm
    [ https://issues.apache.org/jira/browse/HBASE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    stack updated HBASE-1590:
    -------------------------

    Priority: Minor (was: Major)
    Fix Version/s: (was: 0.20.1)
    0.21.0
    Assignee: stack (was: Jonathan Gray)

    Assigned me, moved to 0.21 and made it trivial.
    Extend TestHeapSize and ClassSize to do "deep" sizing of Objects
    ----------------------------------------------------------------

    Key: HBASE-1590
    URL: https://issues.apache.org/jira/browse/HBASE-1590
    Project: Hadoop HBase
    Issue Type: Improvement
    Affects Versions: 0.20.0
    Reporter: Jonathan Gray
    Assignee: stack
    Priority: Minor
    Fix For: 0.21.0


    As discussed in HBASE-1554 there is a bit of a disconnect between how ClassSize calculates the heap size and how we need to calculate heap size in our implementations.
    For example, the LRU block cache can be sized via ClassSize, but it is a shallow sizing. There is a backing ConcurrentHashMap that is the largest memory consumer. However, ClassSize only counts that as a single reference. But in our heapSize() reporting, we want to include *everything* within that Object.
    This issue is to resolve that dissonance. We may need to create an additional ClassSize.estimateDeep(), we may need to rethink our HeapSize interface, or maybe just leave it as is. The two primary goals of all this testing is to 1) ensure that if something is changed and the sizing is not updated, our tests fail, and 2) ensure our sizing is as accurate as possible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieshbase, hadoop
postedJun 29, '09 at 9:03p
activeAug 12, '09 at 11:40p
posts12
users1
websitehbase.apache.org

1 user in discussion

stack (JIRA): 12 posts

People

Translate

site design / logo © 2022 Grokbase