FAQ
I have noticed some memory leak problems in my HBase client.
RES has increased to 27g
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12676 root 20 0 30.8g 27g 5092 S 2 57.5 587:57.76 /opt/java/jre/bin/java -Djava.library.path=lib/.

But I am not sure the leak comes from HBase Client jar itself or just our client code.

This is some parameters of jvm.
:-Xms15g -Xmn12g -Xmx15g -XX:PermSize=64m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=65 -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=1 -XX:+CMSParallelRemarkEnabled

Who has experience in this case? , I need continue to dig :)



发件人: Gaojinchao
发送时间: 2011年11月30日 11:02
收件人: user@hbase.apache.org
主题: Suspected memory leak

In HBaseClient proceess, I found heap has been increased.
I used command ’cat smaps’ to get the heap size.
It seems in case when the threads pool in HTable has released the no using thread, if you use putlist api to put data again, the memory is increased.

Who has experience in this case?

Below is the heap of Hbase client:
C3S31:/proc/18769 # cat smaps
4010a000-4709d000 rwxp 00000000 00:00 0 [heap]
Size: 114252 kB
Rss: 114044 kB
Pss: 114044 kB

4010a000-4709d000 rwxp 00000000 00:00 0 [heap]
Size: 114252 kB
Rss: 114044 kB
Pss: 114044 kB

4010a000-48374000 rwxp 00000000 00:00 0 [heap]
Size: 133544 kB
Rss: 133336 kB
Pss: 133336 kB

4010a000-49f20000 rwxp 00000000 00:00 0 [heap]
Size: 161880 kB
Rss: 161672 kB
Pss: 161672 kB

4010a000-4c5de000 rwxp 00000000 00:00 0 [heap]
Size: 201552 kB
Rss: 201344 kB
Pss: 201344 kB

Search Discussions

  • Shrijeet Paliwal at Dec 1, 2011 at 2:26 am
    Gaojinchao,

    I had filed this some time ago,
    https://issues.apache.org/jira/browse/HBASE-4633
    But after some recent insights on our application code, I am inclined to
    think leak (or memory 'hold') is in our application. But it will be good to
    check out either way.
    I need to update the jira with my saga. See if the description of issue I
    posted there, matches yours. If not, may be you can update with your story
    in detail.

    -Shrijeet

    2011/11/30 Gaojinchao <gaojinchao@huawei.com>
    I have noticed some memory leak problems in my HBase client.
    RES has increased to 27g
    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    12676 root 20 0 30.8g 27g 5092 S 2 57.5 587:57.76
    /opt/java/jre/bin/java -Djava.library.path=lib/.

    But I am not sure the leak comes from HBase Client jar itself or just our
    client code.

    This is some parameters of jvm.
    :-Xms15g -Xmn12g -Xmx15g -XX:PermSize=64m -XX:+UseParNewGC
    -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=65
    -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=1
    -XX:+CMSParallelRemarkEnabled

    Who has experience in this case? , I need continue to dig :)



    发件人: Gaojinchao
    发送时间: 2011年11月30日 11:02
    收件人: user@hbase.apache.org
    主题: Suspected memory leak

    In HBaseClient proceess, I found heap has been increased.
    I used command ’cat smaps’ to get the heap size.
    It seems in case when the threads pool in HTable has released the no using
    thread, if you use putlist api to put data again, the memory is increased.

    Who has experience in this case?

    Below is the heap of Hbase client:
    C3S31:/proc/18769 # cat smaps
    4010a000-4709d000 rwxp 00000000 00:00 0
    [heap]
    Size: 114252 kB
    Rss: 114044 kB
    Pss: 114044 kB

    4010a000-4709d000 rwxp 00000000 00:00 0
    [heap]
    Size: 114252 kB
    Rss: 114044 kB
    Pss: 114044 kB

    4010a000-48374000 rwxp 00000000 00:00 0
    [heap]
    Size: 133544 kB
    Rss: 133336 kB
    Pss: 133336 kB

    4010a000-49f20000 rwxp 00000000 00:00 0
    [heap]
    Size: 161880 kB
    Rss: 161672 kB
    Pss: 161672 kB

    4010a000-4c5de000 rwxp 00000000 00:00 0
    [heap]
    Size: 201552 kB
    Rss: 201344 kB
    Pss: 201344 kB
  • Bijieshan at Dec 1, 2011 at 2:31 am
    Hi Shrijeet,

    I think that's jira relevant to trunk, but not for 90.X. For there's no timeout mechanism in 90.X. Right?
    We found this problem in 90.x.

    Thanks,

    Jieshan.

    -----邮件原件-----
    发件人: Shrijeet Paliwal
    发送时间: 2011年12月1日 10:26
    收件人: user@hbase.apache.org
    抄送: Gaojinchao; Chenjian
    主题: Re: Suspected memory leak

    Gaojinchao,

    I had filed this some time ago,
    https://issues.apache.org/jira/browse/HBASE-4633
    But after some recent insights on our application code, I am inclined to
    think leak (or memory 'hold') is in our application. But it will be good to
    check out either way.
    I need to update the jira with my saga. See if the description of issue I
    posted there, matches yours. If not, may be you can update with your story
    in detail.

    -Shrijeet

    2011/11/30 Gaojinchao <gaojinchao@huawei.com>
    I have noticed some memory leak problems in my HBase client.
    RES has increased to 27g
    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    12676 root 20 0 30.8g 27g 5092 S 2 57.5 587:57.76
    /opt/java/jre/bin/java -Djava.library.path=lib/.

    But I am not sure the leak comes from HBase Client jar itself or just our
    client code.

    This is some parameters of jvm.
    :-Xms15g -Xmn12g -Xmx15g -XX:PermSize=64m -XX:+UseParNewGC
    -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=65
    -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=1
    -XX:+CMSParallelRemarkEnabled

    Who has experience in this case? , I need continue to dig :)



    发件人: Gaojinchao
    发送时间: 2011年11月30日 11:02
    收件人: user@hbase.apache.org
    主题: Suspected memory leak

    In HBaseClient proceess, I found heap has been increased.
    I used command ’cat smaps’ to get the heap size.
    It seems in case when the threads pool in HTable has released the no using
    thread, if you use putlist api to put data again, the memory is increased.

    Who has experience in this case?

    Below is the heap of Hbase client:
    C3S31:/proc/18769 # cat smaps
    4010a000-4709d000 rwxp 00000000 00:00 0
    [heap]
    Size: 114252 kB
    Rss: 114044 kB
    Pss: 114044 kB

    4010a000-4709d000 rwxp 00000000 00:00 0
    [heap]
    Size: 114252 kB
    Rss: 114044 kB
    Pss: 114044 kB

    4010a000-48374000 rwxp 00000000 00:00 0
    [heap]
    Size: 133544 kB
    Rss: 133336 kB
    Pss: 133336 kB

    4010a000-49f20000 rwxp 00000000 00:00 0
    [heap]
    Size: 161880 kB
    Rss: 161672 kB
    Pss: 161672 kB

    4010a000-4c5de000 rwxp 00000000 00:00 0
    [heap]
    Size: 201552 kB
    Rss: 201344 kB
    Pss: 201344 kB
  • Shrijeet Paliwal at Dec 1, 2011 at 2:39 am
    Jieshan,
    We backported https://issues.apache.org/jira/browse/HBASE-2937 to 0.90.3

    -Shrijeet


    2011/11/30 bijieshan <bijieshan@huawei.com>
    Hi Shrijeet,

    I think that's jira relevant to trunk, but not for 90.X. For there's no
    timeout mechanism in 90.X. Right?
    We found this problem in 90.x.

    Thanks,

    Jieshan.

    -----邮件原件-----
    发件人: Shrijeet Paliwal
    发送时间: 2011年12月1日 10:26
    收件人: user@hbase.apache.org
    抄送: Gaojinchao; Chenjian
    主题: Re: Suspected memory leak

    Gaojinchao,

    I had filed this some time ago,
    https://issues.apache.org/jira/browse/HBASE-4633
    But after some recent insights on our application code, I am inclined to
    think leak (or memory 'hold') is in our application. But it will be good to
    check out either way.
    I need to update the jira with my saga. See if the description of issue I
    posted there, matches yours. If not, may be you can update with your story
    in detail.

    -Shrijeet

    2011/11/30 Gaojinchao <gaojinchao@huawei.com>
    I have noticed some memory leak problems in my HBase client.
    RES has increased to 27g
    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    12676 root 20 0 30.8g 27g 5092 S 2 57.5 587:57.76
    /opt/java/jre/bin/java -Djava.library.path=lib/.

    But I am not sure the leak comes from HBase Client jar itself or just our
    client code.

    This is some parameters of jvm.
    :-Xms15g -Xmn12g -Xmx15g -XX:PermSize=64m -XX:+UseParNewGC
    -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=65
    -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=1
    -XX:+CMSParallelRemarkEnabled

    Who has experience in this case? , I need continue to dig :)



    发件人: Gaojinchao
    发送时间: 2011年11月30日 11:02
    收件人: user@hbase.apache.org
    主题: Suspected memory leak

    In HBaseClient proceess, I found heap has been increased.
    I used command ’cat smaps’ to get the heap size.
    It seems in case when the threads pool in HTable has released the no using
    thread, if you use putlist api to put data again, the memory is
    increased.
    Who has experience in this case?

    Below is the heap of Hbase client:
    C3S31:/proc/18769 # cat smaps
    4010a000-4709d000 rwxp 00000000 00:00 0
    [heap]
    Size: 114252 kB
    Rss: 114044 kB
    Pss: 114044 kB

    4010a000-4709d000 rwxp 00000000 00:00 0
    [heap]
    Size: 114252 kB
    Rss: 114044 kB
    Pss: 114044 kB

    4010a000-48374000 rwxp 00000000 00:00 0
    [heap]
    Size: 133544 kB
    Rss: 133336 kB
    Pss: 133336 kB

    4010a000-49f20000 rwxp 00000000 00:00 0
    [heap]
    Size: 161880 kB
    Rss: 161672 kB
    Pss: 161672 kB

    4010a000-4c5de000 rwxp 00000000 00:00 0
    [heap]
    Size: 201552 kB
    Rss: 201344 kB
    Pss: 201344 kB
  • Bijieshan at Dec 1, 2011 at 2:45 am
    Shrijeet,

    Thanks, and nice to hear that. I will make a analysis of your patch. Maybe it can help. I think it's a native memory leak problem.

    Jieshan.

    -----邮件原件-----
    发件人: Shrijeet Paliwal
    发送时间: 2011年12月1日 10:38
    收件人: user@hbase.apache.org
    抄送: Gaojinchao; Chenjian
    主题: Re: Suspected memory leak

    Jieshan,
    We backported https://issues.apache.org/jira/browse/HBASE-2937 to 0.90.3

    -Shrijeet


    2011/11/30 bijieshan <bijieshan@huawei.com>
    Hi Shrijeet,

    I think that's jira relevant to trunk, but not for 90.X. For there's no
    timeout mechanism in 90.X. Right?
    We found this problem in 90.x.

    Thanks,

    Jieshan.

    -----邮件原件-----
    发件人: Shrijeet Paliwal
    发送时间: 2011年12月1日 10:26
    收件人: user@hbase.apache.org
    抄送: Gaojinchao; Chenjian
    主题: Re: Suspected memory leak

    Gaojinchao,

    I had filed this some time ago,
    https://issues.apache.org/jira/browse/HBASE-4633
    But after some recent insights on our application code, I am inclined to
    think leak (or memory 'hold') is in our application. But it will be good to
    check out either way.
    I need to update the jira with my saga. See if the description of issue I
    posted there, matches yours. If not, may be you can update with your story
    in detail.

    -Shrijeet

    2011/11/30 Gaojinchao <gaojinchao@huawei.com>
    I have noticed some memory leak problems in my HBase client.
    RES has increased to 27g
    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    12676 root 20 0 30.8g 27g 5092 S 2 57.5 587:57.76
    /opt/java/jre/bin/java -Djava.library.path=lib/.

    But I am not sure the leak comes from HBase Client jar itself or just our
    client code.

    This is some parameters of jvm.
    :-Xms15g -Xmn12g -Xmx15g -XX:PermSize=64m -XX:+UseParNewGC
    -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=65
    -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=1
    -XX:+CMSParallelRemarkEnabled

    Who has experience in this case? , I need continue to dig :)



    发件人: Gaojinchao
    发送时间: 2011年11月30日 11:02
    收件人: user@hbase.apache.org
    主题: Suspected memory leak

    In HBaseClient proceess, I found heap has been increased.
    I used command ’cat smaps’ to get the heap size.
    It seems in case when the threads pool in HTable has released the no using
    thread, if you use putlist api to put data again, the memory is
    increased.
    Who has experience in this case?

    Below is the heap of Hbase client:
    C3S31:/proc/18769 # cat smaps
    4010a000-4709d000 rwxp 00000000 00:00 0
    [heap]
    Size: 114252 kB
    Rss: 114044 kB
    Pss: 114044 kB

    4010a000-4709d000 rwxp 00000000 00:00 0
    [heap]
    Size: 114252 kB
    Rss: 114044 kB
    Pss: 114044 kB

    4010a000-48374000 rwxp 00000000 00:00 0
    [heap]
    Size: 133544 kB
    Rss: 133336 kB
    Pss: 133336 kB

    4010a000-49f20000 rwxp 00000000 00:00 0
    [heap]
    Size: 161880 kB
    Rss: 161672 kB
    Pss: 161672 kB

    4010a000-4c5de000 rwxp 00000000 00:00 0
    [heap]
    Size: 201552 kB
    Rss: 201344 kB
    Pss: 201344 kB
  • Vladimir Rodionov at Dec 1, 2011 at 5:27 pm
    You can create several heap dumps of JVM process in question and compare heap allocations
    To create heap dump:

    jmap pid

    To analize:
    1. jhat
    2. visualvm
    3. any commercial profiler

    One note: -Xmn12G ??? How long is your minor collections GC pauses?

    Best regards,
    Vladimir Rodionov
    Principal Platform Engineer
    Carrier IQ, www.carrieriq.com
    e-mail: vrodionov@carrieriq.com

    ________________________________________
    From: Ramkrishna S Vasudevan [ramkrishna.vasudevan@huawei.com]
    Sent: Wednesday, November 30, 2011 6:51 PM
    To: user@hbase.apache.org; dev@hbase.apache.org
    Subject: RE: Suspected memory leak

    Adding dev list to get some suggestions.

    Regards
    Ram


    -----Original Message-----
    From: Shrijeet Paliwal
    Sent: Thursday, December 01, 2011 8:08 AM
    To: user@hbase.apache.org
    Cc: Gaojinchao; Chenjian
    Subject: Re: Suspected memory leak

    Jieshan,
    We backported https://issues.apache.org/jira/browse/HBASE-2937 to 0.90.3

    -Shrijeet


    2011/11/30 bijieshan <bijieshan@huawei.com>
    Hi Shrijeet,

    I think that's jira relevant to trunk, but not for 90.X. For there's no
    timeout mechanism in 90.X. Right?
    We found this problem in 90.x.

    Thanks,

    Jieshan.

    -----邮件原件-----
    发件人: Shrijeet Paliwal
    发送时间: 2011年12月1日 10:26
    收件人: user@hbase.apache.org
    抄送: Gaojinchao; Chenjian
    主题: Re: Suspected memory leak

    Gaojinchao,

    I had filed this some time ago,
    https://issues.apache.org/jira/browse/HBASE-4633
    But after some recent insights on our application code, I am inclined to
    think leak (or memory 'hold') is in our application. But it will be good to
    check out either way.
    I need to update the jira with my saga. See if the description of issue I
    posted there, matches yours. If not, may be you can update with your story
    in detail.

    -Shrijeet

    2011/11/30 Gaojinchao <gaojinchao@huawei.com>
    I have noticed some memory leak problems in my HBase client.
    RES has increased to 27g
    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    12676 root 20 0 30.8g 27g 5092 S 2 57.5 587:57.76
    /opt/java/jre/bin/java -Djava.library.path=lib/.

    But I am not sure the leak comes from HBase Client jar itself or just
    our
    client code.

    This is some parameters of jvm.
    :-Xms15g -Xmn12g -Xmx15g -XX:PermSize=64m -XX:+UseParNewGC
    -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=65
    -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=1
    -XX:+CMSParallelRemarkEnabled

    Who has experience in this case? , I need continue to dig :)



    发件人: Gaojinchao
    发送时间: 2011年11月30日 11:02
    收件人: user@hbase.apache.org
    主题: Suspected memory leak

    In HBaseClient proceess, I found heap has been increased.
    I used command ’cat smaps’ to get the heap size.
    It seems in case when the threads pool in HTable has released the no using
    thread, if you use putlist api to put data again, the memory is
    increased.
    Who has experience in this case?

    Below is the heap of Hbase client:
    C3S31:/proc/18769 # cat smaps
    4010a000-4709d000 rwxp 00000000 00:00 0
    [heap]
    Size: 114252 kB
    Rss: 114044 kB
    Pss: 114044 kB

    4010a000-4709d000 rwxp 00000000 00:00 0
    [heap]
    Size: 114252 kB
    Rss: 114044 kB
    Pss: 114044 kB

    4010a000-48374000 rwxp 00000000 00:00 0
    [heap]
    Size: 133544 kB
    Rss: 133336 kB
    Pss: 133336 kB

    4010a000-49f20000 rwxp 00000000 00:00 0
    [heap]
    Size: 161880 kB
    Rss: 161672 kB
    Pss: 161672 kB

    4010a000-4c5de000 rwxp 00000000 00:00 0
    [heap]
    Size: 201552 kB
    Rss: 201344 kB
    Pss: 201344 kB

    Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.
  • Stack at Dec 1, 2011 at 7:27 pm
    Make sure its not the issue that Jonathan Payne identifiied a while
    back: https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357#
    St.Ack
  • Kihwal Lee at Dec 1, 2011 at 10:23 pm
    Adding to the excellent write-up by Jonathan:
    Since finalizer is involved, it takes two GC cycles to collect them. Due to a bug/bugs in the CMS GC, collection may not happen and the heap can grow really big. See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for details.

    Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the socket related objects were being collected properly. This option forces the concurrent marker to be one thread. This was for HDFS, but I think the same applies here.

    Kihwal

    On 12/1/11 1:26 PM, "Stack" wrote:

    Make sure its not the issue that Jonathan Payne identifiied a while
    back: https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357#
    St.Ack
  • Bijieshan at Dec 2, 2011 at 7:38 am
    Thank you all.
    I think it's the same problem with the link provided by Stack. Because the heap-size is stabilized, but the non-heap size keep growing. So I think not the problem of the CMS GC bug.
    And we have known the content of the problem memory section, all the records contains the info like below:
    "|www.hostname00000000000002087075.comlhggmdjapwpfvkqvxgnskzzydiywoacjnpljkarlehrnzzbpbxc||||||460|||||||||||Agent||||"
    "BBZHtable_UFDR_058,048342220093168-02570"
    ........

    Jieshan.

    -----邮件原件-----
    发件人: Kihwal Lee
    发送时间: 2011年12月2日 4:20
    收件人: dev@hbase.apache.org
    抄送: Ramakrishna s vasudevan; user@hbase.apache.org
    主题: Re: Suspected memory leak

    Adding to the excellent write-up by Jonathan:
    Since finalizer is involved, it takes two GC cycles to collect them. Due to a bug/bugs in the CMS GC, collection may not happen and the heap can grow really big. See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for details.

    Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the socket related objects were being collected properly. This option forces the concurrent marker to be one thread. This was for HDFS, but I think the same applies here.

    Kihwal

    On 12/1/11 1:26 PM, "Stack" wrote:

    Make sure its not the issue that Jonathan Payne identifiied a while
    back: https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357#
    St.Ack
  • Gaojinchao at Dec 4, 2011 at 3:58 am
    Thank you for your help.

    This issue appears to be a configuration problem:
    1. HBase client uses NIO(socket) API that uses the direct memory.
    2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if there doesn't have "full gc", all direct memory can't reclaim. Unfortunately, using GC confiugre parameter of our client doesn't produce any "full gc".

    This is only a preliminary result, All tests is running, If have any further results , we will be fed back.
    Finally , I will update our story to issue https://issues.apache.org/jira/browse/HBASE-4633.

    If our digging is crrect, whether we should set a default value for the "-XXMaxDirectMemorySize" to prevent this situation?


    Thanks

    -----邮件原件-----
    发件人: bijieshan
    发送时间: 2011年12月2日 15:37
    收件人: dev@hbase.apache.org; user@hbase.apache.org
    抄送: Chenjian; wenzaohua
    主题: Re: Suspected memory leak

    Thank you all.
    I think it's the same problem with the link provided by Stack. Because the heap-size is stabilized, but the non-heap size keep growing. So I think not the problem of the CMS GC bug.
    And we have known the content of the problem memory section, all the records contains the info like below:
    "|www.hostname00000000000002087075.comlhggmdjapwpfvkqvxgnskzzydiywoacjnpljkarlehrnzzbpbxc||||||460|||||||||||Agent||||"
    "BBZHtable_UFDR_058,048342220093168-02570"
    ........

    Jieshan.

    -----邮件原件-----
    发件人: Kihwal Lee
    发送时间: 2011年12月2日 4:20
    收件人: dev@hbase.apache.org
    抄送: Ramakrishna s vasudevan; user@hbase.apache.org
    主题: Re: Suspected memory leak

    Adding to the excellent write-up by Jonathan:
    Since finalizer is involved, it takes two GC cycles to collect them. Due to a bug/bugs in the CMS GC, collection may not happen and the heap can grow really big. See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for details.

    Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the socket related objects were being collected properly. This option forces the concurrent marker to be one thread. This was for HDFS, but I think the same applies here.

    Kihwal

    On 12/1/11 1:26 PM, "Stack" wrote:

    Make sure its not the issue that Jonathan Payne identifiied a while
    back: https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357#
    St.Ack
  • Sandy Pratt at Dec 5, 2011 at 9:55 pm
    Gaojinchao,

    I'm not certain, but this looks a lot like some of the issues I've been dealing with lately (namely, non-Java-heap memory leakage).

    First, -XX:MaxDirectMemorySize doesn't seem to be a solution. This flag is poorly documented, and moreover the problem appears to be related to releasing/reclaiming resources rather than over-allocating them. See http://bugs.sun.com/bugdatabase/view_bug.do;jsessionid=ae283c11508fb97ede5fe27a1554b?bug_id=4469299

    Second, you may wish to experiment with "-XX:+UseParallelGC -XX:+UseParallelOldGC" rather than CMS GC. I have been trying this recently on some of my app servers and hadoop servers, and it certainly does fix the problem of non-Java heap growth. The concern with parallel GC is that full GCs (which are the solution to the non-heap memory problem, it would seem) take too long. Personally, I consider this reasoning fallacious, since full GC is bound to occur sooner or later, and when using the CMS GC with this bug in effect, they can be fatal (and even without this bug, CMS uses a single thread for a full GC AFAIK). The numbers for parallel GC on a 2G heap are not terrible, even without tuning, even with old processors (max pause 2.8 sec, avg pause 1 sec for a full GC, with minor collections outnumbering the major at least 3:1, total overhead 1.3%). If your application can tolerate a second or two of latency once in a while, you can switch to parallelOldGC and call it a day.

    The fact that some installations are trying to deal with ~24GB heaps sounds like a design issue to me; HBase and Hadoop are already designed to scale horizontally, and this emphasis on scaling vertically just because the hardware comes in a certain size sounds misguided. But not having that hardware, I might be missing something.

    Finally, you might look at changing the vm.swappiness parameter in the Linux kernel (I think it's in sysctl.conf). I have set swappiness to 0 for my servers, and I'm happy with it. I don't know the exact mechanism, but it certainly appears that there's a memory pressure feedback of some sort going on between the kernel and the JVM. Perhaps it has to do with the total commit charge appearing lower (just physical instead of physical + swap) when swappiness is low. I'd love to hear from someone with a deep understanding of OS memory allocation about this.

    Hope this helps,
    Sandy

    -----Original Message-----
    From: Gaojinchao
    Sent: Saturday, December 03, 2011 19:58
    To: user@hbase.apache.org; dev@hbase.apache.org
    Cc: Chenjian; wenzaohua
    Subject: FeedbackRe: Suspected memory leak

    Thank you for your help.

    This issue appears to be a configuration problem:
    1. HBase client uses NIO(socket) API that uses the direct memory.
    2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if there
    doesn't have "full gc", all direct memory can't reclaim. Unfortunately, using
    GC confiugre parameter of our client doesn't produce any "full gc".

    This is only a preliminary result, All tests is running, If have any further results
    , we will be fed back.
    Finally , I will update our story to issue
    https://issues.apache.org/jira/browse/HBASE-4633.

    If our digging is crrect, whether we should set a default value for the "-
    XXMaxDirectMemorySize" to prevent this situation?


    Thanks

    -----邮件原件-----
    发件人: bijieshan
    发送时间: 2011年12月2日 15:37
    收件人: dev@hbase.apache.org; user@hbase.apache.org
    抄送: Chenjian; wenzaohua
    主题: Re: Suspected memory leak

    Thank you all.
    I think it's the same problem with the link provided by Stack. Because the
    heap-size is stabilized, but the non-heap size keep growing. So I think not the
    problem of the CMS GC bug.
    And we have known the content of the problem memory section, all the
    records contains the info like below:
    "|www.hostname00000000000002087075.comlhggmdjapwpfvkqvxgnskzzydi
    ywoacjnpljkarlehrnzzbpbxc||||||460|||||||||||Agent||||"
    "BBZHtable_UFDR_058,048342220093168-02570"
    ........

    Jieshan.

    -----邮件原件-----
    发件人: Kihwal Lee
    发送时间: 2011年12月2日 4:20
    收件人: dev@hbase.apache.org
    抄送: Ramakrishna s vasudevan; user@hbase.apache.org
    主题: Re: Suspected memory leak

    Adding to the excellent write-up by Jonathan:
    Since finalizer is involved, it takes two GC cycles to collect them. Due to a
    bug/bugs in the CMS GC, collection may not happen and the heap can grow
    really big. See
    http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for
    details.

    Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the socket
    related objects were being collected properly. This option forces the
    concurrent marker to be one thread. This was for HDFS, but I think the same
    applies here.

    Kihwal

    On 12/1/11 1:26 PM, "Stack" wrote:

    Make sure its not the issue that Jonathan Payne identifiied a while
    back:
    https://groups.google.com/group/asynchbase/browse_thread/thread/c45b
    c7ba788b2357#
    St.Ack

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedDec 1, '11 at 2:13a
activeDec 5, '11 at 9:55p
posts11
users7
websitehbase.apache.org

People

Translate

site design / logo © 2018 Grokbase