FAQ
Hello,

I just wanted to make sure that I'm interpreting a series of common issues correctly.

I saw ZK expirations causing regionserver failures, and this in a GC log of one of the regionservers:

16237.033: [GC[YG occupancy: 22353 K (38336 K)]16245.298: [Rescan (parallel) , 0.0264040 secs]16245.325: [weak refs processing, 0.0000970 secs] [1 CMS-remark: 1465176K(3282456K)] 1487530K(3320792K), 0.0266760 secs] [Times: user=0.02 sys=0.01, real=8.29 secs]

5328.127: [GC[YG occupancy: 27822 K (38336 K)]5334.773: [Rescan (parallel) , 0.0156270 secs]5334.788: [weak refs processing, 0.0003130 secs] [1 CMS-remark: 1144288K(2375464K)] 1172111K(2413800K), 0.0161190 secs] [Times: user=0.02 sys=0.00, real=6.66 secs]

I noted the rather large delta between the user/sys times & the real times here:

[Times: user=0.02 sys=0.00, real=6.66 secs]
[Times: user=0.02 sys=0.01, real=8.29 secs]

So I'm assuming in the second of the two common causes of the GC issues?
That is, CPU or I/O bound M/R tasks are starving the GC of CPU time?

Just wanted to check that I was stringing the logic (and logs) together correctly.

Thanks!

Take care,
-stu

Search Discussions

  • Abhijit Pol at Nov 10, 2010 at 2:16 am
    how are GC settings in hbase-env.sh look like for you? did you add/remove
    from out of box hbase-env.sh?

    try running this on RS and watch last column, each increment should be small
    sudo -u <RS_USER> jstats -gcutil <RS_PID> 1000


    On Tue, Nov 9, 2010 at 10:53 AM, Stuart Smith wrote:

    Hello,

    I just wanted to make sure that I'm interpreting a series of common issues
    correctly.

    I saw ZK expirations causing regionserver failures, and this in a GC log of
    one of the regionservers:

    16237.033: [GC[YG occupancy: 22353 K (38336 K)]16245.298: [Rescan
    (parallel) , 0.0264040 secs]16245.325: [weak refs processing, 0.0000970
    secs] [1 CMS-remark: 1465176K(3282456K)] 1487530K(3320792K), 0.0266760 secs]
    [Times: user=0.02 sys=0.01, real=8.29 secs]

    5328.127: [GC[YG occupancy: 27822 K (38336 K)]5334.773: [Rescan (parallel)
    , 0.0156270 secs]5334.788: [weak refs processing, 0.0003130 secs] [1
    CMS-remark: 1144288K(2375464K)] 1172111K(2413800K), 0.0161190 secs] [Times:
    user=0.02 sys=0.00, real=6.66 secs]

    I noted the rather large delta between the user/sys times & the real times
    here:

    [Times: user=0.02 sys=0.00, real=6.66 secs]
    [Times: user=0.02 sys=0.01, real=8.29 secs]

    So I'm assuming in the second of the two common causes of the GC issues?
    That is, CPU or I/O bound M/R tasks are starving the GC of CPU time?

    Just wanted to check that I was stringing the logic (and logs) together
    correctly.

    Thanks!

    Take care,
    -stu


  • Ted Yu at Nov 10, 2010 at 4:11 am
    I think there is typo below - jstats should be jstat
    On Tue, Nov 9, 2010 at 6:16 PM, Abhijit Pol wrote:

    how are GC settings in hbase-env.sh look like for you? did you add/remove
    from out of box hbase-env.sh?

    try running this on RS and watch last column, each increment should be
    small
    sudo -u <RS_USER> jstats -gcutil <RS_PID> 1000


    On Tue, Nov 9, 2010 at 10:53 AM, Stuart Smith wrote:

    Hello,

    I just wanted to make sure that I'm interpreting a series of common issues
    correctly.

    I saw ZK expirations causing regionserver failures, and this in a GC log of
    one of the regionservers:

    16237.033: [GC[YG occupancy: 22353 K (38336 K)]16245.298: [Rescan
    (parallel) , 0.0264040 secs]16245.325: [weak refs processing, 0.0000970
    secs] [1 CMS-remark: 1465176K(3282456K)] 1487530K(3320792K), 0.0266760 secs]
    [Times: user=0.02 sys=0.01, real=8.29 secs]

    5328.127: [GC[YG occupancy: 27822 K (38336 K)]5334.773: [Rescan
    (parallel)
    , 0.0156270 secs]5334.788: [weak refs processing, 0.0003130 secs] [1
    CMS-remark: 1144288K(2375464K)] 1172111K(2413800K), 0.0161190 secs] [Times:
    user=0.02 sys=0.00, real=6.66 secs]

    I noted the rather large delta between the user/sys times & the real times
    here:

    [Times: user=0.02 sys=0.00, real=6.66 secs]
    [Times: user=0.02 sys=0.01, real=8.29 secs]

    So I'm assuming in the second of the two common causes of the GC issues?
    That is, CPU or I/O bound M/R tasks are starving the GC of CPU time?

    Just wanted to check that I was stringing the logic (and logs) together
    correctly.

    Thanks!

    Take care,
    -stu


  • Stuart Smith at Nov 14, 2010 at 2:19 am
    Hello Abhihit,

    Thanks for the tip. The only problem is that the regionserver dies when this happens, so there's no regionserver pid :)

    It only happens when I run a particular M/R task, so I'm going to try one more time with no tasks running on the region servers.

    Take care,
    -stu

    --- On Tue, 11/9/10, Abhijit Pol wrote:
    From: Abhijit Pol <apol@rocketfuel.com>
    Subject: Re: Is this indicative of a GC CPU starvation?
    To: user@hbase.apache.org
    Date: Tuesday, November 9, 2010, 9:16 PM
    how are GC settings in hbase-env.sh
    look like for you? did you add/remove
    from out of box hbase-env.sh?

    try running this on RS and watch last column, each
    increment should be small
    sudo -u <RS_USER> jstats -gcutil <RS_PID> 1000


    On Tue, Nov 9, 2010 at 10:53 AM, Stuart Smith wrote:

    Hello,

    I just wanted to make sure that I'm interpreting
    a series of common issues
    correctly.

    I saw ZK expirations causing regionserver failures,
    and this in a GC log of
    one of the regionservers:

    16237.033: [GC[YG occupancy: 22353 K (38336
    K)]16245.298: [Rescan
    (parallel) , 0.0264040 secs]16245.325: [weak refs
    processing, 0.0000970
    secs] [1 CMS-remark: 1465176K(3282456K)]
    1487530K(3320792K), 0.0266760 secs]
    [Times: user=0.02 sys=0.01, real=8.29 secs]

    5328.127: [GC[YG occupancy: 27822 K (38336
    K)]5334.773: [Rescan (parallel)
    , 0.0156270 secs]5334.788: [weak refs processing,
    0.0003130 secs] [1
    CMS-remark: 1144288K(2375464K)] 1172111K(2413800K),
    0.0161190 secs] [Times:
    user=0.02 sys=0.00, real=6.66 secs]

    I noted the rather large delta between the user/sys
    times & the real times
    here:

    [Times: user=0.02 sys=0.00, real=6.66 secs]
    [Times: user=0.02 sys=0.01, real=8.29 secs]

    So I'm assuming in the second of the two common causes
    of the GC issues?
    That is, CPU or I/O bound M/R tasks are starving the
    GC of CPU time?
    Just wanted to check that I was stringing the logic
    (and logs) together
    correctly.

    Thanks!

    Take care,
    -stu


Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedNov 9, '10 at 6:53p
activeNov 14, '10 at 2:19a
posts4
users3
websitehbase.apache.org

People

Translate

site design / logo © 2022 Grokbase