FAQ
J-D,

Your hypothesis is interesting.

I took the same step -- change 100 -> 10 -- to reduce the probability that regionservers would OOME under high write load as generated by an end simulation I have been developing, to model an application we plan to deploy. (Stack, this is the next generation of the monster that led us to find the problem with ByteArrayOutputStream buffer management in the 0.19 time frame. It's baaaaack, bigger than before.)

Reducing handler.count did move the needle, but sooner or later they are all dead, at 4G heap, or 8G heap... and the usual GC tuning tricks are not helping.

When I get back from this latest tour of Asia next week I need to dig in with jhat and jprofiler.

Best regards,

- Andy


--- On Thu, 12/2/10, Jean-Daniel Cryans (JIRA) wrote:
From: Jean-Daniel Cryans (JIRA) <jira@apache.org>
Subject: [jira] Created: (HBASE-3303) Lower hbase.regionserver.handler.count from 25 back to 10
To: issues@hbase.apache.org
Date: Thursday, December 2, 2010, 2:02 PM
Lower
hbase.regionserver.handler.count from 25 back to 10
---------------------------------------------------------


Key: HBASE-3303

URL: https://issues.apache.org/jira/browse/HBASE-3303

Project: HBase
Issue Type: Improvement
Reporter:
Jean-Daniel Cryans
Assignee:
Jean-Daniel Cryans
Fix
For: 0.90.0


With HBASE-2506 in mind, I tested a low-memory environment
(2GB of heap) with a lot of concurrent writers using the
default write buffer to verify if a lower number of handlers
actually helps reducing the occurrence full GCs. Very
unscientifically, at this moment I think it's safe to say
that yes, it helps.

With the defaults, I saw a region server struggling more
and more because the random inserters at some point started
filling up all the handlers and were all BLOCKED trying to
sync the WAL. It's safe to say that each of those clients
carried a payload that the GC cannot get rid of and it's one
that we don't account for (as opposed to MemStore and the
block cache).

With a much lower setting of 5, I didn't see the
situation.

It kind of confirms my hypothesis but I need to do more
proper testing. In the mean time, in order to lower the
onslaught of users that write to the ML complaining about
either GCs or OOMEs, I think we should set the handlers back
to what it was originally (10) for 0.90.0 and add some
documentation about configuring
hbase.regionserver.handler.count

I'd like to hear others' thoughts.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.

Search Discussions

  • Jean-Daniel Cryans at Dec 2, 2010 at 11:22 pm
    Hey Andrew,

    They were still all dead? From session expiration or OOME? Or HDFS issues?

    J-D
    On Thu, Dec 2, 2010 at 3:17 PM, Andrew Purtell wrote:
    J-D,

    Your hypothesis is interesting.

    I took the same step -- change 100 -> 10 -- to reduce the probability that regionservers would OOME under high write load as generated by an end simulation I have been developing, to model an application we plan to deploy. (Stack, this is the next generation of the monster that led us to find the problem with ByteArrayOutputStream buffer management in the 0.19 time frame. It's baaaaack, bigger than before.)

    Reducing handler.count did move the needle, but sooner or later they are all dead, at 4G heap, or 8G heap... and the usual GC tuning tricks are not helping.

    When I get back from this latest tour of Asia next week I need to dig in with jhat and jprofiler.

    Best regards,

    - Andy


    --- On Thu, 12/2/10, Jean-Daniel Cryans (JIRA) wrote:
    From: Jean-Daniel Cryans (JIRA) <jira@apache.org>
    Subject: [jira] Created: (HBASE-3303) Lower hbase.regionserver.handler.count from 25 back to 10
    To: issues@hbase.apache.org
    Date: Thursday, December 2, 2010, 2:02 PM
    Lower
    hbase.regionserver.handler.count from 25 back to 10
    ---------------------------------------------------------


    Key: HBASE-3303

    URL: https://issues.apache.org/jira/browse/HBASE-3303

    Project: HBase
    Issue Type: Improvement
    Reporter:
    Jean-Daniel Cryans
    Assignee:
    Jean-Daniel Cryans
    Fix
    For: 0.90.0


    With HBASE-2506 in mind, I tested a low-memory environment
    (2GB of heap) with a lot of concurrent writers using the
    default write buffer to verify if a lower number of handlers
    actually helps reducing the occurrence full GCs. Very
    unscientifically, at this moment I think it's safe to say
    that yes, it helps.

    With the defaults, I saw a region server struggling more
    and more because the random inserters at some point started
    filling up all the handlers and were all BLOCKED trying to
    sync the WAL. It's safe to say that each of those clients
    carried a payload that the GC cannot get rid of and it's one
    that we don't account for (as opposed to MemStore and the
    block cache).

    With a much lower setting of 5, I didn't see the
    situation.

    It kind of confirms my hypothesis but I need to do more
    proper testing. In the mean time, in order to lower the
    onslaught of users that write to the ML complaining about
    either GCs or OOMEs, I think we should set the handlers back
    to what it was originally (10) for 0.90.0 and add some
    documentation about configuring
    hbase.regionserver.handler.count

    I'd like to hear others' thoughts.

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue
    online.

  • Todd Lipcon at Dec 2, 2010 at 11:29 pm

    On Thu, Dec 2, 2010 at 3:21 PM, Jean-Daniel Cryans wrote:

    Hey Andrew,

    They were still all dead? From session expiration or OOME? Or HDFS issues?
    I've found the same in my load testing - it's a compaction pause for me.
    Avoiding heap fragmentation seems to be basically impossible.

    -Todd

    J-D
    On Thu, Dec 2, 2010 at 3:17 PM, Andrew Purtell wrote:
    J-D,

    Your hypothesis is interesting.

    I took the same step -- change 100 -> 10 -- to reduce the probability
    that regionservers would OOME under high write load as generated by an end
    simulation I have been developing, to model an application we plan to
    deploy. (Stack, this is the next generation of the monster that led us to
    find the problem with ByteArrayOutputStream buffer management in the 0.19
    time frame. It's baaaaack, bigger than before.)
    Reducing handler.count did move the needle, but sooner or later they are
    all dead, at 4G heap, or 8G heap... and the usual GC tuning tricks are not
    helping.
    When I get back from this latest tour of Asia next week I need to dig in
    with jhat and jprofiler.
    Best regards,

    - Andy


    --- On Thu, 12/2/10, Jean-Daniel Cryans (JIRA) wrote:
    From: Jean-Daniel Cryans (JIRA) <jira@apache.org>
    Subject: [jira] Created: (HBASE-3303) Lower
    hbase.regionserver.handler.count from 25 back to 10
    To: issues@hbase.apache.org
    Date: Thursday, December 2, 2010, 2:02 PM
    Lower
    hbase.regionserver.handler.count from 25 back to 10
    ---------------------------------------------------------


    Key: HBASE-3303

    URL: https://issues.apache.org/jira/browse/HBASE-3303

    Project: HBase
    Issue Type: Improvement
    Reporter:
    Jean-Daniel Cryans
    Assignee:
    Jean-Daniel Cryans
    Fix
    For: 0.90.0


    With HBASE-2506 in mind, I tested a low-memory environment
    (2GB of heap) with a lot of concurrent writers using the
    default write buffer to verify if a lower number of handlers
    actually helps reducing the occurrence full GCs. Very
    unscientifically, at this moment I think it's safe to say
    that yes, it helps.

    With the defaults, I saw a region server struggling more
    and more because the random inserters at some point started
    filling up all the handlers and were all BLOCKED trying to
    sync the WAL. It's safe to say that each of those clients
    carried a payload that the GC cannot get rid of and it's one
    that we don't account for (as opposed to MemStore and the
    block cache).

    With a much lower setting of 5, I didn't see the
    situation.

    It kind of confirms my hypothesis but I need to do more
    proper testing. In the mean time, in order to lower the
    onslaught of users that write to the ML complaining about
    either GCs or OOMEs, I think we should set the handlers back
    to what it was originally (10) for 0.90.0 and add some
    documentation about configuring
    hbase.regionserver.handler.count

    I'd like to hear others' thoughts.

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue
    online.



    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Andrew Purtell at Dec 2, 2010 at 11:39 pm
    Usually splits for me. Quite similar. If I'm reading the GC log correctly, usually in the middle of a normal parallel CMS collection.

    Best regards,

    - Andy


    --- On Thu, 12/2/10, Todd Lipcon wrote:
    From: Todd Lipcon <todd@cloudera.com>
    Subject: Re: [jira] Created: (HBASE-3303) Lower hbase.regionserver.handler.count from 25 back to 10
    To: dev@hbase.apache.org
    Date: Thursday, December 2, 2010, 3:28 PM
    On Thu, Dec 2, 2010 at 3:21 PM,
    Jean-Daniel Cryans wrote:
    Hey Andrew,

    They were still all dead? From session expiration or
    OOME? Or HDFS issues?
    I've found the same in my load testing - it's a compaction
    pause for me.
    Avoiding heap fragmentation seems to be basically
    impossible.

    -Todd

    J-D

    On Thu, Dec 2, 2010 at 3:17 PM, Andrew Purtell <apurtell@apache.org>
    wrote:
    J-D,

    Your hypothesis is interesting.

    I took the same step -- change 100 -> 10 -- to
    reduce the probability
    that regionservers would OOME under high write load as
    generated by an end
    simulation I have been developing, to model an
    application we plan to
    deploy. (Stack, this is the next generation of the
    monster that led us to
    find the problem with ByteArrayOutputStream buffer
    management in the 0.19
    time frame. It's baaaaack, bigger than before.)
    Reducing handler.count did move the needle, but
    sooner or later they are
    all dead, at 4G heap, or 8G heap... and the usual GC
    tuning tricks are not
    helping.
    When I get back from this latest tour of Asia
    next week I need to dig in
    with jhat and jprofiler.
    Best regards,

    - Andy


    --- On Thu, 12/2/10, Jean-Daniel Cryans (JIRA)
    wrote:
    From: Jean-Daniel Cryans (JIRA) <jira@apache.org>
    Subject: [jira] Created: (HBASE-3303) Lower
    hbase.regionserver.handler.count from 25 back to 10
    To: issues@hbase.apache.org
    Date: Thursday, December 2, 2010, 2:02 PM
    Lower
    hbase.regionserver.handler.count from 25 back
    to 10
    ---------------------------------------------------------

    Key: HBASE-3303

    URL: https://issues.apache.org/jira/browse/HBASE-3303

    Project: HBase
    Issue Type: Improvement
    Reporter:
    Jean-Daniel Cryans
    Assignee:
    Jean-Daniel Cryans
    Fix
    For: 0.90.0


    With HBASE-2506 in mind, I tested a
    low-memory environment
    (2GB of heap) with a lot of concurrent
    writers using the
    default write buffer to verify if a lower
    number of handlers
    actually helps reducing the occurrence full
    GCs. Very
    unscientifically, at this moment I think it's
    safe to say
    that yes, it helps.

    With the defaults, I saw a region server
    struggling more
    and more because the random inserters at some
    point started
    filling up all the handlers and were all
    BLOCKED trying to
    sync the WAL. It's safe to say that each of
    those clients
    carried a payload that the GC cannot get rid
    of and it's one
    that we don't account for (as opposed to
    MemStore and the
    block cache).

    With a much lower setting of 5, I didn't see
    the
    situation.

    It kind of confirms my hypothesis but I need
    to do more
    proper testing. In the mean time, in order to
    lower the
    onslaught of users that write to the ML
    complaining about
    either GCs or OOMEs, I think we should set
    the handlers back
    to what it was originally (10) for 0.90.0 and
    add some
    documentation about configuring
    hbase.regionserver.handler.count

    I'd like to hear others' thoughts.

    --
    This message is automatically generated by
    JIRA.
    -
    You can reply to this email to add a comment
    to the issue
    online.



    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Andrew Purtell at Dec 2, 2010 at 11:55 pm
    Wow I should spend more time proofreading mails I send < 8am.

    Restated: I usually see OOME around splits. Seems the heap pressure between that and client load is too much. Using LZO compression causes OOME on average sooner of course. If I'm reading the GC log correctly, usually this happens in the middle of a normal parallel CMS collection. Either the GC is just not keeping up or there is a real leak here. Like I said, need to dig in with jhat and jprofiler.

    My application simulation performs a transitive web page fetch and writes all of the content as List<Put> in one transaction. There are 100 fetch worker threads. Each has a private pool of 8 threads for fetching transitive resources in parallel. I seed the list with the Alexa top 1M web sites and run up in EC2 where they have fat ingress pipes. So the workload can be extreme, but is a reasonable approximation of what we might see in production. I've tried clusters of 5 or 10 c1.xlarge instances in EC2 and it doesn't seem to matter. With one client it's ok, add another and after we get to the point where there are ~10 regions on each RS, they fall like dominos.

    0.20.6 does not do this, interestingly, though the good results with 0.20.6 are with an earlier version of my simulation. But I will retest with it next week.

    Best regards,

    - Andy


    --- On Thu, 12/2/10, Andrew Purtell wrote:
    From: Andrew Purtell <apurtell@apache.org>
    Subject: Re: [jira] Created: (HBASE-3303) Lower hbase.regionserver.handler.count from 25 back to 10
    To: dev@hbase.apache.org
    Date: Thursday, December 2, 2010, 3:38 PM
    Usually splits for me. Quite similar.
    If I'm reading the GC log correctly, usually in the middle
    of a normal parallel CMS collection.

    Best regards,

    - Andy


    --- On Thu, 12/2/10, Todd Lipcon wrote:
    From: Todd Lipcon <todd@cloudera.com>
    Subject: Re: [jira] Created: (HBASE-3303) Lower
    hbase.regionserver.handler.count from 25 back to 10
    To: dev@hbase.apache.org
    Date: Thursday, December 2, 2010, 3:28 PM
    On Thu, Dec 2, 2010 at 3:21 PM,
    Jean-Daniel Cryans wrote:
    Hey Andrew,

    They were still all dead? From session expiration
    or
    OOME? Or HDFS issues?
    I've found the same in my load testing - it's a
    compaction
    pause for me.
    Avoiding heap fragmentation seems to be basically
    impossible.

    -Todd

    J-D

    On Thu, Dec 2, 2010 at 3:17 PM, Andrew Purtell
    <apurtell@apache.org>
    wrote:
    J-D,

    Your hypothesis is interesting.

    I took the same step -- change 100 -> 10
    -- to
    reduce the probability
    that regionservers would OOME under high write
    load as
    generated by an end
    simulation I have been developing, to model an
    application we plan to
    deploy. (Stack, this is the next generation of
    the
    monster that led us to
    find the problem with ByteArrayOutputStream
    buffer
    management in the 0.19
    time frame. It's baaaaack, bigger than before.)
    Reducing handler.count did move the needle,
    but
    sooner or later they are
    all dead, at 4G heap, or 8G heap... and the usual
    GC
    tuning tricks are not
    helping.
    When I get back from this latest tour of
    Asia
    next week I need to dig in
    with jhat and jprofiler.
    Best regards,

    - Andy


    --- On Thu, 12/2/10, Jean-Daniel Cryans
    (JIRA)
    <jira@apache.org>
    wrote:
    From: Jean-Daniel Cryans (JIRA) <jira@apache.org>
    Subject: [jira] Created: (HBASE-3303)
    Lower
    hbase.regionserver.handler.count from 25 back to
    10
    To: issues@hbase.apache.org
    Date: Thursday, December 2, 2010, 2:02
    PM
    Lower
    hbase.regionserver.handler.count from 25
    back
    to 10
    ---------------------------------------------------------

    Key: HBASE-3303

    URL: https://issues.apache.org/jira/browse/HBASE-3303

    Project: HBase
    Issue Type: Improvement
    Reporter:
    Jean-Daniel Cryans
    Assignee:
    Jean-Daniel Cryans
    Fix
    For: 0.90.0


    With HBASE-2506 in mind, I tested a
    low-memory environment
    (2GB of heap) with a lot of concurrent
    writers using the
    default write buffer to verify if a
    lower
    number of handlers
    actually helps reducing the occurrence
    full
    GCs. Very
    unscientifically, at this moment I think
    it's
    safe to say
    that yes, it helps.

    With the defaults, I saw a region
    server
    struggling more
    and more because the random inserters at
    some
    point started
    filling up all the handlers and were
    all
    BLOCKED trying to
    sync the WAL. It's safe to say that each
    of
    those clients
    carried a payload that the GC cannot get
    rid
    of and it's one
    that we don't account for (as opposed
    to
    MemStore and the
    block cache).

    With a much lower setting of 5, I didn't
    see
    the
    situation.

    It kind of confirms my hypothesis but I
    need
    to do more
    proper testing. In the mean time, in
    order to
    lower the
    onslaught of users that write to the ML
    complaining about
    either GCs or OOMEs, I think we should
    set
    the handlers back
    to what it was originally (10) for
    0.90.0 and
    add some
    documentation about configuring
    hbase.regionserver.handler.count

    I'd like to hear others' thoughts.

    --
    This message is automatically generated
    by
    JIRA.
    -
    You can reply to this email to add a
    comment
    to the issue
    online.



    --
    Todd Lipcon
    Software Engineer, Cloudera

  • Andrew Purtell at Dec 2, 2010 at 11:37 pm
    OOME

    Best regards,

    - Andy


    --- On Thu, 12/2/10, Jean-Daniel Cryans wrote:
    From: Jean-Daniel Cryans <jdcryans@apache.org>
    Subject: Re: [jira] Created: (HBASE-3303) Lower hbase.regionserver.handler.count from 25 back to 10
    To: dev@hbase.apache.org
    Date: Thursday, December 2, 2010, 3:21 PM
    Hey Andrew,

    They were still all dead? From session expiration or OOME?
    Or HDFS issues?

    J-D
    On Thu, Dec 2, 2010 at 3:17 PM, Andrew Purtell wrote:
    J-D,

    Your hypothesis is interesting.

    I took the same step -- change 100 -> 10 -- to
    reduce the probability that regionservers would OOME under
    high write load as generated by an end simulation I have
    been developing, to model an application we plan to deploy.
    (Stack, this is the next generation of the monster that led
    us to find the problem with ByteArrayOutputStream buffer
    management in the 0.19 time frame. It's baaaaack, bigger
    than before.)
    Reducing handler.count did move the needle, but sooner
    or later they are all dead, at 4G heap, or 8G heap... and
    the usual GC tuning tricks are not helping.
    When I get back from this latest tour of Asia next
    week I need to dig in with jhat and jprofiler.
    Best regards,

    - Andy


    --- On Thu, 12/2/10, Jean-Daniel Cryans (JIRA) wrote:
    From: Jean-Daniel Cryans (JIRA) <jira@apache.org>
    Subject: [jira] Created: (HBASE-3303) Lower
    hbase.regionserver.handler.count from 25 back to 10
    To: issues@hbase.apache.org
    Date: Thursday, December 2, 2010, 2:02 PM
    Lower
    hbase.regionserver.handler.count from 25 back to
    10
    ---------------------------------------------------------

    Key: HBASE-3303

    URL: https://issues.apache.org/jira/browse/HBASE-3303

    Project: HBase
    Issue Type: Improvement
    Reporter:
    Jean-Daniel Cryans
    Assignee:
    Jean-Daniel Cryans
    Fix
    For: 0.90.0


    With HBASE-2506 in mind, I tested a low-memory
    environment
    (2GB of heap) with a lot of concurrent writers
    using the
    default write buffer to verify if a lower number
    of handlers
    actually helps reducing the occurrence full GCs.
    Very
    unscientifically, at this moment I think it's safe
    to say
    that yes, it helps.

    With the defaults, I saw a region server
    struggling more
    and more because the random inserters at some
    point started
    filling up all the handlers and were all BLOCKED
    trying to
    sync the WAL. It's safe to say that each of those
    clients
    carried a payload that the GC cannot get rid of
    and it's one
    that we don't account for (as opposed to MemStore
    and the
    block cache).

    With a much lower setting of 5, I didn't see the
    situation.

    It kind of confirms my hypothesis but I need to do
    more
    proper testing. In the mean time, in order to
    lower the
    onslaught of users that write to the ML
    complaining about
    either GCs or OOMEs, I think we should set the
    handlers back
    to what it was originally (10) for 0.90.0 and add
    some
    documentation about configuring
    hbase.regionserver.handler.count

    I'd like to hear others' thoughts.

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to
    the issue
    online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieshbase, hadoop
postedDec 2, '10 at 11:18p
activeDec 2, '10 at 11:55p
posts6
users3
websitehbase.apache.org

People

Translate

site design / logo © 2022 Grokbase