FAQ
Thanks in advance for any help. I've been quite pleased with Hbase for this current project and until this problem it has worked quite well.

Test cluster setup is CDH3b3 on a 7 nodes:
5 data nodes with 48GB RAM, 8 cores, 4 disks,
2 masters with 8 cores, 2 disks 24GB RAM for master/zookeeper/namenode

My hbase.hregion.max.filesize is set to 1GB, ulimit files to 32k and xceivers to 4096, hbase heap is at 8GB.

I'm testing out using GZ compression on two tables, each is currently still only one region. My tests runs fine when compression is off so this is definitely related to compression. When I start loading data (via thrift, many clients) it loads great for a while then the region servers slow to crawl. When this happens the two regionservers that are hosting the tables use ~ 110-160% CPU and block writes. One regionserver has occasional bursts of activity but mostly is very repetitive, here is a sample of the log:

http://pastebin.com/WSc8aZFQ

The other active regionserver looks to be continuously compacting:

http://pastebin.com/3ifVKaX2


The master log is quite boring with this being repeated:

2011-01-08 00:48:58,419 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scanning meta region {server: 10.56.24.8:60020, regionname: -ROOT-,,0.70236052, startKey: <>}
2011-01-08 00:48:58,424 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scan of 1 row(s) of meta region {server: 10.56.24.8:60020, regionname: -ROOT-,,0.70236052, startKey: <>} complete
2011-01-08 00:48:58,444 INFO org.apache.hadoop.hbase.master.ServerManager: 5 region servers, 0 dead, average load 1.6
2011-01-08 00:49:04,810 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {server: 10.56.24.7:60020, regionname: .META.,,1.1028785192, startKey: <>}
2011-01-08 00:49:04,820 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scan of 6 row(s) of meta region {server: 10.56.24.7:60020, regionname: .META.,,1.1028785192, startKey: <>} complete
2011-01-08 00:49:04,820 INFO org.apache.hadoop.hbase.master.BaseScanner: All 1 .META. region(s) scanned


At this point loading slows to a trickle (requests are 0 in the web ui), I can see infrequent bursts of loading but very small amounts. Each table only has one region (and there are only two other tables, each also with only one region).

I've compiled and tested the native GZ compression codecs on the nodes and the nodes have plenty of CPU, IO and memory available and no swapping. Any suggestions? Please let me know if you need any other info.

thanks!
-chris

Search Discussions

  • Sandy Pratt at Jan 10, 2011 at 6:55 pm
    Chris,

    I'm curious if this happens when hbase.hregion.max.filesize is set to the default 256m. Have you tested it?

    Sandy
    -----Original Message-----
    From: Christopher Tarnas On Behalf Of Chris Tarnas
    Sent: Friday, January 07, 2011 23:07
    To: user@hbase.apache.org
    Subject: Strange regionserver behavior with GZ compression

    Thanks in advance for any help. I've been quite pleased with Hbase for this
    current project and until this problem it has worked quite well.

    Test cluster setup is CDH3b3 on a 7 nodes:
    5 data nodes with 48GB RAM, 8 cores, 4 disks,
    2 masters with 8 cores, 2 disks 24GB RAM for master/zookeeper/namenode

    My hbase.hregion.max.filesize is set to 1GB, ulimit files to 32k and xceivers to
    4096, hbase heap is at 8GB.

    I'm testing out using GZ compression on two tables, each is currently still only
    one region. My tests runs fine when compression is off so this is definitely
    related to compression. When I start loading data (via thrift, many clients) it
    loads great for a while then the region servers slow to crawl. When this
    happens the two regionservers that are hosting the tables use ~ 110-160%
    CPU and block writes. One regionserver has occasional bursts of activity but
    mostly is very repetitive, here is a sample of the log:

    http://pastebin.com/WSc8aZFQ

    The other active regionserver looks to be continuously compacting:

    http://pastebin.com/3ifVKaX2


    The master log is quite boring with this being repeated:

    2011-01-08 00:48:58,419 INFO
    org.apache.hadoop.hbase.master.BaseScanner:
    RegionManager.rootScanner scanning meta region {server: 10.56.24.8:60020,
    regionname: -ROOT-,,0.70236052, startKey: <>}
    2011-01-08 00:48:58,424 INFO
    org.apache.hadoop.hbase.master.BaseScanner:
    RegionManager.rootScanner scan of 1 row(s) of meta region {server:
    10.56.24.8:60020, regionname: -ROOT-,,0.70236052, startKey: <>} complete
    2011-01-08 00:48:58,444 INFO
    org.apache.hadoop.hbase.master.ServerManager: 5 region servers, 0 dead,
    average load 1.6
    2011-01-08 00:49:04,810 INFO
    org.apache.hadoop.hbase.master.BaseScanner:
    RegionManager.metaScanner scanning meta region {server:
    10.56.24.7:60020, regionname: .META.,,1.1028785192, startKey: <>}
    2011-01-08 00:49:04,820 INFO
    org.apache.hadoop.hbase.master.BaseScanner:
    RegionManager.metaScanner scan of 6 row(s) of meta region {server:
    10.56.24.7:60020, regionname: .META.,,1.1028785192, startKey: <>} complete
    2011-01-08 00:49:04,820 INFO
    org.apache.hadoop.hbase.master.BaseScanner: All 1 .META. region(s)
    scanned


    At this point loading slows to a trickle (requests are 0 in the web ui), I can see
    infrequent bursts of loading but very small amounts. Each table only has one
    region (and there are only two other tables, each also with only one region).

    I've compiled and tested the native GZ compression codecs on the nodes
    and the nodes have plenty of CPU, IO and memory available and no
    swapping. Any suggestions? Please let me know if you need any other info.

    thanks!
    -chris
  • Chirstopher Tarnas at Jan 11, 2011 at 4:22 pm
    I have not tested GZ compression on a 256mb region size yet. When I start a
    new round of testing I will, thanks for the idea,

    -chris
    On Mon, Jan 10, 2011 at 12:54 PM, Sandy Pratt wrote:

    Chris,

    I'm curious if this happens when hbase.hregion.max.filesize is set to the
    default 256m. Have you tested it?

    Sandy
    -----Original Message-----
    From: Christopher Tarnas On Behalf Of Chris Tarnas
    Sent: Friday, January 07, 2011 23:07
    To: user@hbase.apache.org
    Subject: Strange regionserver behavior with GZ compression

    Thanks in advance for any help. I've been quite pleased with Hbase for this
    current project and until this problem it has worked quite well.

    Test cluster setup is CDH3b3 on a 7 nodes:
    5 data nodes with 48GB RAM, 8 cores, 4 disks,
    2 masters with 8 cores, 2 disks 24GB RAM for master/zookeeper/namenode

    My hbase.hregion.max.filesize is set to 1GB, ulimit files to 32k and
    xceivers to
    4096, hbase heap is at 8GB.

    I'm testing out using GZ compression on two tables, each is currently
    still only
    one region. My tests runs fine when compression is off so this is
    definitely
    related to compression. When I start loading data (via thrift, many
    clients) it
    loads great for a while then the region servers slow to crawl. When this
    happens the two regionservers that are hosting the tables use ~ 110-160%
    CPU and block writes. One regionserver has occasional bursts of activity but
    mostly is very repetitive, here is a sample of the log:

    http://pastebin.com/WSc8aZFQ

    The other active regionserver looks to be continuously compacting:

    http://pastebin.com/3ifVKaX2


    The master log is quite boring with this being repeated:

    2011-01-08 00:48:58,419 INFO
    org.apache.hadoop.hbase.master.BaseScanner:
    RegionManager.rootScanner scanning meta region {server: 10.56.24.8:60020 ,
    regionname: -ROOT-,,0.70236052, startKey: <>}
    2011-01-08 00:48:58,424 INFO
    org.apache.hadoop.hbase.master.BaseScanner:
    RegionManager.rootScanner scan of 1 row(s) of meta region {server:
    10.56.24.8:60020, regionname: -ROOT-,,0.70236052, startKey: <>} complete
    2011-01-08 00:48:58,444 INFO
    org.apache.hadoop.hbase.master.ServerManager: 5 region servers, 0 dead,
    average load 1.6
    2011-01-08 00:49:04,810 INFO
    org.apache.hadoop.hbase.master.BaseScanner:
    RegionManager.metaScanner scanning meta region {server:
    10.56.24.7:60020, regionname: .META.,,1.1028785192, startKey: <>}
    2011-01-08 00:49:04,820 INFO
    org.apache.hadoop.hbase.master.BaseScanner:
    RegionManager.metaScanner scan of 6 row(s) of meta region {server:
    10.56.24.7:60020, regionname: .META.,,1.1028785192, startKey: <>} complete
    2011-01-08 00:49:04,820 INFO
    org.apache.hadoop.hbase.master.BaseScanner: All 1 .META. region(s)
    scanned


    At this point loading slows to a trickle (requests are 0 in the web ui), I can see
    infrequent bursts of loading but very small amounts. Each table only has one
    region (and there are only two other tables, each also with only one region).
    I've compiled and tested the native GZ compression codecs on the nodes
    and the nodes have plenty of CPU, IO and memory available and no
    swapping. Any suggestions? Please let me know if you need any other info.

    thanks!
    -chris
  • Stack at Jan 10, 2011 at 7:53 pm
    Odd. Mind thread dumping the regionserver a few times and
    pastebining it during a compaction so we can see where its spending
    time? (Your compaction numbers are bad).

    St.Ack
    On Fri, Jan 7, 2011 at 11:07 PM, Chris Tarnas wrote:
    Thanks in advance for any help. I've been quite pleased with Hbase for this current project and until this problem it has worked quite well.

    Test cluster setup is CDH3b3 on a 7 nodes:
    5 data nodes with 48GB RAM, 8 cores, 4 disks,
    2 masters with 8 cores, 2 disks 24GB RAM for master/zookeeper/namenode

    My hbase.hregion.max.filesize is set to 1GB, ulimit files to 32k and xceivers to 4096, hbase heap is at 8GB.

    I'm testing out using GZ compression on two tables, each is currently still only one region. My tests runs fine when compression is off so this is definitely related to compression. When I start loading data (via thrift, many clients) it loads great for a while then the region servers slow to crawl. When this happens the two regionservers that are hosting the tables use ~ 110-160% CPU and block writes. One regionserver has occasional bursts of activity but mostly is very repetitive, here is a sample of the log:

    http://pastebin.com/WSc8aZFQ

    The other active regionserver looks to be continuously compacting:

    http://pastebin.com/3ifVKaX2


    The master log is quite boring with this being repeated:

    2011-01-08 00:48:58,419 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scanning meta region {server: 10.56.24.8:60020, regionname: -ROOT-,,0.70236052, startKey: <>}
    2011-01-08 00:48:58,424 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scan of 1 row(s) of meta region {server: 10.56.24.8:60020, regionname: -ROOT-,,0.70236052, startKey: <>} complete
    2011-01-08 00:48:58,444 INFO org.apache.hadoop.hbase.master.ServerManager: 5 region servers, 0 dead, average load 1.6
    2011-01-08 00:49:04,810 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {server: 10.56.24.7:60020, regionname: .META.,,1.1028785192, startKey: <>}
    2011-01-08 00:49:04,820 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scan of 6 row(s) of meta region {server: 10.56.24.7:60020, regionname: .META.,,1.1028785192, startKey: <>} complete
    2011-01-08 00:49:04,820 INFO org.apache.hadoop.hbase.master.BaseScanner: All 1 .META. region(s) scanned


    At this point loading slows to a trickle (requests are 0 in the web ui), I can see infrequent bursts of loading but very small amounts. Each table only has one region (and there are only two other tables, each also with only one region).

    I've compiled and tested the native GZ compression codecs on the nodes and the nodes have plenty of CPU, IO and memory available and no swapping. Any suggestions? Please let me know if you need any other info.

    thanks!
    -chris
  • Chirstopher Tarnas at Jan 11, 2011 at 4:18 pm
    Hi Stack,

    Thanks for taking a look. I think I caught a regionserver compacting:

    http://pastebin.com/y9BQaVeJ

    http://pastebin.com/ZMxwEX5j

    thanks again,
    -chris
    On Mon, Jan 10, 2011 at 1:52 PM, Stack wrote:

    Odd. Mind thread dumping the regionserver a few times and
    pastebining it during a compaction so we can see where its spending
    time? (Your compaction numbers are bad).

    St.Ack
    On Fri, Jan 7, 2011 at 11:07 PM, Chris Tarnas wrote:
    Thanks in advance for any help. I've been quite pleased with Hbase for
    this current project and until this problem it has worked quite well.
    Test cluster setup is CDH3b3 on a 7 nodes:
    5 data nodes with 48GB RAM, 8 cores, 4 disks,
    2 masters with 8 cores, 2 disks 24GB RAM for master/zookeeper/namenode

    My hbase.hregion.max.filesize is set to 1GB, ulimit files to 32k and
    xceivers to 4096, hbase heap is at 8GB.
    I'm testing out using GZ compression on two tables, each is currently
    still only one region. My tests runs fine when compression is off so this is
    definitely related to compression. When I start loading data (via thrift,
    many clients) it loads great for a while then the region servers slow to
    crawl. When this happens the two regionservers that are hosting the tables
    use ~ 110-160% CPU and block writes. One regionserver has occasional bursts
    of activity but mostly is very repetitive, here is a sample of the log:
    http://pastebin.com/WSc8aZFQ

    The other active regionserver looks to be continuously compacting:

    http://pastebin.com/3ifVKaX2


    The master log is quite boring with this being repeated:

    2011-01-08 00:48:58,419 INFO org.apache.hadoop.hbase.master.BaseScanner:
    RegionManager.rootScanner scanning meta region {server: 10.56.24.8:60020,
    regionname: -ROOT-,,0.70236052, startKey: <>}
    2011-01-08 00:48:58,424 INFO org.apache.hadoop.hbase.master.BaseScanner:
    RegionManager.rootScanner scan of 1 row(s) of meta region {server:
    10.56.24.8:60020, regionname: -ROOT-,,0.70236052, startKey: <>} complete
    2011-01-08 00:48:58,444 INFO
    org.apache.hadoop.hbase.master.ServerManager: 5 region servers, 0 dead,
    average load 1.6
    2011-01-08 00:49:04,810 INFO org.apache.hadoop.hbase.master.BaseScanner:
    RegionManager.metaScanner scanning meta region {server: 10.56.24.7:60020,
    regionname: .META.,,1.1028785192, startKey: <>}
    2011-01-08 00:49:04,820 INFO org.apache.hadoop.hbase.master.BaseScanner:
    RegionManager.metaScanner scan of 6 row(s) of meta region {server:
    10.56.24.7:60020, regionname: .META.,,1.1028785192, startKey: <>} complete
    2011-01-08 00:49:04,820 INFO org.apache.hadoop.hbase.master.BaseScanner:
    All 1 .META. region(s) scanned

    At this point loading slows to a trickle (requests are 0 in the web ui),
    I can see infrequent bursts of loading but very small amounts. Each table
    only has one region (and there are only two other tables, each also with
    only one region).
    I've compiled and tested the native GZ compression codecs on the nodes
    and the nodes have plenty of CPU, IO and memory available and no swapping.
    Any suggestions? Please let me know if you need any other info.
    thanks!
    -chris
  • Chirstopher Tarnas at Jan 12, 2011 at 9:46 pm
    I'm doing a test now w/o any GZ compression enabled and I am seeing the same
    pauses in loading... any more ideas? I will try dropping my region size down
    to 256 MB next. Currently I cannot get any sustained writing via thrift for
    more than a few seconds before it all pauses.

    -chris

    On Tue, Jan 11, 2011 at 10:18 AM, Chirstopher Tarnas wrote:

    Hi Stack,

    Thanks for taking a look. I think I caught a regionserver compacting:

    http://pastebin.com/y9BQaVeJ

    http://pastebin.com/ZMxwEX5j

    thanks again,
    -chris
    On Mon, Jan 10, 2011 at 1:52 PM, Stack wrote:

    Odd. Mind thread dumping the regionserver a few times and
    pastebining it during a compaction so we can see where its spending
    time? (Your compaction numbers are bad).

    St.Ack
    On Fri, Jan 7, 2011 at 11:07 PM, Chris Tarnas wrote:
    Thanks in advance for any help. I've been quite pleased with Hbase for
    this current project and until this problem it has worked quite well.
    Test cluster setup is CDH3b3 on a 7 nodes:
    5 data nodes with 48GB RAM, 8 cores, 4 disks,
    2 masters with 8 cores, 2 disks 24GB RAM for master/zookeeper/namenode

    My hbase.hregion.max.filesize is set to 1GB, ulimit files to 32k and
    xceivers to 4096, hbase heap is at 8GB.
    I'm testing out using GZ compression on two tables, each is currently
    still only one region. My tests runs fine when compression is off so this is
    definitely related to compression. When I start loading data (via thrift,
    many clients) it loads great for a while then the region servers slow to
    crawl. When this happens the two regionservers that are hosting the tables
    use ~ 110-160% CPU and block writes. One regionserver has occasional bursts
    of activity but mostly is very repetitive, here is a sample of the log:
    http://pastebin.com/WSc8aZFQ

    The other active regionserver looks to be continuously compacting:

    http://pastebin.com/3ifVKaX2


    The master log is quite boring with this being repeated:

    2011-01-08 00:48:58,419 INFO org.apache.hadoop.hbase.master.BaseScanner:
    RegionManager.rootScanner scanning meta region {server: 10.56.24.8:60020,
    regionname: -ROOT-,,0.70236052, startKey: <>}
    2011-01-08 00:48:58,424 INFO org.apache.hadoop.hbase.master.BaseScanner:
    RegionManager.rootScanner scan of 1 row(s) of meta region {server:
    10.56.24.8:60020, regionname: -ROOT-,,0.70236052, startKey: <>} complete
    2011-01-08 00:48:58,444 INFO
    org.apache.hadoop.hbase.master.ServerManager: 5 region servers, 0 dead,
    average load 1.6
    2011-01-08 00:49:04,810 INFO org.apache.hadoop.hbase.master.BaseScanner:
    RegionManager.metaScanner scanning meta region {server: 10.56.24.7:60020,
    regionname: .META.,,1.1028785192, startKey: <>}
    2011-01-08 00:49:04,820 INFO org.apache.hadoop.hbase.master.BaseScanner:
    RegionManager.metaScanner scan of 6 row(s) of meta region {server:
    10.56.24.7:60020, regionname: .META.,,1.1028785192, startKey: <>}
    complete
    2011-01-08 00:49:04,820 INFO org.apache.hadoop.hbase.master.BaseScanner:
    All 1 .META. region(s) scanned

    At this point loading slows to a trickle (requests are 0 in the web ui),
    I can see infrequent bursts of loading but very small amounts. Each table
    only has one region (and there are only two other tables, each also with
    only one region).
    I've compiled and tested the native GZ compression codecs on the nodes
    and the nodes have plenty of CPU, IO and memory available and no swapping.
    Any suggestions? Please let me know if you need any other info.
    thanks!
    -chris
  • Chirstopher Tarnas at Jan 12, 2011 at 11:47 pm
    More details on what I am seeing:

    I set the region size back to the default (256MB) and got much better
    performance with fewer pauses for compaction. I loaded until I hit about 150
    total regions in the table I am loading now (30 per regionserver) and the
    set hbase.hregion.max.filesize back up to 1GB (1073741824 is the actual
    setting I used), After restarting the cluster I ran another load test. Many
    many more pauses for compactions that halted the whole cluster and i
    got roughly 50% of the write speed I had before. Compression was not
    enabled.

    thanks for any help,
    -chris
    On Wed, Jan 12, 2011 at 3:46 PM, Chirstopher Tarnas wrote:

    I'm doing a test now w/o any GZ compression enabled and I am seeing the
    same pauses in loading... any more ideas? I will try dropping my region size
    down to 256 MB next. Currently I cannot get any sustained writing via thrift
    for more than a few seconds before it all pauses.

    -chris

    On Tue, Jan 11, 2011 at 10:18 AM, Chirstopher Tarnas wrote:

    Hi Stack,

    Thanks for taking a look. I think I caught a regionserver compacting:

    http://pastebin.com/y9BQaVeJ

    http://pastebin.com/ZMxwEX5j

    thanks again,
    -chris
    On Mon, Jan 10, 2011 at 1:52 PM, Stack wrote:

    Odd. Mind thread dumping the regionserver a few times and
    pastebining it during a compaction so we can see where its spending
    time? (Your compaction numbers are bad).

    St.Ack
    On Fri, Jan 7, 2011 at 11:07 PM, Chris Tarnas wrote:
    Thanks in advance for any help. I've been quite pleased with Hbase for
    this current project and until this problem it has worked quite well.
    Test cluster setup is CDH3b3 on a 7 nodes:
    5 data nodes with 48GB RAM, 8 cores, 4 disks,
    2 masters with 8 cores, 2 disks 24GB RAM for master/zookeeper/namenode

    My hbase.hregion.max.filesize is set to 1GB, ulimit files to 32k and
    xceivers to 4096, hbase heap is at 8GB.
    I'm testing out using GZ compression on two tables, each is currently
    still only one region. My tests runs fine when compression is off so this is
    definitely related to compression. When I start loading data (via thrift,
    many clients) it loads great for a while then the region servers slow to
    crawl. When this happens the two regionservers that are hosting the tables
    use ~ 110-160% CPU and block writes. One regionserver has occasional bursts
    of activity but mostly is very repetitive, here is a sample of the log:
    http://pastebin.com/WSc8aZFQ

    The other active regionserver looks to be continuously compacting:

    http://pastebin.com/3ifVKaX2


    The master log is quite boring with this being repeated:

    2011-01-08 00:48:58,419 INFO
    org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner
    scanning meta region {server: 10.56.24.8:60020, regionname:
    -ROOT-,,0.70236052, startKey: <>}
    2011-01-08 00:48:58,424 INFO
    org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scan
    of 1 row(s) of meta region {server: 10.56.24.8:60020, regionname:
    -ROOT-,,0.70236052, startKey: <>} complete
    2011-01-08 00:48:58,444 INFO
    org.apache.hadoop.hbase.master.ServerManager: 5 region servers, 0 dead,
    average load 1.6
    2011-01-08 00:49:04,810 INFO
    org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner
    scanning meta region {server: 10.56.24.7:60020, regionname:
    .META.,,1.1028785192, startKey: <>}
    2011-01-08 00:49:04,820 INFO
    org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scan
    of 6 row(s) of meta region {server: 10.56.24.7:60020, regionname:
    .META.,,1.1028785192, startKey: <>} complete
    2011-01-08 00:49:04,820 INFO
    org.apache.hadoop.hbase.master.BaseScanner: All 1 .META. region(s) scanned

    At this point loading slows to a trickle (requests are 0 in the web
    ui), I can see infrequent bursts of loading but very small amounts. Each
    table only has one region (and there are only two other tables, each also
    with only one region).
    I've compiled and tested the native GZ compression codecs on the nodes
    and the nodes have plenty of CPU, IO and memory available and no swapping.
    Any suggestions? Please let me know if you need any other info.
    thanks!
    -chris

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedJan 8, '11 at 7:07a
activeJan 12, '11 at 11:47p
posts7
users3
websitehbase.apache.org

People

Translate

site design / logo © 2019 Grokbase