FAQ

[HBase-user] Row get very slow

Damien Hardy
Nov 10, 2011 at 11:13 am
Hello there.


When I want to get a row by rowid the answer is very slow (even 15 secs
some times)
What is wrong with my Htable ?
Here is some examples to illustrate my problem:

hbase(main):030:0> get 'logs',
'_f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKqrSSqNcToHdA==', {
COLUMN => 'body:body', VERSIONS => 1 }
COLUMN CELL
body:body
timestamp=1320919979701, value=Nov 10 11:05:41 haproxy[15469]: ...
[haproxy logs] ...

1 row(s) in 6.0310 seconds

hbase(main):031:0> scan 'logs', { STARTROW
=>'_f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKqrSSqNcToHdA==',
LIMIT => 1 }
ROW COLUMN+CELL
_f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKq column=body:body,
timestamp=1320919979701, value=Nov 10 11:05:41 haproxy[15469]: ...
[haproxy logs] ...
rSSqNcToHdA==

1 row(s) in 2.7160 seconds

hbase(main):032:0> get 'logs',
'_f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKqrSSqNcToHdA=='
COLUMN CELL
body:body
timestamp=1320919979701, value=Nov 10 11:05:41 haproxy[15469]: ...
[haproxy logs] ...
1 row(s) in 5.0640 seconds

hbase(main):033:0> describe 'logs'
DESCRIPTION
ENABLED
{NAME => 'logs', FAMILIES => [{NAME => 'body', BLOOMFILTER => 'NONE',
REPLICATION_SCOPE => '0', COMPRESSION => 'SNAPPY', VERSIONS => true
'1', TTL => '2147483647', BLOCKSIZE => '536870912', IN_MEMORY =>
'false', BLOCKCACHE => 'true'}]}
1 row(s) in 0.0660 seconds

hbase(main):025:0> get 'logs',
'_f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKqrSSqNcToHdA==', {
COLUMN => 'body:body', TIMERANGE => [1320919900000,1320920000000] }
COLUMN CELL
body:body
timestamp=1320919979701, value=Nov 10 11:05:41 haproxy[15469]: ...
[haproxy logs] ...

1 row(s) in 0.0630 seconds


scan is always fatser than get, I think it's strange.

I get normal answer when I precise the TS.

The table is about 200 regions distributed on 2 nodes (with full stack
on each : hdfs / hbase master+regionserver / zookeeper)
Region size is 2GB now.

Recently I increase region size from default size (128MB if I remember)
to 2Go to get fewer number of regions (I had 3500 regions).

I change hbase.hregion.max.filesize to 2147483648, restart my whole
cluster, create a new table, copy via pig from old table to the new one
=> fewer regions => I'm happy \o/
But on my older table the get answer was very fast, like the one with TS
precised on the new table.

Is the size of regions affect so much the Hbase answer fastness ?

get on other table not rebuilt after config change (regions not merged)
is still fast.

Thank you,

--
Damien
reply

Search Discussions

10 responses

  • Lars hofhansl at Nov 10, 2011 at 6:44 pm
    "BLOCKSIZE => '536870912'"


    You set your blocksize to 512mb? The default is 64k (65536), try to set it to something lower.

    -- Lars
    ________________________________

    From: Damien Hardy <dhardy@figarocms.fr>
    To: "user@hbase.apache.org" <user@hbase.apache.org>
    Sent: Thursday, November 10, 2011 3:11 AM
    Subject: Row get very slow

    Hello there.


    When I want to get a row by rowid the answer is very slow (even 15 secs some times)
    What is wrong with my Htable ?
    Here is some examples to illustrate my problem:

    hbase(main):030:0> get 'logs', '_f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKqrSSqNcToHdA==', { COLUMN => 'body:body', VERSIONS => 1 }
    COLUMN                                               CELL
    body:body                                           timestamp=1320919979701, value=Nov 10 11:05:41 haproxy[15469]: ... [haproxy logs] ...

    1 row(s) in 6.0310 seconds

    hbase(main):031:0> scan 'logs', { STARTROW =>'_f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKqrSSqNcToHdA==', LIMIT => 1 }
    ROW                                                  COLUMN+CELL
    _f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKq column=body:body, timestamp=1320919979701, value=Nov 10 11:05:41 haproxy[15469]: ... [haproxy logs] ...
    rSSqNcToHdA==

    1 row(s) in 2.7160 seconds

    hbase(main):032:0> get 'logs', '_f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKqrSSqNcToHdA=='
    COLUMN                                               CELL
    body:body                                           timestamp=1320919979701, value=Nov 10 11:05:41 haproxy[15469]: ... [haproxy logs] ...
    1 row(s) in 5.0640 seconds

    hbase(main):033:0> describe 'logs'
    DESCRIPTION                                                                                                                           ENABLED
    {NAME => 'logs', FAMILIES => [{NAME => 'body', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'SNAPPY', VERSIONS => true
      '1', TTL => '2147483647', BLOCKSIZE => '536870912', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
    1 row(s) in 0.0660 seconds

    hbase(main):025:0> get 'logs', '_f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKqrSSqNcToHdA==', { COLUMN => 'body:body', TIMERANGE => [1320919900000,1320920000000] }
    COLUMN                                               CELL
    body:body                                           timestamp=1320919979701, value=Nov 10 11:05:41 haproxy[15469]: ... [haproxy logs] ...

    1 row(s) in 0.0630 seconds


    scan is always fatser than get, I think it's strange.

    I get normal answer when I precise the TS.

    The table is about 200 regions distributed on 2 nodes (with full stack on each : hdfs / hbase master+regionserver / zookeeper)
    Region size is 2GB now.

    Recently I increase region size from default size (128MB if I remember) to 2Go to get fewer number of regions (I had 3500 regions).

    I change hbase.hregion.max.filesize to 2147483648, restart my whole cluster, create a new table, copy via pig from old table to the new one => fewer regions => I'm happy \o/
    But on my older table the get answer was very fast, like the one with TS precised on the new table.

    Is the size of regions affect so much the Hbase answer fastness ?

    get on other table not rebuilt after config change (regions not merged) is still fast.

    Thank you,

    -- Damien
  • Arvind Jayaprakash at Nov 13, 2011 at 3:13 pm
    A common confusion is b/w MAX_FILESIZE and BLOCKSIZE. Given that
    MAX_FILESIZE is not listed on :60010/master.jsp, one tends to assume
    BLOCKSIZE represents that value.
    On Nov 10, lars hofhansl wrote:
    "BLOCKSIZE => '536870912'"


    You set your blocksize to 512mb? The default is 64k (65536), try to set it to something lower.
  • Stack at Nov 13, 2011 at 5:52 pm

    On Sun, Nov 13, 2011 at 7:13 AM, Arvind Jayaprakash wrote:
    A common confusion is b/w MAX_FILESIZE and BLOCKSIZE. Given that
    MAX_FILESIZE is not listed on :60010/master.jsp, one tends to assume
    BLOCKSIZE represents that value.
    We should fix that. What would you like to see Arind?
    St.Ack
  • Arvind Jayaprakash at Nov 14, 2011 at 6:15 pm

    On Nov 13, Stack wrote:
    On Sun, Nov 13, 2011 at 7:13 AM, Arvind Jayaprakash wrote:
    A common confusion is b/w MAX_FILESIZE and BLOCKSIZE. Given that
    MAX_FILESIZE is not listed on :60010/master.jsp, one tends to assume
    BLOCKSIZE represents that value.
    We should fix that. What would you like to see Arind?
    Looks like Santa is ahead of schedule this year ...

    (1) I've always found it hard to find all configurable "per-table"
    properties listed in documentation. So that would be a good thing to
    have.

    (2) Also, having all of per table properies being listed on the hbase
    master page would create more awareness of atleast the terms if now how
    to twiddle aronud with it.


    The problem with the specific parameter in question has to do with how
    the mind runs crazy. A lot of hbase design related documents/discussions
    mention the term "region size". it is very hard to imagine that
    MAX_FILESIZE (which is hardly mentioned anywhere) is what really refers
    to region size and that BLOCKSIZE which appears so prominently on the
    master page (or output of scanning the .META. tabale for the nerdier
    folks) is an entiery different beast is easy to miss.

    Once we address #1 & #2, it becomes easier to yell "Didn't you RTFM" at
    anyone who gets confused :-)
  • Damien Hardy at Nov 14, 2011 at 8:53 am

    Le 13/11/2011 16:13, Arvind Jayaprakash a écrit :
    A common confusion is b/w MAX_FILESIZE and BLOCKSIZE. Given that
    MAX_FILESIZE is not listed on :60010/master.jsp, one tends to assume
    BLOCKSIZE represents that value.
    On Nov 10, lars hofhansl wrote:
    "BLOCKSIZE => '536870912'"


    You set your blocksize to 512mb? The default is 64k (65536), try to set it to something lower.

    Hello,

    Thank you for answer I have just altered my table and launched a
    major_compact to get it effective.

    I thought that increasing FILSIZE of HBases implies somehow changes on
    the BLOSKSIZE of my tables and to prevent unbalanced paramaters
    increased it too ... #FAIL.

    The question is : in what application BLOCKSIZE should be changed
    (increased or decreased) ?

    Thank you.

    --
    Damien
  • Doug Meil at Nov 14, 2011 at 3:33 pm
    Hi there-

    re: "The question is : in what application BLOCKSIZE should be changed
    (increased or decreased) ?"


    See.. http://hbase.apache.org/book.html#schema.creation

    and...

    http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.h
    tml






    On 11/14/11 3:51 AM, "Damien Hardy" wrote:

    Le 13/11/2011 16:13, Arvind Jayaprakash a écrit :
    A common confusion is b/w MAX_FILESIZE and BLOCKSIZE. Given that
    MAX_FILESIZE is not listed on :60010/master.jsp, one tends to assume
    BLOCKSIZE represents that value.
    On Nov 10, lars hofhansl wrote:
    "BLOCKSIZE => '536870912'"


    You set your blocksize to 512mb? The default is 64k (65536), try to
    set it to something lower.

    Hello,

    Thank you for answer I have just altered my table and launched a
    major_compact to get it effective.

    I thought that increasing FILSIZE of HBases implies somehow changes on
    the BLOSKSIZE of my tables and to prevent unbalanced paramaters
    increased it too ... #FAIL.

    The question is : in what application BLOCKSIZE should be changed
    (increased or decreased) ?

    Thank you.

    --
    Damien

  • Lars hofhansl at Nov 14, 2011 at 7:25 pm
    Did it speed up your queries? As you can see from the followup discussions here, there is some general confusion around this.

    Generally there are 2 sizes involved:
    1. HBase Filesize
    2. HBase Blocksize

    #1 sets the maximum size of a region before it is split. Default used to be 512mb, it's now 1g (but usually it should be even larger)

    #2 is the size of the blocks inside the HFiles. Smaller blocks mean better random access, but larger block indexes. I would only increase that if you have large cells.

    -- Lars
    ________________________________

    From: Damien Hardy <dhardy@figarocms.fr>
    To: user@hbase.apache.org
    Sent: Monday, November 14, 2011 12:51 AM
    Subject: Re: Row get very slow

    Le 13/11/2011 16:13, Arvind Jayaprakash a écrit :
    A common confusion is b/w MAX_FILESIZE and BLOCKSIZE. Given that
    MAX_FILESIZE is not listed on :60010/master.jsp, one tends to assume
    BLOCKSIZE represents that value.
    On Nov 10, lars hofhansl wrote:
    "BLOCKSIZE =>  '536870912'"


    You set your blocksize to 512mb? The default is 64k (65536), try to set it to something lower.

    Hello,

    Thank you for answer I have just altered my table and launched a major_compact to get it effective.

    I thought that increasing FILSIZE of HBases implies somehow changes on the BLOSKSIZE of my tables and to prevent unbalanced paramaters increased it too ... #FAIL.

    The question is : in what application BLOCKSIZE should be changed (increased or decreased) ?

    Thank you.

    -- Damien
  • Sam Seigal at Nov 14, 2011 at 7:38 pm
    If you are not too concerned with random access time, but want more
    efficient scans, is increasing the block size then a good idea ?
    On Mon, Nov 14, 2011 at 11:24 AM, lars hofhansl wrote:
    Did it speed up your queries? As you can see from the followup discussions here, there is some general confusion around this.

    Generally there are 2 sizes involved:
    1. HBase Filesize
    2. HBase Blocksize

    #1 sets the maximum size of a region before it is split. Default used to be 512mb, it's now 1g (but usually it should be even larger)

    #2 is the size of the blocks inside the HFiles. Smaller blocks mean better random access, but larger block indexes. I would only increase that if you have large cells.

    -- Lars
    ________________________________

    From: Damien Hardy <dhardy@figarocms.fr>
    To: user@hbase.apache.org
    Sent: Monday, November 14, 2011 12:51 AM
    Subject: Re: Row get very slow

    Le 13/11/2011 16:13, Arvind Jayaprakash a écrit :
    A common confusion is b/w MAX_FILESIZE and BLOCKSIZE. Given that
    MAX_FILESIZE is not listed on :60010/master.jsp, one tends to assume
    BLOCKSIZE represents that value.
    On Nov 10, lars hofhansl wrote:
    "BLOCKSIZE =>  '536870912'"


    You set your blocksize to 512mb? The default is 64k (65536), try to set it to something lower.

    Hello,

    Thank you for answer I have just altered my table and launched a major_compact to get it effective.

    I thought that increasing FILSIZE of HBases implies somehow changes on the BLOSKSIZE of my tables and to prevent unbalanced paramaters increased it too ... #FAIL.

    The question is : in what application BLOCKSIZE should be changed (increased or decreased) ?

    Thank you.

    -- Damien
  • Stack at Nov 14, 2011 at 10:22 pm

    On Mon, Nov 14, 2011 at 11:37 AM, Sam Seigal wrote:
    If you are not too concerned with random access time, but want more
    efficient scans, is increasing the block size then a good idea ?
    I'd say leave things as they are unless you have a problem.

    For your case, where random read latency is not so important and you
    are only scanning, upping the block size should not change your scan
    latencies and it will make the hfile indices smaller (if you double
    the blocksize to 128k, your indices should be halved -- you can see
    index sizes in your regionserver UI).

    St.Ack
  • Damien Hardy at Nov 15, 2011 at 8:49 am
    Hi,

    It speed it up definitly :)

    hbase(main):002:0> get 'logs',
    '_f:squid_t:20111114110759_b:squid_s:204-taDiFMcQaPzN13dDOZ99PA=='
    COLUMN CELL
    body:body
    timestamp=1321265279234, value=Nov 14 11:00:24 haproxy[15470]: ...
    [haproxy syslogs] ...

    1 row(s) in 0.0170 seconds

    Thank you again for help and explanations.

    Regards,

    --
    Damien


    Le 14/11/2011 20:24, lars hofhansl a écrit :
    Did it speed up your queries? As you can see from the followup discussions here, there is some general confusion around this.

    Generally there are 2 sizes involved:
    1. HBase Filesize
    2. HBase Blocksize

    #1 sets the maximum size of a region before it is split. Default used to be 512mb, it's now 1g (but usually it should be even larger)

    #2 is the size of the blocks inside the HFiles. Smaller blocks mean better random access, but larger block indexes. I would only increase that if you have large cells.

    -- Lars
    ________________________________

    From: Damien Hardy<dhardy@figarocms.fr>
    To: user@hbase.apache.org
    Sent: Monday, November 14, 2011 12:51 AM
    Subject: Re: Row get very slow

    Le 13/11/2011 16:13, Arvind Jayaprakash a écrit :
    A common confusion is b/w MAX_FILESIZE and BLOCKSIZE. Given that
    MAX_FILESIZE is not listed on :60010/master.jsp, one tends to assume
    BLOCKSIZE represents that value.
    On Nov 10, lars hofhansl wrote:
    "BLOCKSIZE => '536870912'"


    You set your blocksize to 512mb? The default is 64k (65536), try to set it to something lower.
    Hello,

    Thank you for answer I have just altered my table and launched a major_compact to get it effective.

    I thought that increasing FILSIZE of HBases implies somehow changes on the BLOSKSIZE of my tables and to prevent unbalanced paramaters increased it too ... #FAIL.

    The question is : in what application BLOCKSIZE should be changed (increased or decreased) ?

    Thank you.

    -- Damien

Related Discussions

Discussion Navigation
viewthread | post