Grokbase Groups HBase dev July 2012
FAQ
Hi all,



My use case: I have several tables with key starting with a timestamp. Also,
this tables have set data retention to 30 days.

Table size is around 1Tb(3Tb replicated) and data is inserted regular(on
5minute, ~200Mb is inserted).

File size is set to 1Gb. I have this tables in use for almost half an year
and now a table has around 6k partitions and

40% of them are empty.

The problem: the number of regions per region server is now pretty high.



Questions:

Which approach is better?

- to merge adjacent empty partitions in a bigger one?

- to merge empty partitions to non-empty partitions?

Also, I'm wondering why regions merge is not part of major compactions and
why it's necessary to stop the

entire fleet to solve this problem.

Search Discussions

  • Michael Stack at Jul 16, 2012 at 10:02 pm

    On Sat, Jul 14, 2012 at 4:40 PM, Ionut Ignatescu wrote:
    - to merge adjacent empty partitions in a bigger one?

    - to merge empty partitions to non-empty partitions?

    Also, I'm wondering why regions merge is not part of major compactions and
    why it's necessary to stop the

    entire fleet to solve this problem.
    Its something that should have been done long time ago but none of us
    has taken it on properly. J-D did an online merge hackup script
    attached to the onilne merge issue that worked for our purposes and
    helped out some others but beyond that, online merge needs loving.

    It should be easier in your case given 40% of the regions are empty.
    Are you ok w/ a bit of scripting editing the .META. table? Are all
    40% on the end of the table (given they are aged out)? Can you just
    cut the empty tail off the table by deleting all empty regions from
    .META. (and from hdfs) off the end and then just add back a single
    region what has a start key of the last non-empty region and an endkey
    of the empty row to put back a scan stopper?

    St.Ack
  • Ionut Ignatescu at Jul 17, 2012 at 5:08 pm
    Thanks for your answer, @Stack!

    Your approach seems very sane, but I want to add/clarify some aspects:

    - I created the table with 32 regions and first byte of my key is used to
    write in a distributed manner. In this way, the empty regions are at the
    begin of the initial regions,

    but seen from the entire range, this regions are in the middle of the keys
    domain.

    - Let's say I have more consecutive empty regions. I'm considering 2 options
    in this situation: merge empty partitions in a bigger one or

    stick them to the first non-empty region. I think for the second approach I
    have to disable the table, but I don't see any other problem. Is this right?



    Thanks,

    Ionut I.
  • Michael Stack at Jul 18, 2012 at 1:50 pm

    On Tue, Jul 17, 2012 at 7:08 PM, Ionut Ignatescu wrote:
    - Let's say I have more consecutive empty regions. I'm considering 2 options
    in this situation: merge empty partitions in a bigger one or

    stick them to the first non-empty region. I think for the second approach I
    have to disable the table, but I don't see any other problem. Is this right?

    If you do the former, you do not have to take any "data" offline. You
    can remove from .META. (and the region dir in hdfs) all empty regions
    and then when done, make a new empty region that spans the gap of all
    regions just removed.

    For the later, you will need to take the table off line or just close
    the region being operated on while you do your operation. Changing
    the metadata on a region such as its start keys, changes its hash
    which means that the region's dir name changes in HDFS (the name in
    hdfs is hash of the regions meta data). Because of this, you will
    have to move content of current region into the new region (You'll
    have to then close and reopen the region so we open the files that
    were just moved into the new location).

    Are you doing scans across the whole table? If so, while you operate
    on .META., there will be times when there are holes across which scan
    cannot cross. So, can you disable these while you are doing your
    table ops?

    St.Ack
  • Ionut Ignatescu at Jul 21, 2012 at 5:19 am
    Hi again,

    Please provide me some extra-info:
    - what I have to add in the new record added to .META.?
    Basically, the region info has all info from HRegionInfo object created
    using startKey,endKey and table + column descritpr. But how to set the other
    columns?Or are them necceassry?
    -Also in HDFS I have to create the folders for columns families?
    Ex: /hbase/table/regionXX/colFam1
    /hbase/table/regionXX/colFam2


    Regards,
    Ionut I.

    -----Original Message-----
    From: saint.ack@gmail.com On Behalf Of Stack
    Sent: Wednesday, July 18, 2012 4:50 PM
    To: dev@hbase.apache.org
    Subject: Re: How to merge regions?
    On Tue, Jul 17, 2012 at 7:08 PM, Ionut Ignatescu wrote:
    - Let's say I have more consecutive empty regions. I'm considering 2
    options in this situation: merge empty partitions in a bigger one or

    stick them to the first non-empty region. I think for the second
    approach I have to disable the table, but I don't see any other problem.
    Is this right?
    >


    If you do the former, you do not have to take any "data" offline. You can
    remove from .META. (and the region dir in hdfs) all empty regions and then
    when done, make a new empty region that spans the gap of all regions just
    removed.

    For the later, you will need to take the table off line or just close the
    region being operated on while you do your operation. Changing the metadata
    on a region such as its start keys, changes its hash which means that the
    region's dir name changes in HDFS (the name in hdfs is hash of the regions
    meta data). Because of this, you will have to move content of current
    region into the new region (You'll have to then close and reopen the region
    so we open the files that were just moved into the new location).

    Are you doing scans across the whole table? If so, while you operate on
    .META., there will be times when there are holes across which scan cannot
    cross. So, can you disable these while you are doing your table ops?

    St.Ack
  • Michael Stack at Jul 21, 2012 at 9:18 am

    On Sat, Jul 21, 2012 at 7:18 AM, Ionut Ignatescu wrote:
    Hi again,

    Please provide me some extra-info:
    - what I have to add in the new record added to .META.?
    Basically, the region info has all info from HRegionInfo object created
    using startKey,endKey and table + column descritpr. But how to set the other
    columns?Or are them necceassry?
    The .META. info:regioninfo column has a serialized HRegionInfo in it.
    To get the HRegionInfo serialized bytes, see line #284 in
    bin/region_mover.rb: e.g.

    #284 bytes = Writables.getBytes(r)

    See how at top of script it imports Writables.

    You can then put the above bytes into the meta table. Suggest you
    practise in a standalone instance rather than prod install.

    If you want to java it, check out HBaseFsck for how it does .META. edits.

    See also bin/copy_table.rb. See how it does .META. edits.
    -Also in HDFS I have to create the folders for columns families?
    Ex: /hbase/table/regionXX/colFam1
    /hbase/table/regionXX/colFam2
    You don't need to. They'll be created if absent. You just need to
    get the .META. edits right.

    St.Ack
  • Ionut Ignatescu at Jul 17, 2012 at 10:55 pm
    Thanks for your answer!

    Your approach seems very sane, but I want to add/clarify some aspects:

    - I created the table with 32 regions and first byte of my key is used to
    write in a distributed manner. In this way, the key domain now

    has some empty regions. Basically, this empty regions are at the begin of
    the initial regions, but seen from the entire range, this regions

    are in the middle of the keys domain.

    - Let's say I have more consecutive empty regions.I'm considering 2 options
    in this situation: merge empty partitions in a bigger one or

    stick them to the first nonepty region. I think for the second approach I
    have to disable the table, but I don't see any other problem. Is this right?



    Thanks,

    Ionut I.
  • Ionut Ignatescu at Jul 17, 2012 at 10:55 pm
    Thanks for your answer!

    Your approach seems very sane, but I want to add/clarify some aspects:

    - I created the table with 32 regions and first byte of my key is used to
    write in a distributed manner. In this way, the key domain now

    has some empty regions. Basically, this empty regions are at the begin of
    the initial regions, but seen from the entire range, this regions

    are in the middle of the keys domain.

    - Let's say I have more consecutive empty regions.I'm considering 2 options
    in this situation: merge empty partitions in a bigger one or

    stick them to the first nonepty region. I think for the second approach I
    have to disable the table, but I don't see any other problem. Is this right?



    Thanks,

    Ionut I.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieshbase, hadoop
postedJul 14, '12 at 3:12p
activeJul 21, '12 at 9:18a
posts8
users2
websitehbase.apache.org

2 users in discussion

Ionut Ignatescu: 5 posts Michael Stack: 3 posts

People

Translate

site design / logo © 2022 Grokbase