Grokbase Groups HBase user June 2011
FAQ
Hi folks,

I need to load 1 million queue messages into a hbase table in 30 mins.

As "HBase: The Definitive Guide" suggests, I use Client API, flushCommits().

I launched, say, 20 threads, each thread has its own queue connection and hbase instance, which read msg from queue and insert it into hbase. At the end of thread life, it will call the hbaseTable.flushCommits().

It seems working fine, except for:

1. Each flushCommit action takes quite a long time.
2. Occasionally, flushCommit cause WrongRegionException.

Can someone please share the best practice in this situation? Especially, should each thread call flushCommit() or only the main thread call it?

Thanks,

RX

Search Discussions

  • Jean-Daniel Cryans at Jun 1, 2011 at 5:17 pm
    Inline.

    J-D
    On Wed, Jun 1, 2011 at 6:34 AM, Xu, Richard wrote:
    Hi folks,

    I need to load 1 million queue messages into a hbase table in 30 mins.

    As "HBase: The Definitive Guide" suggests, I use Client API, flushCommits().

    I launched, say, 20 threads, each thread has its own queue connection and hbase instance, which read msg from queue and insert it into hbase. At the end of thread life, it will call the hbaseTable.flushCommits().

    It seems working fine, except for:

    1. Each flushCommit action takes quite a long time.
    Take a look at http://hbase.apache.org/book/performance.html, it will
    take time if you are splitting and moving regions a lot.
    2. Occasionally, flushCommit cause WrongRegionException.
    That should never happen, when it does it's because there's a hole in
    your .META. table and that doesn't just come out of nowhere, usually
    it's due to a misconfiguration.

    Which version are you running?

    J-D
  • Xu, Richard at Jun 1, 2011 at 6:28 pm
    Thanks a lot, J-D.

    I was using 90.2.

    -----Original Message-----
    From: jdcryans@gmail.com On Behalf Of Jean-Daniel Cryans
    Sent: Wednesday, June 01, 2011 1:17 PM
    To: user@hbase.apache.org
    Subject: Re: What is the best practice of using flushCommit in multithreaded mode

    Inline.

    J-D
    On Wed, Jun 1, 2011 at 6:34 AM, Xu, Richard wrote:
    Hi folks,

    I need to load 1 million queue messages into a hbase table in 30 mins.

    As "HBase: The Definitive Guide" suggests, I use Client API, flushCommits().

    I launched, say, 20 threads, each thread has its own queue connection and hbase instance, which read msg from queue and insert it into hbase. At the end of thread life, it will call the hbaseTable.flushCommits().

    It seems working fine, except for:

    1. Each flushCommit action takes quite a long time.
    Take a look at http://hbase.apache.org/book/performance.html, it will
    take time if you are splitting and moving regions a lot.
    2. Occasionally, flushCommit cause WrongRegionException.
    That should never happen, when it does it's because there's a hole in
    your .META. table and that doesn't just come out of nowhere, usually
    it's due to a misconfiguration.

    Which version are you running?

    J-D

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedJun 1, '11 at 1:35p
activeJun 1, '11 at 6:28p
posts3
users2
websitehbase.apache.org

2 users in discussion

Xu, Richard: 2 posts Jean-Daniel Cryans: 1 post

People

Translate

site design / logo © 2022 Grokbase