Grokbase Groups HBase user March 2010
FAQ
Hi,
I started 200 clients(spread it to 20 machines) to run NewHTableTest like
following code, which took 983 seconds.
META table just resides in one region that machine CPU and network traffic
are very high
while running NewHTableTest,so I guess there is a bottleneck from Zookeeper
or META table server.
Any suggestion?


My Cluster:
1U servers(4core,12G ram): 20
zookeepers : 3
region servers : 10
regions :1500


public void NewHTableTest() throws IOException {
IndexedTable idxTable1= new IndexedTable(config,
Bytes.toBytes("Table1"));
IndexedTable idxTable2= new IndexedTable(config,
Bytes.toBytes("Table2"));
IndexedTable idxTable3= new IndexedTable(config,
Bytes.toBytes("Table3"));
IndexedTable idxTable4= new IndexedTable(config,
Bytes.toBytes("Table4"));
IndexedTable idxTable5= new IndexedTable(config,
Bytes.toBytes("Table5"));
IndexedTable idxTable6= new IndexedTable(config,
Bytes.toBytes("Table6"));
IndexedTable idxTable7= new IndexedTable(config,
Bytes.toBytes("Table7"));
IndexedTable idxTable8= new IndexedTable(config,
Bytes.toBytes("Table8"));
IndexedTable idxTable9= new IndexedTable(config,
Bytes.toBytes("Table9"));
IndexedTable idxTable10= new IndexedTable(config,
Bytes.toBytes("Table10"));
IndexedTable idxTable11= new IndexedTable(config,
Bytes.toBytes("Table11"));
IndexedTable idxTable12= new IndexedTable(config,
Bytes.toBytes("Table12"));
IndexedTable idxTable13= new IndexedTable(config,
Bytes.toBytes("Table13"));
IndexedTable idxTable14= new IndexedTable(config,
Bytes.toBytes("Table14"));
IndexedTable idxTable15= new IndexedTable(config,
Bytes.toBytes("Table15"));
IndexedTable idxTable16= new IndexedTable(config,
Bytes.toBytes("Table16"));
IndexedTable idxTable17= new IndexedTable(config,
Bytes.toBytes("Table17"));
IndexedTable idxTable18= new IndexedTable(config,
Bytes.toBytes("Table18"));
IndexedTable idxTable19= new IndexedTable(config,
Bytes.toBytes("Table19"));
IndexedTable idxTable20= new IndexedTable(config,
Bytes.toBytes("Table20"));
IndexedTable idxTable21= new IndexedTable(config,
Bytes.toBytes("Table21"));
IndexedTable idxTable22= new IndexedTable(config,
Bytes.toBytes("Table22"));
IndexedTable idxTable23= new IndexedTable(config,
Bytes.toBytes("Table23"));
IndexedTable idxTable24= new IndexedTable(config,
Bytes.toBytes("Table24"));
IndexedTable idxTable25= new IndexedTable(config,
Bytes.toBytes("Table25"));
IndexedTable idxTable26= new IndexedTable(config,
Bytes.toBytes("Table26"));
IndexedTable idxTable27= new IndexedTable(config,
Bytes.toBytes("Table27"));
IndexedTable idxTable28= new IndexedTable(config,
Bytes.toBytes("Table28"));
IndexedTable idxTable29= new IndexedTable(config,
Bytes.toBytes("Table29"));
IndexedTable idxTable30= new IndexedTable(config,
Bytes.toBytes("Table30"));
}




Fleming Chiu(邱宏明)
707-6128
y_823910@tsmc.com
週一無肉日吃素救地球(Meat Free Monday Taiwan)


---------------------------------------------------------------------------
TSMC PROPERTY
This email communication (and any attachments) is proprietary information
for the sole use of its
intended recipient. Any unauthorized review, use or distribution by anyone
other than the intended
recipient is strictly prohibited. If you are not the intended recipient,
please notify the sender by
replying to this email, and then delete this email and any copies of it
immediately. Thank you.
---------------------------------------------------------------------------

Search Discussions

  • Jean-Daniel Cryans at Mar 1, 2010 at 7:25 pm
    In this particular case a lot of things come in action:

    - Creating a table is a long process because the client sleeps a lot,
    6 seconds before 0.20.3, 2 seconds in 0.20.3 and even less than that
    in the current head of branch.

    - in 0.20, without the HDFS-200 patch, HDFS doesn't support fs syncs
    so we force memstore flushes at something like 8MB so that you don't
    lose too much data on that very important table (hopefully in 0.21
    it's supported, no data loss yeah!). So all those memstore flushes can
    account for a lot of traffic and can generate a lot more compactions.

    What exactly is your test trying to show? I'm really not sure... that
    tables with very small memstores take edits at a slower rate?

    J-D

    2010/2/28 <y_823910@tsmc.com>:
    Hi,
    I started 200 clients(spread it to 20 machines) to run NewHTableTest like
    following code, which took 983 seconds.
    META table just resides in one region that machine CPU and network traffic
    are very high
    while running NewHTableTest,so I guess there is a bottleneck from Zookeeper
    or META table server.
    Any suggestion?


    My Cluster:
    1U servers(4core,12G ram): 20
    zookeepers : 3
    region servers : 10
    regions :1500


    public void NewHTableTest() throws IOException {
    IndexedTable idxTable1= new IndexedTable(config,
    Bytes.toBytes("Table1"));
    IndexedTable idxTable2= new IndexedTable(config,
    Bytes.toBytes("Table2"));
    IndexedTable idxTable3= new IndexedTable(config,
    Bytes.toBytes("Table3"));
    IndexedTable idxTable4= new IndexedTable(config,
    Bytes.toBytes("Table4"));
    IndexedTable idxTable5= new IndexedTable(config,
    Bytes.toBytes("Table5"));
    IndexedTable idxTable6= new IndexedTable(config,
    Bytes.toBytes("Table6"));
    IndexedTable idxTable7= new IndexedTable(config,
    Bytes.toBytes("Table7"));
    IndexedTable idxTable8= new IndexedTable(config,
    Bytes.toBytes("Table8"));
    IndexedTable idxTable9= new IndexedTable(config,
    Bytes.toBytes("Table9"));
    IndexedTable idxTable10= new IndexedTable(config,
    Bytes.toBytes("Table10"));
    IndexedTable idxTable11= new IndexedTable(config,
    Bytes.toBytes("Table11"));
    IndexedTable idxTable12= new IndexedTable(config,
    Bytes.toBytes("Table12"));
    IndexedTable idxTable13= new IndexedTable(config,
    Bytes.toBytes("Table13"));
    IndexedTable idxTable14= new IndexedTable(config,
    Bytes.toBytes("Table14"));
    IndexedTable idxTable15= new IndexedTable(config,
    Bytes.toBytes("Table15"));
    IndexedTable idxTable16= new IndexedTable(config,
    Bytes.toBytes("Table16"));
    IndexedTable idxTable17= new IndexedTable(config,
    Bytes.toBytes("Table17"));
    IndexedTable idxTable18= new IndexedTable(config,
    Bytes.toBytes("Table18"));
    IndexedTable idxTable19= new IndexedTable(config,
    Bytes.toBytes("Table19"));
    IndexedTable idxTable20= new IndexedTable(config,
    Bytes.toBytes("Table20"));
    IndexedTable idxTable21= new IndexedTable(config,
    Bytes.toBytes("Table21"));
    IndexedTable idxTable22= new IndexedTable(config,
    Bytes.toBytes("Table22"));
    IndexedTable idxTable23= new IndexedTable(config,
    Bytes.toBytes("Table23"));
    IndexedTable idxTable24= new IndexedTable(config,
    Bytes.toBytes("Table24"));
    IndexedTable idxTable25= new IndexedTable(config,
    Bytes.toBytes("Table25"));
    IndexedTable idxTable26= new IndexedTable(config,
    Bytes.toBytes("Table26"));
    IndexedTable idxTable27= new IndexedTable(config,
    Bytes.toBytes("Table27"));
    IndexedTable idxTable28= new IndexedTable(config,
    Bytes.toBytes("Table28"));
    IndexedTable idxTable29= new IndexedTable(config,
    Bytes.toBytes("Table29"));
    IndexedTable idxTable30= new IndexedTable(config,
    Bytes.toBytes("Table30"));
    }




    Fleming Chiu(邱宏明)
    707-6128
    y_823910@tsmc.com
    週一無肉日吃素救地球(Meat Free Monday Taiwan)


    ---------------------------------------------------------------------------
    TSMC PROPERTY
    This email communication (and any attachments) is proprietary information
    for the sole use of its
    intended recipient. Any unauthorized review, use or distribution by anyone
    other than the intended
    recipient is strictly prohibited. If you are not the intended recipient,
    please notify the sender by
    replying to this email, and then delete this email and any copies of it
    immediately. Thank you.
    ---------------------------------------------------------------------------


  • Y_823910 at Mar 2, 2010 at 1:38 am
    Hi,

    We treat HBASE as a DataGrid.
    There are a lot of HBase java client in our Compute Grid(GridGain) to fetch
    data from HBASE concurrently.
    Our data is normalized data from Oracle, these computing code is to do join
    and some aggregations.
    So our POC job is to Loading Tables' data from Hbase -> Compute these data
    (join & aggregation) -> Save back to HBase
    It's doing very well while we run 10 jobs using 10 concurrent clients , it
    took 53 sec.
    We expect our 20 machines can gain 60 sec complete time while we run 200
    jobs(200 concurrent clients)
    but in fact, these clients all blocked in following code:
    IndexedTable idxTable1= new
    IndexedTable(config,Bytes.toBytes("Table1"));
    The result we are not satisfied as following,
    200 client 839 sec
    400 cleint 1801 sec
    We estimated about 85% time took in new IndexedTable while client number up
    to 200.
    That say HBase can serve well while hundred of client connecting to it
    concurrently.
    Just new a table in your code then run it concurrently in thread or other
    distributing computing platform
    that maybe you can see what's wrong with it ?
    If Hbase just focuses on a few web server connections that's ok,
    but like RDBMS can serve a thousand of concurrent connection, the Hbase
    architecture seems need to be adjusted.
    That's my opinion!



    Fleming Chiu(邱宏明)
    707-6128
    y_823910@tsmc.com
    週一無肉日吃素救地球(Meat Free Monday Taiwan)





    jdcryans@gmail.co
    m To: hbase-user@hadoop.apache.org
    Sent by: cc: bcwalrus@cloudera.com, kevin_hung@tsmc.com, (bcc: Y_823910/TSMC)
    jdcryans@gmail.co Subject: Re: HBase reading performance
    m


    2010/03/02 03:25
    AM
    Please respond to
    hbase-user






    In this particular case a lot of things come in action:

    - Creating a table is a long process because the client sleeps a lot,
    6 seconds before 0.20.3, 2 seconds in 0.20.3 and even less than that
    in the current head of branch.

    - in 0.20, without the HDFS-200 patch, HDFS doesn't support fs syncs
    so we force memstore flushes at something like 8MB so that you don't
    lose too much data on that very important table (hopefully in 0.21
    it's supported, no data loss yeah!). So all those memstore flushes can
    account for a lot of traffic and can generate a lot more compactions.

    What exactly is your test trying to show? I'm really not sure... that
    tables with very small memstores take edits at a slower rate?

    J-D

    2010/2/28 <y_823910@tsmc.com>:
    Hi,
    I started 200 clients(spread it to 20 machines) to run NewHTableTest like
    following code, which took 983 seconds.
    META table just resides in one region that machine CPU and network traffic
    are very high
    while running NewHTableTest,so I guess there is a bottleneck from Zookeeper
    or META table server.
    Any suggestion?


    My Cluster:
    1U servers(4core,12G ram): 20
    zookeepers : 3
    region servers : 10
    regions :1500


    public void NewHTableTest() throws IOException {
    IndexedTable idxTable1= new IndexedTable(config,
    Bytes.toBytes("Table1"));
    IndexedTable idxTable2= new IndexedTable(config,
    Bytes.toBytes("Table2"));
    IndexedTable idxTable3= new IndexedTable(config,
    Bytes.toBytes("Table3"));
    IndexedTable idxTable4= new IndexedTable(config,
    Bytes.toBytes("Table4"));
    IndexedTable idxTable5= new IndexedTable(config,
    Bytes.toBytes("Table5"));
    IndexedTable idxTable6= new IndexedTable(config,
    Bytes.toBytes("Table6"));
    IndexedTable idxTable7= new IndexedTable(config,
    Bytes.toBytes("Table7"));
    IndexedTable idxTable8= new IndexedTable(config,
    Bytes.toBytes("Table8"));
    IndexedTable idxTable9= new IndexedTable(config,
    Bytes.toBytes("Table9"));
    IndexedTable idxTable10= new IndexedTable(config,
    Bytes.toBytes("Table10"));
    IndexedTable idxTable11= new IndexedTable(config,
    Bytes.toBytes("Table11"));
    IndexedTable idxTable12= new IndexedTable(config,
    Bytes.toBytes("Table12"));
    IndexedTable idxTable13= new IndexedTable(config,
    Bytes.toBytes("Table13"));
    IndexedTable idxTable14= new IndexedTable(config,
    Bytes.toBytes("Table14"));
    IndexedTable idxTable15= new IndexedTable(config,
    Bytes.toBytes("Table15"));
    IndexedTable idxTable16= new IndexedTable(config,
    Bytes.toBytes("Table16"));
    IndexedTable idxTable17= new IndexedTable(config,
    Bytes.toBytes("Table17"));
    IndexedTable idxTable18= new IndexedTable(config,
    Bytes.toBytes("Table18"));
    IndexedTable idxTable19= new IndexedTable(config,
    Bytes.toBytes("Table19"));
    IndexedTable idxTable20= new IndexedTable(config,
    Bytes.toBytes("Table20"));
    IndexedTable idxTable21= new IndexedTable(config,
    Bytes.toBytes("Table21"));
    IndexedTable idxTable22= new IndexedTable(config,
    Bytes.toBytes("Table22"));
    IndexedTable idxTable23= new IndexedTable(config,
    Bytes.toBytes("Table23"));
    IndexedTable idxTable24= new IndexedTable(config,
    Bytes.toBytes("Table24"));
    IndexedTable idxTable25= new IndexedTable(config,
    Bytes.toBytes("Table25"));
    IndexedTable idxTable26= new IndexedTable(config,
    Bytes.toBytes("Table26"));
    IndexedTable idxTable27= new IndexedTable(config,
    Bytes.toBytes("Table27"));
    IndexedTable idxTable28= new IndexedTable(config,
    Bytes.toBytes("Table28"));
    IndexedTable idxTable29= new IndexedTable(config,
    Bytes.toBytes("Table29"));
    IndexedTable idxTable30= new IndexedTable(config,
    Bytes.toBytes("Table30"));
    }




    Fleming Chiu(邱宏明)
    707-6128
    y_823910@tsmc.com
    週一無肉日吃素救地球(Meat Free Monday Taiwan)


    ---------------------------------------------------------------------------
    TSMC PROPERTY
    This email communication (and any attachments) is proprietary
    information
    for the sole use of its
    intended recipient. Any unauthorized review, use or distribution by anyone
    other than the intended
    recipient is strictly prohibited. If you are not the intended
    recipient,
    please notify the sender by
    replying to this email, and then delete this email and any copies of it
    immediately. Thank you.
    ---------------------------------------------------------------------------




    ---------------------------------------------------------------------------
    TSMC PROPERTY
    This email communication (and any attachments) is proprietary information
    for the sole use of its
    intended recipient. Any unauthorized review, use or distribution by anyone
    other than the intended
    recipient is strictly prohibited. If you are not the intended recipient,
    please notify the sender by
    replying to this email, and then delete this email and any copies of it
    immediately. Thank you.
    ---------------------------------------------------------------------------
  • Jean-Daniel Cryans at Mar 2, 2010 at 2:01 am
    Ah I understand now, thanks for the context. So I interpreted your
    first test wrong, you are just basically hitting .META. with a lot of
    random reads with lots of clients that have completely empty caches
    when the test begins.

    So here you hit some pain points we have currently WRT random reads
    but first, I'd like to point out that HBase isn't your typical RDBMS
    where you can just point the machine to read from and be done with it.
    Here the client has to figure the region locations by itself doing
    location discovery using the .META. table. Normally that would be fast
    but a couple of issues are slowing down concurrent reads on hot rows:

    We don't use pread: https://issues.apache.org/jira/browse/HBASE-2180
    Random reading locks whole rows (among other stuff) that will be fixed
    in: https://issues.apache.org/jira/browse/HBASE-2248
    Reading from .META. is really slowed down when it has more than 1
    store file: https://issues.apache.org/jira/browse/HBASE-2175

    The first two are actively being worked on, the third still needs
    investigation and may be just a symptom.

    What this means for you is that, if possible, you should try to reuse
    JVMs across jobs in order to use warmed up caches. For example, do the
    same test but call the same code twice and you should see each new
    HTable be really faster in the second batch.

    Another option would be to implement a new feature in the HBase client
    that warms it up using scanners (I think there's a jira about it).

    J-D

    2010/3/1 <y_823910@tsmc.com>:
    Hi,

    We treat HBASE as a DataGrid.
    There are a lot of HBase java client in our Compute Grid(GridGain) to fetch
    data from HBASE concurrently.
    Our data is normalized data from Oracle, these computing code is to do join
    and some aggregations.
    So our POC job is to  Loading Tables' data from Hbase -> Compute these data
    (join & aggregation) -> Save back to HBase
    It's doing very well while we run 10 jobs using 10 concurrent clients , it
    took 53 sec.
    We expect our 20 machines can gain 60 sec complete time while we run 200
    jobs(200 concurrent clients)
    but in fact, these clients all blocked in following code:
    IndexedTable idxTable1= new
    IndexedTable(config,Bytes.toBytes("Table1"));
    The result we are not satisfied as following,
    200  client   839 sec
    400  cleint  1801 sec
    We estimated about 85% time took in new IndexedTable while client number up
    to 200.
    That say HBase can serve well while hundred of client connecting to it
    concurrently.
    Just new a table in your code then run it concurrently in thread or other
    distributing computing platform
    that maybe you can see what's wrong with it ?
    If Hbase just focuses on a few web server connections that's ok,
    but like RDBMS can serve a thousand of concurrent connection, the Hbase
    architecture seems need to be adjusted.
    That's my opinion!
  • Alvin C.L Huang at Mar 2, 2010 at 3:04 am
    @J-D
    I like the idea of 'warm up'.

    I wondered whether it is possible to clone client caches across JVMs.
    (A cache of hot regions or a cache of a running job)

    --
    Alvin C.-L., Huang / 黃俊龍
    ATC, ICL, ITRI, Taiwan
    T: 886-3-59-14625
    本信件可能包含工研院機密資訊,非指定之收件者,請勿使用或揭露本信件內容,並請銷毀此信件。
    This email may contain confidential information. Please do not use or
    disclose it in any way and delete it if you are not the intended recipient.

    On 2 March 2010 10:00, Jean-Daniel Cryans wrote:

    Ah I understand now, thanks for the context. So I interpreted your
    first test wrong, you are just basically hitting .META. with a lot of
    random reads with lots of clients that have completely empty caches
    when the test begins.

    So here you hit some pain points we have currently WRT random reads
    but first, I'd like to point out that HBase isn't your typical RDBMS
    where you can just point the machine to read from and be done with it.
    Here the client has to figure the region locations by itself doing
    location discovery using the .META. table. Normally that would be fast
    but a couple of issues are slowing down concurrent reads on hot rows:

    We don't use pread: https://issues.apache.org/jira/browse/HBASE-2180
    Random reading locks whole rows (among other stuff) that will be fixed
    in: https://issues.apache.org/jira/browse/HBASE-2248
    Reading from .META. is really slowed down when it has more than 1
    store file: https://issues.apache.org/jira/browse/HBASE-2175

    The first two are actively being worked on, the third still needs
    investigation and may be just a symptom.

    What this means for you is that, if possible, you should try to reuse
    JVMs across jobs in order to use warmed up caches. For example, do the
    same test but call the same code twice and you should see each new
    HTable be really faster in the second batch.

    Another option would be to implement a new feature in the HBase client
    that warms it up using scanners (I think there's a jira about it).

    J-D

    2010/3/1 <y_823910@tsmc.com>:
    Hi,

    We treat HBASE as a DataGrid.
    There are a lot of HBase java client in our Compute Grid(GridGain) to fetch
    data from HBASE concurrently.
    Our data is normalized data from Oracle, these computing code is to do join
    and some aggregations.
    So our POC job is to Loading Tables' data from Hbase -> Compute these data
    (join & aggregation) -> Save back to HBase
    It's doing very well while we run 10 jobs using 10 concurrent clients , it
    took 53 sec.
    We expect our 20 machines can gain 60 sec complete time while we run 20
    jobs(200 concurrent clients)
    but in fact, these clients all blocked in following code:
    IndexedTable idxTable1= new
    IndexedTable(config,Bytes.toBytes("Table1"));
    The result we are not satisfied as following,
    200 client 839 sec
    400 cleint 1801 sec
    We estimated about 85% time took in new IndexedTable while client number up
    to 200.
    That say HBase can serve well while hundred of client connecting to it
    concurrently.
    Just new a table in your code then run it concurrently in thread or other
    distributing computing platform
    that maybe you can see what's wrong with it ?
    If Hbase just focuses on a few web server connections that's ok,
    but like RDBMS can serve a thousand of concurrent connection, the Hbase
    architecture seems need to be adjusted.
    That's my opinion!
  • Jean-Daniel Cryans at Mar 2, 2010 at 3:17 am
    Alvin,

    That feature doesn't exist currently and I don't see a nice way of
    doing it as those regions will change location over time (tho on a
    normal production system it shouldn't vary that much). But, someone
    motivated could do the following:

    - Have a new method in HConnectionManager.TableServers that dumps the
    content of its cachedRegionLocations map to a specified table/row
    key/fam:qual
    - Have another method that's able to load that same content from the
    specified location.

    That's hacky but I'm pretty sure it wouldn't be that hard to do.
    Another way of doing it would be to get a public getter for
    cachedRegionLocations and use the normal HTable to do the exact same
    thing.

    J-D

    2010/3/1 Alvin C.L Huang <alvincl.huang@gmail.com>:
    @J-D
    I like the idea of 'warm up'.

    I wondered whether it is possible to clone client caches across JVMs.
    (A cache of hot regions or a cache of a running job)

    --
    Alvin C.-L., Huang / 黃俊龍
    ATC, ICL, ITRI, Taiwan
    T: 886-3-59-14625
    本信件可能包含工研院機密資訊,非指定之收件者,請勿使用或揭露本信件內容,並請銷毀此信件。
    This email may contain confidential information. Please do not use or
    disclose it in any way and delete it if you are not the intended recipient.

    On 2 March 2010 10:00, Jean-Daniel Cryans wrote:

    Ah I understand now, thanks for the context. So I interpreted your
    first test wrong, you are just basically hitting .META. with a lot of
    random reads with lots of clients that have completely empty caches
    when the test begins.

    So here you hit some pain points we have currently WRT random reads
    but first, I'd like to point out that HBase isn't your typical RDBMS
    where you can just point the machine to read from and be done with it.
    Here the client has to figure the region locations by itself doing
    location discovery using the .META. table. Normally that would be fast
    but a couple of issues are slowing down concurrent reads on hot rows:

    We don't use pread: https://issues.apache.org/jira/browse/HBASE-2180
    Random reading locks whole rows (among other stuff) that will be fixed
    in: https://issues.apache.org/jira/browse/HBASE-2248
    Reading from .META. is really slowed down when it has more than 1
    store file: https://issues.apache.org/jira/browse/HBASE-2175

    The first two are actively being worked on, the third still needs
    investigation and may be just a symptom.

    What this means for you is that, if possible, you should try to reuse
    JVMs across jobs in order to use warmed up caches. For example, do the
    same test but call the same code twice and you should see each new
    HTable be really faster in the second batch.

    Another option would be to implement a new feature in the HBase client
    that warms it up using scanners (I think there's a jira about it).

    J-D
  • Y_823910 at Mar 2, 2010 at 3:02 am
    If I just start a client to fetch the META infomation (string) then inject
    it to
    another clients. Will it be possible?
    Thanks

    Fleming Chiu(邱宏明)
    707-6128
    y_823910@tsmc.com
    週一無肉日吃素救地球(Meat Free Monday Taiwan)





    jdcryans@gmail.co
    m To: hbase-user@hadoop.apache.org
    Sent by: cc: (bcc: Y_823910/TSMC)
    jdcryans@gmail.co Subject: Re: HBase reading performance
    m


    2010/03/02 10:00
    AM
    Please respond to
    hbase-user






    Ah I understand now, thanks for the context. So I interpreted your
    first test wrong, you are just basically hitting .META. with a lot of
    random reads with lots of clients that have completely empty caches
    when the test begins.

    So here you hit some pain points we have currently WRT random reads
    but first, I'd like to point out that HBase isn't your typical RDBMS
    where you can just point the machine to read from and be done with it.
    Here the client has to figure the region locations by itself doing
    location discovery using the .META. table. Normally that would be fast
    but a couple of issues are slowing down concurrent reads on hot rows:

    We don't use pread: https://issues.apache.org/jira/browse/HBASE-2180
    Random reading locks whole rows (among other stuff) that will be fixed
    in: https://issues.apache.org/jira/browse/HBASE-2248
    Reading from .META. is really slowed down when it has more than 1
    store file: https://issues.apache.org/jira/browse/HBASE-2175

    The first two are actively being worked on, the third still needs
    investigation and may be just a symptom.

    What this means for you is that, if possible, you should try to reuse
    JVMs across jobs in order to use warmed up caches. For example, do the
    same test but call the same code twice and you should see each new
    HTable be really faster in the second batch.

    Another option would be to implement a new feature in the HBase client
    that warms it up using scanners (I think there's a jira about it).

    J-D

    2010/3/1 <y_823910@tsmc.com>:
    Hi,

    We treat HBASE as a DataGrid.
    There are a lot of HBase java client in our Compute Grid(GridGain) to fetch
    data from HBASE concurrently.
    Our data is normalized data from Oracle, these computing code is to do join
    and some aggregations.
    So our POC job is to ?Loading Tables' data from Hbase -> Compute these data
    (join & aggregation) -> Save back to HBase
    It's doing very well while we run 10 jobs using 10 concurrent clients , it
    took 53 sec.
    We expect our 20 machines can gain 60 sec complete time while we run 200
    jobs(200 concurrent clients)
    but in fact, these clients all blocked in following code:
    ? ? ?IndexedTable idxTable1= new
    IndexedTable(config,Bytes.toBytes("Table1"));
    The result we are not satisfied as following,
    ? ? > > 200 ?client ? 839 sec
    ? ? > > 400 ?cleint ?1801 sec
    We estimated about 85% time took in new IndexedTable while client number up
    to 200.
    That say HBase can serve well while hundred of client connecting to it
    concurrently.
    Just new a table in your code then run it concurrently in thread or other
    distributing computing platform
    that maybe you can see what's wrong with it ?
    If Hbase just focuses on a few web server connections that's ok,
    but like RDBMS can serve a thousand of concurrent connection, the Hbase
    architecture seems need to be adjusted.
    That's my opinion!



    ---------------------------------------------------------------------------
    TSMC PROPERTY
    This email communication (and any attachments) is proprietary information
    for the sole use of its
    intended recipient. Any unauthorized review, use or distribution by anyone
    other than the intended
    recipient is strictly prohibited. If you are not the intended recipient,
    please notify the sender by
    replying to this email, and then delete this email and any copies of it
    immediately. Thank you.
    ---------------------------------------------------------------------------

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedMar 1, '10 at 8:23a
activeMar 2, '10 at 3:17a
posts7
users3
websitehbase.apache.org

People

Translate

site design / logo © 2022 Grokbase