FAQ
Hi,

There are 2 region servers(2G memory), 5 data nodes in my cluster.
I want to test HBase reading performance by writing a program with Hbase
client.
Inside that codes, I was using secondary index to scan the data I need,
that took 80 sec to fetch 5243 rows that was very cool!

Then I tried to deploy that program to another two machines, trying to test
hbase
ability of handling concurrent clients'reading.
Each client fetch the same data(5243 rows)
The Result is like following:
1 concurrent client read: 80 sec
2 concurrent client read: 104 sec
3 concurrent client read: 232 sec
As above, increasing more concurrent client reading connections seems to
lower hbase performance too much.
Any opinions?




Fleming Chiu(邱宏明)
707-6128
y_823910@tsmc.com
週一無肉日吃素救地球(Meat Free Monday Taiwan)


---------------------------------------------------------------------------
TSMC PROPERTY
This email communication (and any attachments) is proprietary information
for the sole use of its
intended recipient. Any unauthorized review, use or distribution by anyone
other than the intended
recipient is strictly prohibited. If you are not the intended recipient,
please notify the sender by
replying to this email, and then delete this email and any copies of it
immediately. Thank you.
---------------------------------------------------------------------------

Search Discussions

  • Stack at Jan 5, 2010 at 6:45 am
    2010/1/3 <y_823910@tsmc.com>
    Each client fetch the same data(5243 rows)
    The Result is like following:
    1 concurrent client read: 80 sec
    2 concurrent client read: 104 sec
    3 concurrent client read: 232 sec
    As above, increasing more concurrent client reading connections seems to
    lower hbase performance too much.
    Any opinions?
    Clients were all running in a single process? If so, try running them as
    distinct processes.
    St.Ack
  • Stack at Jan 5, 2010 at 6:46 am
    My guess is that you have too little data. Try adding 500k rows. What is
    your schema like? What size is your data?
    St.Ack
    On Mon, Jan 4, 2010 at 10:44 PM, stack wrote:

    2010/1/3 <y_823910@tsmc.com>
    Each client fetch the same data(5243 rows)
    The Result is like following:
    1 concurrent client read: 80 sec
    2 concurrent client read: 104 sec
    3 concurrent client read: 232 sec
    As above, increasing more concurrent client reading connections seems to
    lower hbase performance too much.
    Any opinions?
    Clients were all running in a single process? If so, try running them as
    distinct processes.
    St.Ack
  • Y_823910 at Jan 5, 2010 at 6:50 am
    No, I dispatched that program to three different machines.








    2010/1/3 <y_823910@tsmc.com>
    Each client fetch the same data(5243 rows)
    The Result is like following:
    1 concurrent client read: 80 sec
    2 concurrent client read: 104 sec
    3 concurrent client read: 232 sec
    As above, increasing more concurrent client reading connections seems to
    lower hbase performance too much.
    Any opinions?
    Clients were all running in a single process? If so, try running them as
    distinct processes.
    St.Ack




    ---------------------------------------------------------------------------
    TSMC PROPERTY
    This email communication (and any attachments) is proprietary information
    for the sole use of its
    intended recipient. Any unauthorized review, use or distribution by anyone
    other than the intended
    recipient is strictly prohibited. If you are not the intended recipient,
    please notify the sender by
    replying to this email, and then delete this email and any copies of it
    immediately. Thank you.
    ---------------------------------------------------------------------------
  • Y_823910 at Jan 5, 2010 at 7:14 am
    Our data size is about 6G and more 500k rows.
    The schema we created is that only two column family and a few
    qualifiers(keep oracle columns)
    We are going to fire thousands of clients to fetch data from HBase.
    It became so slow even when we only increased to 3 clients.
    Trying to scale-out our region server to 4 , unfortunatly, it worst than
    before.
    Does it work if I set handler.count to 20
    <property>
    <name>hbase.regionserver.handler.count</name>
    <value>10</value>
    <description>Count of RPC Server instances spun up on RegionServers
    Same property is used by the HMaster for count of master handlers.
    Default is 10.
    </description>
    </property>


    Fleming Chiu(邱宏明)
    707-6128
    y_823910@tsmc.com
    週一無肉日吃素救地球(Meat Free Monday Taiwan)





    saint.ack@gmail.c
    om To: hbase-user@hadoop.apache.org
    Sent by: cc: (bcc: Y_823910/TSMC)
    saint.ack@gmail.c Subject: Re: HBase reading test
    om


    2010/01/05 02:46
    PM
    Please respond to
    hbase-user






    My guess is that you have too little data. Try adding 500k rows. What is
    your schema like? What size is your data?
    St.Ack
    On Mon, Jan 4, 2010 at 10:44 PM, stack wrote:

    2010/1/3 <y_823910@tsmc.com>
    Each client fetch the same data(5243 rows)
    The Result is like following:
    1 concurrent client read: 80 sec
    2 concurrent client read: 104 sec
    3 concurrent client read: 232 sec
    As above, increasing more concurrent client reading connections seems to
    lower hbase performance too much.
    Any opinions?
    Clients were all running in a single process? If so, try running them as
    distinct processes.
    St.Ack



    ---------------------------------------------------------------------------
    TSMC PROPERTY
    This email communication (and any attachments) is proprietary information
    for the sole use of its
    intended recipient. Any unauthorized review, use or distribution by anyone
    other than the intended
    recipient is strictly prohibited. If you are not the intended recipient,
    please notify the sender by
    replying to this email, and then delete this email and any copies of it
    immediately. Thank you.
    ---------------------------------------------------------------------------
  • Stack at Jan 5, 2010 at 7:22 am
    Well if only 3 counts, its probably not handler count, though, yes on a
    loaded cluster, you should up the handlers all around (in hbase and in
    hdfs). Check out the performance page on the wiki. Anything there that can
    help? 3 clients have this much trouble is a bit odd going by folks
    experience. See if you can figure where the time is being spent?

    Thanks,
    St.Ack

    2010/1/4 <y_823910@tsmc.com>
    Our data size is about 6G and more 500k rows.
    The schema we created is that only two column family and a few
    qualifiers(keep oracle columns)
    We are going to fire thousands of clients to fetch data from HBase.
    It became so slow even when we only increased to 3 clients.
    Trying to scale-out our region server to 4 , unfortunatly, it worst than
    before.
    Does it work if I set handler.count to 20
    <property>
    <name>hbase.regionserver.handler.count</name>
    <value>10</value>
    <description>Count of RPC Server instances spun up on RegionServers
    Same property is used by the HMaster for count of master handlers.
    Default is 10.
    </description>
    </property>


    Fleming Chiu(邱宏明)
    707-6128
    y_823910@tsmc.com
    週一無肉日吃素救地球(Meat Free Monday Taiwan)





    saint.ack@gmail.c
    om To:
    hbase-user@hadoop.apache.org
    Sent by: cc: (bcc: Y_823910/TSMC)
    saint.ack@gmail.c Subject: Re: HBase reading
    test
    om


    2010/01/05 02:46
    PM
    Please respond to
    hbase-user






    My guess is that you have too little data. Try adding 500k rows. What is
    your schema like? What size is your data?
    St.Ack
    On Mon, Jan 4, 2010 at 10:44 PM, stack wrote:

    2010/1/3 <y_823910@tsmc.com>
    Each client fetch the same data(5243 rows)
    The Result is like following:
    1 concurrent client read: 80 sec
    2 concurrent client read: 104 sec
    3 concurrent client read: 232 sec
    As above, increasing more concurrent client reading connections seems to
    lower hbase performance too much.
    Any opinions?
    Clients were all running in a single process? If so, try running them as
    distinct processes.
    St.Ack




    ---------------------------------------------------------------------------
    TSMC PROPERTY
    This email communication (and any attachments) is proprietary information
    for the sole use of its
    intended recipient. Any unauthorized review, use or distribution by anyone
    other than the intended
    recipient is strictly prohibited. If you are not the intended recipient,
    please notify the sender by
    replying to this email, and then delete this email and any copies of it
    immediately. Thank you.

    ---------------------------------------------------------------------------


  • Y_823910 at Jan 5, 2010 at 7:54 am
    My reading steps like following.
    Previous results are the next scanning condition.
    That became so slower is due to multiple users scan the index table ?
    Anyone experienced this? (Multiple users concurrent scan the same data will
    slower hbase performance)

    One index value
    scan
    Table1 --- Table1-idxColumn
    Results
    scan
    Table2 --- Table2-idxColumn
    Results
    .
    .scan
    .
    .
    .
    .
    Table5 --- Table5-idxColumn


    Fleming Chiu(邱宏明)
    707-6128
    y_823910@tsmc.com
    週一無肉日吃素救地球(Meat Free Monday Taiwan)





    saint.ack@gmail.c
    om To: hbase-user@hadoop.apache.org
    Sent by: cc: (bcc: Y_823910/TSMC)
    saint.ack@gmail.c Subject: Re: HBase reading test
    om


    2010/01/05 03:21
    PM
    Please respond to
    hbase-user






    Well if only 3 counts, its probably not handler count, though, yes on a
    loaded cluster, you should up the handlers all around (in hbase and in
    hdfs). Check out the performance page on the wiki. Anything there that
    can
    help? 3 clients have this much trouble is a bit odd going by folks
    experience. See if you can figure where the time is being spent?

    Thanks,
    St.Ack

    2010/1/4 <y_823910@tsmc.com>
    Our data size is about 6G and more 500k rows.
    The schema we created is that only two column family and a few
    qualifiers(keep oracle columns)
    We are going to fire thousands of clients to fetch data from HBase.
    It became so slow even when we only increased to 3 clients.
    Trying to scale-out our region server to 4 , unfortunatly, it worst than
    before.
    Does it work if I set handler.count to 20
    <property>
    <name>hbase.regionserver.handler.count</name>
    <value>10</value>
    <description>Count of RPC Server instances spun up on RegionServers
    Same property is used by the HMaster for count of master handlers.
    Default is 10.
    </description>
    </property>


    Fleming Chiu(邱宏明)
    707-6128
    y_823910@tsmc.com
    週一無肉日吃素救地球(Meat Free Monday Taiwan)





    saint.ack@gmail.c
    om To:
    hbase-user@hadoop.apache.org
    Sent by: cc: (bcc:
    Y_823910/TSMC)
    saint.ack@gmail.c Subject: Re: HBase reading
    test
    om


    2010/01/05 02:46
    PM
    Please respond to
    hbase-user






    My guess is that you have too little data. Try adding 500k rows. What is
    your schema like? What size is your data?
    St.Ack
    On Mon, Jan 4, 2010 at 10:44 PM, stack wrote:

    2010/1/3 <y_823910@tsmc.com>
    Each client fetch the same data(5243 rows)
    The Result is like following:
    1 concurrent client read: 80 sec
    2 concurrent client read: 104 sec
    3 concurrent client read: 232 sec
    As above, increasing more concurrent client reading connections seems
    to
    lower hbase performance too much.
    Any opinions?
    Clients were all running in a single process? If so, try running them
    as
    distinct processes.
    St.Ack




    ---------------------------------------------------------------------------
    TSMC PROPERTY
    This email communication (and any attachments) is proprietary
    information
    for the sole use of its
    intended recipient. Any unauthorized review, use or distribution by anyone
    other than the intended
    recipient is strictly prohibited. If you are not the intended
    recipient,
    please notify the sender by
    replying to this email, and then delete this email and any copies of it
    immediately. Thank you.

    ---------------------------------------------------------------------------




    ---------------------------------------------------------------------------
    TSMC PROPERTY
    This email communication (and any attachments) is proprietary information
    for the sole use of its
    intended recipient. Any unauthorized review, use or distribution by anyone
    other than the intended
    recipient is strictly prohibited. If you are not the intended recipient,
    please notify the sender by
    replying to this email, and then delete this email and any copies of it
    immediately. Thank you.
    ---------------------------------------------------------------------------
  • Y_823910 at Jan 7, 2010 at 6:56 am
    Hi,

    I've found the root cause of why multiple reading users lower the hbase
    performance.
    That's because I always new a HTable in a share function, which will make
    the region server with meta information being
    very busy!
    After update following code, the reading performance is fantastic.
    1 concurrent client read: 27 sec
    2 concurrent client read: 28 sec
    4 concurrent client read: 36 sec

    public Vector<String> ScanHBase(String tablename,String columnfamily,String
    KeyColumn,String StartKeyValue,String StopKeyValue) throws IOException {
    HTable table = new HTable(config, tablename); //-- bad writing

    .
    .
    .
    }

    2010/1/3 <y_823910@tsmc.com>
    Each client fetch the same data(5243 rows)
    The Result is like following:
    1 concurrent client read: 80 sec
    2 concurrent client read: 104 sec
    3 concurrent client read: 232 sec
    As above, increasing more concurrent client
    reading connections seems to
    lower hbase performance too much.
    Any opinions?
    >
    >
    Clients were all running in a single process?
    If so, try running them as
    distinct processes.
    St.Ack







    Fleming Chiu(邱宏明)
    707-6128
    y_823910@tsmc.com
    週一無肉日吃素救地球(Meat Free Monday Taiwan)


    ---------------------------------------------------------------------------
    TSMC PROPERTY
    This email communication (and any attachments) is proprietary information
    for the sole use of its
    intended recipient. Any unauthorized review, use or distribution by anyone
    other than the intended
    recipient is strictly prohibited. If you are not the intended recipient,
    please notify the sender by
    replying to this email, and then delete this email and any copies of it
    immediately. Thank you.
    ---------------------------------------------------------------------------
  • Jean-Daniel Cryans at Jan 7, 2010 at 7:13 pm
    Yeah instantiating a HTable is very expensive since it pings .META.
    once, glad you could resolve your issue!

    J-D

    2010/1/6 <y_823910@tsmc.com>:
    Hi,

    I've found the root cause of why multiple reading users lower the hbase
    performance.
    That's because I always new a HTable in a share function, which will make
    the region server with meta information being
    very busy!
    After update following code, the reading performance is fantastic.
    1 concurrent client read: 27 sec
    2 concurrent client read: 28 sec
    4 concurrent client read: 36 sec

    public Vector<String> ScanHBase(String tablename,String columnfamily,String
    KeyColumn,String StartKeyValue,String StopKeyValue) throws IOException {
    HTable table = new HTable(config, tablename); //-- bad writing

    .
    .
    .
    }

    2010/1/3 <y_823910@tsmc.com>
    Each client fetch the same data(5243 rows)
    The Result is like following:
    1 concurrent client read: 80 sec
    2 concurrent client read: 104 sec
    3 concurrent client read: 232 sec
    As above, increasing more concurrent client
    reading connections seems to
    lower hbase performance too much.
    Any opinions?
    Clients were all running in a single process?
    If so, try running them as
    distinct processes.
    St.Ack







    Fleming Chiu(邱宏明)
    707-6128
    y_823910@tsmc.com
    週一無肉日吃素救地球(Meat Free Monday Taiwan)


    ---------------------------------------------------------------------------
    TSMC PROPERTY
    This email communication (and any attachments) is proprietary information
    for the sole use of its
    intended recipient. Any unauthorized review, use or distribution by anyone
    other than the intended
    recipient is strictly prohibited. If you are not the intended recipient,
    please notify the sender by
    replying to this email, and then delete this email and any copies of it
    immediately. Thank you.
    ---------------------------------------------------------------------------


Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedJan 4, '10 at 9:43a
activeJan 7, '10 at 7:13p
posts9
users3
websitehbase.apache.org

People

Translate

site design / logo © 2022 Grokbase