FAQ
Dear All,

I have created 2 clients with multi-threading support to perform concurrent writes to HBase with initial expectation that with multiple threads I should be able to write faster. The clients that I created are using the Native HBase API and Thrift API.

To my surprise, the performance with multi-threaded clients dropped for the both the clients consistently when compared to single threaded ingestion. As I increase the number of threads the writes performance degrades consistently. With a single thread ingestion both the clients perform far better, but I intend to use HBase in a multi-threaded environment, wherein I am facing challenges with the performance.

Since I am relatively new to HBase, please do excuse me if I am asking something very basic, but any suggestions around this would be extremely helpful.

Thanks and Regards
Pankaj Misra


________________________________

Impetus Ranked in the Top 50 India’s Best Companies to Work For 2012.

Impetus webcast ‘Designing a Test Automation Framework for Multi-vendor Interoperable Systems’ available at http://lf1.me/0E/.


NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

Search Discussions

  • Doug Meil at Sep 19, 2012 at 8:19 pm
    Hi there,

    You haven't described much about your environment, but there are two
    things you might want to consider for starters:

    1) Is the table pre-split? (I.e., if it isn't, there is one region)
    2) If it is, are all the writes hitting the same region?

    For other write tips, see thisŠ

    http://hbase.apache.org/book.html#perf.writing




    On 9/19/12 2:53 PM, "Pankaj Misra" wrote:

    Dear All,

    I have created 2 clients with multi-threading support to perform
    concurrent writes to HBase with initial expectation that with multiple
    threads I should be able to write faster. The clients that I created are
    using the Native HBase API and Thrift API.

    To my surprise, the performance with multi-threaded clients dropped for
    the both the clients consistently when compared to single threaded
    ingestion. As I increase the number of threads the writes performance
    degrades consistently. With a single thread ingestion both the clients
    perform far better, but I intend to use HBase in a multi-threaded
    environment, wherein I am facing challenges with the performance.

    Since I am relatively new to HBase, please do excuse me if I am asking
    something very basic, but any suggestions around this would be extremely
    helpful.

    Thanks and Regards
    Pankaj Misra


    ________________________________

    Impetus Ranked in the Top 50 India¹s Best Companies to Work For 2012.

    Impetus webcast ŒDesigning a Test Automation Framework for Multi-vendor
    Interoperable Systems¹ available at http://lf1.me/0E/.


    NOTE: This message may contain information that is confidential,
    proprietary, privileged or otherwise protected by law. The message is
    intended solely for the named addressee. If received in error, please
    destroy and notify the sender. Any use of this email is prohibited when
    received in error. Impetus does not represent, warrant and/or guarantee,
    that the integrity of this communication has been maintained nor that the
    communication is free of errors, virus, interception or interference.
  • Pankaj Misra at Sep 19, 2012 at 8:49 pm
    Thank you so much Doug.

    You are right there is only one region to start with as I am not pre-splitting them. So for a given set of writes, all are hitting the same region.

    I will have the table pre-split as described, and test again. Will the number of region servers also impact the writes performance?

    My environment is HBase 0.94.1 with Hadoop 0.23.1, running on Oracle JVM 1.6. I am running hbase in a pseudo-distributed mode. Please find below my hbase-site.xml, which has very basic configurations.
    <configuration>
    <property>
    <name>hbase.rootdir</name>
    <value>hdfs://localhost:9000/hbase</value>
    </property>
    <property>
    <name>dfs.replication</name>
    <value>1</value>
    </property>
    <property>
    <name>hbase.zookeeper.quorum</name>
    <value>localhost</value>
    </property>
    <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
    </property>
    </configuration>


    Thanks and Regards
    Pankaj Misra


    ________________________________________
    From: Doug Meil [doug.meil@explorysmedical.com]
    Sent: Thursday, September 20, 2012 1:48 AM
    To: user@hbase.apache.org
    Subject: Re: HBase Multi-Threaded Writes

    Hi there,

    You haven't described much about your environment, but there are two
    things you might want to consider for starters:

    1) Is the table pre-split? (I.e., if it isn't, there is one region)
    2) If it is, are all the writes hitting the same region?

    For other write tips, see thisŠ

    http://hbase.apache.org/book.html#perf.writing




    On 9/19/12 2:53 PM, "Pankaj Misra" wrote:

    Dear All,

    I have created 2 clients with multi-threading support to perform
    concurrent writes to HBase with initial expectation that with multiple
    threads I should be able to write faster. The clients that I created are
    using the Native HBase API and Thrift API.

    To my surprise, the performance with multi-threaded clients dropped for
    the both the clients consistently when compared to single threaded
    ingestion. As I increase the number of threads the writes performance
    degrades consistently. With a single thread ingestion both the clients
    perform far better, but I intend to use HBase in a multi-threaded
    environment, wherein I am facing challenges with the performance.

    Since I am relatively new to HBase, please do excuse me if I am asking
    something very basic, but any suggestions around this would be extremely
    helpful.

    Thanks and Regards
    Pankaj Misra


    ________________________________

    Impetus Ranked in the Top 50 India¹s Best Companies to Work For 2012.

    Impetus webcast ŒDesigning a Test Automation Framework for Multi-vendor
    Interoperable Systems¹ available at http://lf1.me/0E/.


    NOTE: This message may contain information that is confidential,
    proprietary, privileged or otherwise protected by law. The message is
    intended solely for the named addressee. If received in error, please
    destroy and notify the sender. Any use of this email is prohibited when
    received in error. Impetus does not represent, warrant and/or guarantee,
    that the integrity of this communication has been maintained nor that the
    communication is free of errors, virus, interception or interference.


    ________________________________

    Impetus Ranked in the Top 50 India’s Best Companies to Work For 2012.

    Impetus webcast ‘Designing a Test Automation Framework for Multi-vendor Interoperable Systems’ available at http://lf1.me/0E/.


    NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
  • Doug Meil at Sep 19, 2012 at 9:01 pm
    re: "pseudo-distributed mode"

    Ok, so you're doing a local test. The benefits you get with multiple
    regions per table that are spread across multiple RegionServers are that
    you can engage more of the cluster in your workload. You can't really do
    that on a local test.



    On 9/19/12 4:48 PM, "Pankaj Misra" wrote:

    Thank you so much Doug.

    You are right there is only one region to start with as I am not
    pre-splitting them. So for a given set of writes, all are hitting the
    same region.

    I will have the table pre-split as described, and test again. Will the
    number of region servers also impact the writes performance?

    My environment is HBase 0.94.1 with Hadoop 0.23.1, running on Oracle JVM
    1.6. I am running hbase in a pseudo-distributed mode. Please find below
    my hbase-site.xml, which has very basic configurations.
    <configuration>
    <property>
    <name>hbase.rootdir</name>
    <value>hdfs://localhost:9000/hbase</value>
    </property>
    <property>
    <name>dfs.replication</name>
    <value>1</value>
    </property>
    <property>
    <name>hbase.zookeeper.quorum</name>
    <value>localhost</value>
    </property>
    <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
    </property>
    </configuration>


    Thanks and Regards
    Pankaj Misra


    ________________________________________
    From: Doug Meil [doug.meil@explorysmedical.com]
    Sent: Thursday, September 20, 2012 1:48 AM
    To: user@hbase.apache.org
    Subject: Re: HBase Multi-Threaded Writes

    Hi there,

    You haven't described much about your environment, but there are two
    things you might want to consider for starters:

    1) Is the table pre-split? (I.e., if it isn't, there is one region)
    2) If it is, are all the writes hitting the same region?

    For other write tips, see thisŠ

    http://hbase.apache.org/book.html#perf.writing




    On 9/19/12 2:53 PM, "Pankaj Misra" wrote:

    Dear All,

    I have created 2 clients with multi-threading support to perform
    concurrent writes to HBase with initial expectation that with multiple
    threads I should be able to write faster. The clients that I created are
    using the Native HBase API and Thrift API.

    To my surprise, the performance with multi-threaded clients dropped for
    the both the clients consistently when compared to single threaded
    ingestion. As I increase the number of threads the writes performance
    degrades consistently. With a single thread ingestion both the clients
    perform far better, but I intend to use HBase in a multi-threaded
    environment, wherein I am facing challenges with the performance.

    Since I am relatively new to HBase, please do excuse me if I am asking
    something very basic, but any suggestions around this would be extremely
    helpful.

    Thanks and Regards
    Pankaj Misra


    ________________________________

    Impetus Ranked in the Top 50 India¹s Best Companies to Work For 2012.

    Impetus webcast ŒDesigning a Test Automation Framework for Multi-vendor
    Interoperable Systems¹ available at http://lf1.me/0E/.


    NOTE: This message may contain information that is confidential,
    proprietary, privileged or otherwise protected by law. The message is
    intended solely for the named addressee. If received in error, please
    destroy and notify the sender. Any use of this email is prohibited when
    received in error. Impetus does not represent, warrant and/or guarantee,
    that the integrity of this communication has been maintained nor that the
    communication is free of errors, virus, interception or interference.


    ________________________________

    Impetus Ranked in the Top 50 India’s Best Companies to Work For 2012.

    Impetus webcast ‘Designing a Test Automation Framework for Multi-vendor
    Interoperable Systems’ available at http://lf1.me/0E/.


    NOTE: This message may contain information that is confidential,
    proprietary, privileged or otherwise protected by law. The message is
    intended solely for the named addressee. If received in error, please
    destroy and notify the sender. Any use of this email is prohibited when
    received in error. Impetus does not represent, warrant and/or guarantee,
    that the integrity of this communication has been maintained nor that the
    communication is free of errors, virus, interception or interference.
  • Doug Meil at Sep 19, 2012 at 9:02 pm
    You probably want to do a review of these chapters too...

    http://hbase.apache.org/book.html#architecture
    http://hbase.apache.org/book.html#datamodel
    http://hbase.apache.org/book.html#schema




    On 9/19/12 4:48 PM, "Pankaj Misra" wrote:

    Thank you so much Doug.

    You are right there is only one region to start with as I am not
    pre-splitting them. So for a given set of writes, all are hitting the
    same region.

    I will have the table pre-split as described, and test again. Will the
    number of region servers also impact the writes performance?

    My environment is HBase 0.94.1 with Hadoop 0.23.1, running on Oracle JVM
    1.6. I am running hbase in a pseudo-distributed mode. Please find below
    my hbase-site.xml, which has very basic configurations.
    <configuration>
    <property>
    <name>hbase.rootdir</name>
    <value>hdfs://localhost:9000/hbase</value>
    </property>
    <property>
    <name>dfs.replication</name>
    <value>1</value>
    </property>
    <property>
    <name>hbase.zookeeper.quorum</name>
    <value>localhost</value>
    </property>
    <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
    </property>
    </configuration>


    Thanks and Regards
    Pankaj Misra


    ________________________________________
    From: Doug Meil [doug.meil@explorysmedical.com]
    Sent: Thursday, September 20, 2012 1:48 AM
    To: user@hbase.apache.org
    Subject: Re: HBase Multi-Threaded Writes

    Hi there,

    You haven't described much about your environment, but there are two
    things you might want to consider for starters:

    1) Is the table pre-split? (I.e., if it isn't, there is one region)
    2) If it is, are all the writes hitting the same region?

    For other write tips, see thisŠ

    http://hbase.apache.org/book.html#perf.writing




    On 9/19/12 2:53 PM, "Pankaj Misra" wrote:

    Dear All,

    I have created 2 clients with multi-threading support to perform
    concurrent writes to HBase with initial expectation that with multiple
    threads I should be able to write faster. The clients that I created are
    using the Native HBase API and Thrift API.

    To my surprise, the performance with multi-threaded clients dropped for
    the both the clients consistently when compared to single threaded
    ingestion. As I increase the number of threads the writes performance
    degrades consistently. With a single thread ingestion both the clients
    perform far better, but I intend to use HBase in a multi-threaded
    environment, wherein I am facing challenges with the performance.

    Since I am relatively new to HBase, please do excuse me if I am asking
    something very basic, but any suggestions around this would be extremely
    helpful.

    Thanks and Regards
    Pankaj Misra


    ________________________________

    Impetus Ranked in the Top 50 India¹s Best Companies to Work For 2012.

    Impetus webcast ŒDesigning a Test Automation Framework for Multi-vendor
    Interoperable Systems¹ available at http://lf1.me/0E/.


    NOTE: This message may contain information that is confidential,
    proprietary, privileged or otherwise protected by law. The message is
    intended solely for the named addressee. If received in error, please
    destroy and notify the sender. Any use of this email is prohibited when
    received in error. Impetus does not represent, warrant and/or guarantee,
    that the integrity of this communication has been maintained nor that the
    communication is free of errors, virus, interception or interference.


    ________________________________

    Impetus Ranked in the Top 50 India’s Best Companies to Work For 2012.

    Impetus webcast ‘Designing a Test Automation Framework for Multi-vendor
    Interoperable Systems’ available at http://lf1.me/0E/.


    NOTE: This message may contain information that is confidential,
    proprietary, privileged or otherwise protected by law. The message is
    intended solely for the named addressee. If received in error, please
    destroy and notify the sender. Any use of this email is prohibited when
    received in error. Impetus does not represent, warrant and/or guarantee,
    that the integrity of this communication has been maintained nor that the
    communication is free of errors, virus, interception or interference.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedSep 19, '12 at 6:53p
activeSep 19, '12 at 9:02p
posts5
users2
websitehbase.apache.org

2 users in discussion

Doug Meil: 3 posts Pankaj Misra: 2 posts

People

Translate

site design / logo © 2021 Grokbase