Grokbase Groups HBase user June 2010
FAQ

[HBase-user] HBase's pros and cons

Y_823910
Jun 22, 2010 at 3:03 am
Hi there,

I would like to get a few words about HBase's pros and cons
that may probably help my boss to make decision of adopting
HBase as production.

Pros : High volumn data random access
Scale-out with commodity machine
Fault-tolerance
Free license

Cons : No security control
Data loss risk
Redesign data schema
Lacking of aggregate function(Max, Min, Avg...)
Multiple client concurrent read/write performance
No commercial support now

Any suggestion or correctness would be appreciated!


Fleming Chiu(邱宏明)
Cloudera Certification for Hadoop Map/Red Developer
TEL: 707-2260
Email: y_823910@tsmc.com
Be Veg! Go Green! Save the planet!


---------------------------------------------------------------------------
TSMC PROPERTY
This email communication (and any attachments) is proprietary information
for the sole use of its
intended recipient. Any unauthorized review, use or distribution by anyone
other than the intended
recipient is strictly prohibited. If you are not the intended recipient,
please notify the sender by
replying to this email, and then delete this email and any copies of it
immediately. Thank you.
---------------------------------------------------------------------------
reply

Search Discussions

5 responses

  • Todd Lipcon at Jun 22, 2010 at 3:08 am
    Hi Fleming,

    Lots has been written about this if you look through the archives. Just a
    few notes below:

    2010/6/21 <y_823910@tsmc.com>
    Hi there,

    I would like to get a few words about HBase's pros and cons
    that may probably help my boss to make decision of adopting
    HBase as production.

    Pros : High volumn data random access
    Scale-out with commodity machine
    Fault-tolerance
    Free license

    Cons : No security control
    Andrew Purtell and his team at Trend Micro are working on this in the next
    quarter or two. Please refer to:
    https://issues.apache.org/jira/browse/HBASE-1697

    Data loss risk
    This is essentially fixed in our next major release, assuming you are
    running the right build of HDFS. More coming to the user list next week on
    this subject, but with proper sync() support in HDFS, data loss should never
    happen unless there are bugs. Bugs that do cause data loss will be treated
    with highest priority.

    Redesign data schema
    Lacking of aggregate function(Max, Min, Avg...)
    Multiple client concurrent read/write performance
    Not sure what you mean about this - there have been some performance bugs in
    the past with contention, but we've improved and will continue to improve on
    performance.

    No commercial support now
    Cloudera is beginning to offer commercial support for HBase with CDH3. Let
    me know off-list if I can put you in touch with our sales people (I don't
    want to make the community list a sales forum!)


    Any suggestion or correctness would be appreciated!


    Fleming Chiu(邱宏明)
    Cloudera Certification for Hadoop Map/Red Developer
    TEL: 707-2260
    Email: y_823910@tsmc.com
    Be Veg! Go Green! Save the planet!



    ---------------------------------------------------------------------------
    TSMC PROPERTY
    This email communication (and any attachments) is proprietary information
    for the sole use of its
    intended recipient. Any unauthorized review, use or distribution by anyone
    other than the intended
    recipient is strictly prohibited. If you are not the intended recipient,
    please notify the sender by
    replying to this email, and then delete this email and any copies of it
    immediately. Thank you.

    ---------------------------------------------------------------------------



    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Alex kamil at Jun 22, 2010 at 4:01 am
    Fleming,

    I'd add integration with Hadoop & MapReduce support for data mining tasks as
    one of the major pros.

    On the other side - the size and complexity of the code
    base/API/Configuration is something to consider. Look at the size of the
    source code in KLOC, number of hadoop/hbase/zookeeper configuration
    parameters etc.
    Hadoop/Hbase project is a bit of a heavy-weight in this regard. See if you
    can make sense of these parameters before deploying into prod. compare to
    similar or more lightweight systems which might do the job, you will have to
    support it after all.

    i'd also recommend running this benchmark on your hardware before making
    any decisions: http://wiki.github.com/brianfrankcooper/YCSB/

    Cheers
    Alex
    http://www.columbia.edu/~ak2834/
    <http://www.columbia.edu/~ak2834/>
    On Mon, Jun 21, 2010 at 11:07 PM, Todd Lipcon wrote:

    Hi Fleming,

    Lots has been written about this if you look through the archives. Just a
    few notes below:

    2010/6/21 <y_823910@tsmc.com>
    Hi there,

    I would like to get a few words about HBase's pros and cons
    that may probably help my boss to make decision of adopting
    HBase as production.

    Pros : High volumn data random access
    Scale-out with commodity machine
    Fault-tolerance
    Free license

    Cons : No security control
    Andrew Purtell and his team at Trend Micro are working on this in the next
    quarter or two. Please refer to:
    https://issues.apache.org/jira/browse/HBASE-1697

    Data loss risk
    This is essentially fixed in our next major release, assuming you are
    running the right build of HDFS. More coming to the user list next week on
    this subject, but with proper sync() support in HDFS, data loss should
    never
    happen unless there are bugs. Bugs that do cause data loss will be treated
    with highest priority.

    Redesign data schema
    Lacking of aggregate function(Max, Min, Avg...)
    Multiple client concurrent read/write performance
    Not sure what you mean about this - there have been some performance bugs
    in
    the past with contention, but we've improved and will continue to improve
    on
    performance.

    No commercial support now
    Cloudera is beginning to offer commercial support for HBase with CDH3. Let
    me know off-list if I can put you in touch with our sales people (I don't
    want to make the community list a sales forum!)


    Any suggestion or correctness would be appreciated!


    Fleming Chiu(邱宏明)
    Cloudera Certification for Hadoop Map/Red Developer
    TEL: 707-2260
    Email: y_823910@tsmc.com
    Be Veg! Go Green! Save the planet!



    ---------------------------------------------------------------------------
    TSMC PROPERTY
    This email communication (and any attachments) is proprietary
    information
    for the sole use of its
    intended recipient. Any unauthorized review, use or distribution by anyone
    other than the intended
    recipient is strictly prohibited. If you are not the intended
    recipient,
    please notify the sender by
    replying to this email, and then delete this email and any copies of it
    immediately. Thank you.

    ---------------------------------------------------------------------------


    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Andrew Purtell at Jun 24, 2010 at 5:52 am

    Just want to add I think this is a really important point:

    From: alex kamil <alex.kamil@gmail.com>
    Subject: Re: HBase's pros and cons
    To: user@hbase.apache.org
    Cc: kevin_hung@tsmc.com
    Date: Monday, June 21, 2010, 9:00 PM
    Fleming,

    compare to similar or more lightweight systems which might do
    the job, you will have to support it after all.
    I presume you are looking at Hadoop and HBase because you have a Big Data problem (or Medium Data problem: http://techcrunch.com/2010/03/16/big-data-freedom/). But if this is not the case, there are other tools with a smaller footprint and more familiar semantics you should consider, such as MySQL, or some of the document oriented "NoSQL" offerings.

    If you have a search intensive application you might want to look at something like Solr. Along those lines, for an architecture which seems to merge Solr and HBase profitably, I encourage you to check out Outerthought's LilyCMS: http://lilycms.org/lily/, http://lilycms.org/lily/381-OTC.html

    - Andy
  • Michael Segel at Jun 22, 2010 at 4:01 pm
    The real con to HBase is its maturity, or rather lack of maturity.

    Its still pretty much 'bleeding edge'. The issue is that when comparing to alternatives, your rdbms and hierarchical database technology is 20-30+ years old. (Revelation, Pick, U2/Universe are all examples of hierarchical databases).

    So you have a limited pool of resources contributing to this.

    And while this is the major negative, its important to point out that HBase has evolved dramatically over the past 3 years and as more companies adopt HBase, the evolution should accelerate.


    Subject: HBase's pros and cons
    To: user@hbase.apache.org
    CC: kevin_hung@tsmc.com
    From: y_823910@tsmc.com
    Date: Tue, 22 Jun 2010 11:02:52 +0800

    Hi there,

    I would like to get a few words about HBase's pros and cons
    that may probably help my boss to make decision of adopting
    HBase as production.

    Pros : High volumn data random access
    Scale-out with commodity machine
    Fault-tolerance
    Free license

    Cons : No security control
    Data loss risk
    Redesign data schema
    Lacking of aggregate function(Max, Min, Avg...)
    Multiple client concurrent read/write performance
    No commercial support now

    Any suggestion or correctness would be appreciated!


    Fleming Chiu(邱宏明)
    Cloudera Certification for Hadoop Map/Red Developer
    TEL: 707-2260
    Email: y_823910@tsmc.com
    Be Veg! Go Green! Save the planet!


    ---------------------------------------------------------------------------
    TSMC PROPERTY
    This email communication (and any attachments) is proprietary information
    for the sole use of its
    intended recipient. Any unauthorized review, use or distribution by anyone
    other than the intended
    recipient is strictly prohibited. If you are not the intended recipient,
    please notify the sender by
    replying to this email, and then delete this email and any copies of it
    immediately. Thank you.
    ---------------------------------------------------------------------------

    _________________________________________________________________
    Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
    http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1
  • Andrew Purtell at Jun 24, 2010 at 5:42 am

    Trend Micro will provide a solution for this:

    Cons : No security control
    by end of 2010 Q3.

    - Andy

    From: y_823910@tsmc.com <y_823910@tsmc.com>
    Subject: HBase's pros and cons
    To: user@hbase.apache.org
    Cc: kevin_hung@tsmc.com
    Date: Monday, June 21, 2010, 8:02 PM
    Hi there,

    I would like to get a few words about HBase's pros and
    cons that may probably help my boss to make decision of
    adopting HBase as production.

    Pros : High volumn data random access
    Scale-out with commodity
    machine
    Fault-tolerance
    Free license

    Cons : No security control
    Data loss risk
    Redesign data schema
    Lacking of aggregate
    function(Max, Min, Avg...)
    Multiple client
    concurrent read/write performance
    No commercial support
    now

    Any suggestion or correctness would be appreciated!


    Fleming Chiu(邱宏明)
    Cloudera Certification for Hadoop Map/Red Developer
    TEL: 707-2260
    Email: y_823910@tsmc.com
    Be Veg! Go Green! Save the planet!



    ---------------------------------------------------------------------------




    TSMC PROPERTY

    This email communication (and any attachments) is
    proprietary information
    for the sole use of its



    intended recipient. Any unauthorized review, use or
    distribution by anyone
    other than the intended



    recipient is strictly prohibited.  If you are not the
    intended recipient,
    please notify the sender by



    replying to this email, and then delete this email and any
    copies of it
    immediately. Thank you.




    ---------------------------------------------------------------------------



Related Discussions

Discussion Navigation
viewthread | post