Hi there,
I would like to get a few words about HBase's pros and cons
that may probably help my boss to make decision of adopting
HBase as production.
Pros : High volumn data random access
Scale-out with commodity machine
Fault-tolerance
Free license
Cons : No security control
Data loss risk
Redesign data schema
Lacking of aggregate function(Max, Min, Avg...)
Multiple client concurrent read/write performance
No commercial support now
Any suggestion or correctness would be appreciated!
Fleming Chiu(邱宏明)
Cloudera Certification for Hadoop Map/Red Developer
TEL: 707-2260
Email: y_823910@tsmc.com
Be Veg! Go Green! Save the planet!
---------------------------------------------------------------------------
TSMC PROPERTY
This email communication (and any attachments) is proprietary information
for the sole use of its
intended recipient. Any unauthorized review, use or distribution by anyone
other than the intended
recipient is strictly prohibited. If you are not the intended recipient,
please notify the sender by
replying to this email, and then delete this email and any copies of it
immediately. Thank you.
---------------------------------------------------------------------------
[HBase-user] HBase's pros and cons
| Tweet |
|
Search Discussions
-
Todd Lipcon at Jun 22, 2010 at 3:08 am ⇧
Hi Fleming,
Lots has been written about this if you look through the archives. Just a
few notes below:
2010/6/21 <y_823910@tsmc.com>Hi there,Andrew Purtell and his team at Trend Micro are working on this in the next
I would like to get a few words about HBase's pros and cons
that may probably help my boss to make decision of adopting
HBase as production.
Pros : High volumn data random access
Scale-out with commodity machine
Fault-tolerance
Free license
Cons : No security control
quarter or two. Please refer to:
https://issues.apache.org/jira/browse/HBASE-1697Data loss riskThis is essentially fixed in our next major release, assuming you are
running the right build of HDFS. More coming to the user list next week on
this subject, but with proper sync() support in HDFS, data loss should never
happen unless there are bugs. Bugs that do cause data loss will be treated
with highest priority.Redesign data schemaNot sure what you mean about this - there have been some performance bugs in
Lacking of aggregate function(Max, Min, Avg...)
Multiple client concurrent read/write performance
the past with contention, but we've improved and will continue to improve on
performance.No commercial support nowCloudera is beginning to offer commercial support for HBase with CDH3. Let
me know off-list if I can put you in touch with our sales people (I don't
want to make the community list a sales forum!)Any suggestion or correctness would be appreciated!
Fleming Chiu(邱宏明)
Cloudera Certification for Hadoop Map/Red Developer
TEL: 707-2260
Email: y_823910@tsmc.com
Be Veg! Go Green! Save the planet!
---------------------------------------------------------------------------
TSMC PROPERTY
This email communication (and any attachments) is proprietary information
for the sole use of its
intended recipient. Any unauthorized review, use or distribution by anyone
other than the intended
recipient is strictly prohibited. If you are not the intended recipient,
please notify the sender by
replying to this email, and then delete this email and any copies of it
immediately. Thank you.
---------------------------------------------------------------------------
--
Todd Lipcon
Software Engineer, Cloudera -
Alex kamil at Jun 22, 2010 at 4:01 am ⇧
Fleming,
I'd add integration with Hadoop & MapReduce support for data mining tasks as
one of the major pros.
On the other side - the size and complexity of the code
base/API/Configuration is something to consider. Look at the size of the
source code in KLOC, number of hadoop/hbase/zookeeper configuration
parameters etc.
Hadoop/Hbase project is a bit of a heavy-weight in this regard. See if you
can make sense of these parameters before deploying into prod. compare to
similar or more lightweight systems which might do the job, you will have to
support it after all.
i'd also recommend running this benchmark on your hardware before making
any decisions: http://wiki.github.com/brianfrankcooper/YCSB/
Cheers
Alex
http://www.columbia.edu/~ak2834/
<http://www.columbia.edu/~ak2834/>On Mon, Jun 21, 2010 at 11:07 PM, Todd Lipcon wrote:
Hi Fleming,
Lots has been written about this if you look through the archives. Just a
few notes below:
2010/6/21 <y_823910@tsmc.com>Hi there,Andrew Purtell and his team at Trend Micro are working on this in the next
I would like to get a few words about HBase's pros and cons
that may probably help my boss to make decision of adopting
HBase as production.
Pros : High volumn data random access
Scale-out with commodity machine
Fault-tolerance
Free license
Cons : No security control
quarter or two. Please refer to:
https://issues.apache.org/jira/browse/HBASE-1697Data loss riskThis is essentially fixed in our next major release, assuming you are
running the right build of HDFS. More coming to the user list next week on
this subject, but with proper sync() support in HDFS, data loss should
never
happen unless there are bugs. Bugs that do cause data loss will be treated
with highest priority.Redesign data schemaNot sure what you mean about this - there have been some performance bugs
Lacking of aggregate function(Max, Min, Avg...)
Multiple client concurrent read/write performance
in
the past with contention, but we've improved and will continue to improve
on
performance.No commercial support nowCloudera is beginning to offer commercial support for HBase with CDH3. Let
me know off-list if I can put you in touch with our sales people (I don't
want to make the community list a sales forum!)Any suggestion or correctness would be appreciated!
Fleming Chiu(邱宏明)
Cloudera Certification for Hadoop Map/Red Developer
TEL: 707-2260
Email: y_823910@tsmc.com
Be Veg! Go Green! Save the planet!
---------------------------------------------------------------------------
TSMC PROPERTY
This email communication (and any attachments) is proprietary
information
for the sole use of its
intended recipient. Any unauthorized review, use or distribution by anyone
other than the intended
recipient is strictly prohibited. If you are not the intended
recipient,
please notify the sender by
replying to this email, and then delete this email and any copies of it
immediately. Thank you.
---------------------------------------------------------------------------
--
Todd Lipcon
Software Engineer, Cloudera -
Andrew Purtell at Jun 24, 2010 at 5:52 am ⇧
I presume you are looking at Hadoop and HBase because you have a Big Data problem (or Medium Data problem: http://techcrunch.com/2010/03/16/big-data-freedom/). But if this is not the case, there are other tools with a smaller footprint and more familiar semantics you should consider, such as MySQL, or some of the document oriented "NoSQL" offerings.Just want to add I think this is a really important point:
From: alex kamil <alex.kamil@gmail.com>
Subject: Re: HBase's pros and cons
To: user@hbase.apache.org
Cc: kevin_hung@tsmc.com
Date: Monday, June 21, 2010, 9:00 PM
Fleming,
compare to similar or more lightweight systems which might do
the job, you will have to support it after all.
If you have a search intensive application you might want to look at something like Solr. Along those lines, for an architecture which seems to merge Solr and HBase profitably, I encourage you to check out Outerthought's LilyCMS: http://lilycms.org/lily/, http://lilycms.org/lily/381-OTC.html
- Andy
-
Michael Segel at Jun 22, 2010 at 4:01 pm ⇧
The real con to HBase is its maturity, or rather lack of maturity.
Its still pretty much 'bleeding edge'. The issue is that when comparing to alternatives, your rdbms and hierarchical database technology is 20-30+ years old. (Revelation, Pick, U2/Universe are all examples of hierarchical databases).
So you have a limited pool of resources contributing to this.
And while this is the major negative, its important to point out that HBase has evolved dramatically over the past 3 years and as more companies adopt HBase, the evolution should accelerate.Subject: HBase's pros and cons_________________________________________________________________
To: user@hbase.apache.org
CC: kevin_hung@tsmc.com
From: y_823910@tsmc.com
Date: Tue, 22 Jun 2010 11:02:52 +0800
Hi there,
I would like to get a few words about HBase's pros and cons
that may probably help my boss to make decision of adopting
HBase as production.
Pros : High volumn data random access
Scale-out with commodity machine
Fault-tolerance
Free license
Cons : No security control
Data loss risk
Redesign data schema
Lacking of aggregate function(Max, Min, Avg...)
Multiple client concurrent read/write performance
No commercial support now
Any suggestion or correctness would be appreciated!
Fleming Chiu(邱宏明)
Cloudera Certification for Hadoop Map/Red Developer
TEL: 707-2260
Email: y_823910@tsmc.com
Be Veg! Go Green! Save the planet!
---------------------------------------------------------------------------
TSMC PROPERTY
This email communication (and any attachments) is proprietary information
for the sole use of its
intended recipient. Any unauthorized review, use or distribution by anyone
other than the intended
recipient is strictly prohibited. If you are not the intended recipient,
please notify the sender by
replying to this email, and then delete this email and any copies of it
immediately. Thank you.
---------------------------------------------------------------------------
Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1
-
Andrew Purtell at Jun 24, 2010 at 5:42 am ⇧
by end of 2010 Q3.Trend Micro will provide a solution for this:
Cons : No security control
- AndyFrom: y_823910@tsmc.com <y_823910@tsmc.com>
Subject: HBase's pros and cons
To: user@hbase.apache.org
Cc: kevin_hung@tsmc.com
Date: Monday, June 21, 2010, 8:02 PM
Hi there,
I would like to get a few words about HBase's pros and
cons that may probably help my boss to make decision of
adopting HBase as production.
Pros : High volumn data random access
Scale-out with commodity
machine
Fault-tolerance
Free license
Cons : No security control
Data loss risk
Redesign data schema
Lacking of aggregate
function(Max, Min, Avg...)
Multiple client
concurrent read/write performance
No commercial support
now
Any suggestion or correctness would be appreciated!
Fleming Chiu(邱宏明)
Cloudera Certification for Hadoop Map/Red Developer
TEL: 707-2260
Email: y_823910@tsmc.com
Be Veg! Go Green! Save the planet!
---------------------------------------------------------------------------
TSMC PROPERTY
This email communication (and any attachments) is
proprietary information
for the sole use of its
intended recipient. Any unauthorized review, use or
distribution by anyone
other than the intended
recipient is strictly prohibited. If you are not the
intended recipient,
please notify the sender by
replying to this email, and then delete this email and any
copies of it
immediately. Thank you.
---------------------------------------------------------------------------
Related Discussions
Discussion Navigation
| view | thread | post |
Discussion Overview
| group | user
|
| categories | hbase, hadoop |
| posted | Jun 22, '10 at 3:03a |
| active | Jun 24, '10 at 5:52a |
| posts | 6 |
| users | 5 |
| website | hbase.apache.org |
