FAQ
Currently I am using RDBMS in my project. My project basically monitor
servers. It has to collect the information from all the servers ( no. of
servers could be very huge) every 5 minutes and store it in the database.
storing all the servers information ( around 10000 rows will be inserted
with logical comparison) within 5 minutes itself is challenging for RDBMS
database. we have to maintain around 6 months data in the database.
So,that’s why the data amount becomes very huge. This is the primary
requirement of our project and if it works good then this could be used
widely. Basically I like to know if at all the Hbase could enhance the
writing and reading time of the database and could be used to scale the
database in great respect.
--
View this message in context: http://old.nabble.com/Should-I-use-HBASE--tp32462213p32462213.html
Sent from the HBase User mailing list archive at Nabble.com.

Search Discussions

  • Joey Echeverria at Sep 14, 2011 at 12:12 pm
    What do you mean by "server information"? If you're talking about
    performance metrics, checkout OpenTSDB[1]. It's built on top of HBase
    and designed for this exact use case.

    -Joey

    [1] http://opentsdb.net/
    On Wed, Sep 14, 2011 at 6:02 AM, stable29 wrote:

    Currently I am using RDBMS in my project. My project basically monitor
    servers. It has to collect the information from all the servers ( no. of
    servers could be very huge) every 5 minutes and store it in the database.
    storing all the servers information ( around 10000 rows will be inserted
    with logical comparison) within 5 minutes itself is challenging for RDBMS
    database. we have to maintain around 6 months data in the database.
    So,that’s why the data amount becomes very huge.  This is the primary
    requirement of our project and if it works good then this could be used
    widely. Basically I like to know if at all the Hbase could enhance the
    writing and reading time of the database and could be used to scale the
    database in great respect.
    --
    View this message in context: http://old.nabble.com/Should-I-use-HBASE--tp32462213p32462213.html
    Sent from the HBase User mailing list archive at Nabble.com.


    --
    Joseph Echeverria
    Cloudera, Inc.
    443.305.9434
  • Otis Gospodnetic at Sep 14, 2011 at 3:09 pm
    Hi,

    I'd guess that you could relatively easily write something that writes that much data into your RDBMS and see how writes start behaving over time and how fast reads are after you are done with all writes.
    Over at Sematext we have this thing called Scalable Performance Monitoring [1] service and we chose HBase to store all performance metrics, but we keep a LOT of data (points).

    [1] http://sematext.com/spm/index.html


    Not coincidentally, we also have HBase-specific monitoring and reports there.

    Otis

    From: stable29 <arpitak29@gmail.com>
    To: hbase-user@hadoop.apache.org
    Sent: Wednesday, September 14, 2011 6:02 AM
    Subject: Should I use HBASE?


    Currently I am using RDBMS in my project. My project basically monitor
    servers. It has to collect the information from all the servers ( no. of
    servers could be very huge) every 5 minutes and store it in the database.
    storing all the servers information ( around 10000 rows will be inserted
    with logical comparison) within 5 minutes itself is challenging for RDBMS
    database. we have to maintain around 6 months data in the database.
    So,that’s why the data amount becomes very huge.  This is the primary
    requirement of our project and if it works good then this could be used
    widely. Basically I like to know if at all the Hbase could enhance the
    writing and reading time of the database and could be used to scale the
    database in great respect.
    --
    View this message in context: http://old.nabble.com/Should-I-use-HBASE--tp32462213p32462213.html
    Sent from the HBase User mailing list archive at Nabble.com.

  • Michael Segel at Sep 14, 2011 at 6:18 pm
    I realize that this is an HBase group, however nothing in the stated problem would suggest that an RDBMs couldn't handle the problem.
    Inserting 10K rows every 5 minutes poses a challenge to the database?

    I guess it would be a challenge based on the size and type of data along with the database, schema, hardware, etc... Essentially YMMV.

    I'm not sure that switching to HBase would solve their problem.

    Date: Wed, 14 Sep 2011 08:09:13 -0700
    From: otis_gospodnetic@yahoo.com
    Subject: Re: Should I use HBASE?
    To: user@hbase.apache.org

    Hi,

    I'd guess that you could relatively easily write something that writes that much data into your RDBMS and see how writes start behaving over time and how fast reads are after you are done with all writes.
    Over at Sematext we have this thing called Scalable Performance Monitoring [1] service and we chose HBase to store all performance metrics, but we keep a LOT of data (points).

    [1] http://sematext.com/spm/index.html


    Not coincidentally, we also have HBase-specific monitoring and reports there.

    Otis

    From: stable29 <arpitak29@gmail.com>
    To: hbase-user@hadoop.apache.org
    Sent: Wednesday, September 14, 2011 6:02 AM
    Subject: Should I use HBASE?


    Currently I am using RDBMS in my project. My project basically monitor
    servers. It has to collect the information from all the servers ( no. of
    servers could be very huge) every 5 minutes and store it in the database.
    storing all the servers information ( around 10000 rows will be inserted
    with logical comparison) within 5 minutes itself is challenging for RDBMS
    database. we have to maintain around 6 months data in the database.
    So,that’s why the data amount becomes very huge. This is the primary
    requirement of our project and if it works good then this could be used
    widely. Basically I like to know if at all the Hbase could enhance the
    writing and reading time of the database and could be used to scale the
    database in great respect.
    --
    View this message in context: http://old.nabble.com/Should-I-use-HBASE--tp32462213p32462213.html
    Sent from the HBase User mailing list archive at Nabble.com.

  • Ian Varley at Sep 14, 2011 at 7:02 pm
    That's an important point to make, Michael. Jumping to HBase (or any NoSQL store) from an RDBMS has pros and cons; the pros are generally that you can scale linearly on cheap(er) hardware as your data and usage grows, but the cons are that many things you take for granted in an RDBMS (like transactions, joins, indexes) aren't built in. You shouldn't assume that just because it's "a lot" of data, that an RDBMS won't handle it well. Benchmarking is key.

    In this case, 6-months' worth of data at a rate of 10K inserts per 5 minutes comes out to a steady state of about 500M rows (is that what you mean, @stable29?). Even with skinny rows, that's not "trivial" for a relational database, especially if that database is MySQL. It can work, but you'll have to have someone who really understands the DB at a low level and can administer it, troubleshoot, deal with physical deletion after the 6 months is up, etc. If you ever need to change your schema while keeping the system online, that could also be a challenge. These things are all TOTALLY doable on a relational DB, but you are at least edging towards the territory where there's a reasonable case to be made for HBase.

    Also, since you also don't (probably) have much worry in terms of complex transactions, joins, etc., it does sound like a situation where a small HBase cluster might do a nice job at storing this data for you. If you can design in terms of one (or a small number) of access (read & write) patterns that will always be used, you can really optimize it to the point that you pretty much know exactly how every write is going onto the disk and getting read from the disk.

    Even with HBase, though, you'll still need someone who really understands the architecture, etc. The difference might just be that HBase is fundamentally simpler than a relational DB; if that simplicity provides what you need without complex workarounds, cool. HBase puts you closer to the metal than a relational database; sometimes that's good (at scale) and sometimes it's not (say, if you didn't really need that power and a higher level, more abstract tool set like a relational database would suffice).

    Ian
    On Sep 14, 2011, at 1:17 PM, Michael Segel wrote:


    I realize that this is an HBase group, however nothing in the stated problem would suggest that an RDBMs couldn't handle the problem.
    Inserting 10K rows every 5 minutes poses a challenge to the database?

    I guess it would be a challenge based on the size and type of data along with the database, schema, hardware, etc... Essentially YMMV.

    I'm not sure that switching to HBase would solve their problem.

    Date: Wed, 14 Sep 2011 08:09:13 -0700
    From: otis_gospodnetic@yahoo.com
    Subject: Re: Should I use HBASE?
    To: user@hbase.apache.org

    Hi,

    I'd guess that you could relatively easily write something that writes that much data into your RDBMS and see how writes start behaving over time and how fast reads are after you are done with all writes.
    Over at Sematext we have this thing called Scalable Performance Monitoring [1] service and we chose HBase to store all performance metrics, but we keep a LOT of data (points).

    [1] http://sematext.com/spm/index.html


    Not coincidentally, we also have HBase-specific monitoring and reports there.

    Otis

    From: stable29 <arpitak29@gmail.com>
    To: hbase-user@hadoop.apache.org
    Sent: Wednesday, September 14, 2011 6:02 AM
    Subject: Should I use HBASE?


    Currently I am using RDBMS in my project. My project basically monitor
    servers. It has to collect the information from all the servers ( no. of
    servers could be very huge) every 5 minutes and store it in the database.
    storing all the servers information ( around 10000 rows will be inserted
    with logical comparison) within 5 minutes itself is challenging for RDBMS
    database. we have to maintain around 6 months data in the database.
    So,that’s why the data amount becomes very huge. This is the primary
    requirement of our project and if it works good then this could be used
    widely. Basically I like to know if at all the Hbase could enhance the
    writing and reading time of the database and could be used to scale the
    database in great respect.
    --
    View this message in context: http://old.nabble.com/Should-I-use-HBASE--tp32462213p32462213.html
    Sent from the HBase User mailing list archive at Nabble.com.

  • Michael Segel at Sep 14, 2011 at 9:23 pm
    Ian,

    I think you misunderstood my point.

    The initial author asks a question about using HBase, yet doesn't really provide enough detailed information as to what he wants to achieve and why he is failing.

    My point was based on the information that he presented, he didn't show how or why his RDBMs solution was failing. (Or what he meant when he used the term fail.)
    There are so many reasons why the RDBMs could fail and it could be a factor of which RDBMs is being used.
    I've seen 50K ticks a second being ingested in to Informix's Financial Foundation offering 10 years ago. Here, there is a specific set up of the servers and configuration of IDS.
    But that's 50K records inserted in a second, not 5K every 5 minutes.

    Is it trivial? Probably not trivial, but still not really rocket science.

    But I digress. Again the point is that we have a person coming here and asking us 'is this a good fit' and it would be better to say 'it depends' or 'you haven't provided enough information...'

    To your point, yes, there are other databases out there like Informix and Oracle that scale better than MySQL. If the issue is that his RDBMs can't keep up, then one question I have to ask is if he's thought about changing to a different RDBMs platform. What happens if you say sure we can do this in HBase, and then he pulls out his 'must be ACID compliant' card?

    -Mike

    From: ivarley@salesforce.com
    To: user@hbase.apache.org
    Date: Wed, 14 Sep 2011 12:01:46 -0700
    Subject: Re: Should I use HBASE?

    That's an important point to make, Michael. Jumping to HBase (or any NoSQL store) from an RDBMS has pros and cons; the pros are generally that you can scale linearly on cheap(er) hardware as your data and usage grows, but the cons are that many things you take for granted in an RDBMS (like transactions, joins, indexes) aren't built in. You shouldn't assume that just because it's "a lot" of data, that an RDBMS won't handle it well. Benchmarking is key.

    In this case, 6-months' worth of data at a rate of 10K inserts per 5 minutes comes out to a steady state of about 500M rows (is that what you mean, @stable29?). Even with skinny rows, that's not "trivial" for a relational database, especially if that database is MySQL. It can work, but you'll have to have someone who really understands the DB at a low level and can administer it, troubleshoot, deal with physical deletion after the 6 months is up, etc. If you ever need to change your schema while keeping the system online, that could also be a challenge. These things are all TOTALLY doable on a relational DB, but you are at least edging towards the territory where there's a reasonable case to be made for HBase.

    Also, since you also don't (probably) have much worry in terms of complex transactions, joins, etc., it does sound like a situation where a small HBase cluster might do a nice job at storing this data for you. If you can design in terms of one (or a small number) of access (read & write) patterns that will always be used, you can really optimize it to the point that you pretty much know exactly how every write is going onto the disk and getting read from the disk.

    Even with HBase, though, you'll still need someone who really understands the architecture, etc. The difference might just be that HBase is fundamentally simpler than a relational DB; if that simplicity provides what you need without complex workarounds, cool. HBase puts you closer to the metal than a relational database; sometimes that's good (at scale) and sometimes it's not (say, if you didn't really need that power and a higher level, more abstract tool set like a relational database would suffice).

    Ian
    On Sep 14, 2011, at 1:17 PM, Michael Segel wrote:


    I realize that this is an HBase group, however nothing in the stated problem would suggest that an RDBMs couldn't handle the problem.
    Inserting 10K rows every 5 minutes poses a challenge to the database?

    I guess it would be a challenge based on the size and type of data along with the database, schema, hardware, etc... Essentially YMMV.

    I'm not sure that switching to HBase would solve their problem.

    Date: Wed, 14 Sep 2011 08:09:13 -0700
    From: otis_gospodnetic@yahoo.com
    Subject: Re: Should I use HBASE?
    To: user@hbase.apache.org

    Hi,

    I'd guess that you could relatively easily write something that writes that much data into your RDBMS and see how writes start behaving over time and how fast reads are after you are done with all writes.
    Over at Sematext we have this thing called Scalable Performance Monitoring [1] service and we chose HBase to store all performance metrics, but we keep a LOT of data (points).

    [1] http://sematext.com/spm/index.html


    Not coincidentally, we also have HBase-specific monitoring and reports there.

    Otis

    From: stable29 <arpitak29@gmail.com>
    To: hbase-user@hadoop.apache.org
    Sent: Wednesday, September 14, 2011 6:02 AM
    Subject: Should I use HBASE?


    Currently I am using RDBMS in my project. My project basically monitor
    servers. It has to collect the information from all the servers ( no. of
    servers could be very huge) every 5 minutes and store it in the database.
    storing all the servers information ( around 10000 rows will be inserted
    with logical comparison) within 5 minutes itself is challenging for RDBMS
    database. we have to maintain around 6 months data in the database.
    So,that’s why the data amount becomes very huge. This is the primary
    requirement of our project and if it works good then this could be used
    widely. Basically I like to know if at all the Hbase could enhance the
    writing and reading time of the database and could be used to scale the
    database in great respect.
    --
    View this message in context: http://old.nabble.com/Should-I-use-HBASE--tp32462213p32462213.html
    Sent from the HBase User mailing list archive at Nabble.com.

  • Ian Varley at Sep 14, 2011 at 10:34 pm
    Point well taken, Mike. :) It's a bad idea to assume we know the original poster's requirements well enough to suggest a direction, based on such a brief sketch.

    Original poster, let me be clear: a data set of your size may (or may not) be a good fit for doing in HBase; relational databases regularly do that volume of transactions happily, and offer advanced features and ACID guarantees that HBase does not. If you'd like more targeted advice from the community, perhaps answer the following questions:

    1. Is the 500M rows you refer to a max target, or just the initial volume? Is there some other multiplier you didn't mention? (You said, "if it works good, this could be used widely")

    2. What kind of read access patterns will you have? I.e. do you always get the data by a specific key, or scan across ordered rows? Or, would you need to be able to gather data in real time based on filters on other attributes (like you'd use an index in a relational DB for).

    3. How big is the content of a typical row, in bytes?

    4. Is using a more "industrial strength" DB like Oracle an option? Or would you be doing it on a free offering like MySQL or Postgres? Would you have a DBA to help administer the solution?

    Ian
    On Sep 14, 2011, at 4:23 PM, Michael Segel wrote:


    Ian,

    I think you misunderstood my point.

    The initial author asks a question about using HBase, yet doesn't really provide enough detailed information as to what he wants to achieve and why he is failing.

    My point was based on the information that he presented, he didn't show how or why his RDBMs solution was failing. (Or what he meant when he used the term fail.)
    There are so many reasons why the RDBMs could fail and it could be a factor of which RDBMs is being used.
    I've seen 50K ticks a second being ingested in to Informix's Financial Foundation offering 10 years ago. Here, there is a specific set up of the servers and configuration of IDS.
    But that's 50K records inserted in a second, not 5K every 5 minutes.

    Is it trivial? Probably not trivial, but still not really rocket science.

    But I digress. Again the point is that we have a person coming here and asking us 'is this a good fit' and it would be better to say 'it depends' or 'you haven't provided enough information...'

    To your point, yes, there are other databases out there like Informix and Oracle that scale better than MySQL. If the issue is that his RDBMs can't keep up, then one question I have to ask is if he's thought about changing to a different RDBMs platform. What happens if you say sure we can do this in HBase, and then he pulls out his 'must be ACID compliant' card?

    -Mike

    From: ivarley@salesforce.com
    To: user@hbase.apache.org
    Date: Wed, 14 Sep 2011 12:01:46 -0700
    Subject: Re: Should I use HBASE?

    That's an important point to make, Michael. Jumping to HBase (or any NoSQL store) from an RDBMS has pros and cons; the pros are generally that you can scale linearly on cheap(er) hardware as your data and usage grows, but the cons are that many things you take for granted in an RDBMS (like transactions, joins, indexes) aren't built in. You shouldn't assume that just because it's "a lot" of data, that an RDBMS won't handle it well. Benchmarking is key.

    In this case, 6-months' worth of data at a rate of 10K inserts per 5 minutes comes out to a steady state of about 500M rows (is that what you mean, @stable29?). Even with skinny rows, that's not "trivial" for a relational database, especially if that database is MySQL. It can work, but you'll have to have someone who really understands the DB at a low level and can administer it, troubleshoot, deal with physical deletion after the 6 months is up, etc. If you ever need to change your schema while keeping the system online, that could also be a challenge. These things are all TOTALLY doable on a relational DB, but you are at least edging towards the territory where there's a reasonable case to be made for HBase.

    Also, since you also don't (probably) have much worry in terms of complex transactions, joins, etc., it does sound like a situation where a small HBase cluster might do a nice job at storing this data for you. If you can design in terms of one (or a small number) of access (read & write) patterns that will always be used, you can really optimize it to the point that you pretty much know exactly how every write is going onto the disk and getting read from the disk.

    Even with HBase, though, you'll still need someone who really understands the architecture, etc. The difference might just be that HBase is fundamentally simpler than a relational DB; if that simplicity provides what you need without complex workarounds, cool. HBase puts you closer to the metal than a relational database; sometimes that's good (at scale) and sometimes it's not (say, if you didn't really need that power and a higher level, more abstract tool set like a relational database would suffice).

    Ian
    On Sep 14, 2011, at 1:17 PM, Michael Segel wrote:


    I realize that this is an HBase group, however nothing in the stated problem would suggest that an RDBMs couldn't handle the problem.
    Inserting 10K rows every 5 minutes poses a challenge to the database?

    I guess it would be a challenge based on the size and type of data along with the database, schema, hardware, etc... Essentially YMMV.

    I'm not sure that switching to HBase would solve their problem.

    Date: Wed, 14 Sep 2011 08:09:13 -0700
    From: otis_gospodnetic@yahoo.com
    Subject: Re: Should I use HBASE?
    To: user@hbase.apache.org

    Hi,

    I'd guess that you could relatively easily write something that writes that much data into your RDBMS and see how writes start behaving over time and how fast reads are after you are done with all writes.
    Over at Sematext we have this thing called Scalable Performance Monitoring [1] service and we chose HBase to store all performance metrics, but we keep a LOT of data (points).

    [1] http://sematext.com/spm/index.html


    Not coincidentally, we also have HBase-specific monitoring and reports there.

    Otis

    From: stable29 <arpitak29@gmail.com>
    To: hbase-user@hadoop.apache.org
    Sent: Wednesday, September 14, 2011 6:02 AM
    Subject: Should I use HBASE?


    Currently I am using RDBMS in my project. My project basically monitor
    servers. It has to collect the information from all the servers ( no. of
    servers could be very huge) every 5 minutes and store it in the database.
    storing all the servers information ( around 10000 rows will be inserted
    with logical comparison) within 5 minutes itself is challenging for RDBMS
    database. we have to maintain around 6 months data in the database.
    So,that’s why the data amount becomes very huge. This is the primary
    requirement of our project and if it works good then this could be used
    widely. Basically I like to know if at all the Hbase could enhance the
    writing and reading time of the database and could be used to scale the
    database in great respect.
    --
    View this message in context: http://old.nabble.com/Should-I-use-HBASE--tp32462213p32462213.html
    Sent from the HBase User mailing list archive at Nabble.com.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedSep 14, '11 at 10:02a
activeSep 14, '11 at 10:34p
posts7
users5
websitehbase.apache.org

People

Translate

site design / logo © 2022 Grokbase