Grokbase Groups HBase user June 2011
FAQ
Hello list,

I was a few days ago at SIGMOD and was happy to attend Facebook's talk
on HBase.

As I could understand their workflow makes heavy use of incremental
couters for analytics and so is mine. For what I understand the cost of
incrementing a counter is 2 * N + 1 IOPS, where N is the number of
sequence files over which my dataset is spread, 2 because you have seek
AND read and the final 1 comes from the write to the append-log.
As that looks like an expensive operation, I was guessing if I was
missing something and what are the strategies to alleviate such a cost
(a part of bloom filters).

Thanks!

Claudio

--

Claudio Martella
Digital Technologies
Unit Research & Development - Analyst

TIS innovation park
Via Siemens 19 | Siemensstr. 19
39100 Bolzano | 39100 Bozen
Tel. +39 0471 068 123
Fax +39 0471 068 129
claudio.martella@tis.bz.it http://www.tis.bz.it

Short information regarding use of personal data. According to Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order to fulfil contractual and fiscal obligations and also to send you information regarding our services and events. Your personal data are processed with and without electronic means and by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard to confidentiality, personal identity and the right to personal data protection. At any time and without formalities you can write an e-mail to privacy@tis.bz.it in order to object the processing of your personal data for the purpose of sending advertising materials and also to exercise the right to access personal data and other rights referred to in Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete information on the web site www.tis.bz.it.

Search Discussions

  • Andrew Purtell at Jun 18, 2011 at 7:25 pm
    This is from memory, but I expect someone will chime in if any detail is inaccurate. :-)

    If the blocks containing the values you are updating fit into blockcache then read IOPS are avoided, satisfied from cache, not disk. Evictions from blockcache are done on an LRU basis. (Packing related frequently accessed items into a block via multiple columns and/or appropriate row key construction is therefore advisable.)

    When seeking for a value HBase examines storefiles from the most recent backwards and stops as soon as the required number of versions, including delete markers, have been found. Updates only care about the most recent version. The most recent versions of frequently updated values are likely to be found early in the search; the more frequent, the earlier, the first.

    By enabling bloom filters you can also avoid reads of storefiles known not to contain the value(s) as HBase searches backwards in time. This is because the bloom filters are held in blockcache, and will be frequently accessed for various queries, so are likely to remain resident even if the index or data blocks of the storefiles in question were evicted.

    Once you update a value then the most recent version will be held in MemStore until flush. If the value is updated with an Increment again before the flush, read-update-replace then happens in memory. Increments do not add to the size of the MemStore so if you are only incrementing existing values then you will not trigger flushing. (Here however we can make improvements to HBase to reduce unnecessary overheads in the current code. There are a couple of jiras open to this effect. For example to my understanding Facebook runs with a local patch that introduces mutable KeyValues for the MemStore, eliminating some unnecessary data structure management overheads.)

    Unless you explicitly ask HBase to do otherwise -- by .setWriteToWAL(false) -- then there will be a write to and sync of the write ahead log (WAL) for every commit, every Increment. The RegionServer's RPC handler for the current client operation will block until HDFS acknowledges the write at all DataNodes in the write pipeline, typically configured for 3 replicas. Note that here HBase will do per-row group commit, so you can amortize this cost over multiple updates if you can group them. Put values that are often updated together into the same row.

    Also note that HBase users typically run on top of a HDFS that has been patched with HDFS-895, so HDFS can concurrently service new WAL writes while syncs of others are in progress.

    <wizard>
    Furthermore, the WAL is by default configured to sync at every commit but can also be configured to sync after N commits, e.g. 100 or 1000; or after S seconds, e.g. 1 or 10; whichever comes first, according to the loss window (upon RegionServer failure) your use case can tolerate. So in addition to taking advantage of group commit you can amortize sync overhead further with the tradeoff that under failure conditions your counters (or other data) may become imprecise. For some use cases that is fine.
    </wizard>

    - Andy


    --- On Sat, 6/18/11, Claudio Martella wrote:
    From: Claudio Martella <claudio.martella@tis.bz.it>
    Subject: on the impact of incremental counters
    To: user@hbase.apache.org
    Date: Saturday, June 18, 2011, 9:00 AM
    Hello list,

    I was a few days ago at SIGMOD and was happy to attend
    Facebook's talk on HBase.

    As I could understand their workflow makes heavy use of incremental
    couters for analytics and so is mine. For what I understand the cost of
    incrementing a counter is 2 * N + 1 IOPS, where N is the number of
    sequence files over which my dataset is spread, 2 because you have seek
    AND read and the final 1 comes from the write to the append-log.
    As that looks like an expensive operation, I was guessing if I was
    missing something and what are the strategies to alleviate such a cost
    (a part of bloom filters).

    Thanks!

    Claudio

    --

    Claudio Martella
    Digital Technologies
    Unit Research & Development - Analyst

    TIS innovation park
    Via Siemens 19 | Siemensstr. 19
    39100 Bolzano | 39100 Bozen
    Tel. +39 0471 068 123
    Fax  +39 0471 068 129
    claudio.martella@tis.bz.it
    http://www.tis.bz.it

    Short information regarding use of personal data. According
    to Section 13 of Italian Legislative Decree no. 196 of 30
    June 2003, we inform you that we process your personal data
    in order to fulfil contractual and fiscal obligations and
    also to send you information regarding our services and
    events. Your personal data are processed with and without
    electronic means and by respecting data subjects' rights,
    fundamental freedoms and dignity, particularly with regard
    to confidentiality, personal identity and the right to
    personal data protection. At any time and without
    formalities you can write an e-mail to privacy@tis.bz.it
    in order to object the processing of your personal data for
    the purpose of sending advertising materials and also to
    exercise the right to access personal data and other rights
    referred to in Section 7 of Decree 196/2003. The data
    controller is TIS Techno Innovation Alto Adige, Siemens
    Street n. 19, Bolzano. You can find the complete information
    on the web site www.tis.bz.it.



  • Claudio Martella at Jun 20, 2011 at 12:59 pm
    This all very much makes sense and matches my current understanding of
    things.

    So, basically it's expensive to increment old data.

    Thanks for the time and the detailed answer.


    On 6/18/11 9:24 PM, Andrew Purtell wrote:
    This is from memory, but I expect someone will chime in if any detail is inaccurate. :-)

    If the blocks containing the values you are updating fit into blockcache then read IOPS are avoided, satisfied from cache, not disk. Evictions from blockcache are done on an LRU basis. (Packing related frequently accessed items into a block via multiple columns and/or appropriate row key construction is therefore advisable.)

    When seeking for a value HBase examines storefiles from the most recent backwards and stops as soon as the required number of versions, including delete markers, have been found. Updates only care about the most recent version. The most recent versions of frequently updated values are likely to be found early in the search; the more frequent, the earlier, the first.

    By enabling bloom filters you can also avoid reads of storefiles known not to contain the value(s) as HBase searches backwards in time. This is because the bloom filters are held in blockcache, and will be frequently accessed for various queries, so are likely to remain resident even if the index or data blocks of the storefiles in question were evicted.

    Once you update a value then the most recent version will be held in MemStore until flush. If the value is updated with an Increment again before the flush, read-update-replace then happens in memory. Increments do not add to the size of the MemStore so if you are only incrementing existing values then you will not trigger flushing. (Here however we can make improvements to HBase to reduce unnecessary overheads in the current code. There are a couple of jiras open to this effect. For example to my understanding Facebook runs with a local patch that introduces mutable KeyValues for the MemStore, eliminating some unnecessary data structure management overheads.)

    Unless you explicitly ask HBase to do otherwise -- by .setWriteToWAL(false) -- then there will be a write to and sync of the write ahead log (WAL) for every commit, every Increment. The RegionServer's RPC handler for the current client operation will block until HDFS acknowledges the write at all DataNodes in the write pipeline, typically configured for 3 replicas. Note that here HBase will do per-row group commit, so you can amortize this cost over multiple updates if you can group them. Put values that are often updated together into the same row.

    Also note that HBase users typically run on top of a HDFS that has been patched with HDFS-895, so HDFS can concurrently service new WAL writes while syncs of others are in progress.

    <wizard>
    Furthermore, the WAL is by default configured to sync at every commit but can also be configured to sync after N commits, e.g. 100 or 1000; or after S seconds, e.g. 1 or 10; whichever comes first, according to the loss window (upon RegionServer failure) your use case can tolerate. So in addition to taking advantage of group commit you can amortize sync overhead further with the tradeoff that under failure conditions your counters (or other data) may become imprecise. For some use cases that is fine.
    </wizard>

    - Andy


    --- On Sat, 6/18/11, Claudio Martella wrote:
    From: Claudio Martella <claudio.martella@tis.bz.it>
    Subject: on the impact of incremental counters
    To: user@hbase.apache.org
    Date: Saturday, June 18, 2011, 9:00 AM
    Hello list,

    I was a few days ago at SIGMOD and was happy to attend
    Facebook's talk on HBase.

    As I could understand their workflow makes heavy use of incremental
    couters for analytics and so is mine. For what I understand the cost of
    incrementing a counter is 2 * N + 1 IOPS, where N is the number of
    sequence files over which my dataset is spread, 2 because you have seek
    AND read and the final 1 comes from the write to the append-log.
    As that looks like an expensive operation, I was guessing if I was
    missing something and what are the strategies to alleviate such a cost
    (a part of bloom filters).

    Thanks!

    Claudio

    --

    Claudio Martella
    Digital Technologies
    Unit Research & Development - Analyst

    TIS innovation park
    Via Siemens 19 | Siemensstr. 19
    39100 Bolzano | 39100 Bozen
    Tel. +39 0471 068 123
    Fax +39 0471 068 129
    claudio.martella@tis.bz.it
    http://www.tis.bz.it

    Short information regarding use of personal data. According
    to Section 13 of Italian Legislative Decree no. 196 of 30
    June 2003, we inform you that we process your personal data
    in order to fulfil contractual and fiscal obligations and
    also to send you information regarding our services and
    events. Your personal data are processed with and without
    electronic means and by respecting data subjects' rights,
    fundamental freedoms and dignity, particularly with regard
    to confidentiality, personal identity and the right to
    personal data protection. At any time and without
    formalities you can write an e-mail to privacy@tis.bz.it
    in order to object the processing of your personal data for
    the purpose of sending advertising materials and also to
    exercise the right to access personal data and other rights
    referred to in Section 7 of Decree 196/2003. The data
    controller is TIS Techno Innovation Alto Adige, Siemens
    Street n. 19, Bolzano. You can find the complete information
    on the web site www.tis.bz.it.









    --
    Claudio Martella
    Digital Technologies
    Unit Research & Development - Analyst

    TIS innovation park
    Via Siemens 19 | Siemensstr. 19
    39100 Bolzano | 39100 Bozen
    Tel. +39 0471 068 123
    Fax +39 0471 068 129
    claudio.martella@tis.bz.it http://www.tis.bz.it

    Short information regarding use of personal data. According to Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order to fulfil contractual and fiscal obligations and also to send you information regarding our services and events. Your personal data are processed with and without electronic means and by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard to confidentiality, personal identity and the right to personal data protection. At any time and without formalities you can write an e-mail to privacy@tis.bz.it in order to object the processing of your personal data for the purpose of sending advertising materials and also to exercise the right to access personal data and other rights referred to in Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete information on the web site www.tis.bz.it.
  • Andrew Purtell at Jun 20, 2011 at 3:14 pm

    From: Claudio Martella <claudio.martella@tis.bz.it>
    So, basically it's expensive to increment old data.
    HBase employs a buffer hierarchy to make updating a working set that can fit in RAM reasonably efficient. (But like I said there are some things remaining we can improve in terms of internal data structure management.)

    If you are updating a working set that does not fit in RAM or infrequently such that the value is not maintained in cache, then HBase has to go to disk and we move from the order of memory access to the order of disk access.

    It will obviously be more expensive to increment old data than newer, but I'm not sure I understand what you are getting at. Any data management system with a buffer hierarchy has this behavior.

    Compared to what?

    - Andy
  • Joey Echeverria at Jun 20, 2011 at 3:24 pm
    Is there any reason why the increment has to actually happen on
    insert? Couldn't an "increment record" be kept, and then the actual
    increment operation be performed lazily, on reads and compactions?

    -Joey
    On Mon, Jun 20, 2011 at 11:14 AM, Andrew Purtell wrote:
    From: Claudio Martella <claudio.martella@tis.bz.it>
    So, basically it's expensive to increment old data.
    HBase employs a buffer hierarchy to make updating a working set that can fit in RAM reasonably efficient. (But like I said there are some things remaining we can improve in terms of internal data structure management.)

    If you are updating a working set that does not fit in RAM or infrequently such that the value is not maintained in cache, then HBase has to go to disk and we move from the order of memory access to the order of disk access.

    It will obviously be more expensive to increment old data than newer, but I'm not sure I understand what you are getting at. Any data management system with a buffer hierarchy has this behavior.

    Compared to what?

    - Andy


    --
    Joseph Echeverria
    Cloudera, Inc.
    443.305.9434
  • Ted Yu at Jun 20, 2011 at 3:37 pm
    I think Dhruba did try the approach Joey mentioned.
    On Mon, Jun 20, 2011 at 8:23 AM, Joey Echeverria wrote:

    Is there any reason why the increment has to actually happen on
    insert? Couldn't an "increment record" be kept, and then the actual
    increment operation be performed lazily, on reads and compactions?

    -Joey
    On Mon, Jun 20, 2011 at 11:14 AM, Andrew Purtell wrote:
    From: Claudio Martella <claudio.martella@tis.bz.it>
    So, basically it's expensive to increment old data.
    HBase employs a buffer hierarchy to make updating a working set that can
    fit in RAM reasonably efficient. (But like I said there are some things
    remaining we can improve in terms of internal data structure management.)
    If you are updating a working set that does not fit in RAM or
    infrequently such that the value is not maintained in cache, then HBase has
    to go to disk and we move from the order of memory access to the order of
    disk access.
    It will obviously be more expensive to increment old data than newer, but
    I'm not sure I understand what you are getting at. Any data management
    system with a buffer hierarchy has this behavior.
    Compared to what?

    - Andy


    --
    Joseph Echeverria
    Cloudera, Inc.
    443.305.9434
  • Ted Dunning at Jun 20, 2011 at 3:51 pm
    Lazy increment on read causes the read to be expensive. That might be a win
    if the work load has lots of data that is never read.

    This could be a good idea on average because my impression is that increment
    is usually used for metric sorts of data which are often only read in detail
    in diagnostic post mortem use cases.
    On Mon, Jun 20, 2011 at 3:23 PM, Joey Echeverria wrote:

    Is there any reason why the increment has to actually happen on
    insert? Couldn't an "increment record" be kept, and then the actual
    increment operation be performed lazily, on reads and compactions?

    -Joey
    On Mon, Jun 20, 2011 at 11:14 AM, Andrew Purtell wrote:
    From: Claudio Martella <claudio.martella@tis.bz.it>
    So, basically it's expensive to increment old data.
    HBase employs a buffer hierarchy to make updating a working set that can
    fit in RAM reasonably efficient. (But like I said there are some things
    remaining we can improve in terms of internal data structure management.)
    If you are updating a working set that does not fit in RAM or
    infrequently such that the value is not maintained in cache, then HBase has
    to go to disk and we move from the order of memory access to the order of
    disk access.
    It will obviously be more expensive to increment old data than newer, but
    I'm not sure I understand what you are getting at. Any data management
    system with a buffer hierarchy has this behavior.
    Compared to what?

    - Andy


    --
    Joseph Echeverria
    Cloudera, Inc.
    443.305.9434
  • Joe Pallas at Jun 20, 2011 at 5:51 pm

    On Jun 20, 2011, at 8:50 AM, Ted Dunning wrote:

    Lazy increment on read causes the read to be expensive. That might be a win
    if the work load has lots of data that is never read.

    This could be a good idea on average because my impression is that increment
    is usually used for metric sorts of data which are often only read in detail
    in diagnostic post mortem use cases.
    Just so we're clear, we'd be talking about a new operation, right? Because today's increment returns the incremented value, and some uses (like generating unique values) do require that.

    joe
  • Joey Echeverria at Jun 20, 2011 at 6:03 pm
    Ah, I didn't realize that increment returns the value. Yes, the
    current behavior is required in that case. I was thinking of a use
    case more like the one Ted described, where you're keeping metrics,
    but don't read the values that frequently. Maybe this should be a new
    API call.

    If no one objects, I'll file a JIRA.

    -Joey
    On Mon, Jun 20, 2011 at 1:27 PM, Joe Pallas wrote:
    On Jun 20, 2011, at 8:50 AM, Ted Dunning wrote:

    Lazy increment on read causes the read to be expensive.  That might be a win
    if the work load has lots of data that is never read.

    This could be a good idea on average because my impression is that increment
    is usually used for metric sorts of data which are often only read in detail
    in diagnostic post mortem use cases.
    Just so we're clear, we'd be talking about a new operation, right?  Because today's increment returns the incremented value, and some uses (like generating unique values) do require that.

    joe


    --
    Joseph Echeverria
    Cloudera, Inc.
    443.305.9434
  • Jeff Whiting at Jun 20, 2011 at 6:29 pm
    I think it is really split on how people are using it. I agree that for some there is a increment
    and forget until I run an infrequent analysis. While others increment and read the value very
    often. While we do both our most frequent use is that of reading the value very often. If changes
    are made to the API lets make sure both use cases are considered and not just the "increment and
    forget."

    ~Jeff
    On 6/20/2011 12:03 PM, Joey Echeverria wrote:
    Ah, I didn't realize that increment returns the value. Yes, the
    current behavior is required in that case. I was thinking of a use
    case more like the one Ted described, where you're keeping metrics,
    but don't read the values that frequently. Maybe this should be a new
    API call.

    If no one objects, I'll file a JIRA.

    -Joey

    On Mon, Jun 20, 2011 at 1:27 PM, Joe Pallaswrote:
    On Jun 20, 2011, at 8:50 AM, Ted Dunning wrote:

    Lazy increment on read causes the read to be expensive. That might be a win
    if the work load has lots of data that is never read.

    This could be a good idea on average because my impression is that increment
    is usually used for metric sorts of data which are often only read in detail
    in diagnostic post mortem use cases.
    Just so we're clear, we'd be talking about a new operation, right? Because today's increment returns the incremented value, and some uses (like generating unique values) do require that.

    joe
    --
    Jeff Whiting
    Qualtrics Senior Software Engineer
    jeffw@qualtrics.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedJun 18, '11 at 4:01p
activeJun 20, '11 at 6:29p
posts10
users7
websitehbase.apache.org

People

Translate

site design / logo © 2018 Grokbase