FAQ
Lars Hofhansl created HBASE-13082:
-------------------------------------

              Summary: Coarsen StoreScanner locks to RegionScanner
                  Key: HBASE-13082
                  URL: https://issues.apache.org/jira/browse/HBASE-13082
              Project: HBase
           Issue Type: Bug
             Reporter: Lars Hofhansl


Continuing where HBASE-10015 left of.
We can avoid locking (and memory fencing) inside StoreScanner by deferring to the lock already held by the RegionScanner.
In tests this shows quite a scan improvement and reduced CPU (the fences make the cores wait for memory fetches).

There are some drawbacks too:
* All calls to RegionScanner need to be remain synchronized
* Implementors of coprocessors need to be diligent in following the locking contract. For example Phoenix does not lock RegionScanner.nextRaw() and required in the documentation (not picking on Phoenix, this one is my fault as I told them it's OK)
* possible starving of flushes and compaction with heavy read load. RegionScanner operations would keep getting the locks and the flushes/compactions would not be able finalize the set of files.

I'll have a patch soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Search Discussions

  • stack (JIRA) at Feb 21, 2015 at 7:06 am
    [ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14330048#comment-14330048 ]

    stack commented on HBASE-13082:
    -------------------------------

    Looking forward to the ride [~larsh]

    Coarsen StoreScanner locks to RegionScanner
    -------------------------------------------

    Key: HBASE-13082
    URL: https://issues.apache.org/jira/browse/HBASE-13082
    Project: HBase
    Issue Type: Bug
    Reporter: Lars Hofhansl

    Continuing where HBASE-10015 left of.
    We can avoid locking (and memory fencing) inside StoreScanner by deferring to the lock already held by the RegionScanner.
    In tests this shows quite a scan improvement and reduced CPU (the fences make the cores wait for memory fetches).
    There are some drawbacks too:
    * All calls to RegionScanner need to be remain synchronized
    * Implementors of coprocessors need to be diligent in following the locking contract. For example Phoenix does not lock RegionScanner.nextRaw() and required in the documentation (not picking on Phoenix, this one is my fault as I told them it's OK)
    * possible starving of flushes and compaction with heavy read load. RegionScanner operations would keep getting the locks and the flushes/compactions would not be able finalize the set of files.
    I'll have a patch soon.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)
  • Lars Hofhansl (JIRA) at Feb 22, 2015 at 12:50 am
    [ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Lars Hofhansl updated HBASE-13082:
    ----------------------------------
         Attachment: 13082.txt

    Simple patch. (much like the patch on HBASE-10015).

    A bit ugly, because I did not want to change the coprocessor APIs.

    Also thought of another drawback:
    * Nobody (coprocessor) can use StoreScanner alone anymore (there'd be nothing to lock), i.e. all access *must* be via Rigi.onScannerImpl or flushes or compactions). I think that's OK.

    Maybe we can make the lock the compactions/flushes and StoreScanner have to agree upon explicit.

    Coarsen StoreScanner locks to RegionScanner
    -------------------------------------------

    Key: HBASE-13082
    URL: https://issues.apache.org/jira/browse/HBASE-13082
    Project: HBase
    Issue Type: Bug
    Reporter: Lars Hofhansl
    Attachments: 13082.txt


    Continuing where HBASE-10015 left of.
    We can avoid locking (and memory fencing) inside StoreScanner by deferring to the lock already held by the RegionScanner.
    In tests this shows quite a scan improvement and reduced CPU (the fences make the cores wait for memory fetches).
    There are some drawbacks too:
    * All calls to RegionScanner need to be remain synchronized
    * Implementors of coprocessors need to be diligent in following the locking contract. For example Phoenix does not lock RegionScanner.nextRaw() and required in the documentation (not picking on Phoenix, this one is my fault as I told them it's OK)
    * possible starving of flushes and compaction with heavy read load. RegionScanner operations would keep getting the locks and the flushes/compactions would not be able finalize the set of files.
    I'll have a patch soon.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)
  • stack (JIRA) at Feb 22, 2015 at 2:13 am
    [ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14331989#comment-14331989 ]

    stack commented on HBASE-13082:
    -------------------------------

    How does it work?

    We create a RegionScanner per Scan. If we want to change the files that make up a scan, we need to synchronize on 'this'. Calls to next, close, etc., have synchronized on 'this' so the change of files can only be done when not nexting or closing?

    There may be many outstanding region scans going on. For the completion of file updates to complete, all outstanding scanners will need to reach a 'safe point', i.e. post a next or close call (I suppose close don't count ... so just next).... A big row could take a while to return...

    Sounds good to me.

    Its kinda dirty passing lock from RegionScanner down from high level for use at StoreScanner scope but hey, whatever works. Why were we not able to do this at the StoreScanner scope again?

    On compaction and flush being delayed, for flush, we will be in the commit phase when we are trying to swap in the new file -- post flush from memory but the snapshot of the memstore is still around and being read from -- so we could hold on to memory pressure a little longer. For compaction, yeah, could be reading from many files for a while longer though the compaction finished.

    You done any testing. I like the bit where we remove the locks at Store level.
    Coarsen StoreScanner locks to RegionScanner
    -------------------------------------------

    Key: HBASE-13082
    URL: https://issues.apache.org/jira/browse/HBASE-13082
    Project: HBase
    Issue Type: Bug
    Reporter: Lars Hofhansl
    Attachments: 13082.txt


    Continuing where HBASE-10015 left of.
    We can avoid locking (and memory fencing) inside StoreScanner by deferring to the lock already held by the RegionScanner.
    In tests this shows quite a scan improvement and reduced CPU (the fences make the cores wait for memory fetches).
    There are some drawbacks too:
    * All calls to RegionScanner need to be remain synchronized
    * Implementors of coprocessors need to be diligent in following the locking contract. For example Phoenix does not lock RegionScanner.nextRaw() and required in the documentation (not picking on Phoenix, this one is my fault as I told them it's OK)
    * possible starving of flushes and compaction with heavy read load. RegionScanner operations would keep getting the locks and the flushes/compactions would not be able finalize the set of files.
    I'll have a patch soon.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)
  • Lars Hofhansl (JIRA) at Feb 22, 2015 at 5:10 am
    [ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332025#comment-14332025 ]

    Lars Hofhansl commented on HBASE-13082:
    ---------------------------------------

    Quick note why this works:
    # StoreScanner is passed an explicit object to sync on in updateReaders (it does not care what this object is, just that it needs to sync on it).
    # We the RegionScannerImpl object down as the "sync" object
    # All operations that call any StoreScanner method are synchronized already http://github.com/Xfennec/cvon RegionScannerImpl (except for nextRaw, but that requires the caller to do the locking himself)
    # Now any region scanner operation will prevent the readers from being updated

    #4 is much coarser than locking at the StoreScanner object - StoreScanner.peek is by far the worst, as it is called all over the place. There is no way in StoreScanner (that I see) that avoids locking every single operation (causing a memory fence, read and write barrier in this case). As said above, the lock is almost never contended, the problem are the memory fences, which *kill* multi core performance.

    It leads to the caveat listed above. Very heavy read load can essentially prevent flushes or compaction from finishing.
    But note that this is *already* the case, it is just currently more likely that the flush/compaction will get through, because the locks are more fine grained. Checkout StoreScanner.next(List<Cell>), it already holds a lock for the entire duration of the row fetch. This patch coarsens that to the Scan's batch and up the region. So reads on other stores can lock out flushes/compactions of a store.
    Also note that compactions usually run a long time, and only need the lock once to switch the readers around, same for flushes. Need to do testing but I doubt it's an issue.

    Fair locking can help here, but comes with other issues.

    I've done local node testing. (local single node HDFS cluster, running single node HBase on top)

    Let me know if the patch is clear. If not, what do I need to change? Worth doing?

    Coarsen StoreScanner locks to RegionScanner
    -------------------------------------------

    Key: HBASE-13082
    URL: https://issues.apache.org/jira/browse/HBASE-13082
    Project: HBase
    Issue Type: Bug
    Reporter: Lars Hofhansl
    Attachments: 13082.txt


    Continuing where HBASE-10015 left of.
    We can avoid locking (and memory fencing) inside StoreScanner by deferring to the lock already held by the RegionScanner.
    In tests this shows quite a scan improvement and reduced CPU (the fences make the cores wait for memory fetches).
    There are some drawbacks too:
    * All calls to RegionScanner need to be remain synchronized
    * Implementors of coprocessors need to be diligent in following the locking contract. For example Phoenix does not lock RegionScanner.nextRaw() and required in the documentation (not picking on Phoenix, this one is my fault as I told them it's OK)
    * possible starving of flushes and compaction with heavy read load. RegionScanner operations would keep getting the locks and the flushes/compactions would not be able finalize the set of files.
    I'll have a patch soon.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)
  • Lars Hofhansl (JIRA) at Feb 22, 2015 at 5:11 am
    [ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332025#comment-14332025 ]

    Lars Hofhansl edited comment on HBASE-13082 at 2/22/15 5:10 AM:
    ----------------------------------------------------------------

    Quick note why this works:
    # StoreScanner is passed an explicit object to sync on in updateReaders (it does not care what this object is, just that it needs to sync on it).
    # We pass the RegionScannerImpl object down as the "sync" object
    # All operations that call any StoreScanner method are synchronized already on RegionScannerImpl (except for nextRaw, but that requires the caller to do the locking himself)
    # Now any region scanner operation will prevent the readers from being updated

    #4 is much coarser than locking at the StoreScanner object - StoreScanner.peek is by far the worst, as it is called all over the place. There is no way in StoreScanner (that I see) that avoids locking every single operation (causing a memory fence, read and write barrier in this case). As said above, the lock is almost never contended, the problem are the memory fences, which *kill* multi core performance.

    It leads to the caveat listed above. Very heavy read load can essentially prevent flushes or compaction from finishing.
    But note that this is *already* the case, it is just currently more likely that the flush/compaction will get through, because the locks are more fine grained. Checkout StoreScanner.next(List<Cell>), it already holds a lock for the entire duration of the row fetch. This patch coarsens that to the Scan's batch and up the region. So reads on other stores can lock out flushes/compactions of a store.
    Also note that compactions usually run a long time, and only need the lock once to switch the readers around, same for flushes. Need to do testing but I doubt it's an issue.

    Fair locking can help here, but comes with other issues.

    I've done local node testing. (local single node HDFS cluster, running single node HBase on top)

    Let me know if the patch is clear. If not, what do I need to change? Worth doing?



    was (Author: lhofhansl):
    Quick note why this works:
    # StoreScanner is passed an explicit object to sync on in updateReaders (it does not care what this object is, just that it needs to sync on it).
    # We the RegionScannerImpl object down as the "sync" object
    # All operations that call any StoreScanner method are synchronized already http://github.com/Xfennec/cvon RegionScannerImpl (except for nextRaw, but that requires the caller to do the locking himself)
    # Now any region scanner operation will prevent the readers from being updated

    #4 is much coarser than locking at the StoreScanner object - StoreScanner.peek is by far the worst, as it is called all over the place. There is no way in StoreScanner (that I see) that avoids locking every single operation (causing a memory fence, read and write barrier in this case). As said above, the lock is almost never contended, the problem are the memory fences, which *kill* multi core performance.

    It leads to the caveat listed above. Very heavy read load can essentially prevent flushes or compaction from finishing.
    But note that this is *already* the case, it is just currently more likely that the flush/compaction will get through, because the locks are more fine grained. Checkout StoreScanner.next(List<Cell>), it already holds a lock for the entire duration of the row fetch. This patch coarsens that to the Scan's batch and up the region. So reads on other stores can lock out flushes/compactions of a store.
    Also note that compactions usually run a long time, and only need the lock once to switch the readers around, same for flushes. Need to do testing but I doubt it's an issue.

    Fair locking can help here, but comes with other issues.

    I've done local node testing. (local single node HDFS cluster, running single node HBase on top)

    Let me know if the patch is clear. If not, what do I need to change? Worth doing?

    Coarsen StoreScanner locks to RegionScanner
    -------------------------------------------

    Key: HBASE-13082
    URL: https://issues.apache.org/jira/browse/HBASE-13082
    Project: HBase
    Issue Type: Bug
    Reporter: Lars Hofhansl
    Attachments: 13082.txt


    Continuing where HBASE-10015 left of.
    We can avoid locking (and memory fencing) inside StoreScanner by deferring to the lock already held by the RegionScanner.
    In tests this shows quite a scan improvement and reduced CPU (the fences make the cores wait for memory fetches).
    There are some drawbacks too:
    * All calls to RegionScanner need to be remain synchronized
    * Implementors of coprocessors need to be diligent in following the locking contract. For example Phoenix does not lock RegionScanner.nextRaw() and required in the documentation (not picking on Phoenix, this one is my fault as I told them it's OK)
    * possible starving of flushes and compaction with heavy read load. RegionScanner operations would keep getting the locks and the flushes/compactions would not be able finalize the set of files.
    I'll have a patch soon.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)
  • zhangduo (JIRA) at Feb 22, 2015 at 1:00 pm
    [ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332156#comment-14332156 ]

    zhangduo commented on HBASE-13082:
    ----------------------------------

    I think the patch should work. But I wonder the actual performance improvement.
    Since the lock is rarely contended, then it will not effect concurrency, the only problem is fencing caused large memory latency. But scanner usually does I/O(network or local disk), so is this latency really worth caring? This patch is a little hacking I think...
    Let's see the test result.
    Coarsen StoreScanner locks to RegionScanner
    -------------------------------------------

    Key: HBASE-13082
    URL: https://issues.apache.org/jira/browse/HBASE-13082
    Project: HBase
    Issue Type: Bug
    Reporter: Lars Hofhansl
    Attachments: 13082.txt


    Continuing where HBASE-10015 left of.
    We can avoid locking (and memory fencing) inside StoreScanner by deferring to the lock already held by the RegionScanner.
    In tests this shows quite a scan improvement and reduced CPU (the fences make the cores wait for memory fetches).
    There are some drawbacks too:
    * All calls to RegionScanner need to be remain synchronized
    * Implementors of coprocessors need to be diligent in following the locking contract. For example Phoenix does not lock RegionScanner.nextRaw() and required in the documentation (not picking on Phoenix, this one is my fault as I told them it's OK)
    * possible starving of flushes and compaction with heavy read load. RegionScanner operations would keep getting the locks and the flushes/compactions would not be able finalize the set of files.
    I'll have a patch soon.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)
  • Lars Hofhansl (JIRA) at Feb 22, 2015 at 9:44 pm
    [ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332385#comment-14332385 ]

    Lars Hofhansl commented on HBASE-13082:
    ---------------------------------------

    It's not about contention. We take the lock multiple times a row, every time CPU takes out a memory barrier. When the data is in the block cache the saving can be as much as 2x.
    Coarsen StoreScanner locks to RegionScanner
    -------------------------------------------

    Key: HBASE-13082
    URL: https://issues.apache.org/jira/browse/HBASE-13082
    Project: HBase
    Issue Type: Bug
    Reporter: Lars Hofhansl
    Attachments: 13082.txt


    Continuing where HBASE-10015 left of.
    We can avoid locking (and memory fencing) inside StoreScanner by deferring to the lock already held by the RegionScanner.
    In tests this shows quite a scan improvement and reduced CPU (the fences make the cores wait for memory fetches).
    There are some drawbacks too:
    * All calls to RegionScanner need to be remain synchronized
    * Implementors of coprocessors need to be diligent in following the locking contract. For example Phoenix does not lock RegionScanner.nextRaw() and required in the documentation (not picking on Phoenix, this one is my fault as I told them it's OK)
    * possible starving of flushes and compaction with heavy read load. RegionScanner operations would keep getting the locks and the flushes/compactions would not be able finalize the set of files.
    I'll have a patch soon.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)
  • Lars Hofhansl (JIRA) at Feb 22, 2015 at 11:05 pm
    [ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332416#comment-14332416 ]

    Lars Hofhansl commented on HBASE-13082:
    ---------------------------------------

    OK... Actually current number: 50m row table, 1 col, 1 version, 2gb on disk. Local HDFS with local HBase. All fits into the block cache. All filtered at the server.
    With patch: 9.45s
    Without patch: 12.9s

    (so measures HBase's internal friction, and would be a scenario where we scan to calculate an aggregate via Phoenix, etc)

    So not quite 2x, but not bad either.
    Coarsen StoreScanner locks to RegionScanner
    -------------------------------------------

    Key: HBASE-13082
    URL: https://issues.apache.org/jira/browse/HBASE-13082
    Project: HBase
    Issue Type: Bug
    Reporter: Lars Hofhansl
    Attachments: 13082.txt


    Continuing where HBASE-10015 left of.
    We can avoid locking (and memory fencing) inside StoreScanner by deferring to the lock already held by the RegionScanner.
    In tests this shows quite a scan improvement and reduced CPU (the fences make the cores wait for memory fetches).
    There are some drawbacks too:
    * All calls to RegionScanner need to be remain synchronized
    * Implementors of coprocessors need to be diligent in following the locking contract. For example Phoenix does not lock RegionScanner.nextRaw() and required in the documentation (not picking on Phoenix, this one is my fault as I told them it's OK)
    * possible starving of flushes and compaction with heavy read load. RegionScanner operations would keep getting the locks and the flushes/compactions would not be able finalize the set of files.
    I'll have a patch soon.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)
  • Lars Hofhansl (JIRA) at Feb 23, 2015 at 12:25 am
    [ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332447#comment-14332447 ]

    Lars Hofhansl commented on HBASE-13082:
    ---------------------------------------

    Same with FAST_DIFF encoding (768mb on disk). All data in cache:

    With patch: 13.2s
    Without patch: 16.4s

    Data not in cache (read from SSD, OS buffer cache also cleaned, short circuit reads SCR enabled):
    With patch: 14.2s (reading about 51mb/s from disk)
    Without patch: 17.5s (reading about 42mb/s from disk)

    So assuming data locality so that we can do SCR this definitely improves things.

    In all cases we save about 3.3s. So we save about 66ns per row.

    Is it worth the hack? Not sure.

    Coarsen StoreScanner locks to RegionScanner
    -------------------------------------------

    Key: HBASE-13082
    URL: https://issues.apache.org/jira/browse/HBASE-13082
    Project: HBase
    Issue Type: Bug
    Reporter: Lars Hofhansl
    Attachments: 13082.txt


    Continuing where HBASE-10015 left of.
    We can avoid locking (and memory fencing) inside StoreScanner by deferring to the lock already held by the RegionScanner.
    In tests this shows quite a scan improvement and reduced CPU (the fences make the cores wait for memory fetches).
    There are some drawbacks too:
    * All calls to RegionScanner need to be remain synchronized
    * Implementors of coprocessors need to be diligent in following the locking contract. For example Phoenix does not lock RegionScanner.nextRaw() and required in the documentation (not picking on Phoenix, this one is my fault as I told them it's OK)
    * possible starving of flushes and compaction with heavy read load. RegionScanner operations would keep getting the locks and the flushes/compactions would not be able finalize the set of files.
    I'll have a patch soon.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)
  • Lars Hofhansl (JIRA) at Feb 23, 2015 at 1:03 am
    [ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332462#comment-14332462 ]

    Lars Hofhansl commented on HBASE-13082:
    ---------------------------------------

    Tests with multiple client threads (50m rows, FAST_DIFF, all in cache):
    threads||w/ patch||w/o patch||
    2|15|23|
    3|18|30|
    4|22|35|
    5|28|40|
    6|34|45|
    7|38|56|
    8|44|*|
    * some scanners timed out (over 60s)

    Coarsen StoreScanner locks to RegionScanner
    -------------------------------------------

    Key: HBASE-13082
    URL: https://issues.apache.org/jira/browse/HBASE-13082
    Project: HBase
    Issue Type: Bug
    Reporter: Lars Hofhansl
    Attachments: 13082.txt


    Continuing where HBASE-10015 left of.
    We can avoid locking (and memory fencing) inside StoreScanner by deferring to the lock already held by the RegionScanner.
    In tests this shows quite a scan improvement and reduced CPU (the fences make the cores wait for memory fetches).
    There are some drawbacks too:
    * All calls to RegionScanner need to be remain synchronized
    * Implementors of coprocessors need to be diligent in following the locking contract. For example Phoenix does not lock RegionScanner.nextRaw() and required in the documentation (not picking on Phoenix, this one is my fault as I told them it's OK)
    * possible starving of flushes and compaction with heavy read load. RegionScanner operations would keep getting the locks and the flushes/compactions would not be able finalize the set of files.
    I'll have a patch soon.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)
  • stack (JIRA) at Feb 23, 2015 at 1:31 am
    [ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332469#comment-14332469 ]

    stack commented on HBASE-13082:
    -------------------------------

    Benefit looks great to me. Let me try and add some numbers; perf numbers.
    Coarsen StoreScanner locks to RegionScanner
    -------------------------------------------

    Key: HBASE-13082
    URL: https://issues.apache.org/jira/browse/HBASE-13082
    Project: HBase
    Issue Type: Bug
    Reporter: Lars Hofhansl
    Attachments: 13082.txt


    Continuing where HBASE-10015 left of.
    We can avoid locking (and memory fencing) inside StoreScanner by deferring to the lock already held by the RegionScanner.
    In tests this shows quite a scan improvement and reduced CPU (the fences make the cores wait for memory fetches).
    There are some drawbacks too:
    * All calls to RegionScanner need to be remain synchronized
    * Implementors of coprocessors need to be diligent in following the locking contract. For example Phoenix does not lock RegionScanner.nextRaw() and required in the documentation (not picking on Phoenix, this one is my fault as I told them it's OK)
    * possible starving of flushes and compaction with heavy read load. RegionScanner operations would keep getting the locks and the flushes/compactions would not be able finalize the set of files.
    I'll have a patch soon.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)
  • Lars Hofhansl (JIRA) at Feb 23, 2015 at 1:37 am
    [ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332472#comment-14332472 ]

    Lars Hofhansl commented on HBASE-13082:
    ---------------------------------------

    Same without FAST_DIFF:
    threads||w/ patch||w/o patch||
    2|11|18|
    3|14|24|
    4|16|30|
    5|21|34|
    6|24|35|
    7|30|45|
    8|31|*|
    * some scanners timed out

    The numbers were quite variable in both cases.

    (The machine has 4 cores with hyper threads, hence tests up to 8 active handlers)

    Coarsen StoreScanner locks to RegionScanner
    -------------------------------------------

    Key: HBASE-13082
    URL: https://issues.apache.org/jira/browse/HBASE-13082
    Project: HBase
    Issue Type: Bug
    Reporter: Lars Hofhansl
    Attachments: 13082.txt


    Continuing where HBASE-10015 left of.
    We can avoid locking (and memory fencing) inside StoreScanner by deferring to the lock already held by the RegionScanner.
    In tests this shows quite a scan improvement and reduced CPU (the fences make the cores wait for memory fetches).
    There are some drawbacks too:
    * All calls to RegionScanner need to be remain synchronized
    * Implementors of coprocessors need to be diligent in following the locking contract. For example Phoenix does not lock RegionScanner.nextRaw() and required in the documentation (not picking on Phoenix, this one is my fault as I told them it's OK)
    * possible starving of flushes and compaction with heavy read load. RegionScanner operations would keep getting the locks and the flushes/compactions would not be able finalize the set of files.
    I'll have a patch soon.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)
  • Lars Hofhansl (JIRA) at Feb 23, 2015 at 1:48 am
    [ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332476#comment-14332476 ]

    Lars Hofhansl commented on HBASE-13082:
    ---------------------------------------

    So at least we have established that locking in the StoreScanner is bad :)

    I now remember issues we have seen with timerange range scans, where in unlucky circumstances it takes almost 20 minutes to finish scanning a single region (and that time all spent inside a *single* RegionScanner.next() call, as in this case no Cells matched the timerange)
    So that would be 20 minutes(!) during which we would not be able to commit a flush or finish a compaction.

    So now, I do not think that is acceptable. The RegionScanner lock is too coarse. We need something in between. Hmmm....

    Coarsen StoreScanner locks to RegionScanner
    -------------------------------------------

    Key: HBASE-13082
    URL: https://issues.apache.org/jira/browse/HBASE-13082
    Project: HBase
    Issue Type: Bug
    Reporter: Lars Hofhansl
    Attachments: 13082.txt


    Continuing where HBASE-10015 left of.
    We can avoid locking (and memory fencing) inside StoreScanner by deferring to the lock already held by the RegionScanner.
    In tests this shows quite a scan improvement and reduced CPU (the fences make the cores wait for memory fetches).
    There are some drawbacks too:
    * All calls to RegionScanner need to be remain synchronized
    * Implementors of coprocessors need to be diligent in following the locking contract. For example Phoenix does not lock RegionScanner.nextRaw() and required in the documentation (not picking on Phoenix, this one is my fault as I told them it's OK)
    * possible starving of flushes and compaction with heavy read load. RegionScanner operations would keep getting the locks and the flushes/compactions would not be able finalize the set of files.
    I'll have a patch soon.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)
  • Lars Hofhansl (JIRA) at Feb 24, 2015 at 6:24 am
    [ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14334481#comment-14334481 ]

    Lars Hofhansl commented on HBASE-13082:
    ---------------------------------------

    How often do I have to try fix this?! This is the third time :(
    Coarsen StoreScanner locks to RegionScanner
    -------------------------------------------

    Key: HBASE-13082
    URL: https://issues.apache.org/jira/browse/HBASE-13082
    Project: HBase
    Issue Type: Bug
    Reporter: Lars Hofhansl
    Attachments: 13082.txt


    Continuing where HBASE-10015 left of.
    We can avoid locking (and memory fencing) inside StoreScanner by deferring to the lock already held by the RegionScanner.
    In tests this shows quite a scan improvement and reduced CPU (the fences make the cores wait for memory fetches).
    There are some drawbacks too:
    * All calls to RegionScanner need to be remain synchronized
    * Implementors of coprocessors need to be diligent in following the locking contract. For example Phoenix does not lock RegionScanner.nextRaw() and required in the documentation (not picking on Phoenix, this one is my fault as I told them it's OK)
    * possible starving of flushes and compaction with heavy read load. RegionScanner operations would keep getting the locks and the flushes/compactions would not be able finalize the set of files.
    I'll have a patch soon.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)
  • stack (JIRA) at Feb 24, 2015 at 6:36 am
    [ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14334488#comment-14334488 ]

    stack commented on HBASE-13082:
    -------------------------------

    bq. So that would be 20 minutes during which we would not be able to commit a flush or finish a compaction.

    1. In above case, how many column families, and if > 1, how much of the 20minutes was spent in each CF. If CF == 1, then there were probably no flushes nor compactions going on anyways. If CF > 1, were there even any flushes/compactions going on (were they needed)? I'd argue the patch proposed here probably makes the situation no worse when we have a scanner stuck down deep inside an HRegion for 20 minutes at a time.
    2. Ain't a scanner stuck for 20minutes a different issue altogether than the one being solved here? If a scan disappears for 20 minutes trying to pull out a row, can't we do something like the [~jonathan.lawlor] chunking patch only we have it time based? We return a partial -- even if empty -- if scanning for a full minute say?

    The region was probably really big. In hbase 2.0 we want to move to realm where regions are small. This patch is therefore good for 2.0 anyways?

    It would be cool if we could do the lock on a Store-basis, especially given we not can flush at the Store level.


    Coarsen StoreScanner locks to RegionScanner
    -------------------------------------------

    Key: HBASE-13082
    URL: https://issues.apache.org/jira/browse/HBASE-13082
    Project: HBase
    Issue Type: Bug
    Reporter: Lars Hofhansl
    Attachments: 13082.txt


    Continuing where HBASE-10015 left of.
    We can avoid locking (and memory fencing) inside StoreScanner by deferring to the lock already held by the RegionScanner.
    In tests this shows quite a scan improvement and reduced CPU (the fences make the cores wait for memory fetches).
    There are some drawbacks too:
    * All calls to RegionScanner need to be remain synchronized
    * Implementors of coprocessors need to be diligent in following the locking contract. For example Phoenix does not lock RegionScanner.nextRaw() and required in the documentation (not picking on Phoenix, this one is my fault as I told them it's OK)
    * possible starving of flushes and compaction with heavy read load. RegionScanner operations would keep getting the locks and the flushes/compactions would not be able finalize the set of files.
    I'll have a patch soon.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)
  • Lars Hofhansl (JIRA) at Feb 24, 2015 at 6:39 am
    [ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14334491#comment-14334491 ]

    Lars Hofhansl commented on HBASE-13082:
    ---------------------------------------

    Wait. A. Minute... The scenario that I describe above already happens with the current code. StoreScanner.next(List<Cells>) would loop - with the look held - until either we found a row worth of data or exhausted the entire store. If the region has only one CF that could mean the entire region is scanned.

    So my change would not make it much worse, but cause any other stores of the region not be able to flush/compact during that time.

    (I also see what the issue with the comparison is, if the ts of the cell falls before the minStamp of the we seek to the next column... We'll do this over and over again. But that's for a different jira).

    So back to this.
    * Advantage: Much better scan performance, that can be even measure in 25% higher disk read rate. (will try with rotating disks tomorrow)
    * Disadvantage: a slow scan that does not match any cell in *any* store (CF) can prevent *other* stores in the region from flushing/compacting until the slow scan finished.

    Worth doing?

    Coarsen StoreScanner locks to RegionScanner
    -------------------------------------------

    Key: HBASE-13082
    URL: https://issues.apache.org/jira/browse/HBASE-13082
    Project: HBase
    Issue Type: Bug
    Reporter: Lars Hofhansl
    Attachments: 13082.txt


    Continuing where HBASE-10015 left of.
    We can avoid locking (and memory fencing) inside StoreScanner by deferring to the lock already held by the RegionScanner.
    In tests this shows quite a scan improvement and reduced CPU (the fences make the cores wait for memory fetches).
    There are some drawbacks too:
    * All calls to RegionScanner need to be remain synchronized
    * Implementors of coprocessors need to be diligent in following the locking contract. For example Phoenix does not lock RegionScanner.nextRaw() and required in the documentation (not picking on Phoenix, this one is my fault as I told them it's OK)
    * possible starving of flushes and compaction with heavy read load. RegionScanner operations would keep getting the locks and the flushes/compactions would not be able finalize the set of files.
    I'll have a patch soon.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)
  • stack (JIRA) at Feb 24, 2015 at 6:44 am
    [ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14334495#comment-14334495 ]

    stack commented on HBASE-13082:
    -------------------------------

    See my post just before yours [~larsh]. Yes, it is worth doing. As you have argued elsewhere, lets not check millions of times a second for an event that only happens once an hour if that. It would be better if the lock were Store-scoped but can do that after this.
    Coarsen StoreScanner locks to RegionScanner
    -------------------------------------------

    Key: HBASE-13082
    URL: https://issues.apache.org/jira/browse/HBASE-13082
    Project: HBase
    Issue Type: Bug
    Reporter: Lars Hofhansl
    Attachments: 13082.txt


    Continuing where HBASE-10015 left of.
    We can avoid locking (and memory fencing) inside StoreScanner by deferring to the lock already held by the RegionScanner.
    In tests this shows quite a scan improvement and reduced CPU (the fences make the cores wait for memory fetches).
    There are some drawbacks too:
    * All calls to RegionScanner need to be remain synchronized
    * Implementors of coprocessors need to be diligent in following the locking contract. For example Phoenix does not lock RegionScanner.nextRaw() and required in the documentation (not picking on Phoenix, this one is my fault as I told them it's OK)
    * possible starving of flushes and compaction with heavy read load. RegionScanner operations would keep getting the locks and the flushes/compactions would not be able finalize the set of files.
    I'll have a patch soon.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)
  • stack (JIRA) at Feb 24, 2015 at 6:45 am
    [ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14334496#comment-14334496 ]

    stack commented on HBASE-13082:
    -------------------------------

    Oh, you have a test? I can try it over here. Would like to see diff in perf counters.
    Coarsen StoreScanner locks to RegionScanner
    -------------------------------------------

    Key: HBASE-13082
    URL: https://issues.apache.org/jira/browse/HBASE-13082
    Project: HBase
    Issue Type: Bug
    Reporter: Lars Hofhansl
    Attachments: 13082.txt


    Continuing where HBASE-10015 left of.
    We can avoid locking (and memory fencing) inside StoreScanner by deferring to the lock already held by the RegionScanner.
    In tests this shows quite a scan improvement and reduced CPU (the fences make the cores wait for memory fetches).
    There are some drawbacks too:
    * All calls to RegionScanner need to be remain synchronized
    * Implementors of coprocessors need to be diligent in following the locking contract. For example Phoenix does not lock RegionScanner.nextRaw() and required in the documentation (not picking on Phoenix, this one is my fault as I told them it's OK)
    * possible starving of flushes and compaction with heavy read load. RegionScanner operations would keep getting the locks and the flushes/compactions would not be able finalize the set of files.
    I'll have a patch soon.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)
  • Lars Hofhansl (JIRA) at Feb 24, 2015 at 6:56 am
    [ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14334504#comment-14334504 ]

    Lars Hofhansl commented on HBASE-13082:
    ---------------------------------------

    bq. how many column families
    Just one.

    bq. when we have a scanner stuck down deep inside an HRegion for 20 minutes at a time.
    I agree. The scanner must have been stuck at the StoreScanner level anyway (per my analysis above)

    bq. Ain't a scanner stuck for 20minutes a different issue altogether than the one being solved here? If a scan disappears for 20 minutes trying to pull out a row, can't we do something like the Jonathan Lawlor chunking patch only we have it time based? We return a partial – even if empty – if scanning for a full minute say?

    Problem is that in this case the scanner produced no result at all. I suppose we can time in the StoreScanner and return an empty result (just as we would when we exhausted the region). RegionScanner will do the right thing. But then when I do the patch, the RegionScannerImpl also needs to periodically release the lock to give other threads a chance to continue with a flush.

    bq. It would be cool if we could do the lock on a Store-basis, especially given we not can flush at the Store level.
    Where we place lock is not the so much the issue. It's how often we lock and unlock the lock.

    Coarsen StoreScanner locks to RegionScanner
    -------------------------------------------

    Key: HBASE-13082
    URL: https://issues.apache.org/jira/browse/HBASE-13082
    Project: HBase
    Issue Type: Bug
    Reporter: Lars Hofhansl
    Attachments: 13082.txt


    Continuing where HBASE-10015 left of.
    We can avoid locking (and memory fencing) inside StoreScanner by deferring to the lock already held by the RegionScanner.
    In tests this shows quite a scan improvement and reduced CPU (the fences make the cores wait for memory fetches).
    There are some drawbacks too:
    * All calls to RegionScanner need to be remain synchronized
    * Implementors of coprocessors need to be diligent in following the locking contract. For example Phoenix does not lock RegionScanner.nextRaw() and required in the documentation (not picking on Phoenix, this one is my fault as I told them it's OK)
    * possible starving of flushes and compaction with heavy read load. RegionScanner operations would keep getting the locks and the flushes/compactions would not be able finalize the set of files.
    I'll have a patch soon.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)
  • Lars Hofhansl (JIRA) at Feb 24, 2015 at 7:14 am
    [ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Lars Hofhansl updated HBASE-13082:
    ----------------------------------
         Attachment: 13082-test.txt

    I have a pretty hand-rigged tests.
    Here's a simple test in form of a unit test that you can run. (same as the one in HBASE-10015). Fails in the end and reports runtime and std deviation as the error message.
    Coarsen StoreScanner locks to RegionScanner
    -------------------------------------------

    Key: HBASE-13082
    URL: https://issues.apache.org/jira/browse/HBASE-13082
    Project: HBase
    Issue Type: Bug
    Reporter: Lars Hofhansl
    Attachments: 13082-test.txt, 13082.txt


    Continuing where HBASE-10015 left of.
    We can avoid locking (and memory fencing) inside StoreScanner by deferring to the lock already held by the RegionScanner.
    In tests this shows quite a scan improvement and reduced CPU (the fences make the cores wait for memory fetches).
    There are some drawbacks too:
    * All calls to RegionScanner need to be remain synchronized
    * Implementors of coprocessors need to be diligent in following the locking contract. For example Phoenix does not lock RegionScanner.nextRaw() and required in the documentation (not picking on Phoenix, this one is my fault as I told them it's OK)
    * possible starving of flushes and compaction with heavy read load. RegionScanner operations would keep getting the locks and the flushes/compactions would not be able finalize the set of files.
    I'll have a patch soon.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)
  • Andrew Purtell (JIRA) at Feb 24, 2015 at 9:39 pm
    [ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335502#comment-14335502 ]

    Andrew Purtell commented on HBASE-13082:
    ----------------------------------------

    bq. For example Phoenix does not lock RegionScanner.nextRaw() and required in the documentation
    Note, we just fixed that in PHOENIX-1672
    Coarsen StoreScanner locks to RegionScanner
    -------------------------------------------

    Key: HBASE-13082
    URL: https://issues.apache.org/jira/browse/HBASE-13082
    Project: HBase
    Issue Type: Bug
    Reporter: Lars Hofhansl
    Attachments: 13082-test.txt, 13082.txt


    Continuing where HBASE-10015 left of.
    We can avoid locking (and memory fencing) inside StoreScanner by deferring to the lock already held by the RegionScanner.
    In tests this shows quite a scan improvement and reduced CPU (the fences make the cores wait for memory fetches).
    There are some drawbacks too:
    * All calls to RegionScanner need to be remain synchronized
    * Implementors of coprocessors need to be diligent in following the locking contract. For example Phoenix does not lock RegionScanner.nextRaw() and required in the documentation (not picking on Phoenix, this one is my fault as I told them it's OK)
    * possible starving of flushes and compaction with heavy read load. RegionScanner operations would keep getting the locks and the flushes/compactions would not be able finalize the set of files.
    I'll have a patch soon.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)
  • Andrew Purtell (JIRA) at Feb 24, 2015 at 9:47 pm
    [ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335515#comment-14335515 ]

    Andrew Purtell commented on HBASE-13082:
    ----------------------------------------

    bq. Problem is that in this case the scanner produced no result at all. I suppose we can time in the StoreScanner and return an empty result (just as we would when we exhausted the region). RegionScanner will do the right thing.

    I proposed something like this on HBASE-13090
    Coarsen StoreScanner locks to RegionScanner
    -------------------------------------------

    Key: HBASE-13082
    URL: https://issues.apache.org/jira/browse/HBASE-13082
    Project: HBase
    Issue Type: Bug
    Reporter: Lars Hofhansl
    Attachments: 13082-test.txt, 13082.txt


    Continuing where HBASE-10015 left of.
    We can avoid locking (and memory fencing) inside StoreScanner by deferring to the lock already held by the RegionScanner.
    In tests this shows quite a scan improvement and reduced CPU (the fences make the cores wait for memory fetches).
    There are some drawbacks too:
    * All calls to RegionScanner need to be remain synchronized
    * Implementors of coprocessors need to be diligent in following the locking contract. For example Phoenix does not lock RegionScanner.nextRaw() and required in the documentation (not picking on Phoenix, this one is my fault as I told them it's OK)
    * possible starving of flushes and compaction with heavy read load. RegionScanner operations would keep getting the locks and the flushes/compactions would not be able finalize the set of files.
    I'll have a patch soon.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)
  • stack (JIRA) at Feb 24, 2015 at 11:13 pm
    [ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335636#comment-14335636 ]

    stack commented on HBASE-13082:
    -------------------------------

    [~jonathan.lawlor] What you think on above? What you think of keeping a timer. What happens if we return a Result w/ nothing in it? Will scan still proceed?

    Coarsen StoreScanner locks to RegionScanner
    -------------------------------------------

    Key: HBASE-13082
    URL: https://issues.apache.org/jira/browse/HBASE-13082
    Project: HBase
    Issue Type: Bug
    Reporter: Lars Hofhansl
    Attachments: 13082-test.txt, 13082.txt


    Continuing where HBASE-10015 left of.
    We can avoid locking (and memory fencing) inside StoreScanner by deferring to the lock already held by the RegionScanner.
    In tests this shows quite a scan improvement and reduced CPU (the fences make the cores wait for memory fetches).
    There are some drawbacks too:
    * All calls to RegionScanner need to be remain synchronized
    * Implementors of coprocessors need to be diligent in following the locking contract. For example Phoenix does not lock RegionScanner.nextRaw() and required in the documentation (not picking on Phoenix, this one is my fault as I told them it's OK)
    * possible starving of flushes and compaction with heavy read load. RegionScanner operations would keep getting the locks and the flushes/compactions would not be able finalize the set of files.
    I'll have a patch soon.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)
  • Jonathan Lawlor (JIRA) at Feb 25, 2015 at 12:11 am
    [ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335701#comment-14335701 ]

    Jonathan Lawlor commented on HBASE-13082:
    -----------------------------------------

    As it stands, whenever a client scanner receives results back from an RPC it will check its size and caching limits. If neither of those limits have been reached, the scanner assumes that the current region has been exhausted and it will try to change regions. Thus, if we return an empty Result (i.e. a Result with no cells in it) back to the scanner, it will see that neither limit has been reached and will try to change the region. Also, the client side Scanner will blindly add that result to the client side cache. The problem there is that at some point, the user would call ClientScanner#next() and receive a blank Result.

    In the case of HBASE-11544 (rpc chunking), I prevent the region change in the event that partial results are returned because it means that there are still Results left in the region. As [~apurtell] has pointed out in HBASE-13090, a similar concept could be borrowed here -- if the Result that is returned is flagged as a timer_heartbeat Result (or something along those lines), skip the region change and continue to make RPC's until we have exhausted the region (when regions are completely exhausted, the RPC will return 0 Results to the client). This would likely require some sort of flag in the Result data structure so that the ClientScanner understands that it should continue to scan against the current region.


    Coarsen StoreScanner locks to RegionScanner
    -------------------------------------------

    Key: HBASE-13082
    URL: https://issues.apache.org/jira/browse/HBASE-13082
    Project: HBase
    Issue Type: Bug
    Reporter: Lars Hofhansl
    Attachments: 13082-test.txt, 13082.txt


    Continuing where HBASE-10015 left of.
    We can avoid locking (and memory fencing) inside StoreScanner by deferring to the lock already held by the RegionScanner.
    In tests this shows quite a scan improvement and reduced CPU (the fences make the cores wait for memory fetches).
    There are some drawbacks too:
    * All calls to RegionScanner need to be remain synchronized
    * Implementors of coprocessors need to be diligent in following the locking contract. For example Phoenix does not lock RegionScanner.nextRaw() and required in the documentation (not picking on Phoenix, this one is my fault as I told them it's OK)
    * possible starving of flushes and compaction with heavy read load. RegionScanner operations would keep getting the locks and the flushes/compactions would not be able finalize the set of files.
    I'll have a patch soon.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)
  • Lars Hofhansl (JIRA) at Feb 26, 2015 at 6:47 am
    [ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338000#comment-14338000 ]

    Lars Hofhansl commented on HBASE-13082:
    ---------------------------------------

    The StoreScanner.next() loop we can simply exit after some time limit with a empty result but returning true (i.e. more rows expected). That would be the same that happens when we exhaust the region, the region scanner will continue.

    In RegionScanner we could do the same and return a special indicator that the client just ignores (as described above). I guess what's tricky coprocessors that wrap a region scanner (such as Phoenix does). They'd have to honor the protocol and pass the marker results to the client (or at the very least ignore them).

    Let's do that in another jira, though.

    This patch will not make things worse in principle. A store scanner can be stuck exhausting the entire store in a single next(...) call while holding the lock, prevent flushes from finishing. See extremely long scan times we've seen have other reasons too - see HBASE-13109.
    The only detriment this patch can cause is that one store scanner is stuck this way, and now prevent other stores in the region from flushing/compacting. (and note that that is only the case when no Cells in the store are returned by the store scanner).

    Coarsen StoreScanner locks to RegionScanner
    -------------------------------------------

    Key: HBASE-13082
    URL: https://issues.apache.org/jira/browse/HBASE-13082
    Project: HBase
    Issue Type: Bug
    Reporter: Lars Hofhansl
    Attachments: 13082-test.txt, 13082.txt


    Continuing where HBASE-10015 left of.
    We can avoid locking (and memory fencing) inside StoreScanner by deferring to the lock already held by the RegionScanner.
    In tests this shows quite a scan improvement and reduced CPU (the fences make the cores wait for memory fetches).
    There are some drawbacks too:
    * All calls to RegionScanner need to be remain synchronized
    * Implementors of coprocessors need to be diligent in following the locking contract. For example Phoenix does not lock RegionScanner.nextRaw() and required in the documentation (not picking on Phoenix, this one is my fault as I told them it's OK)
    * possible starving of flushes and compaction with heavy read load. RegionScanner operations would keep getting the locks and the flushes/compactions would not be able finalize the set of files.
    I'll have a patch soon.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)
  • Lars Hofhansl (JIRA) at Feb 27, 2015 at 6:25 am
    [ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339788#comment-14339788 ]

    Lars Hofhansl commented on HBASE-13082:
    ---------------------------------------

    We have three problems:
    # StoreScanner is locked too often
    # If StoreScanner.next(List<Cell>) does not find any Cells (for example if they do not match timerange or filter) it will exhaust the entire store while holding the lock, preventing flushes/compactions from finishing
    # Client can timeout even though the server is still working, because the server does not currently indicate that it is working but just not returning anything.

    This patch is for #1. We can fix #2 in many cases by just returning an empty result after some number of iterations - but we *only* do that if we not found any Cells for the current row, otherwise we need to finish the row, i.e. find the next row (which of course could then exhaust the region if we're unlucky).
    But note that the solution for #2 would *clash* with this patch. With this patch it is no longer the lock on StoreScanner that protects it from concurrent flushes, but the synchronized on RegionScannerImpl, and that we cannot easily let without actually returning something back to the client.
    #3 would only work with HBASE-11544 since we still need to be able to guarantee entire rows to the client, but if we break out of the loops because we did not find any Cell after some time we do not know whether we do a whole row or not.

    So in reality all these things look like need to be fixed together. Given that neither #2 nor #3 can be satisfactorily fixed without HBASE-11544, I propose doing a bit more testing on patch, and then committing this here. Then we fix #3 (which would incidentally also fix #2 after HBASE-11544 is in).

    Coarsen StoreScanner locks to RegionScanner
    -------------------------------------------

    Key: HBASE-13082
    URL: https://issues.apache.org/jira/browse/HBASE-13082
    Project: HBase
    Issue Type: Bug
    Reporter: Lars Hofhansl
    Attachments: 13082-test.txt, 13082.txt


    Continuing where HBASE-10015 left of.
    We can avoid locking (and memory fencing) inside StoreScanner by deferring to the lock already held by the RegionScanner.
    In tests this shows quite a scan improvement and reduced CPU (the fences make the cores wait for memory fetches).
    There are some drawbacks too:
    * All calls to RegionScanner need to be remain synchronized
    * Implementors of coprocessors need to be diligent in following the locking contract. For example Phoenix does not lock RegionScanner.nextRaw() and required in the documentation (not picking on Phoenix, this one is my fault as I told them it's OK)
    * possible starving of flushes and compaction with heavy read load. RegionScanner operations would keep getting the locks and the flushes/compactions would not be able finalize the set of files.
    I'll have a patch soon.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)
  • Lars Hofhansl (JIRA) at Feb 27, 2015 at 6:26 am
    [ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Lars Hofhansl updated HBASE-13082:
    ----------------------------------
         Attachment: 13082.txt

    Reattaching patch for testing.
    Coarsen StoreScanner locks to RegionScanner
    -------------------------------------------

    Key: HBASE-13082
    URL: https://issues.apache.org/jira/browse/HBASE-13082
    Project: HBase
    Issue Type: Bug
    Reporter: Lars Hofhansl
    Attachments: 13082-test.txt, 13082.txt, 13082.txt


    Continuing where HBASE-10015 left of.
    We can avoid locking (and memory fencing) inside StoreScanner by deferring to the lock already held by the RegionScanner.
    In tests this shows quite a scan improvement and reduced CPU (the fences make the cores wait for memory fetches).
    There are some drawbacks too:
    * All calls to RegionScanner need to be remain synchronized
    * Implementors of coprocessors need to be diligent in following the locking contract. For example Phoenix does not lock RegionScanner.nextRaw() and required in the documentation (not picking on Phoenix, this one is my fault as I told them it's OK)
    * possible starving of flushes and compaction with heavy read load. RegionScanner operations would keep getting the locks and the flushes/compactions would not be able finalize the set of files.
    I'll have a patch soon.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)
  • Lars Hofhansl (JIRA) at Feb 27, 2015 at 6:26 am
    [ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Lars Hofhansl updated HBASE-13082:
    ----------------------------------
         Status: Patch Available (was: Open)
    Coarsen StoreScanner locks to RegionScanner
    -------------------------------------------

    Key: HBASE-13082
    URL: https://issues.apache.org/jira/browse/HBASE-13082
    Project: HBase
    Issue Type: Bug
    Reporter: Lars Hofhansl
    Attachments: 13082-test.txt, 13082.txt, 13082.txt


    Continuing where HBASE-10015 left of.
    We can avoid locking (and memory fencing) inside StoreScanner by deferring to the lock already held by the RegionScanner.
    In tests this shows quite a scan improvement and reduced CPU (the fences make the cores wait for memory fetches).
    There are some drawbacks too:
    * All calls to RegionScanner need to be remain synchronized
    * Implementors of coprocessors need to be diligent in following the locking contract. For example Phoenix does not lock RegionScanner.nextRaw() and required in the documentation (not picking on Phoenix, this one is my fault as I told them it's OK)
    * possible starving of flushes and compaction with heavy read load. RegionScanner operations would keep getting the locks and the flushes/compactions would not be able finalize the set of files.
    I'll have a patch soon.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)
  • Lars Hofhansl (JIRA) at Feb 27, 2015 at 6:48 am
    [ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339802#comment-14339802 ]

    Lars Hofhansl commented on HBASE-13082:
    ---------------------------------------

    In other tests I find the gain about 41ns per row and per core (you can see how the results drift more with more cores in use above).
    Above I measured 66ns. So scanning 10m rows, we'd save 0.4-0.6s, 100m rows that'd be 4-6s, and with 1bn rows it'd add up to 40-60s.
    (Note that in tests above I scanned about 11m rows/s when using all 8 cores/HTs on my box)

    Coarsen StoreScanner locks to RegionScanner
    -------------------------------------------

    Key: HBASE-13082
    URL: https://issues.apache.org/jira/browse/HBASE-13082
    Project: HBase
    Issue Type: Bug
    Reporter: Lars Hofhansl
    Attachments: 13082-test.txt, 13082.txt, 13082.txt


    Continuing where HBASE-10015 left of.
    We can avoid locking (and memory fencing) inside StoreScanner by deferring to the lock already held by the RegionScanner.
    In tests this shows quite a scan improvement and reduced CPU (the fences make the cores wait for memory fetches).
    There are some drawbacks too:
    * All calls to RegionScanner need to be remain synchronized
    * Implementors of coprocessors need to be diligent in following the locking contract. For example Phoenix does not lock RegionScanner.nextRaw() and required in the documentation (not picking on Phoenix, this one is my fault as I told them it's OK)
    * possible starving of flushes and compaction with heavy read load. RegionScanner operations would keep getting the locks and the flushes/compactions would not be able finalize the set of files.
    I'll have a patch soon.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)
  • Hadoop QA (JIRA) at Feb 27, 2015 at 8:30 am
    [ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339878#comment-14339878 ]

    Hadoop QA commented on HBASE-13082:
    -----------------------------------

    {color:red}-1 overall{color}. Here are the results of testing the latest attachment
       http://issues.apache.org/jira/secure/attachment/12701281/13082.txt
       against master branch at commit 458846ef7b0528cb7952c413694eaf55c5d94342.
       ATTACHMENT ID: 12701281

         {color:green}+1 @author{color}. The patch does not contain any @author tags.

         {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests.
                             Please justify why no new tests are needed for this patch.
                             Also please list what manual steps were performed to verify this patch.
         {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.1 2.5.2 2.6.0)

         {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings.

         {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings.

         {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages.

         {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors

         {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

         {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings.

         {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100

         {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail.

          {color:red}-1 core tests{color}. The patch failed these unit tests:
                            org.apache.hadoop.hbase.regionserver.TestStoreScanner
                       org.apache.hadoop.hbase.util.TestProcessBasedCluster
                       org.apache.hadoop.hbase.mapreduce.TestImportExport
                       org.apache.hadoop.hbase.regionserver.TestAtomicOperation

          {color:red}-1 core zombie tests{color}. There are 2 zombie test(s): at org.apache.hadoop.hbase.namespace.TestNamespaceAuditor.testRegionMerge(TestNamespaceAuditor.java:308)
      at org.apache.hadoop.hbase.TestAcidGuarantees.testMixedAtomicity(TestAcidGuarantees.java:364)

    Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/12994//testReport/
    Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12994//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
    Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12994//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
    Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12994//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
    Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12994//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
    Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12994//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
    Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12994//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
    Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12994//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
    Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12994//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
    Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12994//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
    Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12994//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
    Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12994//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
    Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/12994//artifact/patchprocess/checkstyle-aggregate.html

       Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/12994//console

    This message is automatically generated.
    Coarsen StoreScanner locks to RegionScanner
    -------------------------------------------

    Key: HBASE-13082
    URL: https://issues.apache.org/jira/browse/HBASE-13082
    Project: HBase
    Issue Type: Bug
    Reporter: Lars Hofhansl
    Attachments: 13082-test.txt, 13082.txt, 13082.txt


    Continuing where HBASE-10015 left of.
    We can avoid locking (and memory fencing) inside StoreScanner by deferring to the lock already held by the RegionScanner.
    In tests this shows quite a scan improvement and reduced CPU (the fences make the cores wait for memory fetches).
    There are some drawbacks too:
    * All calls to RegionScanner need to be remain synchronized
    * Implementors of coprocessors need to be diligent in following the locking contract. For example Phoenix does not lock RegionScanner.nextRaw() and required in the documentation (not picking on Phoenix, this one is my fault as I told them it's OK)
    * possible starving of flushes and compaction with heavy read load. RegionScanner operations would keep getting the locks and the flushes/compactions would not be able finalize the set of files.
    I'll have a patch soon.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupissues @
categorieshbase, hadoop
postedFeb 21, '15 at 6:37a
activeFeb 27, '15 at 8:30a
posts31
users1
websitehbase.apache.org

1 user in discussion

Hadoop QA (JIRA): 31 posts

People

Translate

site design / logo © 2021 Grokbase