Grokbase Groups HBase dev August 2009
FAQ
ICV has a subtle race condition only visible under high load
------------------------------------------------------------

Key: HBASE-1740
URL: https://issues.apache.org/jira/browse/HBASE-1740
Project: Hadoop HBase
Issue Type: Bug
Affects Versions: 0.20.0
Reporter: ryan rawson
Fix For: 0.20.0, 0.20.1


ICV demonstrates a race condition under high load. The result is a duplicate KeyValue with the same timestamp, at first in the memcache, and in hfile, then both in hfile. The get/scan code doesnt know which one to read, and picks one arbitrarily. One of the keyvalues is correct, one is incorrect.

What happens at a deeper level:
- we start an ICV
- a snapshot happens and moves the memstore to the snapshot
- the ICV code puts a key-value into memstore that has the same timestamp as a keyvalue in the snapshot.

This is a deep race condition and several attempts to fix it failed in production here at SU. This issue is about a more permanent fix.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • ryan rawson (JIRA) at Aug 4, 2009 at 12:03 am
    [ https://issues.apache.org/jira/browse/HBASE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    ryan rawson updated HBASE-1740:
    -------------------------------

    Attachment: HBASE-1740.patch

    here is a prelim potential fix, but no test updates and probably doesnt compile in the wider codebase (the snippet doesnt have errors).
    ICV has a subtle race condition only visible under high load
    ------------------------------------------------------------

    Key: HBASE-1740
    URL: https://issues.apache.org/jira/browse/HBASE-1740
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: ryan rawson
    Fix For: 0.20.0, 0.20.1

    Attachments: HBASE-1740.patch


    ICV demonstrates a race condition under high load. The result is a duplicate KeyValue with the same timestamp, at first in the memcache, and in hfile, then both in hfile. The get/scan code doesnt know which one to read, and picks one arbitrarily. One of the keyvalues is correct, one is incorrect.
    What happens at a deeper level:
    - we start an ICV
    - a snapshot happens and moves the memstore to the snapshot
    - the ICV code puts a key-value into memstore that has the same timestamp as a keyvalue in the snapshot.
    This is a deep race condition and several attempts to fix it failed in production here at SU. This issue is about a more permanent fix.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Aug 13, 2009 at 3:52 am
    [ https://issues.apache.org/jira/browse/HBASE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    stack updated HBASE-1740:
    -------------------------

    Fix Version/s: (was: 0.20.0)

    Move to 0.20.1
    ICV has a subtle race condition only visible under high load
    ------------------------------------------------------------

    Key: HBASE-1740
    URL: https://issues.apache.org/jira/browse/HBASE-1740
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: ryan rawson
    Fix For: 0.20.1

    Attachments: HBASE-1740.patch


    ICV demonstrates a race condition under high load. The result is a duplicate KeyValue with the same timestamp, at first in the memcache, and in hfile, then both in hfile. The get/scan code doesnt know which one to read, and picks one arbitrarily. One of the keyvalues is correct, one is incorrect.
    What happens at a deeper level:
    - we start an ICV
    - a snapshot happens and moves the memstore to the snapshot
    - the ICV code puts a key-value into memstore that has the same timestamp as a keyvalue in the snapshot.
    This is a deep race condition and several attempts to fix it failed in production here at SU. This issue is about a more permanent fix.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • ryan rawson (JIRA) at Sep 10, 2009 at 11:10 pm
    [ https://issues.apache.org/jira/browse/HBASE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    ryan rawson updated HBASE-1740:
    -------------------------------

    Attachment: HBASE-icv.patch

    here is a patch that works for us. Highly recommended, but also very intrusive. It does do ICV "the right way":
    - log to HLog
    - then make in-ram changes
    - dont end up with duplicate timestamps in memstore and hfiles
    - dont create too many versions

    enjoy
    ICV has a subtle race condition only visible under high load
    ------------------------------------------------------------

    Key: HBASE-1740
    URL: https://issues.apache.org/jira/browse/HBASE-1740
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: ryan rawson
    Fix For: 0.20.1

    Attachments: HBASE-1740.patch, HBASE-icv.patch


    ICV demonstrates a race condition under high load. The result is a duplicate KeyValue with the same timestamp, at first in the memcache, and in hfile, then both in hfile. The get/scan code doesnt know which one to read, and picks one arbitrarily. One of the keyvalues is correct, one is incorrect.
    What happens at a deeper level:
    - we start an ICV
    - a snapshot happens and moves the memstore to the snapshot
    - the ICV code puts a key-value into memstore that has the same timestamp as a keyvalue in the snapshot.
    This is a deep race condition and several attempts to fix it failed in production here at SU. This issue is about a more permanent fix.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • ryan rawson (JIRA) at Sep 10, 2009 at 11:38 pm
    [ https://issues.apache.org/jira/browse/HBASE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    ryan rawson updated HBASE-1740:
    -------------------------------

    Attachment: HBASE-1740-test.patch

    i forgot the tests! oops!
    ICV has a subtle race condition only visible under high load
    ------------------------------------------------------------

    Key: HBASE-1740
    URL: https://issues.apache.org/jira/browse/HBASE-1740
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: ryan rawson
    Assignee: ryan rawson
    Fix For: 0.20.1

    Attachments: HBASE-1740-test.patch, HBASE-1740.patch, HBASE-icv.patch


    ICV demonstrates a race condition under high load. The result is a duplicate KeyValue with the same timestamp, at first in the memcache, and in hfile, then both in hfile. The get/scan code doesnt know which one to read, and picks one arbitrarily. One of the keyvalues is correct, one is incorrect.
    What happens at a deeper level:
    - we start an ICV
    - a snapshot happens and moves the memstore to the snapshot
    - the ICV code puts a key-value into memstore that has the same timestamp as a keyvalue in the snapshot.
    This is a deep race condition and several attempts to fix it failed in production here at SU. This issue is about a more permanent fix.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • ryan rawson (JIRA) at Sep 10, 2009 at 11:38 pm
    [ https://issues.apache.org/jira/browse/HBASE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    ryan rawson reassigned HBASE-1740:
    ----------------------------------

    Assignee: ryan rawson
    ICV has a subtle race condition only visible under high load
    ------------------------------------------------------------

    Key: HBASE-1740
    URL: https://issues.apache.org/jira/browse/HBASE-1740
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: ryan rawson
    Assignee: ryan rawson
    Fix For: 0.20.1

    Attachments: HBASE-1740-test.patch, HBASE-1740.patch, HBASE-icv.patch


    ICV demonstrates a race condition under high load. The result is a duplicate KeyValue with the same timestamp, at first in the memcache, and in hfile, then both in hfile. The get/scan code doesnt know which one to read, and picks one arbitrarily. One of the keyvalues is correct, one is incorrect.
    What happens at a deeper level:
    - we start an ICV
    - a snapshot happens and moves the memstore to the snapshot
    - the ICV code puts a key-value into memstore that has the same timestamp as a keyvalue in the snapshot.
    This is a deep race condition and several attempts to fix it failed in production here at SU. This issue is about a more permanent fix.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Sep 11, 2009 at 9:29 pm
    [ https://issues.apache.org/jira/browse/HBASE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754357#action_12754357 ]

    stack commented on HBASE-1740:
    ------------------------------

    Reviewed and ran unit tests. All pass but the broken ITHBase test. Committed branch and trunk.
    ICV has a subtle race condition only visible under high load
    ------------------------------------------------------------

    Key: HBASE-1740
    URL: https://issues.apache.org/jira/browse/HBASE-1740
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: ryan rawson
    Assignee: ryan rawson
    Fix For: 0.20.1

    Attachments: HBASE-1740-test.patch, HBASE-1740.patch, HBASE-icv.patch


    ICV demonstrates a race condition under high load. The result is a duplicate KeyValue with the same timestamp, at first in the memcache, and in hfile, then both in hfile. The get/scan code doesnt know which one to read, and picks one arbitrarily. One of the keyvalues is correct, one is incorrect.
    What happens at a deeper level:
    - we start an ICV
    - a snapshot happens and moves the memstore to the snapshot
    - the ICV code puts a key-value into memstore that has the same timestamp as a keyvalue in the snapshot.
    This is a deep race condition and several attempts to fix it failed in production here at SU. This issue is about a more permanent fix.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Sep 11, 2009 at 9:48 pm
    [ https://issues.apache.org/jira/browse/HBASE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    stack resolved HBASE-1740.
    --------------------------

    Resolution: Fixed
    Hadoop Flags: [Reviewed]

    Committed branch and trunk. Ran tests. Tests passed but for the ones in contrib currently failing up on hudson.
    ICV has a subtle race condition only visible under high load
    ------------------------------------------------------------

    Key: HBASE-1740
    URL: https://issues.apache.org/jira/browse/HBASE-1740
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: ryan rawson
    Assignee: ryan rawson
    Fix For: 0.20.1

    Attachments: HBASE-1740-test.patch, HBASE-1740.patch, HBASE-icv.patch


    ICV demonstrates a race condition under high load. The result is a duplicate KeyValue with the same timestamp, at first in the memcache, and in hfile, then both in hfile. The get/scan code doesnt know which one to read, and picks one arbitrarily. One of the keyvalues is correct, one is incorrect.
    What happens at a deeper level:
    - we start an ICV
    - a snapshot happens and moves the memstore to the snapshot
    - the ICV code puts a key-value into memstore that has the same timestamp as a keyvalue in the snapshot.
    This is a deep race condition and several attempts to fix it failed in production here at SU. This issue is about a more permanent fix.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieshbase, hadoop
postedAug 4, '09 at 12:01a
activeSep 11, '09 at 9:48p
posts8
users1
websitehbase.apache.org

1 user in discussion

stack (JIRA): 8 posts

People

Translate

site design / logo © 2021 Grokbase