FAQ
[hbase] Data loss if Exception happens between snapshot and flush to disk.
--------------------------------------------------------------------------

Key: HADOOP-1903
URL: https://issues.apache.org/jira/browse/HADOOP-1903
Project: Hadoop
Issue Type: Bug
Reporter: stack


There exists a little window during which we can lose data. During a memcache flush, we make an inmemory copy, a 'snapshot'. The memcache is then zeroed and off we go again taking updates. Meantime, in background we are supposed to flush the snapshot to disk. If this process is interrupted -- e.g. the HDFS is yanked from under us or if an OOME occurs in this thread -- then the content of the snapshot is lost.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • stack (JIRA) at Sep 15, 2007 at 3:31 am
    [ https://issues.apache.org/jira/browse/HADOOP-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527687 ]

    stack commented on HADOOP-1903:
    -------------------------------

    On second take, if the interruption causes a regionserver crash, there should be no data loss; memaches is backed upin the hlog. But if we manage to keep going, then we'll be in a strange situation where the snapshot has been 'lost' -- though backed up in hlog -- because regionserver serves out of memcache+snapshot histories and from flushes, the store HStoreFiles.
    [hbase] Data loss if Exception happens between snapshot and flush to disk.
    --------------------------------------------------------------------------

    Key: HADOOP-1903
    URL: https://issues.apache.org/jira/browse/HADOOP-1903
    Project: Hadoop
    Issue Type: Bug
    Reporter: stack

    There exists a little window during which we can lose data. During a memcache flush, we make an inmemory copy, a 'snapshot'. The memcache is then zeroed and off we go again taking updates. Meantime, in background we are supposed to flush the snapshot to disk. If this process is interrupted -- e.g. the HDFS is yanked from under us or if an OOME occurs in this thread -- then the content of the snapshot is lost.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Sep 15, 2007 at 3:33 am
    [ https://issues.apache.org/jira/browse/HADOOP-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    stack updated HADOOP-1903:
    --------------------------

    Assignee: stack
    Priority: Minor (was: Major)
    Summary: [hbase] Possible data loss if Exception happens between snapshot and flush to disk. (was: [hbase] Data loss if Exception happens between snapshot and flush to disk.)
    [hbase] Possible data loss if Exception happens between snapshot and flush to disk.
    -----------------------------------------------------------------------------------

    Key: HADOOP-1903
    URL: https://issues.apache.org/jira/browse/HADOOP-1903
    Project: Hadoop
    Issue Type: Bug
    Reporter: stack
    Assignee: stack
    Priority: Minor

    There exists a little window during which we can lose data. During a memcache flush, we make an inmemory copy, a 'snapshot'. The memcache is then zeroed and off we go again taking updates. Meantime, in background we are supposed to flush the snapshot to disk. If this process is interrupted -- e.g. the HDFS is yanked from under us or if an OOME occurs in this thread -- then the content of the snapshot is lost.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Sep 15, 2007 at 7:31 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    stack updated HADOOP-1903:
    --------------------------

    Attachment: 1903.patch

    If flush is interrupted, force a restart of the HRS so hlog is replayed -- currently this only happens on restarts -- so memcache is refilled with what was in the aborted/dropped snapshot. Otherwise, opportunity for HRS to misrepresent content of its stores.

    HADOOP-1903 Possible data loss if Exception happens between snapshot and flush
    to disk.

    M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HRegionServer.java
    javadoc additions.
    (Flusher.chore): Restart HRS if we get a DroppedSnapshotException.
    M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HLog.java
    (abort): Renamed abortCacheFlush
    (cleanup): Added.
    M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HMaster.java
    Add TODO to comment.
    M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HMemcache.java
    Add comment on state.
    M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HRegion.java
    Add throws javadoc to a couple of methods. Add comments to
    catch of IOException in cache flush.
    [hbase] Possible data loss if Exception happens between snapshot and flush to disk.
    -----------------------------------------------------------------------------------

    Key: HADOOP-1903
    URL: https://issues.apache.org/jira/browse/HADOOP-1903
    Project: Hadoop
    Issue Type: Bug
    Reporter: stack
    Assignee: stack
    Priority: Minor
    Fix For: 0.15.0

    Attachments: 1903.patch


    There exists a little window during which we can lose data. During a memcache flush, we make an inmemory copy, a 'snapshot'. The memcache is then zeroed and off we go again taking updates. Meantime, in background we are supposed to flush the snapshot to disk. If this process is interrupted -- e.g. the HDFS is yanked from under us or if an OOME occurs in this thread -- then the content of the snapshot is lost.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Sep 15, 2007 at 7:31 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    stack updated HADOOP-1903:
    --------------------------

    Fix Version/s: 0.15.0
    Status: Patch Available (was: Open)

    Builds and passes all tests locally. Trying against hudson.
    [hbase] Possible data loss if Exception happens between snapshot and flush to disk.
    -----------------------------------------------------------------------------------

    Key: HADOOP-1903
    URL: https://issues.apache.org/jira/browse/HADOOP-1903
    Project: Hadoop
    Issue Type: Bug
    Reporter: stack
    Assignee: stack
    Priority: Minor
    Fix For: 0.15.0

    Attachments: 1903.patch


    There exists a little window during which we can lose data. During a memcache flush, we make an inmemory copy, a 'snapshot'. The memcache is then zeroed and off we go again taking updates. Meantime, in background we are supposed to flush the snapshot to disk. If this process is interrupted -- e.g. the HDFS is yanked from under us or if an OOME occurs in this thread -- then the content of the snapshot is lost.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Sep 15, 2007 at 8:24 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527778 ]

    Hadoop QA commented on HADOOP-1903:
    -----------------------------------

    +1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12365914/1903.patch
    against trunk revision r575950.

    @author +1. The patch does not contain any @author tags.

    javadoc +1. The javadoc tool did not generate any warning messages.

    javac +1. The applied patch does not generate any new compiler warnings.

    findbugs +1. The patch does not introduce any new Findbugs warnings.

    core tests +1. The patch passed core unit tests.

    contrib tests +1. The patch passed contrib unit tests.

    Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/772/testReport/
    Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/772/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/772/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/772/console

    This message is automatically generated.
    [hbase] Possible data loss if Exception happens between snapshot and flush to disk.
    -----------------------------------------------------------------------------------

    Key: HADOOP-1903
    URL: https://issues.apache.org/jira/browse/HADOOP-1903
    Project: Hadoop
    Issue Type: Bug
    Reporter: stack
    Assignee: stack
    Priority: Minor
    Fix For: 0.15.0

    Attachments: 1903.patch


    There exists a little window during which we can lose data. During a memcache flush, we make an inmemory copy, a 'snapshot'. The memcache is then zeroed and off we go again taking updates. Meantime, in background we are supposed to flush the snapshot to disk. If this process is interrupted -- e.g. the HDFS is yanked from under us or if an OOME occurs in this thread -- then the content of the snapshot is lost.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Sep 15, 2007 at 9:13 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    stack updated HADOOP-1903:
    --------------------------

    Resolution: Fixed
    Status: Resolved (was: Patch Available)

    Committed. Resolving issue.
    [hbase] Possible data loss if Exception happens between snapshot and flush to disk.
    -----------------------------------------------------------------------------------

    Key: HADOOP-1903
    URL: https://issues.apache.org/jira/browse/HADOOP-1903
    Project: Hadoop
    Issue Type: Bug
    Reporter: stack
    Assignee: stack
    Priority: Minor
    Fix For: 0.15.0

    Attachments: 1903.patch


    There exists a little window during which we can lose data. During a memcache flush, we make an inmemory copy, a 'snapshot'. The memcache is then zeroed and off we go again taking updates. Meantime, in background we are supposed to flush the snapshot to disk. If this process is interrupted -- e.g. the HDFS is yanked from under us or if an OOME occurs in this thread -- then the content of the snapshot is lost.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Sep 15, 2007 at 10:54 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    stack updated HADOOP-1903:
    --------------------------

    Component/s: contrib/hbase
    [hbase] Possible data loss if Exception happens between snapshot and flush to disk.
    -----------------------------------------------------------------------------------

    Key: HADOOP-1903
    URL: https://issues.apache.org/jira/browse/HADOOP-1903
    Project: Hadoop
    Issue Type: Bug
    Components: contrib/hbase
    Reporter: stack
    Assignee: stack
    Priority: Minor
    Fix For: 0.15.0

    Attachments: 1903.patch


    There exists a little window during which we can lose data. During a memcache flush, we make an inmemory copy, a 'snapshot'. The memcache is then zeroed and off we go again taking updates. Meantime, in background we are supposed to flush the snapshot to disk. If this process is interrupted -- e.g. the HDFS is yanked from under us or if an OOME occurs in this thread -- then the content of the snapshot is lost.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hudson (JIRA) at Sep 16, 2007 at 12:13 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527858 ]

    Hudson commented on HADOOP-1903:
    --------------------------------

    Integrated in Hadoop-Nightly #239 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/239/])
    [hbase] Possible data loss if Exception happens between snapshot and flush to disk.
    -----------------------------------------------------------------------------------

    Key: HADOOP-1903
    URL: https://issues.apache.org/jira/browse/HADOOP-1903
    Project: Hadoop
    Issue Type: Bug
    Components: contrib/hbase
    Reporter: stack
    Assignee: stack
    Priority: Minor
    Fix For: 0.15.0

    Attachments: 1903.patch


    There exists a little window during which we can lose data. During a memcache flush, we make an inmemory copy, a 'snapshot'. The memcache is then zeroed and off we go again taking updates. Meantime, in background we are supposed to flush the snapshot to disk. If this process is interrupted -- e.g. the HDFS is yanked from under us or if an OOME occurs in this thread -- then the content of the snapshot is lost.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedSep 15, '07 at 1:32a
activeSep 16, '07 at 12:13p
posts9
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Hudson (JIRA): 9 posts

People

Translate

site design / logo © 2022 Grokbase