FAQ
David Arthur created ZOOKEEPER-1621:
---------------------------------------

Summary: ZooKeeper does not recover from crash when disk was full
Key: ZOOKEEPER-1621
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
Project: ZooKeeper
Issue Type: Bug
Components: server
Affects Versions: 3.4.3
Environment: Ubuntu 12.04, Amazon EC2 instance
Reporter: David Arthur


The disk that ZooKeeper was using filled up. During a snapshot write, I got the following exception

2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting
java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:282)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)

Then many subsequent exceptions like:

2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was partial.
2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:341)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)


It seems to me that writing the transaction log should be fully atomic to avoid such situations. Is this not the case?



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Search Discussions

  • David Arthur (JIRA) at Jan 16, 2013 at 3:26 pm
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555110#comment-13555110 ]

    David Arthur commented on ZOOKEEPER-1621:
    -----------------------------------------

    I was able to workaround the issue by deleting the partially written snapshot file
    ZooKeeper does not recover from crash when disk was full
    --------------------------------------------------------

    Key: ZOOKEEPER-1621
    URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
    Project: ZooKeeper
    Issue Type: Bug
    Components: server
    Affects Versions: 3.4.3
    Environment: Ubuntu 12.04, Amazon EC2 instance
    Reporter: David Arthur

    The disk that ZooKeeper was using filled up. During a snapshot write, I got the following exception
    2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting
    java.io.IOException: No space left on device
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:282)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
    at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
    at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
    at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
    at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
    Then many subsequent exceptions like:
    2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was partial.
    2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally
    java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
    at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504)
    at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
    at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
    at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
    at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
    at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
    at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
    at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
    at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
    It seems to me that writing the transaction log should be fully atomic to avoid such situations. Is this not the case?
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Flavio Junqueira (JIRA) at Jan 16, 2013 at 4:04 pm
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555158#comment-13555158 ]

    Flavio Junqueira commented on ZOOKEEPER-1621:
    ---------------------------------------------

    I believe the exception is being thrown while reading the snapshot and the partial transaction message is not an indication of what is causing it to crash. It sounds right that we should try a different snapshot, but according to the log messages you posted, it sounds like the problem is that we are not catching EOFException.
    ZooKeeper does not recover from crash when disk was full
    --------------------------------------------------------

    Key: ZOOKEEPER-1621
    URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
    Project: ZooKeeper
    Issue Type: Bug
    Components: server
    Affects Versions: 3.4.3
    Environment: Ubuntu 12.04, Amazon EC2 instance
    Reporter: David Arthur

    The disk that ZooKeeper was using filled up. During a snapshot write, I got the following exception
    2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting
    java.io.IOException: No space left on device
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:282)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
    at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
    at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
    at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
    at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
    Then many subsequent exceptions like:
    2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was partial.
    2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally
    java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
    at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504)
    at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
    at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
    at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
    at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
    at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
    at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
    at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
    at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
    It seems to me that writing the transaction log should be fully atomic to avoid such situations. Is this not the case?
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Mahadev konar (JIRA) at Jan 16, 2013 at 4:18 pm
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Mahadev konar updated ZOOKEEPER-1621:
    -------------------------------------

    Priority: Critical (was: Major)
    ZooKeeper does not recover from crash when disk was full
    --------------------------------------------------------

    Key: ZOOKEEPER-1621
    URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
    Project: ZooKeeper
    Issue Type: Bug
    Components: server
    Affects Versions: 3.4.3
    Environment: Ubuntu 12.04, Amazon EC2 instance
    Reporter: David Arthur
    Priority: Critical
    Fix For: 3.4.6


    The disk that ZooKeeper was using filled up. During a snapshot write, I got the following exception
    2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting
    java.io.IOException: No space left on device
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:282)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
    at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
    at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
    at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
    at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
    Then many subsequent exceptions like:
    2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was partial.
    2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally
    java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
    at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504)
    at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
    at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
    at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
    at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
    at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
    at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
    at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
    at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
    It seems to me that writing the transaction log should be fully atomic to avoid such situations. Is this not the case?
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Mahadev konar (JIRA) at Jan 16, 2013 at 4:18 pm
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Mahadev konar updated ZOOKEEPER-1621:
    -------------------------------------

    Fix Version/s: 3.4.6
    ZooKeeper does not recover from crash when disk was full
    --------------------------------------------------------

    Key: ZOOKEEPER-1621
    URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
    Project: ZooKeeper
    Issue Type: Bug
    Components: server
    Affects Versions: 3.4.3
    Environment: Ubuntu 12.04, Amazon EC2 instance
    Reporter: David Arthur
    Fix For: 3.4.6


    The disk that ZooKeeper was using filled up. During a snapshot write, I got the following exception
    2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting
    java.io.IOException: No space left on device
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:282)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
    at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
    at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
    at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
    at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
    Then many subsequent exceptions like:
    2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was partial.
    2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally
    java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
    at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504)
    at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
    at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
    at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
    at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
    at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
    at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
    at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
    at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
    It seems to me that writing the transaction log should be fully atomic to avoid such situations. Is this not the case?
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Mahadev konar (JIRA) at Jan 16, 2013 at 4:20 pm
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Mahadev konar updated ZOOKEEPER-1621:
    -------------------------------------

    Priority: Major (was: Critical)
    ZooKeeper does not recover from crash when disk was full
    --------------------------------------------------------

    Key: ZOOKEEPER-1621
    URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
    Project: ZooKeeper
    Issue Type: Bug
    Components: server
    Affects Versions: 3.4.3
    Environment: Ubuntu 12.04, Amazon EC2 instance
    Reporter: David Arthur
    Fix For: 3.4.6


    The disk that ZooKeeper was using filled up. During a snapshot write, I got the following exception
    2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting
    java.io.IOException: No space left on device
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:282)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
    at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
    at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
    at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
    at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
    Then many subsequent exceptions like:
    2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was partial.
    2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally
    java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
    at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504)
    at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
    at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
    at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
    at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
    at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
    at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
    at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
    at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
    It seems to me that writing the transaction log should be fully atomic to avoid such situations. Is this not the case?
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Mahadev konar (JIRA) at Jan 16, 2013 at 4:20 pm
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Mahadev konar updated ZOOKEEPER-1621:
    -------------------------------------

    Fix Version/s: (was: 3.4.6)
    3.5.0
    ZooKeeper does not recover from crash when disk was full
    --------------------------------------------------------

    Key: ZOOKEEPER-1621
    URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
    Project: ZooKeeper
    Issue Type: Bug
    Components: server
    Affects Versions: 3.4.3
    Environment: Ubuntu 12.04, Amazon EC2 instance
    Reporter: David Arthur
    Fix For: 3.5.0


    The disk that ZooKeeper was using filled up. During a snapshot write, I got the following exception
    2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting
    java.io.IOException: No space left on device
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:282)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
    at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
    at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
    at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
    at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
    Then many subsequent exceptions like:
    2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was partial.
    2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally
    java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
    at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504)
    at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
    at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
    at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
    at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
    at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
    at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
    at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
    at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
    It seems to me that writing the transaction log should be fully atomic to avoid such situations. Is this not the case?
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Mahadev konar (JIRA) at Jan 16, 2013 at 4:22 pm
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555169#comment-13555169 ]

    Mahadev konar commented on ZOOKEEPER-1621:
    ------------------------------------------

    David,
    So there exceptions are thrown when ZooKeeper is running? Am not sure why its exiting so many times. Do you guys restart the ZK server if it dies?
    ZooKeeper does not recover from crash when disk was full
    --------------------------------------------------------

    Key: ZOOKEEPER-1621
    URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
    Project: ZooKeeper
    Issue Type: Bug
    Components: server
    Affects Versions: 3.4.3
    Environment: Ubuntu 12.04, Amazon EC2 instance
    Reporter: David Arthur
    Fix For: 3.5.0


    The disk that ZooKeeper was using filled up. During a snapshot write, I got the following exception
    2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting
    java.io.IOException: No space left on device
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:282)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
    at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
    at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
    at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
    at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
    Then many subsequent exceptions like:
    2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was partial.
    2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally
    java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
    at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504)
    at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
    at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
    at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
    at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
    at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
    at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
    at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
    at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
    It seems to me that writing the transaction log should be fully atomic to avoid such situations. Is this not the case?
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • David Arthur (JIRA) at Jan 16, 2013 at 4:42 pm
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555189#comment-13555189 ]

    David Arthur commented on ZOOKEEPER-1621:
    -----------------------------------------

    We run ZooKeeper with runit, so yes it is restarted when it dies. It ends up in a loop of:

    * No space left on device
    * Starting server
    * Last transaction was partial
    * Snapshotting: 0x19a3d to /opt/zookeeper-3.4.3/data/version-2/snapshot.19a3d
    * No space left on device
    ZooKeeper does not recover from crash when disk was full
    --------------------------------------------------------

    Key: ZOOKEEPER-1621
    URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
    Project: ZooKeeper
    Issue Type: Bug
    Components: server
    Affects Versions: 3.4.3
    Environment: Ubuntu 12.04, Amazon EC2 instance
    Reporter: David Arthur
    Fix For: 3.5.0


    The disk that ZooKeeper was using filled up. During a snapshot write, I got the following exception
    2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting
    java.io.IOException: No space left on device
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:282)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
    at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
    at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
    at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
    at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
    Then many subsequent exceptions like:
    2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was partial.
    2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally
    java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
    at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504)
    at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
    at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
    at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
    at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
    at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
    at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
    at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
    at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
    It seems to me that writing the transaction log should be fully atomic to avoid such situations. Is this not the case?
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Mahadev konar (JIRA) at Jan 16, 2013 at 4:46 pm
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555192#comment-13555192 ]

    Mahadev konar commented on ZOOKEEPER-1621:
    ------------------------------------------

    David,
    I thought you said it does not recover when disk was full, but looks like the disk is still full? No?
    ZooKeeper does not recover from crash when disk was full
    --------------------------------------------------------

    Key: ZOOKEEPER-1621
    URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
    Project: ZooKeeper
    Issue Type: Bug
    Components: server
    Affects Versions: 3.4.3
    Environment: Ubuntu 12.04, Amazon EC2 instance
    Reporter: David Arthur
    Fix For: 3.5.0


    The disk that ZooKeeper was using filled up. During a snapshot write, I got the following exception
    2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting
    java.io.IOException: No space left on device
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:282)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
    at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
    at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
    at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
    at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
    Then many subsequent exceptions like:
    2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was partial.
    2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally
    java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
    at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504)
    at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
    at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
    at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
    at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
    at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
    at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
    at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
    at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
    It seems to me that writing the transaction log should be fully atomic to avoid such situations. Is this not the case?
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • David Arthur (JIRA) at Jan 16, 2013 at 5:08 pm
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555215#comment-13555215 ]

    David Arthur commented on ZOOKEEPER-1621:
    -----------------------------------------

    Here is the full sequence of events (sorry for the confusion):

    * Noticed disk was full
    * Cleaned up disk space
    * Tried zkCli.sh, got errors
    * Checked ZK log, loop of:

    2013-01-16 15:01:35,194 - ERROR [main:Util@239] - Last transaction was partial.
    2013-01-16 15:01:35,196 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally
    java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
    at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:341)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
    at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
    at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
    at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
    at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
    at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
    at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
    at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)

    * Stopped ZK
    * Listed ZK data directory

    ubuntu@ip-10-78-19-254:/opt/zookeeper-3.4.3/data/version-2$ ls -lat
    total 18096
    drwxr-xr-x 2 zookeeper zookeeper 4096 Jan 16 06:41 .
    -rw-r--r-- 1 zookeeper zookeeper 0 Jan 16 06:41 log.19a3e
    -rw-r--r-- 1 zookeeper zookeeper 585377 Jan 16 06:41 snapshot.19a3d
    -rw-r--r-- 1 zookeeper zookeeper 67108880 Jan 16 03:11 log.19a2a
    -rw-r--r-- 1 zookeeper zookeeper 585911 Jan 16 03:11 snapshot.19a29
    -rw-r--r-- 1 zookeeper zookeeper 67108880 Jan 16 03:11 log.11549
    -rw-r--r-- 1 zookeeper zookeeper 585190 Jan 15 17:28 snapshot.11547
    -rw-r--r-- 1 zookeeper zookeeper 67108880 Jan 15 17:28 log.1
    -rw-r--r-- 1 zookeeper zookeeper 296 Jan 14 16:44 snapshot.0
    drwxr-xr-x 3 zookeeper zookeeper 4096 Jan 14 16:44 ..

    * Removed log.19a3e and snapshot.19a3d

    ubuntu@ip-10-78-19-254:/opt/zookeeper-3.4.3/data/version-2$ sudo rm log.19a3e
    ubuntu@ip-10-78-19-254:/opt/zookeeper-3.4.3/data/version-2$ sudo rm snapshot.19a3d

    * Started ZK
    * Back to normal
    ZooKeeper does not recover from crash when disk was full
    --------------------------------------------------------

    Key: ZOOKEEPER-1621
    URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
    Project: ZooKeeper
    Issue Type: Bug
    Components: server
    Affects Versions: 3.4.3
    Environment: Ubuntu 12.04, Amazon EC2 instance
    Reporter: David Arthur
    Fix For: 3.5.0

    Attachments: zookeeper.log.gz


    The disk that ZooKeeper was using filled up. During a snapshot write, I got the following exception
    2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting
    java.io.IOException: No space left on device
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:282)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
    at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
    at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
    at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
    at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
    Then many subsequent exceptions like:
    2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was partial.
    2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally
    java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
    at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504)
    at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
    at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
    at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
    at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
    at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
    at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
    at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
    at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
    It seems to me that writing the transaction log should be fully atomic to avoid such situations. Is this not the case?
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • David Arthur (JIRA) at Jan 16, 2013 at 5:08 pm
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    David Arthur updated ZOOKEEPER-1621:
    ------------------------------------

    Attachment: zookeeper.log.gz

    Attaching zookeeper.log
    ZooKeeper does not recover from crash when disk was full
    --------------------------------------------------------

    Key: ZOOKEEPER-1621
    URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
    Project: ZooKeeper
    Issue Type: Bug
    Components: server
    Affects Versions: 3.4.3
    Environment: Ubuntu 12.04, Amazon EC2 instance
    Reporter: David Arthur
    Fix For: 3.5.0

    Attachments: zookeeper.log.gz


    The disk that ZooKeeper was using filled up. During a snapshot write, I got the following exception
    2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting
    java.io.IOException: No space left on device
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:282)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
    at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
    at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
    at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
    at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
    Then many subsequent exceptions like:
    2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was partial.
    2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally
    java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
    at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504)
    at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
    at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
    at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
    at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
    at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
    at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
    at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
    at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
    It seems to me that writing the transaction log should be fully atomic to avoid such situations. Is this not the case?
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Edward Ribeiro (JIRA) at Jan 16, 2013 at 5:36 pm
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555243#comment-13555243 ]

    Edward Ribeiro commented on ZOOKEEPER-1621:
    -------------------------------------------

    Hi folks,

    FYI, this issue is a duplication of ZOOKEEPER-1612 (curiously, a permutation of the last two digits, heh). I'd suggest to close 1612 as dup instead, if possible.
    ZooKeeper does not recover from crash when disk was full
    --------------------------------------------------------

    Key: ZOOKEEPER-1621
    URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
    Project: ZooKeeper
    Issue Type: Bug
    Components: server
    Affects Versions: 3.4.3
    Environment: Ubuntu 12.04, Amazon EC2 instance
    Reporter: David Arthur
    Fix For: 3.5.0

    Attachments: zookeeper.log.gz


    The disk that ZooKeeper was using filled up. During a snapshot write, I got the following exception
    2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting
    java.io.IOException: No space left on device
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:282)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
    at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
    at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
    at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
    at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
    Then many subsequent exceptions like:
    2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was partial.
    2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally
    java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
    at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504)
    at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
    at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
    at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
    at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
    at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
    at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
    at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
    at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
    It seems to me that writing the transaction log should be fully atomic to avoid such situations. Is this not the case?
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Mahadev konar (JIRA) at Jan 16, 2013 at 6:38 pm
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555318#comment-13555318 ]

    Mahadev konar commented on ZOOKEEPER-1621:
    ------------------------------------------

    Ill makr 1612 as dup. Thanks for pointing that out Edward.


    ZooKeeper does not recover from crash when disk was full
    --------------------------------------------------------

    Key: ZOOKEEPER-1621
    URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
    Project: ZooKeeper
    Issue Type: Bug
    Components: server
    Affects Versions: 3.4.3
    Environment: Ubuntu 12.04, Amazon EC2 instance
    Reporter: David Arthur
    Fix For: 3.5.0

    Attachments: zookeeper.log.gz


    The disk that ZooKeeper was using filled up. During a snapshot write, I got the following exception
    2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting
    java.io.IOException: No space left on device
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:282)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
    at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
    at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
    at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
    at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
    Then many subsequent exceptions like:
    2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was partial.
    2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally
    java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
    at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504)
    at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
    at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
    at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
    at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
    at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
    at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
    at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
    at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
    It seems to me that writing the transaction log should be fully atomic to avoid such situations. Is this not the case?
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Mahadev konar (JIRA) at Jan 18, 2013 at 7:38 am
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557022#comment-13557022 ]

    Mahadev konar commented on ZOOKEEPER-1621:
    ------------------------------------------

    Looks like the header was incomplete. Unfortunately we do not handle corrupt header but do handle corrupt txn's later. Am suprised that this happened twice in a row for 2 users. Ill upload a patch and test case.
    ZooKeeper does not recover from crash when disk was full
    --------------------------------------------------------

    Key: ZOOKEEPER-1621
    URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
    Project: ZooKeeper
    Issue Type: Bug
    Components: server
    Affects Versions: 3.4.3
    Environment: Ubuntu 12.04, Amazon EC2 instance
    Reporter: David Arthur
    Fix For: 3.5.0

    Attachments: zookeeper.log.gz


    The disk that ZooKeeper was using filled up. During a snapshot write, I got the following exception
    2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting
    java.io.IOException: No space left on device
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:282)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
    at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
    at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
    at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
    at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
    Then many subsequent exceptions like:
    2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was partial.
    2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally
    java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
    at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504)
    at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
    at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
    at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
    at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
    at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
    at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
    at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
    at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
    It seems to me that writing the transaction log should be fully atomic to avoid such situations. Is this not the case?
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Mahadev konar (JIRA) at Jan 18, 2013 at 7:38 am
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Mahadev konar updated ZOOKEEPER-1621:
    -------------------------------------

    Assignee: Mahadev konar
    ZooKeeper does not recover from crash when disk was full
    --------------------------------------------------------

    Key: ZOOKEEPER-1621
    URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
    Project: ZooKeeper
    Issue Type: Bug
    Components: server
    Affects Versions: 3.4.3
    Environment: Ubuntu 12.04, Amazon EC2 instance
    Reporter: David Arthur
    Assignee: Mahadev konar
    Fix For: 3.5.0

    Attachments: zookeeper.log.gz


    The disk that ZooKeeper was using filled up. During a snapshot write, I got the following exception
    2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting
    java.io.IOException: No space left on device
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:282)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
    at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
    at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
    at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
    at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
    Then many subsequent exceptions like:
    2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was partial.
    2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally
    java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
    at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504)
    at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
    at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
    at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
    at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
    at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
    at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
    at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
    at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
    It seems to me that writing the transaction log should be fully atomic to avoid such situations. Is this not the case?
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators
    For more information on JIRA, see: http://www.atlassian.com/software/jira

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieszookeeper, hadoop
postedJan 16, '13 at 3:24p
activeJan 18, '13 at 7:38a
posts16
users1
websitezookeeper.apache.org
irc#zookeeper

1 user in discussion

Mahadev konar (JIRA): 16 posts

People

Translate

site design / logo © 2021 Grokbase