Hi,

When the master shuts down or crashes, there seems to be
the case where walreceiver exits without flushing WAL which
has already been written. This might lead startup process to
replay un-flushed WAL and break a Write-Ahead-Logging rule.

walreceiver.c
/* Wait a while for data to arrive */
if (walrcv_receive(NAPTIME_PER_CYCLE, &type, &buf, &len))
{
/* Accept the received data, and process it */
XLogWalRcvProcessMsg(type, buf, len);

/* Receive any more data we can without sleeping */
while (walrcv_receive(0, &type, &buf, &len))
XLogWalRcvProcessMsg(type, buf, len);

/*
* If we've written some records, flush them to disk and let the
* startup process know about them.
*/
XLogWalRcvFlush();
}
The problematic case happens when the latter walrcv_receive
emits ERROR. In this case, the WAL received by the former
walrcv_receive is not guaranteed to have been flushed yet.

The attached patch ensures that all WAL received is flushed to
disk before walreceiver exits. This patch should be backported
to 9.0, I think.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Search Discussions

  • Heikki Linnakangas at Jan 13, 2011 at 8:59 am

    On 13.01.2011 10:28, Fujii Masao wrote:
    When the master shuts down or crashes, there seems to be
    the case where walreceiver exits without flushing WAL which
    has already been written. This might lead startup process to
    replay un-flushed WAL and break a Write-Ahead-Logging rule.
    Hmm, that can happen at a crash even with no replication involved. If
    you "kill -9 postmaster", and some WAL had been written but not fsync'd,
    on crash recovery we will happily recover the unsynced WAL. We could
    prevent that by fsyncing all WAL before applying it - presumably
    fsyncing a file that has already been flushed is quick. But is it worth
    the trouble?
    walreceiver.c
    /* Wait a while for data to arrive */
    if (walrcv_receive(NAPTIME_PER_CYCLE,&type,&buf,&len))
    {
    /* Accept the received data, and process it */
    XLogWalRcvProcessMsg(type, buf, len);

    /* Receive any more data we can without sleeping */
    while (walrcv_receive(0,&type,&buf,&len))
    XLogWalRcvProcessMsg(type, buf, len);

    /*
    * If we've written some records, flush them to disk and let the
    * startup process know about them.
    */
    XLogWalRcvFlush();
    }
    The problematic case happens when the latter walrcv_receive
    emits ERROR. In this case, the WAL received by the former
    walrcv_receive is not guaranteed to have been flushed yet.

    The attached patch ensures that all WAL received is flushed to
    disk before walreceiver exits. This patch should be backported
    to 9.0, I think.
    Yeah, we probably should do that, even though it doesn't completely
    close the window tahat unsynced WAL is replayed.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Fujii Masao at Jan 13, 2011 at 10:35 am

    On Thu, Jan 13, 2011 at 5:59 PM, Heikki Linnakangas wrote:
    On 13.01.2011 10:28, Fujii Masao wrote:

    When the master shuts down or crashes, there seems to be
    the case where walreceiver exits without flushing WAL which
    has already been written. This might lead startup process to
    replay un-flushed WAL and break a Write-Ahead-Logging rule.
    Hmm, that can happen at a crash even with no replication involved. If you
    "kill -9 postmaster", and some WAL had been written but not fsync'd, on
    crash recovery we will happily recover the unsynced WAL.
    Right. If postmaster restarts immediately after kill -9, WAL which has not
    reached to the disk might be replayed. Then if the server crashes when
    min recovery point indicates such an unsynced WAL, the database would
    get corrupted.

    As you say, that is not just about replication. But that is more likely to
    happen in the standby because unsynced WAL appears while recovery
    is in progress. This is one of reasons why walreceiver doesn't let the
    startup process know that new WAL has arrived before flushing it, I think.

    So I believe that the patch is somewhat worth applying.

    BTW, another good point of the patch is that we can track the last WAL
    receive location correctly. Since WalRcv->receivedUpto is updated
    after WAL flush, if the patch is not applied, the location of WAL received
    just before walreceiver exits might not be saved in WalRcv->receivedUpto.
    We could prevent
    that by fsyncing all WAL before applying it - presumably fsyncing a file
    that has already been flushed is quick. But is it worth the trouble?
    No. It looks overkill though it would completely prevent the problem.

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
  • Heikki Linnakangas at Jan 17, 2011 at 10:29 am

    On 13.01.2011 12:35, Fujii Masao wrote:
    On Thu, Jan 13, 2011 at 5:59 PM, Heikki Linnakangas
    wrote:
    On 13.01.2011 10:28, Fujii Masao wrote:

    When the master shuts down or crashes, there seems to be
    the case where walreceiver exits without flushing WAL which
    has already been written. This might lead startup process to
    replay un-flushed WAL and break a Write-Ahead-Logging rule.
    Hmm, that can happen at a crash even with no replication involved. If you
    "kill -9 postmaster", and some WAL had been written but not fsync'd, on
    crash recovery we will happily recover the unsynced WAL.
    Right. If postmaster restarts immediately after kill -9, WAL which has not
    reached to the disk might be replayed. Then if the server crashes when
    min recovery point indicates such an unsynced WAL, the database would
    get corrupted.

    As you say, that is not just about replication. But that is more likely to
    happen in the standby because unsynced WAL appears while recovery
    is in progress. This is one of reasons why walreceiver doesn't let the
    startup process know that new WAL has arrived before flushing it, I think.

    So I believe that the patch is somewhat worth applying.
    Agreed, Committed.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedJan 13, '11 at 8:28a
activeJan 17, '11 at 10:29a
posts4
users2
websitepostgresql.org...
irc#postgresql

2 users in discussion

Heikki Linnakangas: 2 posts Fujii Masao: 2 posts

People

Translate

site design / logo © 2022 Grokbase