I believe I've figured out why synchronous replication has such
terrible performance with fsync=off: it has a nasty race condition.
It may happen - if the standby responds very quickly - that the
standby acks the commit record and awakens waiters before the
committing backend actually begins to wait. There's no cross-check
for this: the committing backend waits unconditionally, with no regard
to whether the necessary ACK has already arrived. At this point we
may be in for a very long wait: another ACK will be required to
release waiters, and that may not be immediately forthcoming. I had
thought that the next ACK (after at most wal_receiver_status_interval)
would do the trick, but it appears to be even worse than that: by
making the standby win the race, I was easily able to get the master
to hang for over a minute, and it only got released when I committed
another transaction. Had I been sufficiently patient, the next
checkpoint probably would have done the trick.

Of course, with fsync=off on the standby, it's much easier for the
standby to win the race.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Search Discussions

  • Simon Riggs at Mar 26, 2011 at 10:16 am

    On Sat, Mar 26, 2011 at 1:11 AM, Robert Haas wrote:

    I believe I've figured out why synchronous replication has such
    terrible performance with fsync=off: it has a nasty race condition.
    It may happen - if the standby responds very quickly - that the
    standby acks the commit record and awakens waiters before the
    committing backend actually begins to wait.  There's no cross-check
    for this: the committing backend waits unconditionally, with no regard
    to whether the necessary ACK has already arrived.  At this point we
    may be in for a very long wait: another ACK will be required to
    release waiters, and that may not be immediately forthcoming.  I had
    thought that the next ACK (after at most wal_receiver_status_interval)
    would do the trick, but it appears to be even worse than that: by
    making the standby win the race, I was easily able to get the master
    to hang for over a minute, and it only got released when I committed
    another transaction.  Had I been sufficiently patient, the next
    checkpoint probably would have done the trick.

    Of course, with fsync=off on the standby, it's much easier for the
    standby to win the race.
    That seems very unlikely even with fsync=off in a real config where we
    have network path to consider.

    Your definition of a "nasty" race condition seems off.

    I've added code for you.

    --
    Simon Riggs                   http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
  • Robert Haas at Mar 26, 2011 at 3:00 pm

    On Mar 26, 2011, at 6:16 AM, Simon Riggs wrote:
    That seems very unlikely even with fsync=off in a real config where we
    have network path to consider.

    Your definition of a "nasty" race condition seems off.

    I've added code for you.
    Your skepticism seems out of place. I actually hit this problem in testing. We could debate how realistic my test setup was, but why? It is not our policy - and bad practice in general - to ship code with race conditions. The code you added is not "for me"; it's "to fix a race condition".

    Thanks for the quick fix.

    ...Robert
  • Tom Lane at Mar 26, 2011 at 4:05 pm

    Robert Haas writes:
    I've added code for you.
    Your skepticism seems out of place. I actually hit this problem in testing. We could debate how realistic my test setup was, but why? It is not our policy - and bad practice in general - to ship code with race conditions. The code you added is not "for me"; it's "to fix a race condition".
    In particular, in view of today's fix, shouldn't this commit be reverted?

    http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=6e8e7cc580665ddd43c8ca2acc6d60f345570a57

    I thought at the time that that was nothing more than documenting a
    known bug, and now it is documenting a dead bug.

    regards, tom lane
  • Simon Riggs at Mar 26, 2011 at 4:16 pm

    On Sat, Mar 26, 2011 at 4:04 PM, Tom Lane wrote:
    Robert Haas <robertmhaas@gmail.com> writes:
    I've added code for you.
    Your skepticism seems out of place. I actually hit this problem in testing. We could debate how realistic my test setup was, but why?  It is not our policy - and bad practice in general - to ship code with race conditions.  The code you added is not "for me"; it's "to fix a race condition".
    In particular, in view of today's fix, shouldn't this commit be reverted?

    http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=6e8e7cc580665ddd43c8ca2acc6d60f345570a57

    I thought at the time that that was nothing more than documenting a
    known bug, and now it is documenting a dead bug.
    No, that doc change is still accurate.

    --
    Simon Riggs                   http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
  • Tom Lane at Mar 26, 2011 at 4:24 pm

    Simon Riggs writes:
    On Sat, Mar 26, 2011 at 4:04 PM, Tom Lane wrote:
    In particular, in view of today's fix, shouldn't this commit be reverted?

    http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=6e8e7cc580665ddd43c8ca2acc6d60f345570a57

    I thought at the time that that was nothing more than documenting a
    known bug, and now it is documenting a dead bug.
    No, that doc change is still accurate.
    Well, in that case, it should be on the open-items list. If the system
    is still behaving that way, it's a bug.

    regards, tom lane
  • Robert Haas at Mar 26, 2011 at 4:39 pm

    On Mar 26, 2011, at 12:24 PM, Tom Lane wrote:
    Simon Riggs <simon@2ndQuadrant.com> writes:
    On Sat, Mar 26, 2011 at 4:04 PM, Tom Lane wrote:
    In particular, in view of today's fix, shouldn't this commit be reverted?

    http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=6e8e7cc580665ddd43c8ca2acc6d60f345570a57

    I thought at the time that that was nothing more than documenting a
    known bug, and now it is documenting a dead bug.
    No, that doc change is still accurate.
    Well, in that case, it should be on the open-items list. If the system
    is still behaving that way, it's a bug.
    I suspect it's fixed, but I haven't checked yet.

    And contrary to Simon's contention on a nearby email, it was not a "slight" performance degradation. It was running at like 1 tps.

    ...Robert
  • Simon Riggs at Mar 26, 2011 at 4:41 pm

    On Sat, Mar 26, 2011 at 4:24 PM, Tom Lane wrote:
    Simon Riggs <simon@2ndQuadrant.com> writes:
    On Sat, Mar 26, 2011 at 4:04 PM, Tom Lane wrote:
    In particular, in view of today's fix, shouldn't this commit be reverted?

    http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=6e8e7cc580665ddd43c8ca2acc6d60f345570a57

    I thought at the time that that was nothing more than documenting a
    known bug, and now it is documenting a dead bug.
    No, that doc change is still accurate.
    Well, in that case, it should be on the open-items list.  If the system
    is still behaving that way, it's a bug.
    Is it? Sync rep requires fsync on the standby. If you then explicitly
    turn off fsync on the standby then it has a performance impact, as
    documented.

    Setting shared_buffers very low also reduces performance.

    --
    Simon Riggs                   http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
  • Robert Haas at Mar 26, 2011 at 5:51 pm

    On Sat, Mar 26, 2011 at 12:41 PM, Simon Riggs wrote:
    Is it? Sync rep requires fsync on the standby. If you then explicitly
    turn off fsync on the standby then it has a performance impact, as
    documented.
    Actually, it doesn't, now that you fixed this. Before:

    [rhaas@office ~]$ pgbench -T 10
    starting vacuum...end.
    transaction type: TPC-B (sort of)
    scaling factor: 25
    query mode: simple
    number of clients: 1
    number of threads: 1
    duration: 10 s
    number of transactions actually processed: 27
    tps = 0.099386 (including connections establishing)
    tps = 0.099389 (excluding connections establishing)
    [rhaas@office ~]$ pgbench -T 10
    starting vacuum...end.
    transaction type: TPC-B (sort of)
    scaling factor: 25
    query mode: simple
    number of clients: 1
    number of threads: 1
    duration: 10 s
    number of transactions actually processed: 425
    tps = 42.442185 (including connections establishing)
    tps = 42.468972 (excluding connections establishing)

    The first one - run with code from a few weeks ago - hung up on the
    27th transaction and was stuck there until the next checkpoint
    completed. The second one was run with the latest code and no longer
    hangs - and in fact it's now faster than running with fsync=on,
    exactly as one would expect. Or at least as *I* expected.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Simon Riggs at Mar 27, 2011 at 11:46 am

    On Sat, Mar 26, 2011 at 5:51 PM, Robert Haas wrote:
    On Sat, Mar 26, 2011 at 12:41 PM, Simon Riggs wrote:
    Is it? Sync rep requires fsync on the standby. If you then explicitly
    turn off fsync on the standby then it has a performance impact, as
    documented.
    Actually, it doesn't, now that you fixed this.  Before:

    [rhaas@office ~]$ pgbench -T 10
    starting vacuum...end.
    transaction type: TPC-B (sort of)
    scaling factor: 25
    query mode: simple
    number of clients: 1
    number of threads: 1
    duration: 10 s
    number of transactions actually processed: 27
    tps = 0.099386 (including connections establishing)
    tps = 0.099389 (excluding connections establishing)
    [rhaas@office ~]$ pgbench -T 10
    starting vacuum...end.
    transaction type: TPC-B (sort of)
    scaling factor: 25
    query mode: simple
    number of clients: 1
    number of threads: 1
    duration: 10 s
    number of transactions actually processed: 425
    tps = 42.442185 (including connections establishing)
    tps = 42.468972 (excluding connections establishing)

    The first one - run with code from a few weeks ago - hung up on the
    27th transaction and was stuck there until the next checkpoint
    completed.  The second one was run with the latest code and no longer
    hangs - and in fact it's now faster than running with fsync=on,
    exactly as one would expect.  Or at least as *I* expected.
    Are the master and standby on same system or are they separated by a network?

    I'm surprised that a network roundtrip takes less time than the
    backend takes to mark clog and then queue for the SyncRepLock.

    --
    Simon Riggs                   http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
  • Robert Haas at Mar 28, 2011 at 1:27 am

    On Sun, Mar 27, 2011 at 7:46 AM, Simon Riggs wrote:
    Are the master and standby on same system or are they separated by a network?

    I'm surprised that a network roundtrip takes less time than the
    backend takes to mark clog and then queue for the SyncRepLock.
    When I first noticed that it was slow (really hanging, though I failed
    to realize it) with fsync=off, I had two clusters on the system. I
    didn't know what was going on at that point, and wasn't specifically
    looking for this bug - I was actually testing some other aspect of the
    behavior and hit upon it by accident. Then while I was at PG East I
    realized there was a race condition. (I think I actually realized it
    while I was dreaming about PostgreSQL; if you think dreaming about
    PostgreSQL is a sign that something is seriously wrong with me, you
    are likely correct.) Just to convince myself that I wasn't making
    things up I then stuck a sleep(1) in right before the sync rep wait,
    for testing purposes, which of course made it trivial to demonstrate
    the hang; I again did that on the same system (different one) but of
    course with the sleep in there it wouldn't have mattered. Then later
    I realized that the race condition and the fsync=off were probably the
    same problem, so I wrote up the email that way.

    If your point is that I never demonstrated with sync rep between two
    different systems, I agree. I suspect it could be done, but you'd
    probably have to load the master down pretty heavily while keeping the
    load on the standby very light - or possibly it would work to just run
    a single-threaded test for a really long time, but I don't know
    because I haven't tried it. I'm actually not that interested in
    quantifying the exact probability of this happening under any given
    set of circumstances; it seems like enough that it's been found and
    fixed. If something in the phrasing of my original email gave
    offence, it wasn't intended to: in particular, the use of the word
    "nasty" was intended to convey "difficult to find; tricky". I think
    my fear that it would prove difficult to fix also may have affected
    that word choice; I didn't anticipate it being resolved so quickly and
    with such a small patch.

    I am doing my best to help fix the things that I believe to be bugs in
    the code without pissing anybody off. Clearly, at least in your case,
    that doesn't seem to have been entirely successful, but in all honesty
    it's not for lack of trying. I really, really want to get this
    release out the door and get back to writing code and doing
    CommitFests; but I also want it to be good (as I'm sure you do as
    well) and I think we're not quite there yet.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Tom Lane at Mar 26, 2011 at 6:04 pm

    Simon Riggs writes:
    On Sat, Mar 26, 2011 at 4:24 PM, Tom Lane wrote:
    Well, in that case, it should be on the open-items list.  If the system
    is still behaving that way, it's a bug.
    Is it? Sync rep requires fsync on the standby. If you then explicitly
    turn off fsync on the standby then it has a performance impact, as
    documented.
    No, that's utter nonsense. If the system is behaving that way, then
    it's a bug. If you don't think it's a bug, then you misunderstand what
    the fsync GUC is supposed to do. What fsync=off is supposed to do is
    cause every attempted fsync to succeed instantly. It is *not possible*
    for that to result in a performance slowdown compared to fsyncs that
    take nonzero time. Unless someone's broken it. The documented behavior
    is broken. Period.

    regards, tom lane
  • Simon Riggs at Mar 26, 2011 at 4:26 pm

    On Sat, Mar 26, 2011 at 3:00 PM, Robert Haas wrote:
    On Mar 26, 2011, at 6:16 AM, Simon Riggs wrote:
    That seems very unlikely even with fsync=off in a real config where we
    have network path to consider.

    Your definition of a "nasty" race condition seems off.

    I've added code for you.
    Your skepticism seems out of place. I actually hit this problem in testing. We could debate how realistic my test setup was, but why?  It is not our policy - and bad practice in general - to ship code with race conditions.  The code you added is not "for me"; it's "to fix a race condition".
    A race that will not be lost with normal and sensible settings and
    results in a slight performance degradation only in a contrived worse
    case scenario. There is no question that it caused any other problem.
    Thanks for the quick fix.
    --
    Simon Riggs                   http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
  • Kevin Grittner at Mar 27, 2011 at 5:07 pm

    Simon Riggs wrote:

    I'm surprised that a network roundtrip takes less time than the
    backend takes to mark clog and then queue for the SyncRepLock.
    I'm not surprised by that at all. Some of our replication involves
    Gb or faster connections on the same LAN segment (sometimes on the
    same switch).

    That's rather beside the point though -- on a busy system a process
    can get starved long enough to make some seemingly improbably timings
    occur. PostgreSQL shouldn't fall over just because the load gets
    heavy enough to cause such timings. A race condition is a race
    condition.

    -Kevin
  • Simon Riggs at Mar 27, 2011 at 5:36 pm

    On Sun, Mar 27, 2011 at 6:07 PM, Kevin Grittner wrote:
    Simon Riggs  wrote:
    I'm surprised that a network roundtrip takes less time than the
    backend takes to mark clog and then queue for the SyncRepLock.
    I'm not surprised by that at all.  Some of our replication involves
    Gb or faster connections on the same LAN segment (sometimes on the
    same switch).

    That's rather beside the point though -- on a busy system a process
    can get starved long enough to make some seemingly improbably timings
    occur.  PostgreSQL shouldn't fall over just because the load gets
    heavy enough to cause such timings.  A race condition is a race
    condition.
    That has been fixed, without discussion. The discussion was about ...,
    well, something else.

    --
    Simon Riggs                   http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedMar 26, '11 at 1:11a
activeMar 28, '11 at 1:27a
posts15
users4
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase