There is an open item for synchronous replication and smart shutdown,
with a link to here:

http://archives.postgresql.org/pgsql-hackers/2011-03/msg01391.php

The issue is not straightforward, however, so I want to get some
broader input before proceeding. In short, the problem is that if
synchronous replication is in use, no standbys are connected, and a
smart shutdown is requested, any future commits will wait for a
wake-up that will never come, because by that point postmaster is no
longer accepting connections - thus no standby can reconnect to
release waiters. Or, if there is a standby connected when the smart
shutdown is requested, but it subsequently gets disconnected, it won't
be able to reconnect, and again all waiters will get stuck.

There are a couple of plausible ways to proceed here:

1. Do nothing. If this happens to you, you will need to request fast
or immediate shutdown to get the system unstuck. Since it's pretty
easy for this to happen already anyway (all you need is one connection
to sit open doing nothing), most people probably already have
provision for this and likely wouldn't be terribly inconvenienced by
one more corner case. On the flip side, I would rather that we were
moving in the direction of making it more likely for smart shutdown to
actually shut down the system, rather than less likely.

2. When a smart shutdown is initiated, shut off synchronous
replication. This definitely makes sure you won't get stuck waiting
for sync rep, but on the other hand you probably configured sync rep
because you wanted, uh, sync rep. Or alternatively, continue to allow
sync rep for as long as there is a sync standby connected, but if the
last sync standby drops off then shut it off.

3. Accept new replication connections even when the system is
undergoing a smart shutdown. This is the approach that the
above-linked patch tries to take, and it seems superficially sensible,
but it doesn't really work. Currently, once a shutdown has been
initiated and any on-line backup has been stopped, we stop creating
regular backends; we instead only create dead-end backends that just
return an error message and exit. Once no regular backends remain, we
then stop accepting connections AT ALL and wait for the dead end
backends to drain out. What this patch proposes to do (though it
isn't real clear from the way it's written) is continue creating
regular backends but boot out all but superuser and replication
connections as soon as possible. However, that misses the reason why
the current code works the way that it does: to make sure that even in
the face of a continuing stream of connection requests, we actually
eventually manage to stop talking and shut down. Basically, this
patch would fix the smart-shutdown-sync-rep interaction at the expense
of making smart shutdown considerably more fragile in other cases,
which does not seem like a good trade-off. AFAICT, this whole
approach is doomed to failure.

Anyone else have an idea or opinion?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Search Discussions

  • Tom Lane at Apr 8, 2011 at 6:42 pm

    Robert Haas writes:
    There is an open item for synchronous replication and smart shutdown,
    with a link to here:
    http://archives.postgresql.org/pgsql-hackers/2011-03/msg01391.php
    There are a couple of plausible ways to proceed here:
    1. Do nothing.
    2. When a smart shutdown is initiated, shut off synchronous
    replication.
    3. Accept new replication connections even when the system is
    undergoing a smart shutdown.
    I agree that #3 is impractical and #2 is a bad idea, which seems to
    leave us with #1 (unless anyone has a #4)? This is probably just
    something we should figure is going to be one of the rough edges
    in the first release of sync rep.

    A #4 idea did just come to mind: once we realize that there are no
    working replication connections, automatically do a fast shutdown
    instead, ie, forcibly roll back those transactions that are never
    gonna complete. Or at least have the postmaster bleat about it.
    But I'm not sure what it'd take to code that, and am also unsure
    that it's something to undertake at this stage of the cycle.

    regards, tom lane
  • Robert Haas at Apr 8, 2011 at 6:54 pm

    On Fri, Apr 8, 2011 at 2:38 PM, Tom Lane wrote:
    Robert Haas <robertmhaas@gmail.com> writes:
    There is an open item for synchronous replication and smart shutdown,
    with a link to here:
    http://archives.postgresql.org/pgsql-hackers/2011-03/msg01391.php
    There are a couple of plausible ways to proceed here:
    1. Do nothing.
    2. When a smart shutdown is initiated, shut off synchronous
    replication.
    3. Accept new replication connections even when the system is
    undergoing a smart shutdown.
    I agree that #3 is impractical and #2 is a bad idea, which seems to
    leave us with #1 (unless anyone has a #4)?  This is probably just
    something we should figure is going to be one of the rough edges
    in the first release of sync rep.
    That's kind of where my mind was headed too, although I was (probably
    vainly) hoping for a better option.
    A #4 idea did just come to mind: once we realize that there are no
    working replication connections, automatically do a fast shutdown
    instead, ie, forcibly roll back those transactions that are never
    gonna complete.  Or at least have the postmaster bleat about it.
    But I'm not sure what it'd take to code that, and am also unsure
    that it's something to undertake at this stage of the cycle.
    Well, you certainly can't do that. By the time a transaction is
    waiting for sync rep, it's too late to roll back; the commit record is
    already, and necessarily, on disk. But in theory we could notice that
    all of the remaining backends are waiting for sync rep, and switch to
    a fast shutdown.

    Several people have suggested refinements for smart shutdown in
    general, such as switching to fast shutdown after a certain number of
    seconds, or having backends exit at the end of the current transaction
    (or immediately if idle). Such things would both make this problem
    less irksome and increase the overall utility of smart shutdown
    tremendously. So maybe it's not worth expending too much effort on it
    right now.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Fujii Masao at Apr 11, 2011 at 2:56 am

    On Sat, Apr 9, 2011 at 3:53 AM, Robert Haas wrote:
    There are a couple of plausible ways to proceed here:
    1. Do nothing.
    2. When a smart shutdown is initiated, shut off synchronous
    replication.
    3. Accept new replication connections even when the system is
    undergoing a smart shutdown.
    I agree that #3 is impractical and #2 is a bad idea, which seems to
    leave us with #1 (unless anyone has a #4)?  This is probably just
    something we should figure is going to be one of the rough edges
    in the first release of sync rep.
    That's kind of where my mind was headed too, although I was (probably
    vainly) hoping for a better option.
    Though I proposed #3, I can live with #1 for now. Even if smart shutdown
    gets stuck, we can resolve that by requesting fast shutdown or emptying
    synchronous_standby_names.

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedApr 8, '11 at 6:15p
activeApr 11, '11 at 2:56a
posts4
users3
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase