On Wed, Mar 2, 2011 at 11:30 PM, Fujii Masao wrote:
On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs wrote:
The WALSender deliberately does *not* wake waiting users if the standby
disconnects. Doing so would break the whole reason for having sync rep
in the first place. What we do is allow a potential standby to takeover
the role of sync standby, if one is available. Or the failing standby
can reconnect and then release waiters.
If there is potential standby when synchronous standby has gone, I agree
that it's not good idea to release the waiting backends soon. In this case,
those backends should wait for next synchronous standby.

On the other hand, if there is no potential standby, I think that the waiting
backends should not wait for the timeout and should wake up as soon as
synchronous standby has gone. Otherwise, those backends suspend for
a long time (i.e., until the timeout expires), which would decrease the
high-availability, I'm afraid.

Keeping those backends waiting for the failed standby to reconnect is an
idea. But this looks like the behavior for "allow_standalone_primary = off".
If allow_standalone_primary = on, it looks more natural to make the
primary work alone without waiting the timeout.
Also I think that the waiting backends should be released as soon as the
last synchronous standby switches to asynchronous mode. Since there is
no standby which is planning to reconnect, obviously they no longer need
to wait.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Search Discussions

  • Simon Riggs at Mar 6, 2011 at 10:36 pm

    On Fri, 2011-03-04 at 16:57 +0900, Fujii Masao wrote:
    On Wed, Mar 2, 2011 at 11:30 PM, Fujii Masao wrote:
    On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs wrote:
    The WALSender deliberately does *not* wake waiting users if the standby
    disconnects. Doing so would break the whole reason for having sync rep
    in the first place. What we do is allow a potential standby to takeover
    the role of sync standby, if one is available. Or the failing standby
    can reconnect and then release waiters.
    If there is potential standby when synchronous standby has gone, I agree
    that it's not good idea to release the waiting backends soon. In this case,
    those backends should wait for next synchronous standby.

    On the other hand, if there is no potential standby, I think that the waiting
    backends should not wait for the timeout and should wake up as soon as
    synchronous standby has gone. Otherwise, those backends suspend for
    a long time (i.e., until the timeout expires), which would decrease the
    high-availability, I'm afraid.

    Keeping those backends waiting for the failed standby to reconnect is an
    idea. But this looks like the behavior for "allow_standalone_primary = off".
    If allow_standalone_primary = on, it looks more natural to make the
    primary work alone without waiting the timeout.
    Also I think that the waiting backends should be released as soon as the
    last synchronous standby switches to asynchronous mode. Since there is
    no standby which is planning to reconnect, obviously they no longer need
    to wait.
    I've not done this, but we could.

    It can't run in a WALSender, so this code would need to live in either
    WALWriter or BgWriter.

    --
    Simon Riggs http://www.2ndQuadrant.com/books/
    PostgreSQL Development, 24x7 Support, Training and Services
  • Robert Haas at Mar 7, 2011 at 6:16 pm

    On Sun, Mar 6, 2011 at 5:36 PM, Simon Riggs wrote:
    On Fri, 2011-03-04 at 16:57 +0900, Fujii Masao wrote:
    On Wed, Mar 2, 2011 at 11:30 PM, Fujii Masao wrote:
    On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs wrote:
    The WALSender deliberately does *not* wake waiting users if the standby
    disconnects. Doing so would break the whole reason for having sync rep
    in the first place. What we do is allow a potential standby to takeover
    the role of sync standby, if one is available. Or the failing standby
    can reconnect and then release waiters.
    If there is potential standby when synchronous standby has gone, I agree
    that it's not good idea to release the waiting backends soon. In this case,
    those backends should wait for next synchronous standby.

    On the other hand, if there is no potential standby, I think that the waiting
    backends should not wait for the timeout and should wake up as soon as
    synchronous standby has gone. Otherwise, those backends suspend for
    a long time (i.e., until the timeout expires), which would decrease the
    high-availability, I'm afraid.

    Keeping those backends waiting for the failed standby to reconnect is an
    idea. But this looks like the behavior for "allow_standalone_primary = off".
    If allow_standalone_primary = on, it looks more natural to make the
    primary work alone without waiting the timeout.
    Also I think that the waiting backends should be released as soon as the
    last synchronous standby switches to asynchronous mode. Since there is
    no standby which is planning to reconnect, obviously they no longer need
    to wait.
    I've not done this, but we could.

    It can't run in a WALSender, so this code would need to live in either
    WALWriter or BgWriter.
    I would have thought that the last WALSender to switch to async would
    have been responsible for doing this at that time. Why doesn't that
    work?

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Simon Riggs at Mar 7, 2011 at 7:17 pm

    On Mon, 2011-03-07 at 13:15 -0500, Robert Haas wrote:
    On Sun, Mar 6, 2011 at 5:36 PM, Simon Riggs wrote:
    On Fri, 2011-03-04 at 16:57 +0900, Fujii Masao wrote:
    On Wed, Mar 2, 2011 at 11:30 PM, Fujii Masao wrote:
    On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs wrote:
    Also I think that the waiting backends should be released as soon as the
    last synchronous standby switches to asynchronous mode. Since there is
    no standby which is planning to reconnect, obviously they no longer need
    to wait.
    I've not done this, but we could.

    It can't run in a WALSender, so this code would need to live in either
    WALWriter or BgWriter.
    I would have thought that the last WALSender to switch to async would
    have been responsible for doing this at that time. Why doesn't that
    work?
    The main time we get extended waits is when there are no WALsenders.

    --
    Simon Riggs http://www.2ndQuadrant.com/books/
    PostgreSQL Development, 24x7 Support, Training and Services

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedMar 4, '11 at 7:57a
activeMar 7, '11 at 7:17p
posts4
users3
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase