Well, good news all round.

v17 implements what I believe to be the final set of features for sync
rep. This one I'm actually fairly happy with. It can be enjoyed best at
DEBUG3.

The patch is very lite touch on a few areas of code, plus a chunk of
specific code, all on master-side. Pretty straight really. I'm sure
problems will be found, its not long since I completed this; thanks to
Daniel Farina for your help with patch assembly.

Which is just as well, because the other news is that I'm off on holiday
for a few days, which is most inconvenient. I won't be committing this
for at least a week and absent from the list. OTOH, I think its ready
for a final review and commit, so I'm OK if you commit or OK if you
leave it for me.

That's not the end of it. I can see a few things we could/should do in
this release, but this includes all the must-do things. Docs could do
with a little love also. So I expect work for me when I return.




Config Summary
==============

Most parameters are set on the primary. Set

primary: synchronous_standby_names = 'node1, node2, node3'

which means that whichever of those standbys connect first will
become the main synchronous standby. Servers arriving later will
be potential standbys (standby standbys doesn't sound right...).
synchronous_standby_names can change at reload.

Currently, the standby_name is the application_name parameter
set in the primary_conninfo.

When we set this for a client, or in postgresql.conf

primary: synchronous_replication = on

then we will wait at commit until the synchronous standby has
reached the WAL location of our commit point.

If the current synchronous standby dies then one of the other standbys
will take over. (I think it would be a great idea to make the
list a priority order, but I haven't had time to code that).

If none of the standbys are available, then we don't wait at all
if allow_standalone_primary is set.
allow_standalone_primary can change at reload.

Whatever happens, if you set sync_replication_timeout_client
then backends will give up waiting if some WALsender doesn't
wake them quickly enough.

You can generally leave these parameters at their default settings

primary: sync_replication_timeout_client = 120s
primary: allow_standalone_primary = on
standby: wal_receiver_status_interval = 10s

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

Search Discussions

  • Thom Brown at Feb 19, 2011 at 12:33 am

    On 19 February 2011 00:06, Simon Riggs wrote:
    Well, good news all round.

    v17 implements what I believe to be the final set of features for sync
    rep. This one I'm actually fairly happy with. It can be enjoyed best at
    DEBUG3.

    The patch is very lite touch on a few areas of code, plus a chunk of
    specific code, all on master-side. Pretty straight really. I'm sure
    problems will be found, its not long since I completed this; thanks to
    Daniel Farina for your help with patch assembly.

    Which is just as well, because the other news is that I'm off on holiday
    for a few days, which is most inconvenient. I won't be committing this
    for at least a week and absent from the list. OTOH, I think its ready
    for a final review and commit, so I'm OK if you commit or OK if you
    leave it for me.

    That's not the end of it. I can see a few things we could/should do in
    this release, but this includes all the must-do things. Docs could do
    with a little love also. So I expect work for me when I return.




    Config Summary
    ==============

    Most parameters are set on the primary. Set

    primary: synchronous_standby_names = 'node1, node2, node3'

    which means that whichever of those standbys connect first will
    become the main synchronous standby. Servers arriving later will
    be potential standbys (standby standbys doesn't sound right...).
    synchronous_standby_names can change at reload.

    Currently, the standby_name is the application_name parameter
    set in the primary_conninfo.

    When we set this for a client, or in postgresql.conf

    primary: synchronous_replication = on

    then we will wait at commit until the synchronous standby has
    reached the WAL location of our commit point.

    If the current synchronous standby dies then one of the other standbys
    will take over. (I think it would be a great idea to make the
    list a priority order, but I haven't had time to code that).

    If none of the standbys are available, then we don't wait at all
    if allow_standalone_primary is set.
    allow_standalone_primary can change at reload.

    Whatever happens, if you set sync_replication_timeout_client
    then backends will give up waiting if some WALsender doesn't
    wake them quickly enough.

    You can generally leave these parameters at their default settings

    primary: sync_replication_timeout_client = 120s
    primary: allow_standalone_primary = on
    standby: wal_receiver_status_interval = 10s
    I see the updated patch I provided last time to fix various errata and
    spacing issues got lost in the last round of conversations. Maybe
    it's safer to provide a patch for the patch.

    --
    Thom Brown
    Twitter: @darkixion
    IRC (freenode): dark_ixion
    Registered Linux user: #516935
  • Simon Riggs at Feb 19, 2011 at 12:45 am

    On Sat, 2011-02-19 at 00:32 +0000, Thom Brown wrote:

    I see the updated patch I provided last time to fix various errata and
    spacing issues got lost in the last round of conversations. Maybe
    it's safer to provide a patch for the patch.
    I'm sorry about that Thom, your changes were and are welcome. The docs
    were assembled rather quickly about 2 hours ago and we'll definitely
    need your care and attention to bring them to a good level of quality.

    --
    Simon Riggs http://www.2ndQuadrant.com/books/
    PostgreSQL Development, 24x7 Support, Training and Services
  • Alvaro Herrera at Feb 19, 2011 at 12:51 am

    Excerpts from Thom Brown's message of vie feb 18 21:32:17 -0300 2011:

    I see the updated patch I provided last time to fix various errata and
    spacing issues got lost in the last round of conversations. Maybe
    it's safer to provide a patch for the patch.
    interdiff?

    --
    Álvaro Herrera <alvherre@commandprompt.com>
    The PostgreSQL Company - Command Prompt, Inc.
    PostgreSQL Replication, Consulting, Custom Development, 24x7 support
  • Robert Haas at Feb 19, 2011 at 1:45 am

    On Fri, Feb 18, 2011 at 7:06 PM, Simon Riggs wrote:
    The patch is very lite touch on a few areas of code, plus a chunk of
    specific code, all on master-side. Pretty straight really. I'm sure
    problems will be found, its not long since I completed this; thanks to
    Daniel Farina for your help with patch assembly.
    This looks like it's in far better shape than the previous version.
    Thanks to you and Dan for working on it.

    As you have this coded, enabling synchronous_replication forces all
    transactions to commit synchronously. This means that, when
    synchronous_replication=on, you get synchronous_commit=on behavior
    even if you have synchronous_commit=off. I'd prefer to see us go the
    other way, and say that you can only get synchronous replication when
    you're also getting synchronous commit.

    Even if not, we should at least arrange the test so that we don't
    force synchronous commit for transactions that haven't written XLOG.
    That is, instead of:

    ((wrote_xlog && XactSyncCommit) || forceSyncCommit || nrels > 0 ||
    SyncRepRequested())

    ...we should instead say:

    ((write_xlog && (XactSyncCommit || SyncRepRequested()) ||
    forceSyncCommit || nrels > 0)

    ...but as I say I'd be inclined to instead keep the existing test, and
    just skip SyncRepWaitForLSN() if we aren't committing synchronously.

    I'm unsure of the value of sync_replication_timeout_client (which
    incidentally is spelled replication_timeout_client in one place, and
    elsewhere asynchornus -> asynchronous). I think in practice if you
    start hitting that timeout, your server will slow to such a crawl that
    you might as well be down. On the other hand, I see no particular
    harm in leaving the option in there either, though I definitely think
    the default should be changed to -1.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Simon Riggs at Feb 19, 2011 at 8:28 am

    On Fri, 2011-02-18 at 20:45 -0500, Robert Haas wrote:
    On Fri, Feb 18, 2011 at 7:06 PM, Simon Riggs wrote:
    The patch is very lite touch on a few areas of code, plus a chunk of
    specific code, all on master-side. Pretty straight really. I'm sure
    problems will be found, its not long since I completed this; thanks to
    Daniel Farina for your help with patch assembly.
    This looks like it's in far better shape than the previous version.
    Thanks to you and Dan for working on it.

    As you have this coded, enabling synchronous_replication forces all
    transactions to commit synchronously. This means that, when
    synchronous_replication=on, you get synchronous_commit=on behavior
    even if you have synchronous_commit=off. I'd prefer to see us go the
    other way, and say that you can only get synchronous replication when
    you're also getting synchronous commit.
    First, we should be clear to explain that you are referring to the fact
    that the request
    synchronous_commit = off
    synchronous_replication = on
    makes no sense in the way the replication system is currently designed,
    even though it is a wish-list item to make it work in 9.2+

    Since that combination makes no sense we need to decide how will we
    react when it is requested. I think fail-safe is the best way. We should
    make the default the safe way and allow performance options in
    non-default paths.

    Hence the above combination of options is taken in the patch to be the
    same as
    synchronous_commit = on
    synchronous_replication = on

    If you want performance, you can still get it with
    synchronous_commit = off
    synchronous_replication = off
    which does have meaning

    So the way the patch is coded makes most sense for a safety feature. I
    think it does need to be documented also.

    Must fly now...

    --
    Simon Riggs http://www.2ndQuadrant.com/books/
    PostgreSQL Development, 24x7 Support, Training and Services
  • Robert Haas at Feb 20, 2011 at 3:52 am

    On Sat, Feb 19, 2011 at 3:28 AM, Simon Riggs wrote:
    First, we should be clear to explain that you are referring to the fact
    that the request
    synchronous_commit = off
    synchronous_replication = on
    makes no sense in the way the replication system is currently designed,
    even though it is a wish-list item to make it work in 9.2+
    What exactly do you mean by "make it work"? We can either (1) wait
    for the local commit and the remote commit (synchronous_commit=on,
    synchronous_replication=on), (2) wait for the local commit only
    (synchronous_commit=on, synchronous_replication=off), or (3) wait for
    neither (synchronous_commit=off, synchronous_replication=off).
    There's no fourth possible behavior, AFAICS.

    The question is whether synchronous_commit=off,
    synchronous_replication=on should behave like (1) or (3); AFAICS
    there's no fourth possible behavior. You have it as #1; I'm arguing
    it should be #3. I realize it's an arguable point; I'm just arguing
    for what makes most sense to me.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Jaime Casanova at Feb 20, 2011 at 7:12 am

    On Sat, Feb 19, 2011 at 10:52 PM, Robert Haas wrote:
    On Sat, Feb 19, 2011 at 3:28 AM, Simon Riggs wrote:
    First, we should be clear to explain that you are referring to the fact
    that the request
    synchronous_commit = off
    synchronous_replication = on
    makes no sense in the way the replication system is currently designed,
    even though it is a wish-list item to make it work in 9.2+
    What exactly do you mean by "make it work"?  We can either (1) wait
    for the local commit and the remote commit (synchronous_commit=on,
    synchronous_replication=on), (2) wait for the local commit only
    (synchronous_commit=on, synchronous_replication=off), or (3) wait for
    neither (synchronous_commit=off, synchronous_replication=off).
    There's no fourth possible behavior, AFAICS.

    The question is whether synchronous_commit=off,
    synchronous_replication=on should behave like (1) or (3); AFAICS
    there's no fourth possible behavior.  You have it as #1; I'm arguing
    it should be #3.  I realize it's an arguable point; I'm just arguing
    for what makes most sense to me.
    IMHO, we should stick to the safest option.

    considering that synchronous_replication to on means that we *want*
    durability, and that synchronous_commit to off means we don't *care*
    about durability. Then the real question here is: in the presence of
    synchronous_replication to on (which means we want durability), are we
    allowed to assume we can loss data?

    one way to manage that is simply disallow that combination with an
    error, maybe that is a bit strict but we haven't to make assumptions;
    the other option is to keep safe which means keep durability so if you
    want to risk some data then you should disable synchronous_replication
    as well as synchronous_commit... maybe sending a message to the log
    when you detect the conflicting situation.

    --
    Jaime Casanova         www.2ndQuadrant.com
    Professional PostgreSQL: Soporte y capacitación de PostgreSQL
  • Florian Pflug at Feb 20, 2011 at 12:20 pm

    On Feb20, 2011, at 08:12 , Jaime Casanova wrote:
    considering that synchronous_replication to on means that we *want*
    durability, and that synchronous_commit to off means we don't *care*
    about durability. Then the real question here is: in the presence of
    synchronous_replication to on (which means we want durability), are we
    allowed to assume we can loss data?
    From the angle, shouldn't we turn synchronous_replication=on into a third
    possible state of synchronous_commit?

    We'd then have

    synchronous_commit=off #Same as now
    synchronous_commit=local #Same as synchronous_commit=on,
    #synchronous_replication=off
    synchronous_commit=standby #Same as synchronous_commit=on,
    #synchronous_replication=on
    one way to manage that is simply disallow that combination with an
    error, maybe that is a bit strict but we haven't to make assumptions;
    the other option is to keep safe which means keep durability so if you
    want to risk some data then you should disable synchronous_replication
    as well as synchronous_commit... maybe sending a message to the log
    when you detect the conflicting situation.
    The question is where we'd raise the error, though. Doing it on GUC
    assignment makes the behaviour depend on assignment order. That's a
    bad idea I believe, since it possibly breaks ALTER ROLE/DATEBASE SET ....

    best regards,
    Florian Pflug
  • Jaime Casanova at Feb 20, 2011 at 11:10 pm

    On Sun, Feb 20, 2011 at 7:20 AM, Florian Pflug wrote:
    On Feb20, 2011, at 08:12 , Jaime Casanova wrote:
    considering that synchronous_replication to on means that we *want*
    durability, and that synchronous_commit to off means we don't *care*
    about durability. Then the real question here is: in the presence of
    synchronous_replication to on (which means we want durability), are we
    allowed to assume we can loss data?
    From the angle, shouldn't we turn synchronous_replication=on into a third
    possible state of synchronous_commit?

    We'd then have

    synchronous_commit=off #Same as now
    synchronous_commit=local #Same as synchronous_commit=on,
    #synchronous_replication=off
    synchronous_commit=standby #Same as synchronous_commit=on,
    #synchronous_replication=on
    that would be a little confuse and difficult to document. at least i
    know that as far as we say something like this "to activate
    synchronous replication set synchronous commit to standby" users
    somehow will have the impression that locally the commit is
    asynchronous (ok, a have had bad experiences with Ecuadorian users ;)
    one way to manage that is simply disallow that combination with an
    error, maybe that is a bit strict but we haven't to make assumptions;
    the other option is to keep safe which means keep durability so if you
    want to risk some data then you should disable synchronous_replication
    as well as synchronous_commit... maybe sending a message to the log
    when you detect the conflicting situation.
    The question is where we'd raise the error, though. Doing it on GUC
    assignment makes the behaviour depend on assignment order. That's a
    bad idea I believe, since it possibly breaks ALTER ROLE/DATEBASE SET ....
    well, yeah... maybe is just to much worthless work... but we can check
    before commit and send a NOTICE message

    --
    Jaime Casanova         www.2ndQuadrant.com
    Professional PostgreSQL: Soporte y capacitación de PostgreSQL
  • Florian Pflug at Feb 21, 2011 at 12:21 am

    On Feb21, 2011, at 00:09 , Jaime Casanova wrote:
    On Sun, Feb 20, 2011 at 7:20 AM, Florian Pflug wrote:
    On Feb20, 2011, at 08:12 , Jaime Casanova wrote:
    considering that synchronous_replication to on means that we *want*
    durability, and that synchronous_commit to off means we don't *care*
    about durability. Then the real question here is: in the presence of
    synchronous_replication to on (which means we want durability), are we
    allowed to assume we can loss data?
    From the angle, shouldn't we turn synchronous_replication=on into a third
    possible state of synchronous_commit?

    We'd then have

    synchronous_commit=off #Same as now
    synchronous_commit=local #Same as synchronous_commit=on,
    #synchronous_replication=off
    synchronous_commit=standby #Same as synchronous_commit=on,
    #synchronous_replication=on
    that would be a little confuse and difficult to document. at least i
    know that as far as we say something like this "to activate
    synchronous replication set synchronous commit to standby" users
    somehow will have the impression that locally the commit is
    asynchronous (ok, a have had bad experiences with Ecuadorian users ;)

    I figured we'd say something like

    'To guarantee durability of transactions except in the fail-over case,
    set synchronous_commit to "local". To additionally guarantee durability
    even in the case of a fail-over to a standby node, set synchronous_commit
    to "standby". This causes a COMMIT to only report success once the commit
    record has be acknowledged by the current synchronous standby node, and
    will therefore induce a higher latency of COMMIT requests.'

    Other possible names for the "standby" options that come to mind are
    "replication", "replica" or "replicate". Or maybe "cluster", but that
    seems confusing since we sometimes call the collection of databases managed
    by one postgres instance a cluster.

    best regards,
    Florian Pflug
  • Simon Riggs at Feb 28, 2011 at 9:13 pm

    On Sat, 2011-02-19 at 22:52 -0500, Robert Haas wrote:
    On Sat, Feb 19, 2011 at 3:28 AM, Simon Riggs wrote:
    First, we should be clear to explain that you are referring to the fact
    that the request
    synchronous_commit = off
    synchronous_replication = on
    makes no sense in the way the replication system is currently designed,
    even though it is a wish-list item to make it work in 9.2+
    What exactly do you mean by "make it work"? We can either (1) wait
    for the local commit and the remote commit (synchronous_commit=on,
    synchronous_replication=on), (2) wait for the local commit only
    (synchronous_commit=on, synchronous_replication=off), or (3) wait for
    neither (synchronous_commit=off, synchronous_replication=off).
    There's no fourth possible behavior, AFAICS.
    Currently, no, since as we discussed earlier we currently need to fsync
    WAL locally before it gets sent to standby.
    The question is whether synchronous_commit=off,
    synchronous_replication=on should behave like (1) or (3)
    Yes, that is the right question.
    You have it as #1; I'm arguing
    it should be #3. I realize it's an arguable point; I'm just arguing
    for what makes most sense to me.
    Various comments follow on thread. We can pick this up once we've
    committed the main patch.

    --
    Simon Riggs http://www.2ndQuadrant.com/books/
    PostgreSQL Development, 24x7 Support, Training and Services
  • Simon Riggs at Feb 19, 2011 at 8:36 am

    On Fri, 2011-02-18 at 20:45 -0500, Robert Haas wrote:

    On the other hand, I see no particular
    harm in leaving the option in there either, though I definitely think
    the default should be changed to -1.
    The default setting should be to *not* freeze up if you lose the
    standby. That behaviour unexpectedly leads to an effective server down
    situation, rather than 2 minutes of slow running. This is supposed to be
    High Availability software so it will be bad for us if we allow that.

    We've discussed this many times already. The people that wish to wait
    forever have their wait-forever option, but lets not force it on
    everyone.

    --
    Simon Riggs http://www.2ndQuadrant.com/books/
    PostgreSQL Development, 24x7 Support, Training and Services
  • Robert Haas at Feb 20, 2011 at 4:26 am

    On Sat, Feb 19, 2011 at 3:35 AM, Simon Riggs wrote:
    On Fri, 2011-02-18 at 20:45 -0500, Robert Haas wrote:
    On the other hand, I see no particular
    harm in leaving the option in there either, though I definitely think
    the default should be changed to -1.
    The default setting should be to *not* freeze up if you lose the
    standby. That behaviour unexpectedly leads to an effective server down
    situation, rather than 2 minutes of slow running.
    My understanding is that the parameter will wait on every commit, not
    just once. There's no mechanism to do anything else. But I did some
    testing this evening and actually it appears to not work at all. I
    hit walreceiver with a SIGSTOP and the commit never completes, even
    after the two minute timeout. Also, when I restarted walreceiver
    after a long time, I got a server crash.

    DEBUG: write 0/3027BC8 flush 0/3014690 apply 0/3014690
    DEBUG: released 0 procs up to 0/3014690
    DEBUG: write 0/3027BC8 flush 0/3027BC8 apply 0/3014690
    DEBUG: released 2 procs up to 0/3027BC8
    WARNING: could not locate ourselves on wait queue
    server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
    The connection to the server was lost. Attempting reset: DEBUG:
    shmem_exit(-1): 0 callbacks to make
    DEBUG: proc_exit(-1): 0 callbacks to make
    FATAL: could not receive data from WAL stream: server closed the
    connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.

    Failed.
    !> LOG: record with zero length at 0/3027BC8
    DEBUG: CommitTransaction
    DEBUG: name: unnamed; blockState: STARTED; state: INPROGR,
    xid/subid/cid: 0/1/0, nestlvl: 1, children:
    DEBUG: received replication command: IDENTIFY_SYSTEM
    DEBUG: received replication command: START_REPLICATION 0/3000000
    LOG: streaming replication successfully connected to primary
    DEBUG: standby "standby" is a potential synchronous standby
    DEBUG: write 0/0 flush 0/0 apply 0/3027BC8
    DEBUG: released 0 procs up to 0/0
    DEBUG: standby "standby" has now caught up with primary
    DEBUG: write 0/3027C18 flush 0/0 apply 0/3027BC8
    DEBUG: standby "standby" is now the synchronous replication standby
    DEBUG: released 0 procs up to 0/0
    DEBUG: write 0/3027C18 flush 0/3027C18 apply 0/3027BC8
    DEBUG: released 0 procs up to 0/3027C18
    DEBUG: write 0/3027C18 flush 0/3027C18 apply 0/3027C18
    DEBUG: released 0 procs up to 0/3027C18

    (lots more copies of those last two messages)

    I believe the problem is that the definition of IsOnSyncRepQueue is
    bogus, so that the loop in SyncRepWaitOnQueue always takes the first
    branch.

    It was a little confusing to me setting this up that setting only
    synchronous_replication did nothing; I had to also set
    synchronous_standby_names. We might need a cross-check there. I
    believe the docs for synchronous_replication also need some updating;
    this part appears to be out of date:

    + between primary and standby. The commit wait will last until the
    + first reply from any standby. Multiple standby servers allow
    + increased availability and possibly increase performance as well.

    The words "on the primary" in the next sentence may not be necessary
    any more either, as I believe this parameter now has no effect
    anywhere else.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Jaime Casanova at Feb 22, 2011 at 7:44 pm

    On Sat, Feb 19, 2011 at 11:26 PM, Robert Haas wrote:
    DEBUG:  write 0/3027BC8 flush 0/3014690 apply 0/3014690
    DEBUG:  released 0 procs up to 0/3014690
    DEBUG:  write 0/3027BC8 flush 0/3027BC8 apply 0/3014690
    DEBUG:  released 2 procs up to 0/3027BC8
    WARNING:  could not locate ourselves on wait queue
    server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
    The connection to the server was lost. Attempting reset: DEBUG:
    you can make this happen more easily, i just run "pgbench -n -c10 -j10
    test" and qot that warning and sometimes a segmentation fault and
    sometimes a failed assertion

    and the problematic code starts at
    src/backend/replication/syncrep.c:277, here my suggestions on that
    code.
    still i get a failed assertion because of the second Assert (i think
    we should just remove that one)

    *************** SyncRepRemoveFromQueue(void)
    *** 288,299 ****

    if (proc->lwWaitLink == NULL)
    elog(WARNING, "could not locate
    ourselves on wait queue");
    ! proc = proc->lwWaitLink;
    }

    if (proc->lwWaitLink == NULL) /* At tail */
    {
    ! Assert(proc == MyProc);
    /* Remove ourselves from tail of queue */
    Assert(queue->tail == MyProc);
    queue->tail = proc;
    --- 288,300 ----

    if (proc->lwWaitLink == NULL)
    elog(WARNING, "could not locate
    ourselves on wait queue");
    ! else
    ! proc = proc->lwWaitLink;
    }

    if (proc->lwWaitLink == NULL) /* At tail */
    {
    ! Assert(proc != MyProc);
    /* Remove ourselves from tail of queue */
    Assert(queue->tail == MyProc);
    queue->tail = proc;

    --
    Jaime Casanova         www.2ndQuadrant.com
    Professional PostgreSQL: Soporte y capacitación de PostgreSQL
  • Daniel Farina at Feb 25, 2011 at 2:13 am

    On Tue, Feb 22, 2011 at 11:43 AM, Jaime Casanova wrote:
    On Sat, Feb 19, 2011 at 11:26 PM, Robert Haas wrote:

    DEBUG:  write 0/3027BC8 flush 0/3014690 apply 0/3014690
    DEBUG:  released 0 procs up to 0/3014690
    DEBUG:  write 0/3027BC8 flush 0/3027BC8 apply 0/3014690
    DEBUG:  released 2 procs up to 0/3027BC8
    WARNING:  could not locate ourselves on wait queue
    server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
    The connection to the server was lost. Attempting reset: DEBUG:
    you can make this happen more easily, i just run "pgbench -n -c10 -j10
    test" and qot that warning and sometimes a segmentation fault and
    sometimes a failed assertion
    I have also reproduced this. Notably, things seem fine as long as
    pgbench is confined to one backend, but as soon as two are used (-c 2)
    by the feature I can get segfaults.

    In the UI department, I am finding it somewhat difficult to affirm
    that I am, in fact, synchronously replicating anything in the HA
    scenario (where I never want to block. However, by enjoying the patch
    at DEBUG3 and running what I think to be syncrepped and non-syncrepped
    runs, I believe that I am not committing user error (seeing syncrep
    waiting vs. lack thereof). This is in part hard to confirm because
    the single-backend performance (if DEBUG3 is to be believed, I will
    write a real torture test later) of syncrep is actually very good, I
    was expecting a more perceptible performance dropoff. But then again,
    I imagine the real kicker will happen when we can run concurrent
    backends. Also, Amazon EBS doesn't have the fastest disks, and within
    an availability zone network latency is awfully low.

    I can't quite explain what I was seeing before w.r.t. memory usage,
    and more pressingly, a very slow recover. I turned off hot standby and
    was messing around and, before I knew it, the server was caught up. I
    do not know if that was just coincidence (probably) or overhead
    imposed by HS. The very high RES number was linux fooling me, as most
    of it was SHR and in SHMEM.

    --
    fdr
  • Daniel Farina at Feb 25, 2011 at 2:26 am
    With some more fooling around, I have also managed to get this elog(WARNING)

    if (proc->lwWaitLink == NULL)
    elog(WARNING, "could not locate ourselves on wait queue");

    --
    fdr
  • Simon Riggs at Feb 28, 2011 at 7:17 pm

    On Thu, 2011-02-24 at 18:13 -0800, Daniel Farina wrote:

    I have also reproduced this. Notably, things seem fine as long as
    pgbench is confined to one backend, but as soon as two are used (-c 2)
    by the feature I can get segfaults.
    Sorry that you all experienced this. I wasn't able to get concurrent
    queue accesses even with -c 8, so I spent about half a day last week
    investigating a possible spinlock locking flaw. That meant the code in
    that area was untested, which is most obvious now. I guess that means I
    should test on different hardware in future.

    --
    Simon Riggs http://www.2ndQuadrant.com/books/
    PostgreSQL Development, 24x7 Support, Training and Services
  • Yeb Havinga at Feb 25, 2011 at 3:42 pm

    On 2011-02-22 20:43, Jaime Casanova wrote:
    you can make this happen more easily, i just run "pgbench -n -c10 -j10
    test" and qot that warning and sometimes a segmentation fault and
    sometimes a failed assertion

    and the problematic code starts at
    src/backend/replication/syncrep.c:277, here my suggestions on that
    code.
    still i get a failed assertion because of the second Assert (i think
    we should just remove that one)

    *************** SyncRepRemoveFromQueue(void)
    *** 288,299 ****

    if (proc->lwWaitLink == NULL)
    elog(WARNING, "could not locate
    ourselves on wait queue");
    ! proc = proc->lwWaitLink;
    }

    if (proc->lwWaitLink == NULL) /* At tail */
    {
    ! Assert(proc == MyProc);
    /* Remove ourselves from tail of queue */
    Assert(queue->tail == MyProc);
    queue->tail = proc;
    --- 288,300 ----

    if (proc->lwWaitLink == NULL)
    elog(WARNING, "could not locate
    ourselves on wait queue");
    ! else
    ! proc = proc->lwWaitLink;
    }

    if (proc->lwWaitLink == NULL) /* At tail */
    {
    ! Assert(proc != MyProc);
    /* Remove ourselves from tail of queue */
    Assert(queue->tail == MyProc);
    queue->tail = proc;
    I also did some initial testing on this patch and got the queue related
    errors with > 1 clients. With the code change from Jaime above I still
    got a lot of 'not on queue warnings'.

    I tried to understand how the queue was supposed to work - resulting in
    the changes below that also incorporates a suggestion from Fujii
    upthread, to early exit when myproc was found.

    With the changes below all seems to work without warnings. I now see
    that the note about the list invariant is too short, better was: "if
    queue length = 1 then head = tail"

    --- a/src/backend/replication/syncrep.c
    +++ b/src/backend/replication/syncrep.c
    @@ -274,6 +274,8 @@ SyncRepRemoveFromQueue(void)
    }
    else
    {
    + bool found = false;
    +
    while (proc->lwWaitLink != NULL)
    {
    /* Are we the next proc in our traversal of the
    queue? */
    @@ -284,17 +286,19 @@ SyncRepRemoveFromQueue(void)
    * No need to touch head or tail.
    */
    proc->lwWaitLink = MyProc->lwWaitLink;
    + found = true;
    + break;
    }

    - if (proc->lwWaitLink == NULL)
    - elog(WARNING, "could not locate
    ourselves on wait queue");
    proc = proc->lwWaitLink;
    }
    + if (!found)
    + elog(WARNING, "could not locate ourselves on
    wait queue");

    - if (proc->lwWaitLink == NULL) /* At tail */
    + /* If MyProc was removed from the tail, maintain list
    invariant head==tail */
    + if (proc->lwWaitLink == NULL)
    {
    - Assert(proc == MyProc);
    - /* Remove ourselves from tail of queue */
    + Assert(proc != MyProc); /* impossible since that
    is the head=MyProc branch above */
    Assert(queue->tail == MyProc);
    queue->tail = proc;
    proc->lwWaitLink = NULL;

    I needed to add this to make the documentation compile

    --- a/doc/src/sgml/config.sgml
    +++ b/doc/src/sgml/config.sgml
    @@ -2010,6 +2010,9 @@ SET ENABLE_SEQSCAN TO OFF;
    You should also consider setting <varname>hot_standby_feedback</>
    as an alternative to using this parameter.
    </para>
    + </listitem>
    + </varlistentry>
    + </variablelist></sect2>

    <sect2 id="runtime-config-sync-rep">

    regards,
    Yeb Havinga
  • Jaime Casanova at Feb 25, 2011 at 7:41 pm

    On Fri, Feb 25, 2011 at 10:41 AM, Yeb Havinga wrote:
    I also did some initial testing on this patch and got the queue related
    errors with > 1 clients. With the code change from Jaime above I still got a
    lot of 'not on queue warnings'.

    I tried to understand how the queue was supposed to work - resulting in the
    changes below that also incorporates a suggestion from Fujii upthread, to
    early exit when myproc was found.
    yes, looking at the code, the warning and your patch... it seems yours
    is the right solution...
    I'm compiling right now to test again and see the effects, Robert
    maybe you can test your failure case again? i'm really sure it's
    related to this...

    --
    Jaime Casanova         www.2ndQuadrant.com
    Professional PostgreSQL: Soporte y capacitación de PostgreSQL
  • Yeb Havinga at Feb 28, 2011 at 9:32 am

    On 2011-02-25 20:40, Jaime Casanova wrote:
    On Fri, Feb 25, 2011 at 10:41 AM, Yeb Havingawrote:
    I also did some initial testing on this patch and got the queue related
    errors with> 1 clients. With the code change from Jaime above I still got a
    lot of 'not on queue warnings'.

    I tried to understand how the queue was supposed to work - resulting in the
    changes below that also incorporates a suggestion from Fujii upthread, to
    early exit when myproc was found
    yes, looking at the code, the warning and your patch... it seems yours
    is the right solution...
    I'm compiling right now to test again and see the effects, Robert
    maybe you can test your failure case again? i'm really sure it's
    related to this...
    I did some more testing over the weekend with this patched v17 patch.
    Since you've posted a v18 patch, let me write some findings with the v17
    patch before continuing with the v18 patch.

    The tests were done on a x86_64 platform, 1Gbit network interfaces, 3
    servers. Non default configuration changes are copy pasted at the end of
    this mail.

    1) no automatic switch to other synchronous standby
    - start master server, add synchronous standby 1
    - change allow_standalone_primary to off
    - add second synchronous standby
    - wait until pg_stat_replication shows both standby's are in STREAMING state
    - stop standby 1
    what happens is that the master stalls, where I expected that it
    would've switched to standby 2 acknowledge commits.

    The following thing was pilot error, but since I was test-piloting a new
    plane, I still think it might be usual feedback. In my opinion, any
    number and order of pg_ctl stops and starts on both the master and
    standby servers, as long as they are not with -m immediate, should never
    cause the state I reached.

    2) reaching some sort of shutdown deadlock state
    - start master server, add synchronous standby
    - change allow_standalone_primary to off
    then I did all sorts of test things, everything still ok. Then I wanted
    to shutdown everything, and maybe because of some symmetry (stack like)
    I did the following because I didn't think it through
    - pg_ctl stop on standby (didn't actualy wait until done, but
    immediately in other terminal)
    - pg_ctl stop on master
    O wait.. master needs to sync transactions
    - start standby again. but now: FATAL: the database system is shutting down

    There is no clean way to get out of this situation.
    allow_standalone_primary in the face of shutdowns might be tricky. Maybe
    shutdown must be prohibited to enter the shutting down phase in
    allow_standalone_primary = off together with no sync standby, that would
    allow for the sync standby to attach again.

    3) PANIC on standby server
    At some point a standby suddenly disconnected after I started a new
    pgbench run on a existing master/standby pair, with the following error
    in the logfile.

    LOCATION: libpqrcv_connect, libpqwalreceiver.c:171
    PANIC: XX000: heap_update_redo: failed to add tuple
    CONTEXT: xlog redo hot_update: rel 1663/16411/16424; tid 305453/15; new
    305453/102
    LOCATION: heap_xlog_update, heapam.c:4724
    LOG: 00000: startup process (PID 32597) was terminated by signal 6: Aborted

    This might be due to pilot error as well; I did a several tests over the
    weekend and after this error I was more alert on remembering immediate
    shutdowns/starting with a clean backup after that, and didn't see
    similar errors since.

    4) The performance of the syncrep seems to be quite an improvement over
    the previous syncrep patches, I've seen tps-ses of O(650) where the
    others were more like O(20). The O(650) tps is limited by the speed of
    the standby server I used-at several times the master would halt only
    because of heavy disk activity at the standby. A warning in the docs
    might be right: be sure to use good IO hardware for your synchronous
    replicas! With that bottleneck gone, I suspect the current syncrep
    version can go beyond 1000tps over 1 Gbit.

    regards,
    Yeb Havinga

    recovery.conf:
    standby_mode = 'on'
    primary_conninfo = 'host=mg73 user=repuser password=pwd
    application_name=standby1'
    trigger_file = '/tmp/postgresql.trigger.5432'

    postgresql.conf nondefault parameters:
    log_error_verbosity = verbose
    log_min_messages = warning
    log_min_error_statement = warning
    listen_addresses = '*' # what IP address(es) to listen on;
    search_path='\"$user\", public, hl7'
    archive_mode = on
    archive_command = 'test ! -f /data/backup_in_progress || cp -i %p
    /archive/%f < /dev/null'
    checkpoint_completion_target = 0.9
    checkpoint_segments = 16
    default_statistics_target = 500
    constraint_exclusion = on
    max_connections = 120
    maintenance_work_mem = 128MB
    effective_cache_size = 1GB
    work_mem = 44MB
    wal_buffers = 8MB
    shared_buffers = 128MB
    wal_level = 'archive'
    max_wal_senders = 4
    wal_keep_segments = 1000 # 16000MB (for production increase this)
    synchronous_standby_names = 'standby1,standby2,standby3'
    synchronous_replication = on
    allow_standalone_primary = off
  • Simon Riggs at Feb 28, 2011 at 6:40 pm

    On Mon, 2011-02-28 at 10:31 +0100, Yeb Havinga wrote:

    1) no automatic switch to other synchronous standby
    - start master server, add synchronous standby 1
    - change allow_standalone_primary to off
    - add second synchronous standby
    - wait until pg_stat_replication shows both standby's are in STREAMING state
    - stop standby 1
    what happens is that the master stalls, where I expected that it
    would've switched to standby 2 acknowledge commits.

    The following thing was pilot error, but since I was test-piloting a new
    plane, I still think it might be usual feedback. In my opinion, any
    number and order of pg_ctl stops and starts on both the master and
    standby servers, as long as they are not with -m immediate, should never
    cause the state I reached.
    The behaviour of "allow_synchronous_standby = off" is pretty much
    untested and does seem to have various gotchas in there.
    2) reaching some sort of shutdown deadlock state
    - start master server, add synchronous standby
    - change allow_standalone_primary to off
    then I did all sorts of test things, everything still ok. Then I wanted
    to shutdown everything, and maybe because of some symmetry (stack like)
    I did the following because I didn't think it through
    - pg_ctl stop on standby (didn't actualy wait until done, but
    immediately in other terminal)
    - pg_ctl stop on master
    O wait.. master needs to sync transactions
    - start standby again. but now: FATAL: the database system is shutting down

    There is no clean way to get out of this situation.
    allow_standalone_primary in the face of shutdowns might be tricky. Maybe
    shutdown must be prohibited to enter the shutting down phase in
    allow_standalone_primary = off together with no sync standby, that would
    allow for the sync standby to attach again.
    The behaviour of "allow_synchronous_standby = off" is not something I'm
    worried about personally and I've argued all along it sounds pretty
    silly to me. If someone wants to spend some time defining how it
    *should* work that might help matters. I'm inclined to remove it before
    commit if it can't work cleanly, to be re-added at a later date if it
    makes sense.
    3) PANIC on standby server
    At some point a standby suddenly disconnected after I started a new
    pgbench run on a existing master/standby pair, with the following error
    in the logfile.

    LOCATION: libpqrcv_connect, libpqwalreceiver.c:171
    PANIC: XX000: heap_update_redo: failed to add tuple
    CONTEXT: xlog redo hot_update: rel 1663/16411/16424; tid 305453/15; new
    305453/102
    LOCATION: heap_xlog_update, heapam.c:4724
    LOG: 00000: startup process (PID 32597) was terminated by signal 6: Aborted

    This might be due to pilot error as well; I did a several tests over the
    weekend and after this error I was more alert on remembering immediate
    shutdowns/starting with a clean backup after that, and didn't see
    similar errors since.
    Good. There are no changes in the patch for that section of code.
    4) The performance of the syncrep seems to be quite an improvement over
    the previous syncrep patches, I've seen tps-ses of O(650) where the
    others were more like O(20). The O(650) tps is limited by the speed of
    the standby server I used-at several times the master would halt only
    because of heavy disk activity at the standby. A warning in the docs
    might be right: be sure to use good IO hardware for your synchronous
    replicas! With that bottleneck gone, I suspect the current syncrep
    version can go beyond 1000tps over 1 Gbit.
    Good, thanks.

    --
    Simon Riggs http://www.2ndQuadrant.com/books/
    PostgreSQL Development, 24x7 Support, Training and Services
  • Simon Riggs at Feb 28, 2011 at 6:41 pm

    On Fri, 2011-02-25 at 16:41 +0100, Yeb Havinga wrote:

    --- a/src/backend/replication/syncrep.c
    +++ b/src/backend/replication/syncrep.c
    @@ -274,6 +274,8 @@ SyncRepRemoveFromQueue(void)
    }
    else
    {
    + bool found = false;
    +
    while (proc->lwWaitLink != NULL)
    {
    /* Are we the next proc in our traversal of the
    queue? */
    @@ -284,17 +286,19 @@ SyncRepRemoveFromQueue(void)
    * No need to touch head or tail.
    */
    proc->lwWaitLink = MyProc->lwWaitLink;
    + found = true;
    + break;
    }

    - if (proc->lwWaitLink == NULL)
    - elog(WARNING, "could not locate
    ourselves on wait queue");
    proc = proc->lwWaitLink;
    }
    + if (!found)
    + elog(WARNING, "could not locate ourselves on
    wait queue");

    - if (proc->lwWaitLink == NULL) /* At tail */
    + /* If MyProc was removed from the tail, maintain list
    invariant head==tail */
    + if (proc->lwWaitLink == NULL)
    {
    - Assert(proc == MyProc);
    - /* Remove ourselves from tail of queue */
    + Assert(proc != MyProc); /* impossible since that
    is the head=MyProc branch above */
    Assert(queue->tail == MyProc);
    queue->tail = proc;
    proc->lwWaitLink = NULL;
    Used your suggested fix
    Code available at git://github.com/simon2ndQuadrant/postgres.git
    I needed to add this to make the documentation compile

    --- a/doc/src/sgml/config.sgml
    +++ b/doc/src/sgml/config.sgml
    @@ -2010,6 +2010,9 @@ SET ENABLE_SEQSCAN TO OFF;
    You should also consider setting <varname>hot_standby_feedback</>
    as an alternative to using this parameter.
    </para>
    + </listitem>
    + </varlistentry>
    + </variablelist></sect2>

    <sect2 id="runtime-config-sync-rep">
    Separate bug, will fix

    --
    Simon Riggs http://www.2ndQuadrant.com/books/
    PostgreSQL Development, 24x7 Support, Training and Services
  • Simon Riggs at Feb 28, 2011 at 7:57 pm

    On Fri, 2011-02-25 at 16:41 +0100, Yeb Havinga wrote:

    I needed to add this to make the documentation compile

    --- a/doc/src/sgml/config.sgml
    +++ b/doc/src/sgml/config.sgml
    @@ -2010,6 +2010,9 @@ SET ENABLE_SEQSCAN TO OFF;
    You should also consider setting
    <varname>hot_standby_feedback</>
    as an alternative to using this parameter.
    </para>
    + </listitem>
    + </varlistentry>
    + </variablelist></sect2>

    <sect2 id="runtime-config-sync-rep">
    Corrected, thanks.

    --
    Simon Riggs http://www.2ndQuadrant.com/books/
    PostgreSQL Development, 24x7 Support, Training and Services
  • Simon Riggs at Feb 28, 2011 at 9:14 pm

    On Sat, 2011-02-19 at 23:26 -0500, Robert Haas wrote:

    I believe the problem is that the definition of IsOnSyncRepQueue is
    bogus, so that the loop in SyncRepWaitOnQueue always takes the first
    branch.
    Sorry, don't see that. Jaime/Yeb fix applied.
    It was a little confusing to me setting this up that setting only
    synchronous_replication did nothing; I had to also set
    synchronous_standby_names. We might need a cross-check there.
    I'm inclined to make an empty "synchronous_standby_names" mean that any
    standby can become the sync standby. That seems more useful behaviour
    and avoids the need for a cross-check (what exactly would we check??).
    I
    believe the docs for synchronous_replication also need some updating;
    this part appears to be out of date:

    + between primary and standby. The commit wait will last until
    the
    + first reply from any standby. Multiple standby servers allow
    + increased availability and possibly increase performance as
    well. Agreed
    The words "on the primary" in the next sentence may not be necessary
    any more either, as I believe this parameter now has no effect
    anywhere else.
    Agreed

    Docs changed: git://github.com/simon2ndQuadrant/postgres.git

    --
    Simon Riggs http://www.2ndQuadrant.com/books/
    PostgreSQL Development, 24x7 Support, Training and Services
  • Robert Haas at Feb 28, 2011 at 9:22 pm

    On Mon, Feb 28, 2011 at 4:13 PM, Simon Riggs wrote:
    On Sat, 2011-02-19 at 23:26 -0500, Robert Haas wrote:

    I believe the problem is that the definition of IsOnSyncRepQueue is
    bogus, so that the loop in SyncRepWaitOnQueue always takes the first
    branch.
    Sorry, don't see that. Jaime/Yeb fix applied.
    It was a little confusing to me setting this up that setting only
    synchronous_replication did nothing; I had to also set
    synchronous_standby_names.  We might need a cross-check there.
    I'm inclined to make an empty "synchronous_standby_names" mean that any
    standby can become the sync standby. That seems more useful behaviour
    and avoids the need for a cross-check (what exactly would we check??).
    Hmm, that is a little surprising but might be reasonable. My thought
    was that we would check that if synchronous_replication=on then
    synchronous_standbys must be non-empty. I think there ought to be
    some way for the admin to turn synchronous replication *off* though,
    in a way that an individual user cannot override. How will we do
    that?
    Docs changed: git://github.com/simon2ndQuadrant/postgres.git
    I'm hoping you're going to post an updated patch once the current rash
    of updates is all done.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Simon Riggs at Feb 28, 2011 at 9:36 pm

    On Mon, 2011-02-28 at 16:22 -0500, Robert Haas wrote:

    Docs changed: git://github.com/simon2ndQuadrant/postgres.git
    I'm hoping you're going to post an updated patch once the current rash
    of updates is all done.
    Immediately prior to commit, yes.

    Everybody else has been nudging me towards developing in public view,
    commit by commit on a public repo. So that's what I'm doing now, as
    promised earlier. That should help people object to specific commits if
    they no likey.

    --
    Simon Riggs http://www.2ndQuadrant.com/books/
    PostgreSQL Development, 24x7 Support, Training and Services
  • Robert Haas at Feb 28, 2011 at 9:55 pm

    On Mon, Feb 28, 2011 at 4:36 PM, Simon Riggs wrote:
    On Mon, 2011-02-28 at 16:22 -0500, Robert Haas wrote:

    Docs changed: git://github.com/simon2ndQuadrant/postgres.git
    I'm hoping you're going to post an updated patch once the current rash
    of updates is all done.
    Immediately prior to commit, yes.

    Everybody else has been nudging me towards developing in public view,
    commit by commit on a public repo. So that's what I'm doing now, as
    promised earlier. That should help people object to specific commits if
    they no likey.
    It took a few days for the problems with the last version to shake
    out. I think you should give people about that much time again. It's
    not realistic to suppose that everyone will follow your git repo in
    detail.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Simon Riggs at Feb 28, 2011 at 10:41 pm

    On Mon, 2011-02-28 at 16:55 -0500, Robert Haas wrote:
    On Mon, Feb 28, 2011 at 4:36 PM, Simon Riggs wrote:
    On Mon, 2011-02-28 at 16:22 -0500, Robert Haas wrote:

    Docs changed: git://github.com/simon2ndQuadrant/postgres.git
    I'm hoping you're going to post an updated patch once the current rash
    of updates is all done.
    Immediately prior to commit, yes.

    Everybody else has been nudging me towards developing in public view,
    commit by commit on a public repo. So that's what I'm doing now, as
    promised earlier. That should help people object to specific commits if
    they no likey.
    It took a few days for the problems with the last version to shake
    out. I think you should give people about that much time again. It's
    not realistic to suppose that everyone will follow your git repo in
    detail.
    Yeh, I'm not rushing to commit. And even afterwards I expect comments
    that will mean I'm editing this for next month at least.

    --
    Simon Riggs http://www.2ndQuadrant.com/books/
    PostgreSQL Development, 24x7 Support, Training and Services
  • Bruce Momjian at Feb 19, 2011 at 2:39 am

    Simon Riggs wrote:
    Most parameters are set on the primary. Set

    primary: synchronous_standby_names = 'node1, node2, node3'

    which means that whichever of those standbys connect first will
    become the main synchronous standby. Servers arriving later will
    be potential standbys (standby standbys doesn't sound right...).
    synchronous_standby_names can change at reload.

    Currently, the standby_name is the application_name parameter
    set in the primary_conninfo.

    When we set this for a client, or in postgresql.conf

    primary: synchronous_replication = on

    then we will wait at commit until the synchronous standby has
    reached the WAL location of our commit point.

    If the current synchronous standby dies then one of the other standbys
    will take over. (I think it would be a great idea to make the
    list a priority order, but I haven't had time to code that).

    If none of the standbys are available, then we don't wait at all
    if allow_standalone_primary is set.
    allow_standalone_primary can change at reload.
    Are we going to allow a command to be run when these things happen, like
    archive_command does, or is that something for 9.2?

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://enterprisedb.com

    + It's impossible for everything to be true. +
  • Simon Riggs at Feb 19, 2011 at 8:29 am

    On Fri, 2011-02-18 at 21:39 -0500, Bruce Momjian wrote:
    Simon Riggs wrote:
    Most parameters are set on the primary. Set

    primary: synchronous_standby_names = 'node1, node2, node3'

    which means that whichever of those standbys connect first will
    become the main synchronous standby. Servers arriving later will
    be potential standbys (standby standbys doesn't sound right...).
    synchronous_standby_names can change at reload.

    Currently, the standby_name is the application_name parameter
    set in the primary_conninfo.

    When we set this for a client, or in postgresql.conf

    primary: synchronous_replication = on

    then we will wait at commit until the synchronous standby has
    reached the WAL location of our commit point.

    If the current synchronous standby dies then one of the other standbys
    will take over. (I think it would be a great idea to make the
    list a priority order, but I haven't had time to code that).

    If none of the standbys are available, then we don't wait at all
    if allow_standalone_primary is set.
    allow_standalone_primary can change at reload.
    Are we going to allow a command to be run when these things happen, like
    archive_command does, or is that something for 9.2?
    Not really sure which events you're speaking of, sorry. As I write, I
    don't see a place or a reason to run a command, but that might change
    with a longer explanation of what you're thinking. I wouldn't rule out
    adding something like that in this release, but I'm not around for the
    next week to add it.

    --
    Simon Riggs http://www.2ndQuadrant.com/books/
    PostgreSQL Development, 24x7 Support, Training and Services
  • Bruce Momjian at Feb 19, 2011 at 1:23 pm

    Simon Riggs wrote:
    On Fri, 2011-02-18 at 21:39 -0500, Bruce Momjian wrote:
    Simon Riggs wrote:
    Most parameters are set on the primary. Set

    primary: synchronous_standby_names = 'node1, node2, node3'

    which means that whichever of those standbys connect first will
    become the main synchronous standby. Servers arriving later will
    be potential standbys (standby standbys doesn't sound right...).
    synchronous_standby_names can change at reload.

    Currently, the standby_name is the application_name parameter
    set in the primary_conninfo.

    When we set this for a client, or in postgresql.conf

    primary: synchronous_replication = on

    then we will wait at commit until the synchronous standby has
    reached the WAL location of our commit point.

    If the current synchronous standby dies then one of the other standbys
    will take over. (I think it would be a great idea to make the
    list a priority order, but I haven't had time to code that).

    If none of the standbys are available, then we don't wait at all
    if allow_standalone_primary is set.
    allow_standalone_primary can change at reload.
    Are we going to allow a command to be run when these things happen, like
    archive_command does, or is that something for 9.2?
    Not really sure which events you're speaking of, sorry. As I write, I
    don't see a place or a reason to run a command, but that might change
    with a longer explanation of what you're thinking. I wouldn't rule out
    adding something like that in this release, but I'm not around for the
    next week to add it.
    Sorry. I was thinking of allowing a command to alert an administrator
    when we switch standby machines, or if we can't do synchronous standby
    any longer. I assume we put a message in the logs, and the admin could
    have a script that monitors that, but I thought a pager-like command
    would be helpful, though I don't think we do that in any other case, so
    maybe it is overkill. These are all optional ideas.

    Also, I like that you are using application name here.

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://enterprisedb.com

    + It's impossible for everything to be true. +
  • Marti Raudsepp at Feb 19, 2011 at 9:06 pm

    On Sat, Feb 19, 2011 at 15:23, Bruce Momjian wrote:
    Sorry.  I was thinking of allowing a command to alert an administrator
    when we switch standby machines, or if we can't do synchronous standby
    any longer.  I assume we put a message in the logs, and the admin could
    have a script that monitors that, but I thought a pager-like command
    would be helpful
    For asynchronous notifications we already have LISTEN/NOTIFY. Perhaps
    that could be used?

    Regards,
    Marti
  • Bruce Momjian at Feb 19, 2011 at 11:05 pm

    Marti Raudsepp wrote:
    On Sat, Feb 19, 2011 at 15:23, Bruce Momjian wrote:
    Sorry. ?I was thinking of allowing a command to alert an administrator
    when we switch standby machines, or if we can't do synchronous standby
    any longer. ?I assume we put a message in the logs, and the admin could
    have a script that monitors that, but I thought a pager-like command
    would be helpful
    For asynchronous notifications we already have LISTEN/NOTIFY. Perhaps
    that could be used?
    Well, those are going to require work to notify someone externally, like
    send an email alert.

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://enterprisedb.com

    + It's impossible for everything to be true. +
  • Jaime Casanova at Feb 19, 2011 at 11:34 pm

    On Sat, Feb 19, 2011 at 6:05 PM, Bruce Momjian wrote:
    Marti Raudsepp wrote:
    On Sat, Feb 19, 2011 at 15:23, Bruce Momjian wrote:
    Sorry. ?I was thinking of allowing a command to alert an administrator
    when we switch standby machines, or if we can't do synchronous standby
    any longer. ?I assume we put a message in the logs, and the admin could
    have a script that monitors that, but I thought a pager-like command
    would be helpful
    For asynchronous notifications we already have LISTEN/NOTIFY. Perhaps
    that could be used?
    Well, those are going to require work to notify someone externally, like
    send an email alert.
    although, that seems the work for a monitor tool.

    --
    Jaime Casanova         www.2ndQuadrant.com
    Professional PostgreSQL: Soporte y capacitación de PostgreSQL
  • Josh Berkus at Feb 19, 2011 at 8:09 pm

    On 2/18/11 4:06 PM, Simon Riggs wrote:
    v17 implements what I believe to be the final set of features for sync
    rep. This one I'm actually fairly happy with. It can be enjoyed best at
    DEBUG3.
    Yes, but what wine do I serve with it? ;-)


    --
    -- Josh Berkus
    PostgreSQL Experts Inc.
    http://www.pgexperts.com
  • Fujii Masao at Feb 21, 2011 at 9:06 am

    On Sat, Feb 19, 2011 at 9:06 AM, Simon Riggs wrote:
    Well, good news all round.

    v17 implements what I believe to be the final set of features for sync
    rep. This one I'm actually fairly happy with. It can be enjoyed best at
    DEBUG3.

    The patch is very lite touch on a few areas of code, plus a chunk of
    specific code, all on master-side. Pretty straight really. I'm sure
    problems will be found, its not long since I completed this; thanks to
    Daniel Farina for your help with patch assembly.

    Which is just as well, because the other news is that I'm off on holiday
    for a few days, which is most inconvenient. I won't be committing this
    for at least a week and absent from the list. OTOH, I think its ready
    for a final review and commit, so I'm OK if you commit or OK if you
    leave it for me.
    Thanks for the patch!

    Here are the comments:


    PREPARE TRANSACTION and ROLLBACK PREPARED should wait for
    replication as well as COMMIT PREPARED?

    What if fast shutdown is requested while RecordTransactionCommit
    is waiting in SyncRepWaitForLSN? ISTM fast shutdown cannot complete
    until replication has been successfully done (i.e., until at least one
    synchronous standby has connected to the master especially if
    allow_standalone_primary is disabled). Is this OK?

    We should emit WARNING when the synchronous standby with
    wal_receiver_status_interval = 0 connects to the master. Because,
    in that case, a transaction unexpectedly would wait for replication
    infinitely.

    + /* Need a modifiable copy of string */
    + rawstring = pstrdup(SyncRepStandbyNames);
    +
    + /* Parse string into list of identifiers */
    + if (!SplitIdentifierString(rawstring, ',', &elemlist))

    pfree(rawstring) and list_free(elemlist) should be called also if
    SplitIdentifierString returns TRUE. Otherwise, memory-leak would
    happen.

    + ereport(FATAL,
    + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
    + errmsg("invalid list syntax for parameter
    \"synchronous_standby_names\"")));
    + return false;

    "return false" is not required here though that might be harmless.


    I've read about a tenth of the patch, so I'll submit another comments
    about the rest later.

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
  • Fujii Masao at Feb 22, 2011 at 5:38 am

    On Mon, Feb 21, 2011 at 6:06 PM, Fujii Masao wrote:
    I've read about a tenth of the patch, so I'll submit another comments
    about the rest later.
    Here are another comments:

    SyncRepReleaseWaiters should be called when walsender exits. Otherwise,
    if the standby crashes while a transaction is waiting for replication,
    it waits infinitely.

    sync_rep_service and potential_sync_standby are not required to be in the
    WalSnd shmem because only walsender accesses them.

    +static bool
    +SyncRepServiceAvailable(void)
    +{
    + bool result = false;
    +
    + SpinLockAcquire(&WalSndCtl->ctlmutex);
    + result = WalSndCtl->sync_rep_service_available;
    + SpinLockRelease(&WalSndCtl->ctlmutex);
    +
    + return result;
    +}

    volatile pointer needs to be used to prevent code rearrangement.

    + slock_t ctlmutex; /* locks shared variables shown above */

    cltmutex should be initialized by calling SpinLockInit.

    + /*
    + * Stop providing the sync rep service, even if there are
    + * waiting backends.
    + */
    + {
    + SpinLockAcquire(&WalSndCtl->ctlmutex);
    + WalSndCtl->sync_rep_service_available = false;
    + SpinLockRelease(&WalSndCtl->ctlmutex);
    + }

    sync_rep_service_available should be set to false only when
    there is no synchronous walsender.

    + /*
    + * When we first start replication the standby will be behind the primary.
    + * For some applications, for example, synchronous replication, it is
    + * important to have a clear state for this initial catchup mode, so we
    + * can trigger actions when we change streaming state later. We may stay
    + * in this state for a long time, which is exactly why we want to be
    + * able to monitor whether or not we are still here.
    + */
    + WalSndSetState(WALSNDSTATE_CATCHUP);
    +

    The above has already been committed. Please remove that from the patch.

    I don't like calling SyncRepReleaseWaiters for each feedback because
    I guess that it's too frequent. How about receiving all the feedbacks available
    from the socket, and then calling SyncRepReleaseWaiters as well as
    walreceiver does?

    + bool ownLatch; /* do we own the above latch? */

    We can just remove the ownLatch flag.


    I've read about two-tenths of the patch, so I'll submit another comments
    about the rest later. Sorry for the slow reviewing...

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
  • Marti Raudsepp at Feb 22, 2011 at 8:59 am

    On Tue, Feb 22, 2011 at 07:38, Fujii Masao wrote:
    +       SpinLockAcquire(&WalSndCtl->ctlmutex);
    +       result = WalSndCtl->sync_rep_service_available;
    +       SpinLockRelease(&WalSndCtl->ctlmutex);
    volatile pointer needs to be used to prevent code rearrangement.
    I don't think that's necessary. Spinlock functions already prevent
    reordering using __asm__ __volatile__

    Otherwise, surely they would be utterly broken?

    Regards,
    Marti
  • Andres Freund at Feb 22, 2011 at 10:09 am

    On Tuesday 22 February 2011 09:59:21 Marti Raudsepp wrote:
    On Tue, Feb 22, 2011 at 07:38, Fujii Masao wrote:
    + SpinLockAcquire(&WalSndCtl->ctlmutex);
    + result = WalSndCtl->sync_rep_service_available;
    + SpinLockRelease(&WalSndCtl->ctlmutex);

    volatile pointer needs to be used to prevent code rearrangement.
    I don't think that's necessary. Spinlock functions already prevent
    reordering using __asm__ __volatile__

    Otherwise, surely they would be utterly broken?
    Its not the spinlock thats the problem but that "result" may be already loaded
    into a register. Thats not prohibited by __asm__ __volatile__.

    Andres
  • Tom Lane at Feb 22, 2011 at 3:41 pm

    Marti Raudsepp writes:
    On Tue, Feb 22, 2011 at 07:38, Fujii Masao wrote:
    + SpinLockAcquire(&WalSndCtl->ctlmutex);
    + result = WalSndCtl->sync_rep_service_available;
    + SpinLockRelease(&WalSndCtl->ctlmutex);
    volatile pointer needs to be used to prevent code rearrangement.
    I don't think that's necessary. Spinlock functions already prevent
    reordering using __asm__ __volatile__
    You're mistaken. We started using that volatile-pointer convention
    after noting that some compilers would misoptimize the code otherwise.

    It's not a problem with LWLock-protected stuff because the LWLock calls
    are actual out-of-line function calls, and the compiler knows it doesn't
    know what those functions might do. But gcc is a lot more willing to
    reorder stuff around asm operations, so you can't assume that
    SpinLockAcquire/SpinLockRelease are equally safe. The way to prevent
    optimization bugs is to make sure that the fetches/stores protected by a
    spinlock are done through volatile pointers.

    regards, tom lane
  • Fujii Masao at Feb 24, 2011 at 1:08 pm

    On Tue, Feb 22, 2011 at 2:38 PM, Fujii Masao wrote:
    I've read about two-tenths of the patch, so I'll submit another comments
    about the rest later. Sorry for the slow reviewing...
    Here are another comments:

    + {"synchronous_standby_names", PGC_SIGHUP, WAL_REPLICATION,
    + gettext_noop("List of potential standby names to synchronise with."),
    + NULL,
    + GUC_LIST_INPUT | GUC_IS_NAME

    Why did you add GUC_IS_NAME here? I don't think that it's reasonable
    to limit the length of this parameter to 63. Because dozens of standby
    names might be specified in the parameter.

    SyncRepQueue->qlock should be initialized by calling SpinLockInit?

    + * Portions Copyright (c) 2010-2010, PostgreSQL Global Development Group

    Typo: s/2010/2011

    sync_replication_timeout_client would mess up the "wait-forever" option.
    So, when allow_standalone_primary is disabled, ISTM that
    sync_replication_timeout_client should have no effect.

    Please check max_wal_senders before calling SyncRepWaitForLSN for
    non-replication case.

    SyncRepRemoveFromQueue seems not to be as short-term as we can
    use the spinlock. Instead, LW lock should be used there.

    + old_status = get_ps_display(&len);
    + new_status = (char *) palloc(len + 21 + 1);
    + memcpy(new_status, old_status, len);
    + strcpy(new_status + len, " waiting for sync rep");
    + set_ps_display(new_status, false);
    + new_status[len] = '\0'; /* truncate off " waiting" */

    Updating the PS display should be skipped if update_process_title is false.

    + /*
    + * XXX extra code needed here to maintain sorted invariant.

    Yeah, such a code is required. I think that we can shorten the time
    it takes to find an insert position by searching the list backwards.
    Because the given LSN is expected to be relatively new in the queue.

    + * Our approach should be same as racing car - slow in, fast out.
    + */

    Really? Even when removing the entry from the queue, we need
    to search the queue as well as we do in the add-entry case.
    Why don't you make walsenders remove the entry from the queue,
    instead?

    + long timeout = SyncRepGetWaitTimeout();
    <snip>
    + bool timeout = false;
    <snip>
    + else if (timeout > 0 &&
    + TimestampDifferenceExceeds(GetCurrentTransactionStopTimestamp(),
    + now, timeout))
    + {
    + release = true;
    + timeout = true;
    + }

    You seem to mix up two "timeout" variables.

    + if (proc->lwWaitLink == MyProc)
    + {
    + /*
    + * Remove ourselves from middle of queue.
    + * No need to touch head or tail.
    + */
    + proc->lwWaitLink = MyProc->lwWaitLink;
    + }

    When we find ourselves, we should break out of the loop soon,
    instead of continuing the loop to the end?

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
  • Simon Riggs at Feb 28, 2011 at 6:41 pm

    On Thu, 2011-02-24 at 22:08 +0900, Fujii Masao wrote:
    On Tue, Feb 22, 2011 at 2:38 PM, Fujii Masao wrote:
    I've read about two-tenths of the patch, so I'll submit another comments
    about the rest later. Sorry for the slow reviewing...
    Here are another comments:
    Thanks for your comments
    Code available at git://github.com/simon2ndQuadrant/postgres.git
    + {"synchronous_standby_names", PGC_SIGHUP, WAL_REPLICATION,
    + gettext_noop("List of potential standby names to synchronise with."),
    + NULL,
    + GUC_LIST_INPUT | GUC_IS_NAME

    Why did you add GUC_IS_NAME here? I don't think that it's reasonable
    to limit the length of this parameter to 63. Because dozens of standby
    names might be specified in the parameter.
    OK, misunderstanding by me causing bug. Fixed
    SyncRepQueue->qlock should be initialized by calling SpinLockInit? Fixed
    + * Portions Copyright (c) 2010-2010, PostgreSQL Global Development Group

    Typo: s/2010/2011 Fixed
    sync_replication_timeout_client would mess up the "wait-forever" option.
    So, when allow_standalone_primary is disabled, ISTM that
    sync_replication_timeout_client should have no effect.
    Agreed, done.
    Please check max_wal_senders before calling SyncRepWaitForLSN for
    non-replication case.
    SyncRepWaitForLSN() handles this
    SyncRepRemoveFromQueue seems not to be as short-term as we can
    use the spinlock. Instead, LW lock should be used there.

    + old_status = get_ps_display(&len);
    + new_status = (char *) palloc(len + 21 + 1);
    + memcpy(new_status, old_status, len);
    + strcpy(new_status + len, " waiting for sync rep");
    + set_ps_display(new_status, false);
    + new_status[len] = '\0'; /* truncate off " waiting" */

    Updating the PS display should be skipped if update_process_title is false. Fixed.
    + /*
    + * XXX extra code needed here to maintain sorted invariant.

    Yeah, such a code is required. I think that we can shorten the time
    it takes to find an insert position by searching the list backwards.
    Because the given LSN is expected to be relatively new in the queue.
    Sure, just skipped that because of time pressure. Will add.
    + * Our approach should be same as racing car - slow in, fast out.
    + */

    Really? Even when removing the entry from the queue, we need
    to search the queue as well as we do in the add-entry case.
    Why don't you make walsenders remove the entry from the queue,
    instead?
    This models wakeup behaviour of LWlocks
    + long timeout = SyncRepGetWaitTimeout();
    <snip>
    + bool timeout = false;
    <snip>
    + else if (timeout > 0 &&
    + TimestampDifferenceExceeds(GetCurrentTransactionStopTimestamp(),
    + now, timeout))
    + {
    + release = true;
    + timeout = true;
    + }

    You seem to mix up two "timeout" variables.
    Yes, good catch. Fixed.
    + if (proc->lwWaitLink == MyProc)
    + {
    + /*
    + * Remove ourselves from middle of queue.
    + * No need to touch head or tail.
    + */
    + proc->lwWaitLink = MyProc->lwWaitLink;
    + }

    When we find ourselves, we should break out of the loop soon,
    instead of continuing the loop to the end?
    Incorporated in Yeb's patch

    --
    Simon Riggs http://www.2ndQuadrant.com/books/
    PostgreSQL Development, 24x7 Support, Training and Services
  • Fujii Masao at Mar 1, 2011 at 6:52 am
    Thanks for update of the patch!
    On Tue, Mar 1, 2011 at 3:40 AM, Simon Riggs wrote:
    SyncRepRemoveFromQueue seems not to be as short-term as we can
    use the spinlock. Instead, LW lock should be used there.
    You seem to have forgotten to fix the above-mentioned issue.
    A spinlock can be used only for very short-term operation like
    read/write of some shared-variables. The operation on the queue
    is not short, so should be protected by LWLock, I think.

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
  • Simon Riggs at Mar 1, 2011 at 8:21 am

    On Tue, 2011-03-01 at 15:51 +0900, Fujii Masao wrote:
    Thanks for update of the patch!
    On Tue, Mar 1, 2011 at 3:40 AM, Simon Riggs wrote:
    SyncRepRemoveFromQueue seems not to be as short-term as we can
    use the spinlock. Instead, LW lock should be used there.
    You seem to have forgotten to fix the above-mentioned issue.
    Not forgotten.
    A spinlock can be used only for very short-term operation like
    read/write of some shared-variables. The operation on the queue
    is not short, so should be protected by LWLock, I think.
    There's no need to sleep while holding locks and the operations are very
    short in most cases. The code around it isn't trivial, but that's no
    reason to use LWlocks.

    LWlocks are just spinlocks plus sem sleeps, so I don't see the need for
    that in the current code. Other views welcome.

    --
    Simon Riggs http://www.2ndQuadrant.com/books/
    PostgreSQL Development, 24x7 Support, Training and Services
  • Fujii Masao at Mar 1, 2011 at 9:13 am

    On Tue, Mar 1, 2011 at 5:21 PM, Simon Riggs wrote:
    A spinlock can be used only for very short-term operation like
    read/write of some shared-variables. The operation on the queue
    is not short, so should be protected by LWLock, I think.
    There's no need to sleep while holding locks and the operations are very
    short in most cases. The code around it isn't trivial, but that's no
    reason to use LWlocks.

    LWlocks are just spinlocks plus sem sleeps, so I don't see the need for
    that in the current code. Other views welcome.
    The following description in in src/backend/storage/lmgr/README
    leads me to further think that the spinlock should not be used there.
    Because, in the patch, while the spinlock is being held, whole of the
    waiting list (if there are many concurrent running transactions, this
    might be large) can be searched, SetLatch can be called and
    elog(WARNING) can be called (though this should not happen).

    ----------------
    * Spinlocks. These are intended for *very* short-term locks. If a lock
    is to be held more than a few dozen instructions, or across any sort of
    kernel call (or even a call to a nontrivial subroutine), don't use a
    spinlock. Spinlocks are primarily used as infrastructure for lightweight
    locks. They are implemented using a hardware atomic-test-and-set
    instruction, if available. Waiting processes busy-loop until they can
    get the lock. There is no provision for deadlock detection, automatic
    release on error, or any other nicety. There is a timeout if the lock
    cannot be gotten after a minute or so (which is approximately forever in
    comparison to the intended lock hold time, so this is certainly an error
    condition).
    ----------------

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
  • Robert Haas at Mar 1, 2011 at 1:20 pm

    On Tue, Mar 1, 2011 at 3:21 AM, Simon Riggs wrote:
    On Tue, 2011-03-01 at 15:51 +0900, Fujii Masao wrote:
    Thanks for update of the patch!
    On Tue, Mar 1, 2011 at 3:40 AM, Simon Riggs wrote:
    SyncRepRemoveFromQueue seems not to be as short-term as we can
    use the spinlock. Instead, LW lock should be used there.
    You seem to have forgotten to fix the above-mentioned issue.
    Not forgotten.
    A spinlock can be used only for very short-term operation like
    read/write of some shared-variables. The operation on the queue
    is not short, so should be protected by LWLock, I think.
    There's no need to sleep while holding locks and the operations are very
    short in most cases. The code around it isn't trivial, but that's no
    reason to use LWlocks.

    LWlocks are just spinlocks plus sem sleeps, so I don't see the need for
    that in the current code. Other views welcome.
    An LWLock is a lot safer, in general, than a spinlock. A spinlock
    mustn't do anything that could emit an error or abort (among other
    things). I doubt that the performance cost of using an LWLock rather
    than a spin lock here is enough to matter, and the spin lock seems
    more likely to result in hard-to-find bugs.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Tom Lane at Mar 1, 2011 at 3:16 pm

    Robert Haas writes:
    On Tue, Mar 1, 2011 at 3:21 AM, Simon Riggs wrote:
    LWlocks are just spinlocks plus sem sleeps, so I don't see the need for
    that in the current code. Other views welcome.
    An LWLock is a lot safer, in general, than a spinlock. A spinlock
    mustn't do anything that could emit an error or abort (among other
    things). I doubt that the performance cost of using an LWLock rather
    than a spin lock here is enough to matter, and the spin lock seems
    more likely to result in hard-to-find bugs.
    Well, stuck spinlocks aren't exactly hard to identify. But I agree that
    the lack of any release-on-error infrastructure is a killer reason not
    to use a spinlock for anything but short straight-line code segments.

    regards, tom lane
  • Tom Lane at Mar 1, 2011 at 3:02 pm

    Simon Riggs writes:
    On Tue, 2011-03-01 at 15:51 +0900, Fujii Masao wrote:
    A spinlock can be used only for very short-term operation like
    read/write of some shared-variables. The operation on the queue
    is not short, so should be protected by LWLock, I think.
    There's no need to sleep while holding locks and the operations are very
    short in most cases. The code around it isn't trivial, but that's no
    reason to use LWlocks.
    What does "in most cases" mean?
    LWlocks are just spinlocks plus sem sleeps, so I don't see the need for
    that in the current code. Other views welcome.
    Simon, that is absolutely NOT acceptable. Spinlocks are to be used only
    for short straight-line code segments. If the lock has any potential to
    be held for more than nanoseconds, use an LWLock. The contention costs
    of the shortcut you propose are too high.

    regards, tom lane
  • Simon Riggs at Mar 1, 2011 at 6:18 pm

    On Tue, 2011-03-01 at 10:02 -0500, Tom Lane wrote:
    Simon Riggs <simon@2ndQuadrant.com> writes:
    On Tue, 2011-03-01 at 15:51 +0900, Fujii Masao wrote:
    A spinlock can be used only for very short-term operation like
    read/write of some shared-variables. The operation on the queue
    is not short, so should be protected by LWLock, I think.
    There's no need to sleep while holding locks and the operations are very
    short in most cases. The code around it isn't trivial, but that's no
    reason to use LWlocks.
    What does "in most cases" mean?
    LWlocks are just spinlocks plus sem sleeps, so I don't see the need for
    that in the current code. Other views welcome.
    Simon, that is absolutely NOT acceptable. Spinlocks are to be used only
    for short straight-line code segments. If the lock has any potential to
    be held for more than nanoseconds, use an LWLock. The contention costs
    of the shortcut you propose are too high.
    No problem to change.

    --
    Simon Riggs http://www.2ndQuadrant.com/books/
    PostgreSQL Development, 24x7 Support, Training and Services
  • Simon Riggs at Feb 28, 2011 at 6:41 pm

    On Tue, 2011-02-22 at 14:38 +0900, Fujii Masao wrote:
    On Mon, Feb 21, 2011 at 6:06 PM, Fujii Masao wrote:
    I've read about a tenth of the patch, so I'll submit another comments
    about the rest later.
    Here are another comments:
    Thanks for your comments
    Code available at git://github.com/simon2ndQuadrant/postgres.git
    SyncRepReleaseWaiters should be called when walsender exits. Otherwise,
    if the standby crashes while a transaction is waiting for replication,
    it waits infinitely.
    Will think on this.
    sync_rep_service and potential_sync_standby are not required to be in the
    WalSnd shmem because only walsender accesses them.
    For use in debug, if not later monitoring
    +static bool
    +SyncRepServiceAvailable(void)
    +{
    + bool result = false;
    +
    + SpinLockAcquire(&WalSndCtl->ctlmutex);
    + result = WalSndCtl->sync_rep_service_available;
    + SpinLockRelease(&WalSndCtl->ctlmutex);
    +
    + return result;
    +} Fixed
    volatile pointer needs to be used to prevent code rearrangement.

    + slock_t ctlmutex; /* locks shared variables shown above */

    cltmutex should be initialized by calling SpinLockInit. Fixed
    + /*
    + * Stop providing the sync rep service, even if there are
    + * waiting backends.
    + */
    + {
    + SpinLockAcquire(&WalSndCtl->ctlmutex);
    + WalSndCtl->sync_rep_service_available = false;
    + SpinLockRelease(&WalSndCtl->ctlmutex);
    + }

    sync_rep_service_available should be set to false only when
    there is no synchronous walsender.
    The way I had it is "correct" because "if (MyWalSnd->sync_rep_service)"
    then if we're the sync walsender, so if we stop being it, then there
    isn't one. A potential walsender might then become the sync walsender.

    I think you'd like it if there was no gap at the point the potential wal
    sender takes over? Just not sure how to do that robustly at present.
    Will think some more.
    + /*
    + * When we first start replication the standby will be behind the primary.
    + * For some applications, for example, synchronous replication, it is
    + * important to have a clear state for this initial catchup mode, so we
    + * can trigger actions when we change streaming state later. We may stay
    + * in this state for a long time, which is exactly why we want to be
    + * able to monitor whether or not we are still here.
    + */
    + WalSndSetState(WALSNDSTATE_CATCHUP);
    +

    The above has already been committed. Please remove that from the patch. Removed
    I don't like calling SyncRepReleaseWaiters for each feedback because
    I guess that it's too frequent. How about receiving all the feedbacks available
    from the socket, and then calling SyncRepReleaseWaiters as well as
    walreceiver does?
    Possibly, but an optimisation for later when we have behaviour correct.
    + bool ownLatch; /* do we own the above latch? */

    We can just remove the ownLatch flag.
    Agreed, removed

    --
    Simon Riggs http://www.2ndQuadrant.com/books/
    PostgreSQL Development, 24x7 Support, Training and Services

Related Discussions

People

Translate

site design / logo © 2021 Grokbase