So we've got two patches that implement synchronous replication, and
no agreement on which one, if either, should be committed. We have no
agreement on how synchronous replication should be configured, and at
most a tenuous agreement that it should involve standby registration.

This is bad.

This feature is important, and we need to get it done. How do we get
the ball rolling again?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Search Discussions

  • Fujii Masao at Sep 29, 2010 at 7:57 am

    On Wed, Sep 29, 2010 at 11:47 AM, Robert Haas wrote:
    So we've got two patches that implement synchronous replication, and
    no agreement on which one, if either, should be committed.  We have no
    agreement on how synchronous replication should be configured, and at
    most a tenuous agreement that it should involve standby registration.

    This is bad.

    This feature is important, and we need to get it done.  How do we get
    the ball rolling again?
    ISTM that it still takes long to make consensus on standby registration.
    So, how about putting the per-standby parameters in recovery.conf, and
    focusing on the basic features in synchronous replication at first?
    During that time, we can deepen discussion on standby registration, and
    then we can implement that.

    The basic features that I mean is for most basic use case, that is, one
    master and one synchronous standby case. In detail,
    * Support multiple standbys with various synchronization levels.
    Not required for that case.
    * What happens if a synchronous standby isn't connected at the moment? Return immediately vs. wait forever.
    The wait-forever option is not required for that case. Let's implement
    the return-immediately at first.
    * Per-transaction control. Some transactions are important, others are not.
    Not required for that case.
    * Quorum commit. Wait until n standbys acknowledge. n=1 and n=all servers can be seen as important special cases of this.
    Not required for that case.
    * async, recv, fsync and replay levels of synchronization.
    At least one of three synchronous levels should be included in the first
    commit. I think that either recv or fsync is suitable for first try
    because those don't require wake-up signaling from startup process to
    walreceiver and are relatively easy to implement.

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
  • Robert Haas at Sep 30, 2010 at 12:48 am

    On Wed, Sep 29, 2010 at 3:56 AM, Fujii Masao wrote:
    On Wed, Sep 29, 2010 at 11:47 AM, Robert Haas wrote:
    So we've got two patches that implement synchronous replication, and
    no agreement on which one, if either, should be committed.  We have no
    agreement on how synchronous replication should be configured, and at
    most a tenuous agreement that it should involve standby registration.

    This is bad.

    This feature is important, and we need to get it done.  How do we get
    the ball rolling again?
    ISTM that it still takes long to make consensus on standby registration.
    So, how about putting the per-standby parameters in recovery.conf, and
    focusing on the basic features in synchronous replication at first?
    During that time, we can deepen discussion on standby registration, and
    then we can implement that.

    The basic features that I mean is for most basic use case, that is, one
    master and one synchronous standby case. In detail,
    * Support multiple standbys with various synchronization levels.
    Not required for that case.
    * What happens if a synchronous standby isn't connected at the moment? Return immediately vs. wait forever.
    The wait-forever option is not required for that case. Let's implement
    the return-immediately at first.
    * Per-transaction control. Some transactions are important, others are not.
    Not required for that case.
    * Quorum commit. Wait until n standbys acknowledge. n=1 and n=all servers can be seen as important special cases of this.
    Not required for that case.
    * async, recv, fsync and replay levels of synchronization.
    At least one of three synchronous levels should be included in the first
    commit. I think that either recv or fsync is suitable for first try
    because those don't require wake-up signaling from startup process to
    walreceiver and are relatively easy to implement.
    I'm not sure this really gets us anywhere. We already have two
    patches; writing a third one won't fix anything. We need to decide
    which patch can be the basis for future work. According to my
    understanding, the most significant difference between the patches is
    the way that ACKs get sent from standby to master. Whose idea is
    better, yours or Simon's? And why? Are there other reasons to prefer
    one patch to the other?

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Heikki Linnakangas at Sep 30, 2010 at 6:10 am

    On 29.09.2010 10:56, Fujii Masao wrote:
    On Wed, Sep 29, 2010 at 11:47 AM, Robert Haaswrote:
    So we've got two patches that implement synchronous replication, and
    no agreement on which one, if either, should be committed. We have no
    agreement on how synchronous replication should be configured, and at
    most a tenuous agreement that it should involve standby registration.

    This is bad.

    This feature is important, and we need to get it done. How do we get
    the ball rolling again?
    Agreed. Actually, given the lack of people jumping in and telling us
    what they'd like to do with the feature, maybe it's not that important
    after all.
    ISTM that it still takes long to make consensus on standby registration.
    So, how about putting the per-standby parameters in recovery.conf, and
    focusing on the basic features in synchronous replication at first?
    During that time, we can deepen discussion on standby registration, and
    then we can implement that.

    The basic features that I mean is for most basic use case, that is, one
    master and one synchronous standby case. In detail,
    ISTM the problem is exactly that there is no consensus on what the basic
    use case is. I'm sure there's several things you can accomplish with
    synchronous replication, perhaps you could describe what the important
    use case for you is?
    * Support multiple standbys with various synchronization levels.
    Not required for that case.
    IMHO at least we'll still need to support asynchronous standbys in the
    same mix, that's an existing feature.
    * What happens if a synchronous standby isn't connected at the moment? Return immediately vs. wait forever.
    The wait-forever option is not required for that case. Let's implement
    the return-immediately at first.

    ..-
    * async, recv, fsync and replay levels of synchronization.
    At least one of three synchronous levels should be included in the first
    commit. I think that either recv or fsync is suitable for first try
    because those don't require wake-up signaling from startup process to
    walreceiver and are relatively easy to implement.
    What is the use case for that combination? For zero data loss, you
    *must* wait forever if a standby isn't connected. For keeping a hot
    standby server up-to-date so that you can freely query the standby
    instead of the master, you need replay level synchronization.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Simon Riggs at Sep 30, 2010 at 8:14 am

    On Thu, 2010-09-30 at 09:09 +0300, Heikki Linnakangas wrote:
    On 29.09.2010 10:56, Fujii Masao wrote:
    On Wed, Sep 29, 2010 at 11:47 AM, Robert Haaswrote:
    This feature is important, and we need to get it done. How do we get
    the ball rolling again?
    Agreed. Actually, given the lack of people jumping in and telling us
    what they'd like to do with the feature, maybe it's not that important
    after all.
    I don't see anything has stalled. I've been busy for a few days, so
    haven't had a chance to follow up on the use cases, as suggested. I'm
    busy again today, so cannot reply further. Anyway, taking a few days to
    let us think some more about the technical comments is no bad thing.

    I think we need to relax about this feature some more because trying to
    get something actually done when basic issues need analysis is hard and
    that creates tension. Between us we can work out the code in a few days,
    once we know which code to write.

    What we actually need to do is talk and listen. I'd like to suggest that
    we have an online "focus day" (onlist) on Sync Rep on Oct 5 and maybe 6
    as well?. Meeting in person is possible, but probably impractical. But a
    design sprint, not a code sprint.

    This is important and I'm sure we'll work something out.

    --
    Simon Riggs www.2ndQuadrant.com
    PostgreSQL Development, 24x7 Support, Training and Services
  • David Fetter at Sep 30, 2010 at 1:35 pm

    On Thu, Sep 30, 2010 at 09:14:42AM +0100, Simon Riggs wrote:
    On Thu, 2010-09-30 at 09:09 +0300, Heikki Linnakangas wrote:
    On 29.09.2010 10:56, Fujii Masao wrote:
    On Wed, Sep 29, 2010 at 11:47 AM, Robert Haaswrote:
    This feature is important, and we need to get it done. How do
    we get the ball rolling again?
    Agreed. Actually, given the lack of people jumping in and telling
    us what they'd like to do with the feature, maybe it's not that
    important after all.
    I don't see anything has stalled.
    I do. We're half way through this commitfest, so if no one's actually
    ready to commit one of the patches, I kinda have to bounce them both,
    at least to the next CF.

    The very likely outcome of that, given that it's a pretty enormous
    feature that involves even more enormous amounts of testing on various
    hardware, networks, etc., is that we don't get SR in 9.1, and you
    among others will be very unhappy.

    So yes, it is stalled, and yes, there's a real urgency to actually
    getting a baseline something in there in the next couple of weeks.

    Cheers,
    David.
    --
    David Fetter <david@fetter.org> http://fetter.org/
    Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
    Skype: davidfetter XMPP: david.fetter@gmail.com
    iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

    Remember to vote!
    Consider donating to Postgres: http://www.postgresql.org/about/donate
  • Tom Lane at Sep 30, 2010 at 1:53 pm

    David Fetter writes:
    On Thu, Sep 30, 2010 at 09:14:42AM +0100, Simon Riggs wrote:
    I don't see anything has stalled.
    I do. We're half way through this commitfest, so if no one's actually
    ready to commit one of the patches, I kinda have to bounce them both,
    at least to the next CF.
    [ raised eyebrow ] You seem to be in an awfully big hurry to bounce
    stuff. The CF end is still two weeks away.

    But while I'm thinking about that...

    The actual facts on the ground are that practically no CF work has
    gotten done yet (at least not in my house) due to the git move and the
    9.0.0 release and the upcoming back-branch releases. Maybe we shouldn't
    have started the CF while all that was going on, but that's water over
    the dam now. What we can do is rethink the scheduled end date. IMHO
    we should push out the end date by at least a week to reflect the lack
    of time spent on the CF so far.

    regards, tom lane
  • David Fetter at Sep 30, 2010 at 2:07 pm

    On Thu, Sep 30, 2010 at 09:52:46AM -0400, Tom Lane wrote:
    David Fetter <david@fetter.org> writes:
    On Thu, Sep 30, 2010 at 09:14:42AM +0100, Simon Riggs wrote:
    I don't see anything has stalled.
    I do. We're half way through this commitfest, so if no one's
    actually ready to commit one of the patches, I kinda have to
    bounce them both, at least to the next CF.
    [ raised eyebrow ] You seem to be in an awfully big hurry to bounce
    stuff. The CF end is still two weeks away.
    If people are still wrangling over the design, I'd say two weeks is
    a ludicrously short time, not a long one.
    But while I'm thinking about that...

    The actual facts on the ground are that practically no CF work has
    gotten done yet (at least not in my house)
    Your non-involvement in the first half or more--I'd say maybe 3 weeks
    or so--is precisely what commitfests are for. The point is that
    people who are *not* committers need to do a bunch of QA on patches,
    review them, get or create new patches as needed. Only then should a
    committer get involved.

    Cheers,
    David.
    --
    David Fetter <david@fetter.org> http://fetter.org/
    Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
    Skype: davidfetter XMPP: david.fetter@gmail.com
    iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

    Remember to vote!
    Consider donating to Postgres: http://www.postgresql.org/about/donate
  • Simon Riggs at Sep 30, 2010 at 4:53 pm

    On Thu, 2010-09-30 at 07:06 -0700, David Fetter wrote:
    On Thu, Sep 30, 2010 at 09:52:46AM -0400, Tom Lane wrote:
    David Fetter <david@fetter.org> writes:
    On Thu, Sep 30, 2010 at 09:14:42AM +0100, Simon Riggs wrote:
    I don't see anything has stalled.
    I do. We're half way through this commitfest, so if no one's
    actually ready to commit one of the patches, I kinda have to
    bounce them both, at least to the next CF.
    [ raised eyebrow ] You seem to be in an awfully big hurry to bounce
    stuff. The CF end is still two weeks away.
    If people are still wrangling over the design, I'd say two weeks is
    a ludicrously short time, not a long one.
    Yes, there is design work still to do.

    What purpose would be served by "bouncing" these patches?

    --
    Simon Riggs www.2ndQuadrant.com
    PostgreSQL Development, 24x7 Support, Training and Services
  • Robert Haas at Sep 30, 2010 at 6:14 pm

    On Thu, Sep 30, 2010 at 12:52 PM, Simon Riggs wrote:
    On Thu, 2010-09-30 at 07:06 -0700, David Fetter wrote:
    On Thu, Sep 30, 2010 at 09:52:46AM -0400, Tom Lane wrote:
    David Fetter <david@fetter.org> writes:
    On Thu, Sep 30, 2010 at 09:14:42AM +0100, Simon Riggs wrote:
    I don't see anything has stalled.
    I do.  We're half way through this commitfest, so if no one's
    actually ready to commit one of the patches, I kinda have to
    bounce them both, at least to the next CF.
    [ raised eyebrow ]  You seem to be in an awfully big hurry to bounce
    stuff.  The CF end is still two weeks away.
    If people are still wrangling over the design, I'd say two weeks is
    a ludicrously short time, not a long one.
    Yes, there is design work still to do.

    What purpose would be served by "bouncing" these patches?
    None whatsoever, IMHO. That having been said, I would like to see us
    make some forward progress. I'm open to your ideas expressed
    up-thread, but I'm not sure whether they'll be sufficient to resolve
    the problem. Seems worth a try, though.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Josh Berkus at Oct 3, 2010 at 3:49 am

    On 09/30/2010 10:52 PM, Tom Lane wrote:
    IMHO
    we should push out the end date by at least a week to reflect the lack
    of time spent on the CF so far.
    I agree that we should postpone the end of the CF by one week to deal
    with the distractions people have had.

    --
    -- Josh Berkus
    PostgreSQL Experts Inc.
    http://www.pgexperts.com
  • Josh Berkus at Oct 3, 2010 at 3:52 am

    What we actually need to do is talk and listen. I'd like to suggest that
    we have an online "focus day" (onlist) on Sync Rep on Oct 5 and maybe 6
    as well?. Meeting in person is possible, but probably impractical. But a
    design sprint, not a code sprint.
    I'd suggest something even simpler:

    (1) Create a wiki page which lists all of the design descisions we need
    to make in order to finish the specification for synch rep.

    (2) Link each item to any prior discussion we've had about the item.

    (3) Invite people to comment on the wiki by leaving per-item comments
    and suggestions with their own names.

    I believe that right now only a handful of people (Simon, Heikki, Fujii,
    Zoltan) are really acquainted with all of the decisions which need to be
    made. No wonder the rest of us fly off on minutia like file formats; we
    really have no sense of scope.


    --
    -- Josh Berkus
    PostgreSQL Experts Inc.
    http://www.pgexperts.com
  • Markus Wanner at Oct 4, 2010 at 6:48 am
    Hi,
    On 10/03/2010 05:52 AM, Josh Berkus wrote:
    (3) Invite people to comment on the wiki by leaving per-item comments
    and suggestions with their own names.
    Please keep discussions on the mailing list. On Wikis, those are very
    hard to follow (Date or From missing, no offline capabilities, indirect
    notification, etc..)

    I like Simon's suggestion, but thought of something *more* direct (maybe
    IRC), not less (like Wikis).
    I believe that right now only a handful of people (Simon, Heikki, Fujii,
    Zoltan) are really acquainted with all of the decisions which need to be
    made.
    I at least try to follow. And I actually think we had quite some DBA
    inputs as well.

    Regards

    Markus
  • Aidan Van Dyk at Sep 30, 2010 at 2:01 pm

    On Thu, Sep 30, 2010 at 2:09 AM, Heikki Linnakangas wrote:

    Agreed. Actually, given the lack of people jumping in and telling us what
    they'd like to do with the feature, maybe it's not that important after all.
    The basic features that I mean is for most basic use case, that is, one
    master and one synchronous standby case. In detail,
    ISTM the problem is exactly that there is no consensus on what the basic use
    case is. I'm sure there's several things you can accomplish with synchronous
    replication, perhaps you could describe what the important use case for you
    is?
    OK, So I'll throw in my ideal use case. I'm starting to play with
    Magnus's "streaming -> archive".

    *that's* what I want, with synchronous. Yes, again, I'm looking for
    "data durability", not "server query-ability", and I'ld like to rely
    on the PG user-space side of things instead of praying that replicated
    block-devices hold together....

    If my master flips out, I'm quite happy to do a normal archive
    restore. Except I don't want that last 16MB (or archive timeout) of
    transactions lost. The streaming -> archive in it's current state
    get's me pretty close, but I'ld love to be able to guarantee that my
    recovery from that archive has *every* transaction that the master
    committed...

    a.

    a.
  • Kevin Grittner at Sep 30, 2010 at 2:10 pm
    Aidan Van Dyk wrote:
    Heikki Linnakangas wrote:
    I'm sure there's several things you can accomplish with
    synchronous replication, perhaps you could describe what the
    important use case for you is?
    I'm looking for "data durability", not "server query-ability"
    Same here. If we used synchronous replication, the important thing
    for us would be to hold up the master for the minimum time required
    to ensure remote persistence -- not actual application to the remote
    database. We could tolerate some WAL replay time on recovery better
    than poor commit performance on the master.

    -Kevin
  • Heikki Linnakangas at Sep 30, 2010 at 2:24 pm

    On 30.09.2010 17:09, Kevin Grittner wrote:
    Aidan Van Dykwrote:
    Heikki Linnakangaswrote:
    I'm sure there's several things you can accomplish with
    synchronous replication, perhaps you could describe what the
    important use case for you is?
    I'm looking for "data durability", not "server query-ability"
    Same here. If we used synchronous replication, the important thing
    for us would be to hold up the master for the minimum time required
    to ensure remote persistence -- not actual application to the remote
    database. We could tolerate some WAL replay time on recovery better
    than poor commit performance on the master.
    You do realize that to be able to guarantee zero data loss, the master
    will have to stop committing new transactions if the streaming stops for
    any reason, like a network glitch. Maybe that's a tradeoff you want, but
    I'm asking because that point isn't clear to many people.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Yeb Havinga at Sep 30, 2010 at 2:50 pm

    Heikki Linnakangas wrote:
    On 30.09.2010 17:09, Kevin Grittner wrote:
    Aidan Van Dykwrote:
    Heikki Linnakangaswrote:
    I'm sure there's several things you can accomplish with
    synchronous replication, perhaps you could describe what the
    important use case for you is?
    I'm looking for "data durability", not "server query-ability"
    Same here. If we used synchronous replication, the important thing
    for us would be to hold up the master for the minimum time required
    to ensure remote persistence -- not actual application to the remote
    database. We could tolerate some WAL replay time on recovery better
    than poor commit performance on the master.
    You do realize that to be able to guarantee zero data loss, the master
    will have to stop committing new transactions if the streaming stops
    for any reason, like a network glitch. Maybe that's a tradeoff you
    want, but I'm asking because that point isn't clear to many people.
    If there's a network glitch, it'd probably affect networked client
    connections as well, so it would mean no extra degration of service.

    -- Yeb
  • Markus Wanner at Oct 4, 2010 at 7:03 am

    On 09/30/2010 04:54 PM, Yeb Havinga wrote:
    Heikki Linnakangas wrote:
    You do realize that to be able to guarantee zero data loss, the master
    will have to stop committing new transactions if the streaming stops
    for any reason, like a network glitch. Maybe that's a tradeoff you
    want, but I'm asking because that point isn't clear to many people.
    If there's a network glitch, it'd probably affect networked client
    connections as well, so it would mean no extra degration of service.
    Agreed.

    I think the network glitch example is too general, it could affect any
    part of the whole network. Even just the connection between the master
    and the standby, in which case all client connections would keep up.

    Let's quickly think about that scenario. AFAIU in such a case, the
    standby would continue to answer read-only queries, independent of what
    the master does, right? Or does the standby stop processing read-only
    queries in case it looses connection to the master?

    It seems to me the later is required, if we let the master continue to
    commit transactions. Otherwise the standby would serve stale data to its
    clients without knowing.

    Given that scenario, I'd clearly favor a master that stops committing
    new transactions, but allow both (i.e. master and standbies) to continue
    answering read-only queries.

    Regards

    Markus Wanner
  • Heikki Linnakangas at Oct 4, 2010 at 7:18 am

    On 04.10.2010 10:03, Markus Wanner wrote:
    On 09/30/2010 04:54 PM, Yeb Havinga wrote:
    Heikki Linnakangas wrote:
    You do realize that to be able to guarantee zero data loss, the master
    will have to stop committing new transactions if the streaming stops
    for any reason, like a network glitch. Maybe that's a tradeoff you
    want, but I'm asking because that point isn't clear to many people.
    If there's a network glitch, it'd probably affect networked client
    connections as well, so it would mean no extra degration of service.
    Agreed.

    I think the network glitch example is too general, it could affect any
    part of the whole network. Even just the connection between the master
    and the standby, in which case all client connections would keep up.

    Let's quickly think about that scenario. AFAIU in such a case, the
    standby would continue to answer read-only queries, independent of what
    the master does, right? Right.
    Or does the standby stop processing read-only
    queries in case it looses connection to the master?
    As far as the current proposals go, no.
    It seems to me the later is required, if we let the master continue to
    commit transactions. Otherwise the standby would serve stale data to its
    clients without knowing.
    Yep. If you want to guarantee that a hot standby doesn't return stale
    data, if the connection is lost you need to either stop processing
    read-only queries in the standby, or stop processing commits in the master.

    Note that this assumes that you use the 'replay' synchronization level.
    In the weaker levels, read-only queries can always return stale data.

    With 'replay' and hot standby combination, you'll want to set
    max_standby_archive_delay to a very low value, or a read-only query can
    cause master to stop processing commits (or the standby to stop
    accepting new queries, if that's preferred).

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Markus Wanner at Oct 4, 2010 at 7:49 am

    On 10/04/2010 09:18 AM, Heikki Linnakangas wrote:
    Note that this assumes that you use the 'replay' synchronization level.
    In the weaker levels, read-only queries can always return stale data.
    I'm not too found of those various synchronization levels, but IIUC all
    other levels only allow a rather limited staleness. But a master that's
    continuing to commit new transactions with a disconnected standby that
    happily continues to answer read-only queries, the age of the standby's
    snapshot can grow without limitation.
    With 'replay' and hot standby combination, you'll want to set
    max_standby_archive_delay to a very low value, or a read-only query can
    cause master to stop processing commits (or the standby to stop
    accepting new queries, if that's preferred).
    Well, given that DML-only transactions aren't prone such to conflicts, I
    think of this as a corner case.

    Also note, that this requirement seems to apply whether we wait forever
    on standby failure or not. (Because even if we don't, there must be some
    kind of timeout on the master from the very first suspicion to actually
    declare the standby dead - anything else is called anync).

    Regards

    Markus Wanner
  • Heikki Linnakangas at Oct 4, 2010 at 7:56 am

    On 04.10.2010 10:49, Markus Wanner wrote:
    On 10/04/2010 09:18 AM, Heikki Linnakangas wrote:
    With 'replay' and hot standby combination, you'll want to set
    max_standby_archive_delay to a very low value, or a read-only query can
    cause master to stop processing commits (or the standby to stop
    accepting new queries, if that's preferred).
    Well, given that DML-only transactions aren't prone such to conflicts, I
    think of this as a corner case.
    Yes they are. Any DML operation, and even read-only queries IIRC, can
    trigger HOT pruning, which can conflict with a read-only query in a hot
    standby. And then there's autovacuum which can cause conflicts in the
    standby, even if no user transactions are running in the master.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Bruce Momjian at Oct 14, 2010 at 10:28 pm

    Heikki Linnakangas wrote:
    On 04.10.2010 10:49, Markus Wanner wrote:
    On 10/04/2010 09:18 AM, Heikki Linnakangas wrote:
    With 'replay' and hot standby combination, you'll want to set
    max_standby_archive_delay to a very low value, or a read-only query can
    cause master to stop processing commits (or the standby to stop
    accepting new queries, if that's preferred).
    Well, given that DML-only transactions aren't prone such to conflicts, I
    think of this as a corner case.
    Yes they are. Any DML operation, and even read-only queries IIRC, can
    trigger HOT pruning, which can conflict with a read-only query in a hot
    standby. And then there's autovacuum which can cause conflicts in the
    standby, even if no user transactions are running in the master.
    I can confirm that SELECT can trigger HOT pruning, based on research for
    my PG West MVCC talk. Anything that does a tuple lookup can cause it
    --- INSERT VALUES does not.

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://enterprisedb.com

    + It's impossible for everything to be true. +
  • Kevin Grittner at Sep 30, 2010 at 2:53 pm

    Heikki Linnakangas wrote:

    You do realize that to be able to guarantee zero data loss, the
    master will have to stop committing new transactions if the
    streaming stops for any reason, like a network glitch. Maybe
    that's a tradeoff you want, but I'm asking because that point
    isn't clear to many people.
    Yeah, I get that. I do think the quorum approach or some simplified
    special case of it would be important for us -- possibly even a
    requirement -- for that reason.

    -Kevin
  • Fujii Masao at Oct 1, 2010 at 10:48 am

    On Thu, Sep 30, 2010 at 3:09 PM, Heikki Linnakangas wrote:
    * Support multiple standbys with various synchronization levels.
    Not required for that case.
    IMHO at least we'll still need to support asynchronous standbys in the same
    mix, that's an existing feature.
    My intention is to commit the core part of synchronous replication (which would
    be used for every use cases) at first. Then we can implement the
    feature for each
    use case.

    I agree that 9.1 should support asynchronous standbys in the same mix, but this
    seems to be extended feature rather than very core.
    * What happens if a synchronous standby isn't connected at the moment?
    Return immediately vs. wait forever.
    The wait-forever option is not required for that case. Let's implement
    the return-immediately at first.

    ..-
    * async, recv, fsync and replay levels of synchronization.
    At least one of three synchronous levels should be included in the first
    commit. I think that either recv or fsync is suitable for first try
    because those don't require wake-up signaling from startup process to
    walreceiver and are relatively easy to implement.
    What is the use case for that combination? For zero data loss, you *must*
    wait forever if a standby isn't connected. For keeping a hot standby server
    up-to-date so that you can freely query the standby instead of the master,
    you need replay level synchronization.
    For high availability, and zero data loss unless the disk on one of master
    and standby gets corrupted after the other goes down. It's the same use case
    that cluster with shared disk covers.

    I proposed to implement the "return-immediately" at first because it doesn't
    require standby registration. But if many people think that the "wait-forever"
    is the core rather than the "return-immediately", I'll follow them. We can
    implement the "return-immediately" after that.

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
  • David Fetter at Oct 1, 2010 at 2:16 pm

    On Fri, Oct 01, 2010 at 07:48:25PM +0900, Fujii Masao wrote:
    I proposed to implement the "return-immediately" at first because it
    doesn't require standby registration. But if many people think that
    the "wait-forever" is the core rather than the "return-immediately",
    I'll follow them. We can implement the "return-immediately" after
    that.
    In my experience, most people who want "synchronous" behavior are
    willing to put up with "wait forever," especially when asynchronous
    behavior is already available.

    In short, +1 for "push 'wait forever' soonest."

    Anybody who's got a Secret Base, Hidden in a Hollowed-Out Mountain,
    Making Grand Plans While Stroking a Long-Haired Cat[1], should please
    to update their public repository, or create a public repository if it
    doesn't already exist, and in either case keep it current.

    Cheers,
    David

    [1] While the Hollowed-Out Mountain trick worked back in the 60s,
    it's gotten a little trite. The cool kids are keeping things pretty
    public these days when they plan to go public.
    --
    David Fetter <david@fetter.org> http://fetter.org/
    Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
    Skype: davidfetter XMPP: david.fetter@gmail.com
    iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

    Remember to vote!
    Consider donating to Postgres: http://www.postgresql.org/about/donate
  • Fujii Masao at Oct 4, 2010 at 2:22 pm

    On Fri, Oct 1, 2010 at 11:16 PM, David Fetter wrote:
    On Fri, Oct 01, 2010 at 07:48:25PM +0900, Fujii Masao wrote:
    I proposed to implement the "return-immediately" at first because it
    doesn't require standby registration. But if many people think that
    the "wait-forever" is the core rather than the "return-immediately",
    I'll follow them.  We can implement the "return-immediately" after
    that.
    In my experience, most people who want "synchronous" behavior are
    willing to put up with "wait forever," especially when asynchronous
    behavior is already available.

    In short, +1 for "push 'wait forever' soonest."
    I have one question for clarity:

    If we make all the transactions wait until specified standbys have
    connected to the master, how do we take a base backup from the
    master for those standbys? We seem to be unable to do that because
    pg_start_backup also waits forever. Is this right?

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
  • Aidan Van Dyk at Oct 4, 2010 at 5:07 pm

    On Mon, Oct 4, 2010 at 10:22 AM, Fujii Masao wrote:

    I have one question for clarity:

    If we make all the transactions wait until specified standbys have
    connected to the master, how do we take a base backup from the
    master for those standbys? We seem to be unable to do that because
    pg_start_backup also waits forever. Is this right?
    Well, in my *opinion*, if you've told the master to not "commit to"
    *anything* unless it's synchronously replicated, you should already
    have a synchronously replicating slave up and running.

    I'm happy with the docs saying (maybe some what more politely):
    Before configuring your master to be completly,
    wait-fully-synchronous, make sure you have a slave capable of being
    synchronous ready. Because if you've told it to never be
    un-synchronous, it won't be.
  • Fujii Masao at Oct 5, 2010 at 3:59 am

    On Tue, Oct 5, 2010 at 2:06 AM, Aidan Van Dyk wrote:
    On Mon, Oct 4, 2010 at 10:22 AM, Fujii Masao wrote:

    I have one question for clarity:

    If we make all the transactions wait until specified standbys have
    connected to the master, how do we take a base backup from the
    master for those standbys? We seem to be unable to do that because
    pg_start_backup also waits forever. Is this right?
    Well, in my *opinion*, if you've told the master to not "commit to"
    *anything* unless it's synchronously replicated, you should already
    have a synchronously replicating slave up and running.

    I'm happy with the docs saying (maybe some what more politely):
    Before configuring your master to be completly,
    wait-fully-synchronous, make sure you have a slave capable of being
    synchronous ready.  Because if you've told it to never be
    un-synchronous, it won't be.
    How can we take a base backup for that synchronous standby? You mean
    that we should disable the wait-forever option, start the master, take
    a base backup, shut down the master, enable the wait-forever option,
    start the master, and start the standby from that base backup?

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
  • Tom Lane at Oct 5, 2010 at 4:19 am

    Fujii Masao writes:
    On Tue, Oct 5, 2010 at 2:06 AM, Aidan Van Dyk wrote:
    I'm happy with the docs saying (maybe some what more politely):
    Before configuring your master to be completly,
    wait-fully-synchronous, make sure you have a slave capable of being
    synchronous ready.  Because if you've told it to never be
    un-synchronous, it won't be.
    How can we take a base backup for that synchronous standby? You mean
    that we should disable the wait-forever option, start the master, take
    a base backup, shut down the master, enable the wait-forever option,
    start the master, and start the standby from that base backup?
    I think the point here is that it's possible to have sync-rep
    configurations in which it's impossible to take a base backup. That
    doesn't seem to me to be unacceptable in itself. What *is* unacceptable
    is to be unable to change the configuration to another state in which
    you could take a base backup. Which is why "keep the config in a system
    catalog" doesn't work.

    regards, tom lane
  • Dimitri Fontaine at Oct 6, 2010 at 9:39 am

    Tom Lane writes:
    I think the point here is that it's possible to have sync-rep
    configurations in which it's impossible to take a base backup.
    Sorry to be slow. I still don't understand that problem.

    I can understand why people want "wait forever", but I can't understand
    when the following strange idea apply: consider my non-ready standby
    there as a full member of the distributed setup already.

    I've been making plenty of noise about this topic in the past, at the
    beginning of plans for SR in 9.0 IIRC, pushing Heikki into having a
    worked out state machine to figure out what are the known states of a
    standby and what we can do with each. We've cancelled that and said it
    would maybe necessary for Synchronous Replication. Here we go, right?

    So, first thing first, when is it a good idea to consider a standby
    that's not yet had its base backup, let alone validated that after
    taking it the master still has enough WAL for the backup to be valid as
    far as initialising the slave goes, to consider this broken standby as
    someone we wait forever on?

    I say a standby is registered when it's currently "attached" and already
    able to keep up in async. That's a time when you can slow down the
    master until this new member catches up to full sync or whatever you've
    setup.

    Regards,
    --
    Dimitri Fontaine
    http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

    Lack of google and archives-fu today means no link to those mails. Yet…
  • Aidan Van Dyk at Oct 5, 2010 at 1:34 pm

    On Mon, Oct 4, 2010 at 11:48 PM, Fujii Masao wrote:

    How can we take a base backup for that synchronous standby? You mean
    that we should disable the wait-forever option, start the master, take
    a base backup, shut down the master, enable the wait-forever option,
    start the master, and start the standby from that base backup?
    All I'm saying is that *after* you've configured that everything must
    be synchronous is *not* the time to start trying to figure out if your
    PITR backups/archive are working, and starting to try and get a slave
    replicating synchronously.

    Yes, High-Durability sync rep has caveats. One of them is that you
    must have a working synchronous slave before you can enforce
    synchronousity.

    a.
  • Heikki Linnakangas at Oct 5, 2010 at 8:50 am

    On 04.10.2010 17:22, Fujii Masao wrote:
    If we make all the transactions wait until specified standbys have
    connected to the master, how do we take a base backup from the
    master for those standbys? We seem to be unable to do that because
    pg_start_backup also waits forever. Is this right?
    Hmm, pg_start_backup() writes WAL, but it doesn't commit. Only a commit
    needs to wait for acknowledgment from the standby, so 'wait forever'
    behavior doesn't necessarily mean that you can't take a base backup. If
    you run it outside a transaction you get an implicit commit, though,
    which will wait, so you might need to do something odd like "begin;
    select pg_start_backup(); rollback".

    But I agree with Tom that as long as it's possible to change the
    configuration on the fly, it's not a show-stopper if you can't take a
    new base backup while the standby is disconnected.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Fujii Masao at Oct 5, 2010 at 9:47 am

    On Tue, Oct 5, 2010 at 5:49 PM, Heikki Linnakangas wrote:
    On 04.10.2010 17:22, Fujii Masao wrote:

    If we make all the transactions wait until specified standbys have
    connected to the master, how do we take a base backup from the
    master for those standbys? We seem to be unable to do that because
    pg_start_backup also waits forever. Is this right?
    Hmm, pg_start_backup() writes WAL, but it doesn't commit. Only a commit
    needs to wait for acknowledgment from the standby, so 'wait forever'
    behavior doesn't necessarily mean that you can't take a base backup. If you
    run it outside a transaction you get an implicit commit, though, which will
    wait, so you might need to do something odd like "begin; select
    pg_start_backup(); rollback".
    Yep. Similarly, we would need to enclose also pg_stop_backup with begin
    and rollback.

    I have another question: when should the waiting transactions resume?
    It's a moment the standby has connected to the master? It's a moment
    the standby has caught up with the master? For no data loss, the
    latter seems to be required. Right?

    The third question: if the WAL file is unfortunately recycled when a
    transaction waits for that WAL file to be shipped forever, how should
    that transaction behave? Still waiting? Cause PANIC? Give up waiting?
    For no data loss, ISTM that the second should be chosen. Right?

    This can happen because we can write WAL to the master without waiting
    for replication by enclosing a query with begin and rollback, even if
    all the transaction *commit* are waiting for replication forever.
    But I agree with Tom that as long as it's possible to change the
    configuration on the fly, it's not a show-stopper if you can't take a new
    base backup while the standby is disconnected.
    Yep. If people who want the "wait-forever" can live with such an odd
    backup procedure, I have no objection to implement that.

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
  • Heikki Linnakangas at Oct 5, 2010 at 9:50 am

    On 05.10.2010 12:47, Fujii Masao wrote:
    I have another question: when should the waiting transactions resume?
    It's a moment the standby has connected to the master? It's a moment
    the standby has caught up with the master? For no data loss, the
    latter seems to be required. Right? Yep.
    The third question: if the WAL file is unfortunately recycled when a
    transaction waits for that WAL file to be shipped forever, how should
    that transaction behave? Still waiting? Cause PANIC? Give up waiting?
    For no data loss, ISTM that the second should be chosen. Right?
    Right, it should keep waiting.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Simon Riggs at Oct 5, 2010 at 11:25 am

    On Tue, 2010-10-05 at 18:47 +0900, Fujii Masao wrote:
    On Tue, Oct 5, 2010 at 5:49 PM, Heikki Linnakangas
    wrote:
    On 04.10.2010 17:22, Fujii Masao wrote:

    If we make all the transactions wait until specified standbys have
    connected to the master, how do we take a base backup from the
    master for those standbys? We seem to be unable to do that because
    pg_start_backup also waits forever. Is this right?
    Hmm, pg_start_backup() writes WAL, but it doesn't commit. Only a commit
    needs to wait for acknowledgment from the standby, so 'wait forever'
    behavior doesn't necessarily mean that you can't take a base backup. If you
    run it outside a transaction you get an implicit commit, though, which will
    wait, so you might need to do something odd like "begin; select
    pg_start_backup(); rollback".
    Yep. Similarly, we would need to enclose also pg_stop_backup with begin
    and rollback.
    Presumably we will have an option to *not* wait forever? So we would be
    able to set the option prior to running the base backup? So there isn't
    any need to do this rollback trick suggested.

    pg_start_backup() and pg_stop_backup() have two use cases:

    1) ensuring both are sent through to the standby would make it very easy
    to allow backups from the standby.

    2) make sure we don't wait, so we can take a base backup at any time

    So there's no argument here to prevent it being in a table.

    --
    Simon Riggs www.2ndQuadrant.com
    PostgreSQL Development, 24x7 Support, Training and Services
  • Fujii Masao at Oct 5, 2010 at 2:07 pm

    On Tue, Oct 5, 2010 at 8:25 PM, Simon Riggs wrote:
    Presumably we will have an option to *not* wait forever? So we would be
    able to set the option prior to running the base backup? So there isn't
    any need to do this rollback trick suggested.
    At the initial setup of the standby, we can easily disable wait-forever
    option and take a base backup. I'm concerned about the case where the
    standby goes down while replication is working. ISTM that we cannot
    easily disable wait-forever option for backup because that disablement
    resumes the waiting transactions.

    In this case, we would need to issue rollback. Or we seem to need to
    shut down the master, take a cold backup, start the master and start
    the standby from that cold backup. Though I'm not sure if this is really
    right procedure..

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
  • Simon Riggs at Oct 5, 2010 at 11:42 am

    On Fri, 2010-10-01 at 07:16 -0700, David Fetter wrote:
    On Fri, Oct 01, 2010 at 07:48:25PM +0900, Fujii Masao wrote:
    I proposed to implement the "return-immediately" at first because it
    doesn't require standby registration. But if many people think that
    the "wait-forever" is the core rather than the "return-immediately",
    I'll follow them. We can implement the "return-immediately" after
    that.
    In my experience, most people who want "synchronous" behavior are
    willing to put up with "wait forever," especially when asynchronous
    behavior is already available.

    In short, +1 for "push 'wait forever' soonest."

    Anybody who's got a Secret Base, Hidden in a Hollowed-Out Mountain,
    Making Grand Plans While Stroking a Long-Haired Cat[1], should please
    to update their public repository, or create a public repository if it
    doesn't already exist, and in either case keep it current.
    You've long held the belief that I code in secret and don't reveal my
    code to people. Not really sure why, since I've contributed so much, so
    openly. Strange.

    I am trying to establish a sensible design based upon public discussion.
    I'm not working on any code currently; my understanding was that we
    would discuss what we were going to do and only then do it.

    I *could* add automatic registration or many other features to my patch.
    Doing so would take hours or days. How would that help us decide what to
    do? I'm not treating this as a race between people's patches; is it a
    race? Or is it a discussion and move forwards by mutual agreement
    towards something sensible?

    --
    Simon Riggs www.2ndQuadrant.com
    PostgreSQL Development, 24x7 Support, Training and Services
  • Dimitri Fontaine at Oct 1, 2010 at 3:06 pm

    Fujii Masao writes:
    I proposed to implement the "return-immediately" at first because it doesn't
    require standby registration. But if many people think that the "wait-forever"
    is the core rather than the "return-immediately", I'll follow them. We can
    implement the "return-immediately" after that.
    Wait forever can be done without standby registration, with quorum commit.

    --
    dim
  • Markus Wanner at Oct 4, 2010 at 7:08 am

    On 10/01/2010 05:06 PM, Dimitri Fontaine wrote:
    Wait forever can be done without standby registration, with quorum commit.
    Yeah, I also think the only reason for standby registration is ease of
    configuration (if at all). There's no technical requirement for standby
    registration, AFAICS. Or does anybody know of a realistic use case
    that's possible with standby registration, but not with quorum commit?

    Regards

    Markus Wanner
  • Simon Riggs at Oct 5, 2010 at 11:45 am

    On Fri, 2010-10-01 at 19:48 +0900, Fujii Masao wrote:

    My intention is to commit the core part of synchronous replication (which would
    be used for every use cases) at first. Then we can implement the
    feature for each
    use case.
    I completely agree that we should commit the core part of sync rep, but
    the question is: what is that? We both have equally valid "cores".
    I agree that 9.1 should support asynchronous standbys in the same mix, but this
    seems to be extended feature rather than very core.
    That is trivial, so no need to exclude that.
    I proposed to implement the "return-immediately" at first because it doesn't
    require standby registration. But if many people think that the "wait-forever"
    is the core rather than the "return-immediately", I'll follow them. We can
    implement the "return-immediately" after that.
    I think its fair to say that many people don't like the specific form of
    standby registration that has been proposed. I really don't mind if it
    exists as an option, but it looks way too complex to me to manage for
    realistic systems.

    Wait-forever needs to be an option. Nobody actually will wait forever,
    so if people select it, they will need some form of clusterware to
    control it and I don't want to see people forced to use clusterware.

    If people do choose wait-forever, then we could also do standby
    registration automatically, to give them something to wait for.

    --
    Simon Riggs www.2ndQuadrant.com
    PostgreSQL Development, 24x7 Support, Training and Services

Related Discussions

People

Translate

site design / logo © 2022 Grokbase