I've thought of another nasty problem for the sync-snapshots patch.
Consider the following sequence of events:

1. Transaction A, which is about to export a snapshot, is running in
database X.
2. Transaction B is making some changes in database Y.
3. A takes and exports a snapshot showing B's xid as running.
4. Transaction B ends.
5. Autovacuum launches in database Y. It sees nothing running in Y,
so it decides it can vacuum dead rows right up to nextXid, including
anything B deleted.
6. Transaction C starts in database Y, and imports the snapshot from A.
Now it thinks it can see rows deleted by B ... but vacuum is busy
removing them, or maybe already finished doing so.

The problem here is that A's xmin is ignored by GetOldestXmin when
calculating cutoff XIDs for non-shared tables in database Y, so it
doesn't protect would-be adoptees of the exported snapshot.

I can see a few alternatives, none of them very pleasant:

1. Restrict exported snapshots to be loaded only by transactions running
in the same database as the exporter. This would fix the problem, but
it cuts out one of the main use-cases for sync snapshots, namely getting
cluster-wide-consistent dumps in pg_dumpall.

2. Allow a snapshot exported from another database to be loaded so long
as this doesn't cause the DB-local value of GetOldestXmin to go
backwards. However, in scenarios such as the above, C is certain to
fail such a test. To make it work, pg_dumpall would have to start
"advance guard" transactions in each database before it takes the
intended-to-be-shared snapshot, and probably even wait for these to be
oldest. Ick.

3. Remove the optimization that lets GetOldestXmin ignore XIDs outside
the current database. This sounds bad, but OTOH I don't think there's
ever been any proof that this optimization is worth much in real-world
usage. We've already had to lobotomize that optimization for walsender
processes, anyway.

4. Somehow mark the xmin of a process that has exported a snapshot so
that it will be honored in all DBs not just the current one. The
difficulty here is that we'd need to know *at the time the snap is
taken* that it's going to be exported. (Consider the scenario above,
except that A doesn't get around to exporting the snapshot it took in
step 3 until between steps 5 and 6. If the xmin wasn't already marked
as globally applicable when vacuum looked at it in step 5, we lose.)
This is do-able but it will contort the user-visible API of the sync
snapshots feature. One way we could do it is to require that
transactions that want to export snapshots set a transaction mode
before they take their first snapshot.

Thoughts, better ideas?

regards, tom lane

Search Discussions

  • Florian Pflug at Oct 21, 2011 at 4:05 pm

    On Oct21, 2011, at 17:36 , Tom Lane wrote:
    1. Restrict exported snapshots to be loaded only by transactions running
    in the same database as the exporter. This would fix the problem, but
    it cuts out one of the main use-cases for sync snapshots, namely getting
    cluster-wide-consistent dumps in pg_dumpall.
    Isn't the use-case getting consistent *parallel* dumps of a single database
    rather than consistent dump of multiple databases? Since we don't have atomic
    cross-database commits, what does using the same snapshot to dump multiple
    databases buy us?

    On that grounds, +1 for option 1 here.
    3. Remove the optimization that lets GetOldestXmin ignore XIDs outside
    the current database. This sounds bad, but OTOH I don't think there's
    ever been any proof that this optimization is worth much in real-world
    usage. We've already had to lobotomize that optimization for walsender
    processes, anyway.
    Hm, we've told people who wanted cross-database access to tables in the
    past to either

    * use dblink or

    * not split their tables over multiple databases in the first place,
    and to use schemas instead

    If we remove the GetOldestXmin optimization, we're essentially reversing
    course on this. Do we really wanna go there?

    best regards,
    Florian Pflug
  • Andrew Dunstan at Oct 21, 2011 at 4:37 pm

    On 10/21/2011 12:05 PM, Florian Pflug wrote:
    On Oct21, 2011, at 17:36 , Tom Lane wrote:
    1. Restrict exported snapshots to be loaded only by transactions running
    in the same database as the exporter. This would fix the problem, but
    it cuts out one of the main use-cases for sync snapshots, namely getting
    cluster-wide-consistent dumps in pg_dumpall.
    Isn't the use-case getting consistent *parallel* dumps of a single database
    rather than consistent dump of multiple databases? Since we don't have atomic
    cross-database commits, what does using the same snapshot to dump multiple
    databases buy us?
    That was my understanding of the use case.

    cheers

    andrew
  • Tom Lane at Oct 21, 2011 at 5:06 pm

    Andrew Dunstan writes:
    On 10/21/2011 12:05 PM, Florian Pflug wrote:
    On Oct21, 2011, at 17:36 , Tom Lane wrote:
    1. Restrict exported snapshots to be loaded only by transactions running
    in the same database as the exporter. This would fix the problem, but
    it cuts out one of the main use-cases for sync snapshots, namely getting
    cluster-wide-consistent dumps in pg_dumpall.
    Isn't the use-case getting consistent *parallel* dumps of a single database
    rather than consistent dump of multiple databases? Since we don't have atomic
    cross-database commits, what does using the same snapshot to dump multiple
    databases buy us?
    That was my understanding of the use case.
    Um, which one are you supporting?

    Anyway, the value of using the same snapshot across all of a pg_dumpall
    run would be that you could be sure that what you'd dumped concerning
    role and tablespace objects was consistent with what you then dump about
    database-local objects. (In principle, anyway --- I'm not sure how
    much of that happens under SnapshotNow rules because of use of backend
    functions. But you'll most certainly never be able to guarantee it if
    pg_dumpall can't export its snapshot to each subsidiary pg_dump run.)

    regards, tom lane
  • Andrew Dunstan at Oct 21, 2011 at 5:59 pm

    On 10/21/2011 01:06 PM, Tom Lane wrote:
    Andrew Dunstan<andrew@dunslane.net> writes:
    On 10/21/2011 12:05 PM, Florian Pflug wrote:
    On Oct21, 2011, at 17:36 , Tom Lane wrote:
    1. Restrict exported snapshots to be loaded only by transactions running
    in the same database as the exporter. This would fix the problem, but
    it cuts out one of the main use-cases for sync snapshots, namely getting
    cluster-wide-consistent dumps in pg_dumpall.
    Isn't the use-case getting consistent *parallel* dumps of a single database
    rather than consistent dump of multiple databases? Since we don't have atomic
    cross-database commits, what does using the same snapshot to dump multiple
    databases buy us?
    That was my understanding of the use case.
    Um, which one are you supporting?

    #1 seemed OK from this POV. Everything else looks ickier and/or more
    fragile, at first glance anyway.
    Anyway, the value of using the same snapshot across all of a pg_dumpall
    run would be that you could be sure that what you'd dumped concerning
    role and tablespace objects was consistent with what you then dump about
    database-local objects. (In principle, anyway --- I'm not sure how
    much of that happens under SnapshotNow rules because of use of backend
    functions. But you'll most certainly never be able to guarantee it if
    pg_dumpall can't export its snapshot to each subsidiary pg_dump run.)
    For someone who is concerned with that, maybe pg_dumpall could have an
    option to take an EXCLUSIVE lock on the shared catalogs?

    cheers

    andrew
  • Tom Lane at Oct 21, 2011 at 5:09 pm

    Florian Pflug writes:
    On Oct21, 2011, at 17:36 , Tom Lane wrote:
    3. Remove the optimization that lets GetOldestXmin ignore XIDs outside
    the current database. This sounds bad, but OTOH I don't think there's
    ever been any proof that this optimization is worth much in real-world
    usage. We've already had to lobotomize that optimization for walsender
    processes, anyway.
    Hm, we've told people who wanted cross-database access to tables in the
    past to either
    * use dblink or
    * not split their tables over multiple databases in the first place,
    and to use schemas instead
    If we remove the GetOldestXmin optimization, we're essentially reversing
    course on this. Do we really wanna go there?
    Huh? The behavior of GetOldestXmin is purely a backend-internal matter.
    I don't see how it's related to cross-database access --- or at least,
    changing this would not represent a significant move towards supporting
    that.

    regards, tom lane
  • Florian Pflug at Oct 21, 2011 at 5:40 pm

    On Oct21, 2011, at 19:09 , Tom Lane wrote:
    Florian Pflug <fgp@phlo.org> writes:
    On Oct21, 2011, at 17:36 , Tom Lane wrote:
    3. Remove the optimization that lets GetOldestXmin ignore XIDs outside
    the current database. This sounds bad, but OTOH I don't think there's
    ever been any proof that this optimization is worth much in real-world
    usage. We've already had to lobotomize that optimization for walsender
    processes, anyway.
    Hm, we've told people who wanted cross-database access to tables in the
    past to either
    * use dblink or
    * not split their tables over multiple databases in the first place,
    and to use schemas instead
    If we remove the GetOldestXmin optimization, we're essentially reversing
    course on this. Do we really wanna go there?
    Huh? The behavior of GetOldestXmin is purely a backend-internal matter.
    I don't see how it's related to cross-database access --- or at least,
    changing this would not represent a significant move towards supporting
    that.
    AFAIR, the performance hit we'd take by making the vacuum cutoff point
    (i.e. GetOldestXmin()) global instead of database-local has been repeatedly
    used in the past as an against against cross-database queries. I have to
    admit that I currently cannot seem to find an entry in the archives to
    back that up, though.

    best regards,
    Florian Pflug
  • Robert Haas at Oct 21, 2011 at 5:47 pm

    On Fri, Oct 21, 2011 at 1:40 PM, Florian Pflug wrote:
    AFAIR, the performance hit we'd take by making the vacuum cutoff point
    (i.e. GetOldestXmin()) global instead of database-local has been repeatedly
    used in the past as an against against cross-database queries. I have to
    admit that I currently cannot seem to find an entry in the archives to
    back that up, though.
    I think the main argument against cross-database queries is that every
    place in the backend that, for example, uses an OID to identify a
    table would need to be modified to use a database OID and a table OID.
    Even if the distributed performance penalty of such a change doesn't
    bother you, the amount of code churn that it would take to make such a
    change is mind-boggling.

    I haven't seen anyone explain why they really need this feature
    anyway, and I think it's going in the wrong direction. IMHO, anyone
    who wants to be doing cross-database queries should be using schemas
    instead, and if that's not workable for some reason, then we should
    improve the schema implementation until it becomes workable. I think
    that the target use case for separate databases ought to be
    multi-tenancy, but what is needed there is actually more isolation
    (e.g. wrt/role names, cluster-wide visibility of pg_database contents,
    etc.), not less.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Florian Pflug at Oct 21, 2011 at 6:06 pm

    On Oct21, 2011, at 19:47 , Robert Haas wrote:
    On Fri, Oct 21, 2011 at 1:40 PM, Florian Pflug wrote:
    AFAIR, the performance hit we'd take by making the vacuum cutoff point
    (i.e. GetOldestXmin()) global instead of database-local has been repeatedly
    used in the past as an against against cross-database queries. I have to
    admit that I currently cannot seem to find an entry in the archives to
    back that up, though.
    I haven't seen anyone explain why they really need this feature
    anyway, and I think it's going in the wrong direction. IMHO, anyone
    who wants to be doing cross-database queries should be using schemas
    instead, and if that's not workable for some reason, then we should
    improve the schema implementation until it becomes workable. I think
    that the target use case for separate databases ought to be
    multi-tenancy, but what is needed there is actually more isolation
    (e.g. wrt/role names, cluster-wide visibility of pg_database contents,
    etc.), not less.
    Agreed. I wasn't trying to argue for cross-database queries - quite the opposite,
    actually. My point was more that since we've used database isolation as an
    argument against cross-database queries in the past, we shouldn't sacrifice
    it now for synchronized snapshots.

    best regards,
    Florian Pflug
  • Robert Haas at Oct 21, 2011 at 6:12 pm

    On Fri, Oct 21, 2011 at 2:06 PM, Florian Pflug wrote:
    On Oct21, 2011, at 19:47 , Robert Haas wrote:
    On Fri, Oct 21, 2011 at 1:40 PM, Florian Pflug wrote:
    AFAIR, the performance hit we'd take by making the vacuum cutoff point
    (i.e. GetOldestXmin()) global instead of database-local has been repeatedly
    used in the past as an against against cross-database queries. I have to
    admit that I currently cannot seem to find an entry in the archives to
    back that up, though.
    I haven't seen anyone explain why they really need this feature
    anyway, and I think it's going in the wrong direction.  IMHO, anyone
    who wants to be doing cross-database queries should be using schemas
    instead, and if that's not workable for some reason, then we should
    improve the schema implementation until it becomes workable.  I think
    that the target use case for separate databases ought to be
    multi-tenancy, but what is needed there is actually more isolation
    (e.g. wrt/role names, cluster-wide visibility of pg_database contents,
    etc.), not less.
    Agreed. I wasn't trying to argue for cross-database queries - quite the opposite,
    actually. My point was more that since we've used database isolation as an
    argument against cross-database queries in the past, we shouldn't sacrifice
    it now for synchronized snapshots.
    Right, I agree. It might be nice to take a cluster-wide dump that is
    guaranteed to be transactionally consistent, but I bet a lot of people
    would actually be happier to see us go the opposite direction - e.g.
    give each database its own XID space, so that activity in one database
    doesn't accelerate the need for anti-wraparound vacuums in another
    database. Not sure that could ever actually happen, but the point is
    that people probably should not be relying on serializability across
    databases too much, because the whole point of the multiple databases
    feature is to have multiple, independent databases in one cluster that
    are thoroughly isolated from each other, and any future changes we
    make should probably lean in that direction.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Tom Lane at Oct 21, 2011 at 6:10 pm

    Florian Pflug writes:
    AFAIR, the performance hit we'd take by making the vacuum cutoff point
    (i.e. GetOldestXmin()) global instead of database-local has been repeatedly
    used in the past as an against against cross-database queries. I have to
    admit that I currently cannot seem to find an entry in the archives to
    back that up, though.
    To my mind, the main problem with cross-database queries is that none of
    the backend is set up to deal with more than one set of system catalogs.

    regards, tom lane
  • Robert Haas at Oct 21, 2011 at 5:18 pm

    On Fri, Oct 21, 2011 at 11:36 AM, Tom Lane wrote:
    I've thought of another nasty problem for the sync-snapshots patch.

    1. Restrict exported snapshots to be loaded only by transactions running
    in the same database as the exporter.  This would fix the problem, but
    it cuts out one of the main use-cases for sync snapshots, namely getting
    cluster-wide-consistent dumps in pg_dumpall.

    2. Allow a snapshot exported from another database to be loaded so long
    as this doesn't cause the DB-local value of GetOldestXmin to go
    backwards.  However, in scenarios such as the above, C is certain to
    fail such a test.  To make it work, pg_dumpall would have to start
    "advance guard" transactions in each database before it takes the
    intended-to-be-shared snapshot, and probably even wait for these to be
    oldest.  Ick.

    3. Remove the optimization that lets GetOldestXmin ignore XIDs outside
    the current database.  This sounds bad, but OTOH I don't think there's
    ever been any proof that this optimization is worth much in real-world
    usage.  We've already had to lobotomize that optimization for walsender
    processes, anyway.

    4. Somehow mark the xmin of a process that has exported a snapshot so
    that it will be honored in all DBs not just the current one.  The
    difficulty here is that we'd need to know *at the time the snap is
    taken* that it's going to be exported.  (Consider the scenario above,
    except that A doesn't get around to exporting the snapshot it took in
    step 3 until between steps 5 and 6.  If the xmin wasn't already marked
    as globally applicable when vacuum looked at it in step 5, we lose.)
    This is do-able but it will contort the user-visible API of the sync
    snapshots feature.  One way we could do it is to require that
    transactions that want to export snapshots set a transaction mode
    before they take their first snapshot.
    I am unexcited by #2 on usability grounds. I agree with you that #3
    might end up being a fairly small pessimization in practice, but I'd
    be inclined to just do #1 for now and revisit the issue when and if
    someone shows an interest in revamping pg_dumpall to do what you're
    proposing (and hopefully a bunch of other cleanup too).

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Tom Lane at Oct 21, 2011 at 6:30 pm

    Robert Haas writes:
    On Fri, Oct 21, 2011 at 11:36 AM, Tom Lane wrote:
    1. Restrict exported snapshots to be loaded only by transactions running
    in the same database as the exporter.  This would fix the problem, but
    it cuts out one of the main use-cases for sync snapshots, namely getting
    cluster-wide-consistent dumps in pg_dumpall.
    I am unexcited by #2 on usability grounds. I agree with you that #3
    might end up being a fairly small pessimization in practice, but I'd
    be inclined to just do #1 for now and revisit the issue when and if
    someone shows an interest in revamping pg_dumpall to do what you're
    proposing (and hopefully a bunch of other cleanup too).
    Seems like that is the consensus view, so that's what I'll do.

    regards, tom lane
  • Simon Riggs at Oct 22, 2011 at 12:25 pm

    On Fri, Oct 21, 2011 at 4:36 PM, Tom Lane wrote:

    I can see a few alternatives, none of them very pleasant:

    1. Restrict exported snapshots to be loaded only by transactions running
    in the same database as the exporter.  This would fix the problem, but
    it cuts out one of the main use-cases for sync snapshots, namely getting
    cluster-wide-consistent dumps in pg_dumpall.
    4. Somehow mark the xmin of a process that has exported a snapshot so
    that it will be honored in all DBs not just the current one.  The
    difficulty here is that we'd need to know *at the time the snap is
    taken* that it's going to be exported.  (Consider the scenario above,
    except that A doesn't get around to exporting the snapshot it took in
    step 3 until between steps 5 and 6.  If the xmin wasn't already marked
    as globally applicable when vacuum looked at it in step 5, we lose.)
    This is do-able but it will contort the user-visible API of the sync
    snapshots feature.  One way we could do it is to require that
    transactions that want to export snapshots set a transaction mode
    before they take their first snapshot.
    1 *and* 4 please.

    So, unless explicitly requested, an exported snapshot is limited to
    just one database. If explicitly requested to be transportable, we can
    use the snapshot in other databases.

    This allows us to do parallel pg_dump in both 1+ databases, as well as
    allowing pg_dumpall to be fully consistent across all dbs.

    --
     Simon Riggs                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
  • Tom Lane at Oct 22, 2011 at 3:28 pm

    Simon Riggs writes:
    1 *and* 4 please.
    Given the lack of enthusiasm I'm not going to do anything about #4 now.
    Somebody else can add it later.
    So, unless explicitly requested, an exported snapshot is limited to
    just one database. If explicitly requested to be transportable, we can
    use the snapshot in other databases.
    Yeah, we could make it work like that when it gets added.

    regards, tom lane

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedOct 21, '11 at 3:36p
activeOct 22, '11 at 3:28p
posts15
users5
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase