I wrote:
Alvaro Herrera <alvherre@commandprompt.com> writes:
Shall I work on a fix? I expect you are plenty busy with commitfest
stuff, but please let me know otherwise.
I have what-I-think-is-the-fix pretty clear in my own mind, so let me
give it a try. If it doesn't work I'll bounce it back to you.
Well, I soon ran into the issue that delaying the snapshot release makes
TopTransactionResourceOwner spit up. After some reflection I decided
that the real problem is a circular dependency: snapshot management must
be considered lower-level than ResourceOwners because ResourceOwners
tell snapshot management what to do, but here we have
GetTransactionSnapshot trying to use TopTransactionResourceOwner to
manage its internal reference to the transaction snapshot.

Accordingly, the attached proposed patch gets rid of the circularity
by removing snapmgr.c's dependency on TopTransactionResourceOwner,
in favor of having it track the refcount "manually". This was messier
than I'd hoped because the bogus design had propagated into the SSI
manager meanwhile, but removing the TopTransactionResourceOwner
dependency from that too seems like a good idea.

This passes the regular and isolation regression tests, and it's also
okay with Yamamoto-san's prepared-ROLLBACK test case even without the
band-aid fix in plancache.c. I can't immediately think of any other
test cases to throw at it.

Comments?

regards, tom lane

Search Discussions

  • Alvaro Herrera at Sep 27, 2011 at 12:33 am

    Excerpts from Tom Lane's message of lun sep 26 20:59:45 -0300 2011:

    Well, I soon ran into the issue that delaying the snapshot release makes
    TopTransactionResourceOwner spit up. After some reflection I decided
    that the real problem is a circular dependency: snapshot management must
    be considered lower-level than ResourceOwners because ResourceOwners
    tell snapshot management what to do, but here we have
    GetTransactionSnapshot trying to use TopTransactionResourceOwner to
    manage its internal reference to the transaction snapshot.
    Great. I noticed the circularity but didn't reflect that it was bogus
    in itself.
    Accordingly, the attached proposed patch gets rid of the circularity
    by removing snapmgr.c's dependency on TopTransactionResourceOwner,
    in favor of having it track the refcount "manually". This was messier
    than I'd hoped because the bogus design had propagated into the SSI
    manager meanwhile, but removing the TopTransactionResourceOwner
    dependency from that too seems like a good idea.
    To be honest, I panicked for a second when I saw the new
    SnapshotResetXmin call, before I realized that it wasn't necessary
    before. The serializable case makes more sense the patched way, I
    think.
    This passes the regular and isolation regression tests, and it's also
    okay with Yamamoto-san's prepared-ROLLBACK test case even without the
    band-aid fix in plancache.c. I can't immediately think of any other
    test cases to throw at it.
    I added tests for all the problematic cases discovered after snapmgr was
    introduced, so at least most known weird spots are covered.

    Thanks

    --
    Álvaro Herrera <alvherre@commandprompt.com>
    The PostgreSQL Company - Command Prompt, Inc.
    PostgreSQL Replication, Consulting, Custom Development, 24x7 support
  • Tom Lane at Sep 27, 2011 at 12:44 am

    Alvaro Herrera writes:
    To be honest, I panicked for a second when I saw the new
    SnapshotResetXmin call, before I realized that it wasn't necessary
    before. The serializable case makes more sense the patched way, I
    think.
    Yeah, in the old coding, SnapshotResetXmin would have happened
    implicitly at the last UnregisterSnapshot call. In this arrangement,
    the "last UnregisterSnapshot" is going to be the manual one in
    AtEOXact_Snapshot, so we need a manual SnapshotResetXmin there too.
    I chose to put it at the bottom of the routine so that it's guaranteed
    to fire even if the RegisteredSnapshots count was corrupted, but that's
    just paranoia.

    regards, tom lane

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedSep 26, '11 at 11:59p
activeSep 27, '11 at 12:44a
posts3
users2
websitepostgresql.org...
irc#postgresql

2 users in discussion

Tom Lane: 2 posts Alvaro Herrera: 1 post

People

Translate

site design / logo © 2022 Grokbase