FAQ
As background, for most of SSI development I had a TODO comment in the
source which was basically around the question of whether a predicate
lock on a visible heap tuple should propagate to later versions of the
row as long as the original lock was material. In early February Heikki
(quite reasonably) insisted that I resolve that and either add code to
do so or a comment about why it wasn't necessary. After looking at it
for many hours without getting to a logical proof, I thought I had a
proof by example:

http://archives.postgresql.org/pgsql-hackers/2011-02/msg00325.php

We added code for this, but it has been problematic; probably over half
the SSI bugs which have been found since SSI went into the code tree
have been related to this, and Robert Haas has just found a couple more.
In discussing how to address this with Jeff Davis (over beers at
PGCon), he provided good advice about how to properly fix these issues,
but posed some very good questions about whether it was really
necessary. Rather than fix this aspect of the code, we might be able to
eliminate it, which would reduce the code size and some overhead. Since
this code is more fragile than the rest of SSI, this is particularly
appealing -- my favorite way to deal with fragile code is to remove it.

I went back to the example which persuaded me and took another look. On
review I see that this didn't prove the point because there was a
dangerous structure with T1 as a pivot which should have caused SSI to
break the cycle. Since it didn't, either I got careless in my testing
methodology (which I will recheck when I get back to Wisconsin) or there
was a bug -- but either way, this example can't prove that the predicate
locks need to follow the row to new versions.

I haven't been able to come up with an example where it actually *is*
needed, but failure to come up with an example where it is needed
doesn't prove that it isn't needed. Here's my attempt at a logical
proof of whether it is needed. It would be great if anyone with a grasp
of SSI concepts could confirm its validity or shoot it down. (Hopefully
Friday's presentation expanded the number of those who can do so.)

The issue can be rephrased to this: If transaction T1 reads a row (thus
acquiring a predicate lock on it) and a second transaction T2 updates
that row, must a third transaction T3 which updates the new version of
the row have a rw-conflict in from T1?

Now, the first thing which jumps out is the timing requirements -- T3
must overlap T1 or there is no conflict (and therefore no need to
replicate the predicate locks), but T3 must *not* overlap T2 or there
would be ww-conflict and one of them would roll back. If T2 committed
before T1 acquired its snapshot, T1 would have gotten its predicate lock
on the second version of the row, so for a situation where this could
possibly matter, the following actual timings must exist (read "starts"
as meaning "acquires a snapshot"):

T1 and T2 start in either order.
T1 reads the row and T2 updates the row in either order.
T2 commits.
T3 starts.
T3 updates the row.

So far, using broken lines for rw-conflicts and solid for
wr-dependencies, we have this for apparent order of execution:

T1 - - -> T2 -----> T3

Not on the slides for the presentation, but briefly mentioned, is that
Alan Fekete and others have proven that serialization anomalies can only
occur if the transaction which appears on the rw-conflict *out* side of
the pivot (the transaction which appears to have executed third)
actually commits first. T2 committed before T1, so there can't be a
cycle involving a rw-conflict *in* to T1 because that would complete the
dangerous structure -- unless it commits before T2 or is read only and
gets its snapshot before the commit of T2 (another special case which we
left out of the presentation in the interest of time).

Since T3 must get its snapshot after the commit of T2, if it develops a
rw-conflict with T1 there will be an SSI serialization failure without
needing the lock propagation. So now the question breaks down to
whether we can have other transactions in the mix which, given the
above, cause T3 to appear to have executed before T1.

Since T3 cannot have committed before T1 (given what we know about the
actual timings required), that has to involve a rw-conflict somewhere in
the graph. Since a wr-dependency is caused by the writer actually
committing before its work is read by another transaction, causing
apparent order of execution at that point to match actual serial order
of execution, such transactions would be benign in the transaction graph
-- they might exist in a problem graph but would not help to cause a
later-starting transaction to appear to have executed first. We can
look just a rw-conflicts to cause the apparent order of execution to
loop back in time.

Since the time-arrow for rw-conflict points in the direction of the
write, we can ignore read only transactions. So to complete the cycle
back to T1, the transaction T0 with the rw-conflict to T1 must have
committed before T2 committed. This means that it can't overlap with
T3, because it started after the commit of T2.

So the question is now whether T3 can have a rw-conflict (or chain of
conflicts) to T0.

The timings required to make it necessary to replicate predicate locks
to later versions of the row now are:

T0, T1, and T2 start in any order.
T1 reads the row and T2 updates the row in either order; also T1 updates
some other data which conflicts with a T0 read, in any order.
T0 commits.
T2 commits.
T3 starts.
T3 updates the row.

Apparent order of execution is now:

T0 - - -> T1 - - -> T2 -----> T3

We need at least one more transaction to complete the cycle here. For
only one to do it, it must overlap both T0 and T3, it must do some read
which conflicts with a write of T0 and some write which conflicts with a
read of T3.

Listing the timings at this point would be pretty chaotic -- T4 starts
and develops a rw-conflict with T0 anytime before T0 commits, and
develops a rw-conflict with T3 at any point after T3 starts.

Apparent order of execution is now:

T0 - - -> T1 - - -> T2 -----> T3 - - -> T4 - - -> T0

But wait -- T4 is now a pivot with T0 committing first and T3 is not
read only. That part of the cycle is a dangerous structure and one of
those transactions will be rolled back.

So far we have now proven that you can't cause a problem with only five
transactions.

The question is now whether T4 can be replaced with multiple
transactions forming a rw-conflict without one of them committing at a
time which would create a dangerous structure and break the cycle.

[Anyone who has followed along this far has earned my undying
gratitude.]

Since the chain of transactions has to overlap T0 and T3, I don't see
how that can happen. We established that T0 has to commit before T3 can
start, so the chain will ultimately have to get to that T0 commit.

I don't want to start ripping out the apparently useless code without
someone checking my logic here. One big gaff in this area is quite
enough for me. :-/ Anyone?

-Kevin

Search Discussions

  • Pavan Deolasee at May 21, 2011 at 8:45 pm

    On Sat, May 21, 2011 at 4:09 PM, Kevin Grittner wrote:
    It would be great if anyone with a grasp
    of SSI concepts could confirm its validity or shoot it down. (Hopefully
    Friday's presentation expanded the number of those who can do so.)
    As a first step, it would be great if you can upload the slides on the
    conference website. To expect that the attendees would have understood the
    nitty-gritties of SSI just listening to the presentation is so unhuman :-)

    Thanks,
    Pavan

    --
    Pavan Deolasee
    EnterpriseDB http://www.enterprisedb.com
  • Dan Ports at May 22, 2011 at 12:10 am

    On Sat, May 21, 2011 at 04:45:15PM -0400, Pavan Deolasee wrote:
    As a first step, it would be great if you can upload the slides on the
    conference website. To expect that the attendees would have understood the
    nitty-gritties of SSI just listening to the presentation is so unhuman :-)
    I just posted them at
    http://drkp.net/drkp/papers/ssi-pgcon11-slides.pdf

    ...and they'll be linked from the Serializable wiki page as soon as I
    remember how to edit it. :-)

    Dan

    --
    Dan R. K. Ports MIT CSAIL http://drkp.net/
  • Kevin Grittner at May 21, 2011 at 9:07 pm

    Pavan Deolasee wrote:

    As a first step, it would be great if you can upload the slides on
    the conference website. To expect that the attendees would have
    understood the nitty-gritties of SSI just listening to the
    presentation is so unhuman :-)
    I'm sure Dan will be doing that soon; meanwhile, maybe this page
    will help:

    http://wiki.postgresql.org/wiki/Serializable

    -Kevin
  • Robert Haas at May 22, 2011 at 12:36 am

    On Sat, May 21, 2011 at 4:09 PM, Kevin Grittner wrote:
    [Anyone who has followed along this far has earned my undying
    gratitude.]

    Since the chain of transactions has to overlap T0 and T3, I don't see
    how that can happen.  We established that T0 has to commit before T3 can
    start, so the chain will ultimately have to get to that T0 commit.

    I don't want to start ripping out the apparently useless code without
    someone checking my logic here.  One big gaff in this area is quite
    enough for me.  :-/  Anyone?
    How is an UPDATE different from a DELETE and an INSERT?

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Kevin Grittner at May 22, 2011 at 12:50 am

    Robert Haas wrote:

    How is an UPDATE different from a DELETE and an INSERT?
    That's the question Jeff asked which caused us to revisit this.

    There are two answers -- conceptual and implementation.

    Conceptually, an updated row is the same row, and a row inserted after a
    delete is a new row. Note that READ COMMITTED doesn't treat them the
    same on a write conflict. To give a practical example, police
    departments in Wisconsin typically reassign a badge number to a new
    officer after an existing officer leaves. Updating an officer record
    keyed by badge number (say, with a new address or a name change) would
    be qualitatively different from deleting an old officer record and
    inserting a new one for a different person now getting the badge number.
    (OK, so this is somewhat artificial, because they track who had the
    number in what temporal ranges, but you get the idea.)

    In the implementation the only difference between an UPDATE in a table
    and a DELETE and INSERT in the same table in the same transaction
    (besides concurrency handling) is the ctid linkage, at least as far as I
    can see.

    So I think that you can't just treat the two things the same in SSI just
    because the PostgreSQL implementation details make them similar; but I
    think that you can treat the two things the same for the reasons I
    worked out at the start of the thread.

    -Kevin
  • Robert Haas at May 22, 2011 at 1:58 am

    On Sat, May 21, 2011 at 8:50 PM, Kevin Grittner wrote:
    Robert Haas  wrote:
    How is an UPDATE different from a DELETE and an INSERT?
    That's the question Jeff asked which caused us to revisit this.

    There are two answers -- conceptual and implementation.

    Conceptually, an updated row is the same row, and a row inserted after a
    delete is a new row.  Note that READ COMMITTED doesn't treat them the
    same on a write conflict.  To give a practical example, police
    departments in Wisconsin typically reassign a badge number to a new
    officer after an existing officer leaves.  Updating an officer record
    keyed by badge number (say, with a new address or a name change) would
    be qualitatively different from deleting an old officer record and
    inserting a new one for a different person now getting the badge number.
    (OK, so this is somewhat artificial, because they track who had the
    number in what temporal ranges, but you get the idea.)

    In the implementation the only difference between an UPDATE in a table
    and a DELETE and INSERT in the same table in the same transaction
    (besides concurrency handling) is the ctid linkage, at least as far as I
    can see.
    I think the implementation is what matters here. I understand that
    there are certain situations in which the user might choose to UPDATE
    a row and other situations in which they might choose to DELETE and
    then INSERT: but the user's intent is basically irrelevant. If the
    system treats those operations in basically the same way, then it
    shouldn't be necessary to follow the CTID chain in one case given that
    there is no CTID chain in the other case. Or to put that another way,
    if it is necessary to follow the CTID chain, then we should be able to
    articulate a reason for that necessity -- something that is materially
    different in the UPDATE case. Otherwise, we're just following the
    chain "because it's there".

    It seems to me that we can actually state with some degree of
    precision what that "material difference" would need to be. The goal
    of SSI is to prevent serialization anomalies that would not be
    prevented by snapshot isolation. Let's suppose that it successfully
    does that in the DELETE/INSERT case. Suppose further that we change
    SSI so that it handles the UPDATE case in the same way that it handles
    the DELETE/INSERT case. This change will be incorrect only if there
    is a serialization anomaly that snapshot isolation *would have
    prevented* in the DELETE/INSERT case that *it does not prevent* in the
    update case. In other words, if SSI needs to be more rigorous in the
    UPDATE case, it can only be because snapshot isolation is less
    rigorous in that case, and the additional rigor that SSI must apply
    there must be exactly equal to whatever snapshot isolation isn't
    picking up (as compared with the DELETE/INSERT case).

    Does that make any sense? It seems logical to me, but IJWH.
    So I think that you can't just treat the two things the same in SSI just
    because the PostgreSQL implementation details make them similar; but I
    think that you can treat the two things the same for the reasons I
    worked out at the start of the thread.
    Your argument seems reasonable to me; but it would be nice if we could
    find a simpler one, because simpler arguments are less likely to be
    incorrect. :-)

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Kevin Grittner at May 23, 2011 at 2:26 am

    Robert Haas wrote:

    I think the implementation is what matters here. I understand that
    there are certain situations in which the user might choose to
    UPDATE a row and other situations in which they might choose to
    DELETE and then INSERT: but the user's intent is basically
    irrelevant.
    We don't consider it irrelevant when we decide which triggers to
    fire. We do have update triggers distinct from the insert and delete
    triggers. We also consider it relevant when dealing with a write
    conflict in READ COMMITTED mode. Those facts make me very reluctant
    to move based on a simple assertion that it doesn't matter.
    If the system treats those operations in basically the same way,
    then it shouldn't be necessary to follow the CTID chain in one case
    given that there is no CTID chain in the other case. Or to put that
    another way, if it is necessary to follow the CTID chain, then we
    should be able to articulate a reason for that necessity --
    something that is materially different in the UPDATE case.
    There is a wealth of research on which SSI is based. I've read many
    (although by no means *all*) of the papers on the topic, and all of
    the ones I've read have been based around the concept of a row which
    can be updated and retain its identity. I trust the research, but I
    think it is incumbent on us to prove, rather than just assert, that
    it can be applied just as well to a row-version tuple. I sure hope
    it can, because we can have faster, leaner, less fragile code that
    way. I've attempted to write out a proof; although I won't trust
    that without further review -- by me and by others.
    Otherwise, we're just following the chain "because it's there".
    Why would you say it *is* there?
    It seems to me that we can actually state with some degree of
    precision what that "material difference" would need to be. The
    goal of SSI is to prevent serialization anomalies that would not be
    prevented by snapshot isolation. Let's suppose that it successfully
    does that in the DELETE/INSERT case. Suppose further that we change
    SSI so that it handles the UPDATE case in the same way that it
    handles the DELETE/INSERT case. This change will be incorrect only
    if there is a serialization anomaly that snapshot isolation *would
    have prevented* in the DELETE/INSERT case that *it does not
    prevent* in the update case.
    I don't see that -- it could be correct because of the conceptual
    difference between an UPDATE and a DELETE/INSERT pair.
    In other words, if SSI needs to be more rigorous in the UPDATE
    case, it can only be because snapshot isolation is less rigorous in
    that case, and the additional rigor that SSI must apply there must
    be exactly equal to whatever snapshot isolation isn't picking up
    (as compared with the DELETE/INSERT case).

    Does that make any sense? It seems logical to me, but IJWH.
    I've always loved logic, but one of the most intriguing aspects is
    identifying the unproven assumptions in an argument. You have a
    built-in premise that there is no significant difference between an
    UPDATE and a DELETE/INSERT pair, in which case the logic is flawless
    which is leading you to the conclusion that a lock on the visible
    tuple is enough. I'm not confident in that premise, so the simple
    argument doesn't persuade me.
    Your argument seems reasonable to me;
    Thanks much for fighting through it. It is heartening that you
    couldn't punch any holes in it.
    but it would be nice if we could find a simpler one, because
    simpler arguments are less likely to be incorrect. :-)
    All generalizations are false. :-)

    -Kevin
  • Aidan Van Dyk at May 23, 2011 at 1:59 pm

    On Mon, May 23, 2011 at 2:26 AM, Kevin Grittner wrote:


    I don't see that -- it could be correct because of the conceptual
    difference between an UPDATE and a DELETE/INSERT pair.
    In other words, if SSI needs to be more rigorous in the UPDATE
    case, it can only be because snapshot isolation is less rigorous in
    that case, and the additional rigor that SSI must apply there must
    be exactly equal to whatever snapshot isolation isn't picking up
    (as compared with the DELETE/INSERT case).

    Does that make any sense? It seems logical to me, but IJWH.
    I've always loved logic, but one of the most intriguing aspects is
    identifying the unproven assumptions in an argument.  You have a
    built-in premise that there is no significant difference between an
    UPDATE and a DELETE/INSERT pair, in which case the logic is flawless
    which is leading you to the conclusion that a lock on the visible
    tuple is enough.  I'm not confident in that premise, so the simple
    argument doesn't persuade me.
    I *think* (but am in no means familiar with SSI, or an expert on the
    problems it's trying to solve), that Robert was only arguing that SSI
    is only "relevant" to solve problems that the non SSI wouldn't catch.
    And the "sameness" of UPDATE vs DELETE+INSERT, is simply because if
    you can only see the data as it was *completely before* or *completely
    after* a transaction (not as it was after the delete, before the
    insert), then to you, it doesn't matter if the transaction did an
    UPDATE, or an DELETE+INSERT. All you see is either $OLDROW, or
    $NEWROW, depending if you see it before, or after, not the
    transformation from $OLDROW to $NEWROW.

    So, if SSI conflicts something on the UPDATE case, it would necessrily
    have to conflict the DELETE+INSERT case as well, and vice-versa.

    a.

    --
    Aidan Van Dyk                                             Create like a god,
    aidan@highrise.ca                                       command like a king,
    http://www.highrise.ca/                                   work like a slave.
  • Tom Lane at May 23, 2011 at 2:21 pm

    Aidan Van Dyk writes:
    So, if SSI conflicts something on the UPDATE case, it would necessrily
    have to conflict the DELETE+INSERT case as well, and vice-versa.
    This argument is fundamentally bogus, because an UPDATE is not the same
    thing as DELETE followed by INSERT. In the former case the new tuple
    will have a ctid link from the old one; in the latter case not. And
    that will affect the behavior of subsequent operations. Another
    user-visible difference is that if the table has OIDs, the latter case
    will (usually) give the new tuple a different OID. Or if you don't like
    OIDs, think about a serial column. Or an ON INSERT trigger with
    side-effects.

    It may be that the result Kevin seeks to prove is in fact true, but an
    argument that requires the assumption that UPDATE is indistinguishable
    from DELETE+INSERT isn't going to persuade me, because I don't have any
    trouble whatsoever in distinguishing them.

    regards, tom lane
  • Robert Haas at May 23, 2011 at 2:36 pm

    On Mon, May 23, 2011 at 10:20 AM, Tom Lane wrote:
    Aidan Van Dyk <aidan@highrise.ca> writes:
    So, if SSI conflicts something on the UPDATE case, it would necessrily
    have to conflict the DELETE+INSERT case as well, and vice-versa.
    This argument is fundamentally bogus, because an UPDATE is not the same
    thing as DELETE followed by INSERT.  In the former case the new tuple
    will have a ctid link from the old one; in the latter case not.  And
    that will affect the behavior of subsequent operations.
    Right. The point I was driving at is - in what way will that affect
    the behavior of subsequent operations?

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Kevin Grittner at May 23, 2011 at 3:38 pm

    Robert Haas wrote:

    The point I was driving at is - in what way will that affect the
    behavior of subsequent operations?
    You haven't answered why you think it should matter for OID or for
    write conflict on READ COMMITTED but not here. The logical concept
    of a row which is modified is a meaningful one regardless of any
    particular database's internal implementation. The fact that the
    proofs of SSI techniques use row-oriented terminology shouldn't be
    simply ignored. The fact that the two prototype implementations in
    the papers on the topic used databases with "update in place with a
    rollback log" for their MVCC implementations, and took their
    predicate locks on the "in place" row, reinforce that.

    No flaws jumped out at me in a review of the longer logical argument
    after sleeping on it, Robert said it looked good to him, and Dan
    said he would look at it soon. If it looks good to Dan, and nobody
    else pokes a hole in the logic, I'll have enough confidence to
    proceed.

    -Kevin
  • Dan Ports at May 23, 2011 at 6:32 pm

    On Sat, May 21, 2011 at 03:09:12PM -0500, Kevin Grittner wrote:
    I went back to the example which persuaded me and took another look. On
    review I see that this didn't prove the point because there was a
    dangerous structure with T1 as a pivot which should have caused SSI to
    break the cycle. Since it didn't, either I got careless in my testing
    methodology (which I will recheck when I get back to Wisconsin) or there
    was a bug -- but either way, this example can't prove that the predicate
    locks need to follow the row to new versions.
    I'm still working through the more general question, but I wanted to
    see what was going on with this example. It appears there's a bug,
    because this doesn't cause a rollback without the version linking.

    Specifically, the problem is a missing check in
    OnConflict_CheckForSerializationFailure. We check whether the conflict
    has caused the writer to become a pivot, but only if neither the reader
    or writer is committed. Why is that last condition there? In this case,
    the reader (T4) has committed but the writer (T1) hasn't.

    It would be worth making sure there aren't any other cases we're
    missing here, although I don't see any right now.

    Dan

    --
    Dan R. K. Ports MIT CSAIL http://drkp.net/
  • Kevin Grittner at May 23, 2011 at 8:38 pm

    Dan Ports wrote:
    On Sat, May 21, 2011 at 03:09:12PM -0500, Kevin Grittner wrote:
    I went back to the example which persuaded me and took another
    look. On review I see that this didn't prove the point because
    there was a dangerous structure with T1 as a pivot which should
    have caused SSI to break the cycle. Since it didn't, either I
    got careless in my testing methodology (which I will recheck when
    I get back to Wisconsin) or there was a bug -- but either way,
    this example can't prove that the predicate locks need to follow
    the row to new versions.
    I'm still working through the more general question, but I wanted
    to see what was going on with this example. It appears there's a
    bug, because this doesn't cause a rollback without the version
    linking.

    Specifically, the problem is a missing check in
    OnConflict_CheckForSerializationFailure. We check whether the
    conflict has caused the writer to become a pivot, but only if
    neither the reader or writer is committed. Why is that last
    condition there?
    Because Fekete et al proved that the pivot doesn't cause an anomaly
    unless the transaction read by the pivot commits before the pivot or
    the transaction reading the pivot. The problem has to be somewhere
    that fails to detect the T1 pivot.
    In this case, the reader (T4) has committed but the writer (T1)
    hasn't.
    The T4 commit-first makes this safe. Everything should be OK until
    the session 1 activity at the end, which makes T1 a pivot with T2
    committing first. I believe we should get the error on this
    statement:

    update t set txt = 'a' where id = 1;

    I was too wiped out from travel to look at it last night, and can't
    spend any time on it during business hours today, but after 18:00
    CDT today this has my attention.

    -Kevin
  • Kevin Grittner at May 23, 2011 at 9:41 pm

    Dan Ports wrote:

    Specifically, the problem is a missing check in
    OnConflict_CheckForSerializationFailure. We check whether the
    conflict has caused the writer to become a pivot, but only if
    neither the reader or writer is committed. Why is that last
    condition there? In this case, the reader (T4) has committed but
    the writer (T1) hasn't.
    OK, I misread your post -- you are looking at T1 as the pivot, and
    that test *is* the problem. When T1 becomes the pivot the reader
    (T4) is committed, but it committed *after* T2. I can submit a
    patch for that this evening, after testing to confirm that if finds
    the T1 pivot, unless you want to get it.

    Sorry for the misunderstanding. I'm sneaking peeks at this during
    compiles of other stuff....

    -Kevin
  • Dan Ports at May 23, 2011 at 11:11 pm
    I have thought about this quite a bit and am fairly certain we do not need
    to track this linkage between row versions. One strong hint to this
    is that all the work I've seen on multiversion serializability
    theory defines a rw-conflict to be one transaction reading an object
    and the other writing *the next version* of the same object.

    But maybe that doesn't answer the question conclusively -- after all,
    the differences between an object, a tuple, and a row are subtle. So
    see the argument below:
    [I think it's similar to the argument you were making.]
    On Sat, May 21, 2011 at 03:09:12PM -0500, Kevin Grittner wrote:
    The issue can be rephrased to this: If transaction T1 reads a row (thus
    acquiring a predicate lock on it) and a second transaction T2 updates
    that row, must a third transaction T3 which updates the new version of
    the row have a rw-conflict in from T1?
    OK, that's a good way to phrase the question. Does it matter whether
    this edge T1 -> T3 is there?

    If T1 has a conflict in, it certainly doesn't. Adding the edge T1 -> T3
    would create a dangerous structure, but we already had one from the
    edge T1 -> T2, so we would have aborted something anyway.

    Now let's consider the case where T1 doesn't have a conflict in. If
    that's the case, for this edge T1 -> T3 to make a difference, T3 must
    have a rw-conflict out that induces a cycle in the dependency graph,
    i.e. a conflict out to some transaction preceding T1 in the serial
    order. (A conflict out to T1 would work too, but that would mean T1 has
    a conflict in and we would have rolled back.)

    So now we're trying to figure out if there can be an rw-conflict edge
    T3 -> T0, where T0 is some transaction that precedes T1. For T0 to
    precede T1, there has to be has to be some edge, or sequence of edges,
    from T0 to T1. At least the last edge has to be a wr-dependency or
    ww-dependency rather than a rw-conflict, because T1 doesn't have a
    rw-conflict in. And that gives us enough information about the order of
    transactions to see that T3 can't have a rw-dependency to T0:
    - T0 committed before T1 started (the wr/ww-dependency implies this)
    - T1 started before T2 committed (the T1->T2 rw-conflict implies this)
    - T2 committed before T3 started (otherwise, T3 would be aborted
    because of an update conflict)

    That means T0 committed before T3 started, and therefore there can't be
    a rw-conflict from T3 to T0.

    In both cases, we didn't need the T1 -> T3 edge.


    Does that make sense to you? Unless anyone can poke a hole in that
    reasoning, I think we can get rid of the code to handle later versions,
    which would make me happy for many reasons. It will not be missed.

    Dan

    --
    Dan R. K. Ports MIT CSAIL http://drkp.net/
  • Kevin Grittner at May 23, 2011 at 11:50 pm

    Dan Ports wrote:

    [I think it's similar to the argument you were making.]
    Similar, but you found a simpler, shorter way to the end, which
    should make Robert happy. ;-)
    On Sat, May 21, 2011 at 03:09:12PM -0500, Kevin Grittner wrote:
    The issue can be rephrased to this: If transaction T1 reads a row
    (thus acquiring a predicate lock on it) and a second transaction
    T2 updates that row, must a third transaction T3 which updates
    the new version of the row have a rw-conflict in from T1?
    OK, that's a good way to phrase the question. Does it matter
    whether this edge T1 -> T3 is there?

    If T1 has a conflict in, it certainly doesn't. Adding the edge T1
    -> T3 would create a dangerous structure, but we already had one
    from the edge T1 -> T2, so we would have aborted something anyway.
    I had to pause a moment on that because you didn't mention the
    "early enough commit" aspects; but certainly if T3 commits early
    enough to cause a problem then T2 does, so this is solid.
    Now let's consider the case where T1 doesn't have a conflict in.
    If that's the case, for this edge T1 -> T3 to make a difference,
    T3 must have a rw-conflict out that induces a cycle in the
    dependency graph, i.e. a conflict out to some transaction
    preceding T1 in the serial order. (A conflict out to T1 would work
    too, but that would mean T1 has a conflict in and we would have
    rolled back.) Agreed.
    So now we're trying to figure out if there can be an rw-conflict
    edge T3 -> T0, where T0 is some transaction that precedes T1. For
    T0 to precede T1, there has to be has to be some edge, or sequence
    of edges, from T0 to T1. At least the last edge has to be a
    wr-dependency or ww-dependency rather than a rw-conflict, because
    T1 doesn't have a rw-conflict in.

    And that gives us enough information about the order of
    transactions to see that T3 can't have a rw-dependency to T0:
    - T0 committed before T1 started (the wr/ww-dependency implies
    this)
    - T1 started before T2 committed (the T1->T2 rw-conflict implies
    this)
    - T2 committed before T3 started (otherwise, T3 would be aborted
    because of an update conflict)

    That means T0 committed before T3 started,
    [scribbles diagrams] Yep.
    and therefore there can't be a rw-conflict from T3 to T0.
    No overlap means no rw-conflict; so that has to be true.
    In both cases, we didn't need the T1 -> T3 edge.


    Does that make sense to you?
    Makes sense to me. Like the proof I offered, you have shown that
    there is no cycle which can develop with the locks copied which
    isn't there anyway if we don't copy the locks.
    Unless anyone can poke a hole in that reasoning, I think we can
    get rid of the code to handle later versions,
    Actually, we now have two independent proofs which don't have much
    in common beyond the initial statement of the problem. Someone
    would need to show a flaw in that premise or punch holes in both
    proofs. I think we can get by with just the shorter proof (yours)
    in the README file, though.

    -Kevin
  • Kevin Grittner at May 24, 2011 at 9:19 am

    "Kevin Grittner" wrote:
    Dan Ports wrote:
    Does that make sense to you?
    Makes sense to me. Like the proof I offered, you have shown that
    there is no cycle which can develop with the locks copied which
    isn't there anyway if we don't copy the locks.
    I woke up with the nagging thought that while the above is completely
    accurate, it deserves some slight elaboration. These proofs show that
    there is no legitimate cycle which could cause an anomaly which the
    move from row-based to tuple-based logic will miss. They don't prove
    that the change will generate all the same serialization failures;
    and in fact, some false positives are eliminated by the change.
    That's a good thing. In addition to the benefits mentioned in prior
    posts, there will be a reduction in the rate of rollbacks (in
    particular corner cases) from what people see in beta1 without a loss
    of correctness.

    -Kevin
  • Dan Ports at May 25, 2011 at 12:26 am

    On Tue, May 24, 2011 at 04:18:37AM -0500, Kevin Grittner wrote:
    These proofs show that
    there is no legitimate cycle which could cause an anomaly which the
    move from row-based to tuple-based logic will miss. They don't prove
    that the change will generate all the same serialization failures;
    and in fact, some false positives are eliminated by the change.
    Yes, that's correct. That's related to the part in the proof where I
    claimed T3 couldn't have a conflict out *to some transaction T0 that
    precedes T1*.

    I originally tried to show that T3 couldn't have any conflicts out that
    T2 didn't have, which would mean we got the same set of serialization
    failures, but that's not true. In fact, it's not too hard to come up
    with an example where there would be a serialization failure with the
    row version links, but not without. However, because the rw-conflict
    can't be pointing to a transaction that precedes T1 in the serial
    order, it won't create a cycle. In other words, there are serialization
    failures that won't happen anymore, but they were false positives.

    Dan

    --
    Dan R. K. Ports MIT CSAIL http://drkp.net/
  • Kevin Grittner at May 26, 2011 at 3:19 am

    Dan Ports wrote:
    On Tue, May 24, 2011 at 04:18:37AM -0500, Kevin Grittner wrote:

    These proofs show that there is no legitimate cycle which could
    cause an anomaly which the move from row-based to tuple-based
    logic will miss. They don't prove that the change will generate
    all the same serialization failures; and in fact, some false
    positives are eliminated by the change.
    Yes, that's correct. That's related to the part in the proof where
    I claimed T3 couldn't have a conflict out *to some transaction T0
    that precedes T1*.

    I originally tried to show that T3 couldn't have any conflicts out
    that T2 didn't have, which would mean we got the same set of
    serialization failures, but that's not true. In fact, it's not too
    hard to come up with an example where there would be a
    serialization failure with the row version links, but not without.
    However, because the rw-conflict can't be pointing to a transaction
    that precedes T1 in the serial order, it won't create a cycle. In
    other words, there are serialization failures that won't happen
    anymore, but they were false positives.
    Dan and I went around a couple times chasing down all code, comment,
    and patch changes needed, resulting in the attached patch. We found
    and fixed the bug which originally manifested in a way which I
    confused with a need for row locks, as well as another which was
    nearby in the code. We backed out the changes which were causing
    merge problems for Robert, as those were part of the attempt at the
    row locking (versus tuple locking). We removed a function which is
    no longer needed. We adjusted the comments and an affected isolation
    test.

    As might be expected from removing an unnecessary feature, the lines
    of code went down -- a net decrease of 93 lines.

    This patch applies against HEAD, `make check-world` passes as does
    `make -C src/test/isolation installcheck` and `make
    installcheck-world` against a database with
    default_transaction_isolation = 'serializable'. Dan is running
    stress testing against the patched version tonight with DBT-2.

    These changes generate merge conflicts with the work I've done on
    handling CLUSTER, DROP INDEX, etc. It seems to me that the best
    course would be to commit this, then I can rebase the other work and
    post it. Since these issues are orthogonal, it didn't seem like a
    good idea to combine them in one patch, and this one seems more
    urgent.

    -Kevin
  • Heikki Linnakangas at May 26, 2011 at 8:07 am

    On 26.05.2011 06:19, Kevin Grittner wrote:
    Dan and I went around a couple times chasing down all code, comment,
    and patch changes needed, resulting in the attached patch. We found
    and fixed the bug which originally manifested in a way which I
    confused with a need for row locks, as well as another which was
    nearby in the code. We backed out the changes which were causing
    merge problems for Robert, as those were part of the attempt at the
    row locking (versus tuple locking). We removed a function which is
    no longer needed. We adjusted the comments and an affected isolation
    test.
    Could you explain in the README, why it is safe to only take the lock on
    the visible row version, please? It's not quite obvious, as we've seen
    from this discussion, and if I understood correctly the academic papers
    don't touch that subject either.
    As might be expected from removing an unnecessary feature, the lines
    of code went down -- a net decrease of 93 lines.
    That's the kind of patch I like :-).
    These changes generate merge conflicts with the work I've done on
    handling CLUSTER, DROP INDEX, etc. It seems to me that the best
    course would be to commit this, then I can rebase the other work and
    post it. Since these issues are orthogonal, it didn't seem like a
    good idea to combine them in one patch, and this one seems more
    urgent.
    Agreed.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Kevin Grittner at May 26, 2011 at 7:11 pm

    Heikki Linnakangas wrote:

    Could you explain in the README, why it is safe to only take the
    lock on the visible row version, please?
    Sure. I actually intended to do this last night but ran out of
    steam and posted what I had, planning on following up with that.

    The place it seemed to fit best was in the "Innovations" section,
    since the SSI papers and their prototype implementations seemed
    oriented toward "rows" -- certainly the SIREAD locks were at the row
    level, versus a row version level.

    Since this doesn't touch any of the files in yesterday's patch, and
    it seems entirely within the realm of possibility that people will
    want to argue about how best to document this more than the actual
    fix, I'm posting it as a separate patch -- README-SSI only.

    I mostly just copied from Dan's posted proof verbatim.

    -Kevin
  • Heikki Linnakangas at May 30, 2011 at 8:30 pm

    On 26.05.2011 06:19, Kevin Grittner wrote:
    /*
    * Check whether the writer has become a pivot with an out-conflict
    * committed transaction, while neither reader nor writer is committed. If
    * the reader is a READ ONLY transaction, there is only a serialization
    * failure if an out-conflict transaction causing the pivot committed
    * before the reader acquired its snapshot. (That is, the reader must not
    * have been concurrent with the out-conflict transaction.)
    */
    if (!failure)
    {
    if (SxactHasSummaryConflictOut(writer))
    {
    failure = true;
    conflict = NULL;
    }
    else
    conflict = (RWConflict)
    SHMQueueNext(&writer->outConflicts,
    &writer->outConflicts,
    offsetof(RWConflictData, outLink));
    while (conflict)
    {
    if (SxactIsCommitted(conflict->sxactIn)
    && (!SxactIsCommitted(reader)
    conflict->sxactIn->commitSeqNo <= reader->commitSeqNo)
    && (!SxactIsCommitted(writer)
    conflict->sxactIn->commitSeqNo <= writer->commitSeqNo)
    && (!SxactIsReadOnly(reader)
    conflict->sxactIn->commitSeqNo <= reader->SeqNo.lastCommitBeforeSnapshot))
    {
    failure = true;
    break;
    }
    conflict = (RWConflict)
    SHMQueueNext(&writer->outConflicts,
    &conflict->outLink,
    offsetof(RWConflictData, outLink));
    }
    }
    The comment is not in sync with the code. The code is not checking that
    "neither reader or writer has committed", but something more complicated.

    Looking at OnConflict_CheckForSerializationFailure(), it's really hard
    to see how it works, and why the conditions it checks are sufficient. I
    find it helps tremendously to draw the dangerous structures being
    checked, in addition to just explaining them in text. Ascii art is a bit
    clunky, but I think we have to do it here.

    I did some of that in the comments, and I think I understand it now. See
    attached patch. Does that look right to you? (note that I swapped the
    second and third check in the function, I think it's more
    straightforward that way). I also added a couple of questions about the
    conditions, marked with XXX comments. Can you answer those, please?

    PS. Should we say "Cancelled on identification as pivot, during ...", or
    "Cancelled on identification as a pivot, during ..." ? We seem to use
    both in the error messages.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Kevin Grittner at May 30, 2011 at 8:02 pm
    [The first attempt to send this shows as undeliverable to the list.
    Resending for completeness and coherency of archives. Apologies to
    those getting duplicates.]
    Heikki Linnakangas wrote:
    On 26.05.2011 06:19, Kevin Grittner wrote:

    /*
    * Check whether the writer has become a pivot with an
    * out-conflict committed transaction, while neither reader
    * nor writer is committed. If the reader is a READ ONLY
    * transaction, there is only a serialization failure if an
    * out-conflict transaction causing the pivot committed
    * before the reader acquired its snapshot. (That is, the
    * reader must not have been concurrent with the out-conflict
    * transaction.)
    */
    The comment is not in sync with the code. The code is not checking
    that "neither reader or writer has committed", but something more
    complicated.
    True. We changed the code but not the comment. Sorry for missing
    that. Basically the quoted clause would be correct by changing it to
    "neither reader nor writer committed first." (Your rewrite,
    discussed below, is better, though.)
    I find it helps tremendously to draw the dangerous structures being
    checked, in addition to just explaining them in text.
    Agreed -- I spent many an hour with the "dangerous structure" diagram
    in front of me when thinking through the mechanics of implementation.
    Ascii art is a bit clunky, but I think we have to do it here. OK.
    I did some of that in the comments, and I think I understand it
    now. See attached patch. Does that look right to you?
    ! * A serialization failure can only occur if there is a dangerous
    ! * structure in the dependency graph:
    ! *
    ! * Tin ------> Tpivot ------> Tout
    ! * rw rw
    ! *
    ! * Furthermore, Tout must commit first.

    I agree that this is easier to read and understand than just straight
    text; but you dropped one other condition, which we use rather
    heavily. We should probably add back something like:

    * If Tin is declared READ ONLY (or commits without writing), we can
    * only have a problem if Tout committed before Tin acquired its
    * snapshot.
    * XXX: Where does that last condition come from?
    This optimization is an original one, not yet appearing in any
    academic papers; Dan and I are both convinced it is safe, and in
    off-list correspondence with Michael Cahill after he left Oracle, he
    said that he discussed this with Alan Fekete and they both concur
    that it is a safe and good optimization.

    This optimization is mentioned in the README-SSI file in the
    "Innovations" section. Do you think that file needs to have more on
    the topic?
    * XXX: for an anomaly to occur, T2 must've committed before W.
    * Couldn't we easily check that here, or does the fact that W
    * committed already somehow imply it?
    The flags and macros should probably be renamed to make it more
    obvious that this is covered. The flag which SxactHasConflictOut is
    based on is only set when the conflict out is to a transaction which
    committed ahead of the flagged transaction. Regarding the other
    test -- since we have two transactions (Tin and Tout) which have not
    been summarized, and transactions are summarized in order of commit,
    SxactHasSummaryConflictOut implies that the transaction with the flag
    set has not committed before the transaction to which it has a
    rw-conflict out.

    The problem is coming up with a more accurate name which isn't really
    long. The best I can think of is SxactHasConflictOutToEarlierCommit,
    which is a little long. If nobody has a better idea, I guess it's
    better to have a long name than a misleading one. Do you think we
    need to rename SxactHasSummaryConflictOut or just add a comment on
    that use that a summarized transaction will always be committed ahead
    of any transactions which are not summarized?
    (note that I swapped the second and third check in the function, I
    think it's more straightforward that way).
    It doesn't matter to me either way, so if it seems clearer to you,
    that's a win.
    Should we say "Cancelled on identification as pivot, during ...",
    or "Cancelled on identification as a pivot, during ..." ? We seem
    to use both in the error messages.
    Consistency is good. I think it sounds better with the indefinite
    article in there.

    Do you want me to post another patch based on this, or are these
    changes from what you posted small enough that you would prefer to
    just do it as part of the commit?

    Thanks for taking the time to work through this. As always, your
    ideas improve things.

    -Kevin
  • Heikki Linnakangas at May 30, 2011 at 8:53 pm

    On 30.05.2011 17:10, Kevin Grittner wrote:
    Heikki Linnakangas wrote:
    On 26.05.2011 06:19, Kevin Grittner wrote:
    I did some of that in the comments, and I think I understand it
    now. See attached patch. Does that look right to you?
    ! * A serialization failure can only occur if there is a dangerous
    ! * structure in the dependency graph:
    ! *
    ! * Tin ------> Tpivot ------> Tout
    ! * rw rw
    ! *
    ! * Furthermore, Tout must commit first.

    I agree that this is easier to read and understand than just straight
    text; but you dropped one other condition, which we use rather
    heavily. We should probably add back something like:

    * If Tin is declared READ ONLY (or commits without writing), we can
    * only have a problem if Tout committed before Tin acquired its
    * snapshot.
    * XXX: Where does that last condition come from?
    This optimization is an original one, not yet appearing in any
    academic papers; Dan and I are both convinced it is safe, and in
    off-list correspondence with Michael Cahill after he left Oracle, he
    said that he discussed this with Alan Fekete and they both concur
    that it is a safe and good optimization.

    This optimization is mentioned in the README-SSI file in the
    "Innovations" section. Do you think that file needs to have more on
    the topic?
    Oh, I see this:
    o We can avoid ever rolling back a transaction when the
    transaction on the conflict *in* side of the pivot is explicitly or
    implicitly READ ONLY unless the transaction on the conflict *out*
    side of the pivot committed before the READ ONLY transaction acquired
    its snapshot. (An implicit READ ONLY transaction is one which
    committed without writing, even though it was not explicitly declared
    to be READ ONLY.)
    Since this isn't coming from the papers, it would be nice to explain why
    that is safe.
    * XXX: for an anomaly to occur, T2 must've committed before W.
    * Couldn't we easily check that here, or does the fact that W
    * committed already somehow imply it?
    The flags and macros should probably be renamed to make it more
    obvious that this is covered. The flag which SxactHasConflictOut is
    based on is only set when the conflict out is to a transaction which
    committed ahead of the flagged transaction. Regarding the other
    test -- since we have two transactions (Tin and Tout) which have not
    been summarized, and transactions are summarized in order of commit,
    SxactHasSummaryConflictOut implies that the transaction with the flag
    set has not committed before the transaction to which it has a
    rw-conflict out.
    Ah, got it.
    The problem is coming up with a more accurate name which isn't really
    long. The best I can think of is SxactHasConflictOutToEarlierCommit,
    which is a little long. If nobody has a better idea, I guess it's
    better to have a long name than a misleading one. Do you think we
    need to rename SxactHasSummaryConflictOut or just add a comment on
    that use that a summarized transaction will always be committed ahead
    of any transactions which are not summarized?
    Dunno. I guess a comment would do. Can you write a separate patch for
    that, please?
    (note that I swapped the second and third check in the function, I
    think it's more straightforward that way).
    It doesn't matter to me either way, so if it seems clearer to you,
    that's a win.
    FWIW, the reason I prefer it that way is that the function is checking
    three scenarios:

    R ----> W ----> T2, where W committed after T2
    R ----> W ----> T2, "else"
    T0 ----> R ----> W

    It seems more straightforward to keep both "R ----> W ----> T2" cases
    together.
    Should we say "Cancelled on identification as pivot, during ...",
    or "Cancelled on identification as a pivot, during ..." ? We seem
    to use both in the error messages.
    Consistency is good. I think it sounds better with the indefinite
    article in there.
    Ok, will do.
    Do you want me to post another patch based on this, or are these
    changes from what you posted small enough that you would prefer to
    just do it as part of the commit?
    I've committed with the discussed changes.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Kevin Grittner at Jun 1, 2011 at 10:09 pm

    Heikki Linnakangas wrote:
    On 30.05.2011 17:10, Kevin Grittner wrote:

    This optimization is an original one, not yet appearing in any
    academic papers; Dan and I are both convinced it is safe, and in
    off-list correspondence with Michael Cahill after he left Oracle,
    he said that he discussed this with Alan Fekete and they both
    concur that it is a safe and good optimization.

    This optimization is mentioned in the README-SSI file in the
    "Innovations" section. Do you think that file needs to have more
    on
    the topic?
    Oh, I see this:
    o We can avoid ever rolling back a transaction when the
    transaction on the conflict *in* side of the pivot is explicitly
    or
    implicitly READ ONLY unless the transaction on the conflict *out*
    side of the pivot committed before the READ ONLY transaction
    acquired its snapshot. (An implicit READ ONLY transaction is one
    which committed without writing, even though it was not
    explicitly declared to be READ ONLY.)
    Since this isn't coming from the papers, it would be nice to
    explain why that is safe.
    I see that this issue first surfaced on the Wiki page 2 April, 2010,
    and was never really described in detail on the -hackers list. To
    ensure that it has some documentation here (in case of any possible
    IP controversy), I will describe a proof now. For earlier
    references one could dig around in Wiki history, posted patches
    during CFs, and the git repository history. I have kept a copy of
    the repo before the official conversion from CVS.

    From many academic papers, there is well-established proof that
    serialization anomalies can only occur under snapshot isolation when
    there is a cycle in the graph of apparent order of execution of the
    transactions, and that in such a cycle the following pattern of
    rw-dependencies always occurs:

    Tin - - -> Tpivot - - -> Tout

    A rw-dependency (also called a rw-conflict) exists when a read by
    one transaction doesn't see the write of another transaction because
    the two transactions overlap, regardless of whether the read or the
    write actually happens first. Since the reader doesn't see the work
    of the writer, the reader appears to have executed first, regardless
    of the actual order of snapshot acquisition or commits. The arrows
    show the apparent order of execution of the transactions -- Tin
    first, Tout last. Published papers have further proven that the
    transaction which appears to have executed last of these three must
    actually commit before either of the others for an anomaly to occur.
    Tin and Tout may be the same transaction (as in the case of simple
    write skew), or two distinct transactions.

    SSI relies on recognition of this "dangerous structure" to decide
    when to cancel a transaction to preserve data integrity. No attempt
    is made to complete the cycle from Tout back to Tin. Partly this is
    because of the expense of doing so -- there could be a long chain of
    rw-dependencies, wr-dependencies (where the reader *can* see the
    work of another transaction because it *did* commit first), and
    ww-dependencies (where the writer is updating a row version left by
    another transaction, which must have committed first). There is
    also the uncomfortable possibility that a client application
    *outside* the database ran a transaction which made some change and,
    after observing the successful commit of this transaction, is
    proceeding with the knowledge that it ran before the next transaction
    is submitted.

    The novel optimization we've used in the PostgreSQL implementation
    of SSI is based on the fact that of these four ways of determining
    which transaction appears to have executed first, all but the
    rw-dependency are based on the transaction which appears to have
    executed first committing before the apparent later transaction
    acquires its snapshot. When you combine this with the fact that the
    transaction which appears to execute later in a rw-dependency is the
    one which performed the write, you have the basis of an interesting
    optimization for READ ONLY transactions.

    In the above diagram, if Tin is READ ONLY, it cannot have a
    rw-dependency *in* from some other transaction. The only way Tout
    can directly appear to have executed before Tin is if Tout committed
    before Tin acquired its snapshot, so that Tin can read something
    Tout wrote, or an external application can know that it successfully
    committed Tout before beginning Tin. The published conditions must
    all still hold -- Tin and Tout must both overlap Tpivot, the same
    rw-dependencies must exist, and Tout must still commit first of the
    three; but when Tin doesn't write, Tout must actually commit *even
    earlier* -- before Tin gets started -- to have an anomaly.

    For a proof the question becomes: If Tin does not write and Tin and
    Tout overlap, can a dependency or chain of dependencies develop
    which makes it appear that Tout executed before Tin? If this can't
    happen without creating a dangerous structure around a different
    pivot, then no cycle can develop and the optimization is safe.

    Since Tin doesn't write, no transaction can appear to come before it
    because of failure to read its writes -- no rw-dependency in to this
    transaction is possible. The only transactions Tin can appear to
    follow are transactions which committed in time to make their work
    visible to Tin. Let's assume such a transaction exists, called T0:

    T0 -----> Tin - - -> Tpivot - - -> Tout

    It would be possible for Tout to overlap T0 and develop a
    rw-dependency out to it, but T0 must commit before Tin starts, so
    for Tout to overlap Tin, T0 must commit before Tout, so a
    rw-dependency from Tout to T0 would make Tout a pivot and cause a
    rollback which would break the cycle. Any other transaction to
    which Tout developed a rw-dependency out would have the same
    problem.

    Any *other* type of dependency out from Tout would require the
    transaction to acquire its snapshot after the commit of Tout. Since
    T0 commits before Tin begins and Tout overlaps Tin, the commit of
    Tout must come after the commit of T0, so no transaction which
    begins after Tout commits can overlap T0. This leaves no possible
    way to form a cycle when Tin is READ ONLY and Tout overlaps Tin.

    I won't be shocked if Dan can come up with a shorter proof, but I'm
    confident this one is solid.

    Patch to come soon, but I thought it best to post the proof here
    first to allow a chance to refine it based on discussion -- or
    shorter proofs.

    -Kevin
  • Dan Ports at Jun 2, 2011 at 7:14 am

    On Wed, Jun 01, 2011 at 05:09:09PM -0500, Kevin Grittner wrote:
    I won't be shocked if Dan can come up with a shorter proof, but I'm
    confident this one is solid.
    Well, so happens I wrote a proof on the airplane today, before I saw
    your mail. It's actually quite straightforward... (well, at least more
    so than I was expecting)
    From many academic papers, there is well-established proof that
    serialization anomalies can only occur under snapshot isolation when
    there is a cycle in the graph of apparent order of execution of the
    transactions, and that in such a cycle the following pattern of
    rw-dependencies always occurs:

    Tin - - -> Tpivot - - -> Tout

    A rw-dependency (also called a rw-conflict) exists when a read by
    one transaction doesn't see the write of another transaction because
    the two transactions overlap, regardless of whether the read or the
    write actually happens first. Since the reader doesn't see the work
    of the writer, the reader appears to have executed first, regardless
    of the actual order of snapshot acquisition or commits. The arrows
    show the apparent order of execution of the transactions -- Tin
    first, Tout last. Published papers have further proven that the
    transaction which appears to have executed last of these three must
    actually commit before either of the others for an anomaly to occur.
    We can actually say something slightly stronger than that last
    sentence: Tout has to commit before *any* other transaction in the
    cycle. That doesn't help us implement SSI, because we never try to look
    at an entire cycle, but it's still true and useful for proofs like this.

    Now, supposing Tin is read-only...

    Since there's a cycle, there must also be a transaction that precedes
    Tin in the serial order. Call it T0. (T0 might be the same transaction
    as Tout, but that doesn't matter.) There's an edge in the graph from
    T0 to Tin. It can't be a rw-conflict, because Tin was read-only, so it
    must be a ww- or wr-dependency. Either means T0 committed before Tin
    started.

    Because Tout committed before any other transaction in the cycle, Tout
    has to commit before T0 commits -- and thus before Tin starts.

    Dan

    --
    Dan R. K. Ports MIT CSAIL http://drkp.net/
  • Kevin Grittner at Jun 2, 2011 at 6:01 pm

    Dan Ports wrote:
    On Wed, Jun 01, 2011 at 05:09:09PM -0500, Kevin Grittner wrote:

    Published papers have further proven that the transaction which
    appears to have executed last of these three must actually commit
    before either of the others for an anomaly to occur.
    We can actually say something slightly stronger than that last
    sentence: Tout has to commit before *any* other transaction in the
    cycle. That doesn't help us implement SSI, because we never try to
    look at an entire cycle, but it's still true and useful for proofs
    like this.
    I didn't know that, although it doesn't seem too surprising. With
    that as a given, the proof can be quite short and straightforward.
    Now, supposing Tin is read-only...

    Since there's a cycle, there must also be a transaction that
    precedes Tin in the serial order. Call it T0. (T0 might be the
    same transaction as Tout, but that doesn't matter.) There's an
    edge in the graph from T0 to Tin. It can't be a rw-conflict,
    because Tin was read-only, so it must be a ww- or wr-dependency.
    Either means T0 committed before Tin started.

    Because Tout committed before any other transaction in the cycle,
    Tout has to commit before T0 commits -- and thus before Tin
    starts.
    If we're going to put this into the README-SSI as the proof of the
    validity of this optimization, I'd like to have a footnote pointing
    to a paper describing the "first commit in the cycle" aspect of a
    dangerous structure. Got any favorites, or should I fall back on a
    google search?

    -Kevin
  • Dan Ports at Jun 2, 2011 at 9:37 pm

    On Thu, Jun 02, 2011 at 01:01:05PM -0500, Kevin Grittner wrote:
    If we're going to put this into the README-SSI as the proof of the
    validity of this optimization, I'd like to have a footnote pointing
    to a paper describing the "first commit in the cycle" aspect of a
    dangerous structure. Got any favorites, or should I fall back on a
    google search?
    Hmm. I don't see any that state that in so many words, but it's an
    obvious consequence of the proof of Theorem 2.1 in "Making Snapshot
    Isolation Serializable" -- note that T3 is chosen to be the transaction
    in the cycle with the earliest commit time.

    Dan

    --
    Dan R. K. Ports MIT CSAIL http://drkp.net/
  • Kevin Grittner at Jun 2, 2011 at 9:14 pm

    Heikki Linnakangas wrote:
    On 30.05.2011 17:10, Kevin Grittner wrote:
    Heikki Linnakangas wrote:
    * XXX: for an anomaly to occur, T2 must've committed
    * before W. Couldn't we easily check that here, or does
    * the fact that W committed already somehow imply it?
    The flags and macros should probably be renamed to make it more
    obvious that this is covered. The flag which SxactHasConflictOut
    is based on is only set when the conflict out is to a transaction
    which committed ahead of the flagged transaction. Regarding the
    other test -- since we have two transactions (Tin and Tout) which
    have not been summarized, and transactions are summarized in
    order of commit, SxactHasSummaryConflictOut implies that the
    transaction with the flag set has not committed before the
    transaction to which it has a rw-conflict out.
    Ah, got it.
    The problem is coming up with a more accurate name which isn't
    really long. The best I can think of is
    SxactHasConflictOutToEarlierCommit, which is a little long. If
    nobody has a better idea, I guess it's better to have a long name
    than a misleading one. Do you think we need to rename
    SxactHasSummaryConflictOut or just add a comment on that use that
    a summarized transaction will always be committed ahead of any
    transactions which are not summarized?
    Dunno. I guess a comment would do. Can you write a separate patch
    for that, please?
    Attached is a comments-only patch for this, along with one
    correction to the comments you added and a couple other typos.

    I'll submit a patch for the README-SSI file once I find a reference
    I like to a paper describing what Dan's proof uses as a premise --
    that the transaction on the rw-conflict *out* side of the pivot must
    not only be the first of the three transactions in the dangerous
    structure to commit, but the first in the entire cycle of which the
    dangerous structure is a part. With that premise as a given, the
    proof is short, clear, and unassailable; but I think we need a cite
    to use that argument convincingly.

    -Kevin
  • Heikki Linnakangas at Jun 3, 2011 at 10:04 am

    On 03.06.2011 00:14, Kevin Grittner wrote:
    Attached is a comments-only patch for this, along with one
    correction to the comments you added and a couple other typos.
    Ok, committed.
    I'll submit a patch for the README-SSI file once I find a reference
    I like to a paper describing what Dan's proof uses as a premise --
    that the transaction on the rw-conflict *out* side of the pivot must
    not only be the first of the three transactions in the dangerous
    structure to commit, but the first in the entire cycle of which the
    dangerous structure is a part. With that premise as a given, the
    proof is short, clear, and unassailable; but I think we need a cite
    to use that argument convincingly.
    Agreed.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedMay 21, '11 at 8:09p
activeJun 3, '11 at 10:04a
posts31
users7
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase