FAQ
Various people, including at least Heikki, Andres, and myself, have
proposed various schemes for avoiding freezing that amount to doing it
sooner, when we're already writing WAL anyway, or at least when the
buffer is already dirty anyway, or at least while the buffer is
already in shared_buffers anyway. Various people, including at least
Tom and Andres, have raised the point that this would lose
possibly-useful forensic information that they have in the past found
to be of tangible value in previous debugging of databases that have
somehow gotten messed up. I don't know who originally proposed it,
but I've had many conversations about how we could address this
concern: instead of replacing xmin when we freeze, just set an
infomask bit that means "xmin is frozen" and leave the old, literal
xmin in place. FrozenTransactionId would still exist and still be
understood, of course, but new freezing operations wouldn't use it.

I have attempted to implement this. Trouble is, we're out of infomask
bits. Using an infomask2 bit might work but we don't have many of
them left either, so it's worth casting about a bit for a better
solution. Andres proposed using HEAP_MOVED_IN|HEAP_MOVED_OFF for
this purpose, but I think we're better off trying to reclaim those
bits in a future release. Exactly how to do that is a topic for
another email, but I believe it's very possible. What I'd like to
propose instead is using HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID to
indicate that xmin is frozen. This bit pattern isn't used for
anything else, so there's no confusion possible with existing data
already on disk, but it's necessary to audit all code that checks
HEAP_XMIN_INVALID to make sure it doesn't get confused. I've done
this, and there's little enough of it that it seems pretty easy to
handle.

A somewhat larger problem is that this requires auditing every place
that looks at a tuple xmin and deciding whether any changes are needed
to handle the possibility that the tuple may be frozen even though
xmin != FrozenTransactionId. This is a somewhat more significant
change, but it doesn't seem to be too bad. But there are a couple of
cases that are tricky enough that they seem worth expounding upon:

- When we follow HOT chains, we determine where the HOT chain ends by
matching the xmax of each tuple with the xmin of the next tuple. If
they don't match, we conclude that the HOT chain has ended. I
initially thought this logic might be buggy even as things stand today
if the latest tuple in the chain is frozen, but as Andres pointed out
to me, that's not so. If the newest tuple in the chain is all-visible
(the earliest point at which we can theoretically freeze it), all
earlier tuples are dead altogether, and heap_page_prune() is always
called after locking the buffer and before freezing, so any tuple we
freeze must be the first in its HOT chain. For the same reason, this
logic doesn't need any adjustment for the new freezing system: it's
never looking at anything old enough to be frozen in the first place.

- Various procedural languages use the combination of TID and XMIN to
determine whether a function needs to be recompiled. Although the
possibility of this doing the wrong thing seems quite remote, it's not
obvious to me why it's theoretically correct even as things stand
today. Suppose that previously-frozen tuple is vacuumed away and
another tuple is placed at the same TID and then frozen. Then, we
check whether the cache is still valid and, behold, it is. This would
actually get better with this patch, since it wouldn't be enough
merely for the old and new tuples to both be frozen; they'd have to
have had the same XID prior to freezing. I think that could only
happen if a backend sticks around for at least 2^32 transactions, but
I don't know what would prevent it in that case.

- heap_get_latest_tid() appears broken even without this patch. It's
only used on user-specified TIDs, either in a TID scan, or due to the
use of currtid_byreloid() and currtid_byrelname(). It attempts find
the latest version of the tuple referenced by the given TID by
following the CTID links. Like HOT, it uses XMAX/XMIN matching to
detect when the chain is broken. However, unlike HOT, update chains
can in general span multiple blocks. It is not now nor has it ever
been safe to assume that the next tuple in the chain can't be frozen
before the previous one is vacuumed away. Andres came up with the
best example: suppose the tuple to be frozen physically precedes its
predecessor; then, an in-progress vacuum might reach the to-be-frozen
tuple before it reaches (and removes) the previous row version. In
newer releases, the same problem could be caused by vacuum's
occasional page-skipping behavior. As with the previous point, the
"don't actually change xmin when we freeze" approach actually makes it
harder for a chain to get "broken" when it shouldn't, but I suspect
it's just moving us from one set of extremely-obscure failure cases to
another.

This patch probably needs some documentation updates. Suggestions as
to where would be appreciated.

As a general statement, I view this work as something that is likely
needed no matter which one of the "remove freezing" approaches that
have been proposed we choose to adopt. It does not fix anything in
and of itself, but it (hopefully) removes an objection to the entire
line of inquiry.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Search Discussions

  • Josh Berkus at May 28, 2013 at 11:11 pm

    On 05/28/2013 06:21 AM, Robert Haas wrote:
    As a general statement, I view this work as something that is likely
    needed no matter which one of the "remove freezing" approaches that
    have been proposed we choose to adopt. It does not fix anything in
    and of itself, but it (hopefully) removes an objection to the entire
    line of inquiry.
    Agreed. I have some ideas on how to reduce the impact of freezing as
    well (of course), and the description of your approach certainly seems
    to benefit them, especially as it removes the whole "forensic
    information" objection.

    One question though: if we're not removing the xmin, how do we know the
    maximum xid to which we can prune clog? I can imagine several ways
    given your approach.

    --
    Josh Berkus
    PostgreSQL Experts Inc.
    http://pgexperts.com
  • Andres Freund at May 28, 2013 at 11:27 pm

    On 2013-05-28 09:39:13 -0700, Josh Berkus wrote:
    On 05/28/2013 06:21 AM, Robert Haas wrote:
    As a general statement, I view this work as something that is likely
    needed no matter which one of the "remove freezing" approaches that
    have been proposed we choose to adopt. It does not fix anything in
    and of itself, but it (hopefully) removes an objection to the entire
    line of inquiry.
    Agreed. I have some ideas on how to reduce the impact of freezing as
    well (of course), and the description of your approach certainly seems
    to benefit them, especially as it removes the whole "forensic
    information" objection.

    One question though: if we're not removing the xmin, how do we know the
    maximum xid to which we can prune clog? I can imagine several ways
    given your approach.
    Simply don't count xids which are frozen. Currently we ignore an xid
    because its a special value, after this because the tuple has a certain
    hint bit (combination) set.

    Greetings,

    Andres Freund

    --
      Andres Freund http://www.2ndQuadrant.com/
      PostgreSQL Development, 24x7 Support, Training & Services
  • Robert Haas at May 28, 2013 at 11:33 pm

    On Tue, May 28, 2013 at 7:27 PM, Andres Freund wrote:
    On 2013-05-28 09:39:13 -0700, Josh Berkus wrote:
    On 05/28/2013 06:21 AM, Robert Haas wrote:
    As a general statement, I view this work as something that is likely
    needed no matter which one of the "remove freezing" approaches that
    have been proposed we choose to adopt. It does not fix anything in
    and of itself, but it (hopefully) removes an objection to the entire
    line of inquiry.
    Agreed. I have some ideas on how to reduce the impact of freezing as
    well (of course), and the description of your approach certainly seems
    to benefit them, especially as it removes the whole "forensic
    information" objection.

    One question though: if we're not removing the xmin, how do we know the
    maximum xid to which we can prune clog? I can imagine several ways
    given your approach.
    Simply don't count xids which are frozen. Currently we ignore an xid
    because its a special value, after this because the tuple has a certain
    hint bit (combination) set.
    Right, what he said. Calculating the XID before which we no longer
    need CLOG is just a matter of looking at all the tuples that we don't
    know to be frozen and taking the oldest XID from among those. This
    patch changes the definition of "frozen" but that's a pretty minor
    detail of the CLOG-truncation calculation. So, in essence, this patch
    doesn't really make much difference in that area either way.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Andres Freund at May 29, 2013 at 12:00 am

    On 2013-05-28 09:21:27 -0400, Robert Haas wrote:
    I have attempted to implement this. Trouble is, we're out of infomask
    bits. Using an infomask2 bit might work but we don't have many of
    them left either, so it's worth casting about a bit for a better
    solution. Andres proposed using HEAP_MOVED_IN|HEAP_MOVED_OFF for
    this purpose, but I think we're better off trying to reclaim those
    bits in a future release. Exactly how to do that is a topic for
    another email, but I believe it's very possible. What I'd like to
    propose instead is using HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID to
    indicate that xmin is frozen. This bit pattern isn't used for
    anything else, so there's no confusion possible with existing data
    already on disk, but it's necessary to audit all code that checks
    HEAP_XMIN_INVALID to make sure it doesn't get confused. I've done
    this, and there's little enough of it that it seems pretty easy to
    handle.
    I only suggested MOVED_IN/OFF because I didn't remember
    HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID was still free ;). So, unless that
    combination could have been produced in the past in a way I couldn't
    find so far, I am all for it. I don't see a problem with breaking
    backward compatibility on that level and I don't think there's any way
    to get there without some level of low level compat break.
    - When we follow HOT chains, we determine where the HOT chain ends by
    matching the xmax of each tuple with the xmin of the next tuple. If
    they don't match, we conclude that the HOT chain has ended. I
    initially thought this logic might be buggy even as things stand today
    if the latest tuple in the chain is frozen, but as Andres pointed out
    to me, that's not so. If the newest tuple in the chain is all-visible
    (the earliest point at which we can theoretically freeze it), all
    earlier tuples are dead altogether, and heap_page_prune() is always
    called after locking the buffer and before freezing, so any tuple we
    freeze must be the first in its HOT chain. For the same reason, this
    logic doesn't need any adjustment for the new freezing system: it's
    never looking at anything old enough to be frozen in the first place.
    I am all for adding a comment why this is safe though. We thought about
    it for some while before we were convinced...
    - Various procedural languages use the combination of TID and XMIN to
    determine whether a function needs to be recompiled. Although the
    possibility of this doing the wrong thing seems quite remote, it's not
    obvious to me why it's theoretically correct even as things stand
    today. Suppose that previously-frozen tuple is vacuumed away and
    another tuple is placed at the same TID and then frozen. Then, we
    check whether the cache is still valid and, behold, it is. This would
    actually get better with this patch, since it wouldn't be enough
    merely for the old and new tuples to both be frozen; they'd have to
    have had the same XID prior to freezing. I think that could only
    happen if a backend sticks around for at least 2^32 transactions, but
    I don't know what would prevent it in that case.
    Hm. As previously said, I am less than convinced of those adhoc
    mechanisms and I think this should get properly integrated into the
    normal cache invalidation mechanisms.
    But: I think this is safe since we compare the stored/cached xmin/tid
    with one gotten from the SearchSysCache just before which will point to
    the correct location as of the last AcceptInvalidationMessages(). I
    can't think of a way were this would allow the case you describe.
    - heap_get_latest_tid() appears broken even without this patch. It's
    only used on user-specified TIDs, either in a TID scan, or due to the
    use of currtid_byreloid() and currtid_byrelname(). It attempts find
    the latest version of the tuple referenced by the given TID by
    following the CTID links. Like HOT, it uses XMAX/XMIN matching to
    detect when the chain is broken. However, unlike HOT, update chains
    can in general span multiple blocks. It is not now nor has it ever
    been safe to assume that the next tuple in the chain can't be frozen
    before the previous one is vacuumed away. Andres came up with the
    best example: suppose the tuple to be frozen physically precedes its
    predecessor; then, an in-progress vacuum might reach the to-be-frozen
    tuple before it reaches (and removes) the previous row version. In
    newer releases, the same problem could be caused by vacuum's
    occasional page-skipping behavior. As with the previous point, the
    "don't actually change xmin when we freeze" approach actually makes it
    harder for a chain to get "broken" when it shouldn't, but I suspect
    it's just moving us from one set of extremely-obscure failure cases to
    another.
    I don't think this is especially problematic though. If you do a tidscan
    starting from a tid that is so old that it can be removed: you're doing
    it wrong. The tid could have been reused for something else anyway. I
    think the ctid chaining is only meaningful if the tuple got updated very
    recently, i.e. you hold a snapshot that prevents the removal of the
    root tuple's snapshot.

    Nice work!

    Andres

    --
      Andres Freund http://www.2ndQuadrant.com/
      PostgreSQL Development, 24x7 Support, Training & Services
  • Robert Haas at May 29, 2013 at 1:26 am

    On Tue, May 28, 2013 at 8:00 PM, Andres Freund wrote:
    I only suggested MOVED_IN/OFF because I didn't remember
    HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID was still free ;). So, unless that
    combination could have been produced in the past in a way I couldn't
    find so far, I am all for it. I don't see a problem with breaking
    backward compatibility on that level and I don't think there's any way
    to get there without some level of low level compat break.
    I'm not sure why you call it a compatibility break. It's true that an
    old PostgreSQL can't read new heaps, but we've never guaranteed that
    direction anyway, and every time we add or reinterpret any infomask
    bits, that's true. For example, the fklocks patch did tat. What's
    important is that a new PostgreSQL will still be able to read old
    heaps; that is, pg_upgrade compatibility is preserved.
    I am all for adding a comment why this is safe though. We thought about
    it for some while before we were convinced...
    I'm fine with that, but the logic is present in multiple places, and I
    did not want to comment them all identically. If there's a central
    place in which a comment would be appropriate, let's add it there; or
    IOW, what do you suggest in detail?
    Hm. As previously said, I am less than convinced of those adhoc
    mechanisms and I think this should get properly integrated into the
    normal cache invalidation mechanisms.
    But: I think this is safe since we compare the stored/cached xmin/tid
    with one gotten from the SearchSysCache just before which will point to
    the correct location as of the last AcceptInvalidationMessages(). I
    can't think of a way were this would allow the case you describe.
    I don't understand why it can't. AcceptInvalidationMessages()
    guarantees that we're looking at the latest version of the catalog,
    but it doesn't say anything about whether the latest catalog state
    happens to look like any earlier catalog state.
    I don't think this is especially problematic though. If you do a tidscan
    starting from a tid that is so old that it can be removed: you're doing
    it wrong. The tid could have been reused for something else anyway. I
    think the ctid chaining is only meaningful if the tuple got updated very
    recently, i.e. you hold a snapshot that prevents the removal of the
    root tuple's snapshot.
    That logic seems sound to me.
    Nice work!
    Thanks!

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Andres Freund at May 29, 2013 at 2:08 am

    On 2013-05-28 21:26:49 -0400, Robert Haas wrote:
    On Tue, May 28, 2013 at 8:00 PM, Andres Freund wrote:
    I only suggested MOVED_IN/OFF because I didn't remember
    HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID was still free ;). So, unless that
    combination could have been produced in the past in a way I couldn't
    find so far, I am all for it. I don't see a problem with breaking
    backward compatibility on that level and I don't think there's any way
    to get there without some level of low level compat break.
    I'm not sure why you call it a compatibility break. It's true that an
    old PostgreSQL can't read new heaps, but we've never guaranteed that
    direction anyway, and every time we add or reinterpret any infomask
    bits, that's true. For example, the fklocks patch did tat. What's
    important is that a new PostgreSQL will still be able to read old
    heaps; that is, pg_upgrade compatibility is preserved.
    Oh, not the on-disk format. But as you said, code that looks at xmin is
    going to need to change. fklocks broke code that relied on
    HeapTupleHeaderGetXmax(), this would break code that looks at
    xmin. Removing/renaming *GetXmin similarly sounds like a good idea to
    make the breakage explicit.
    I am all for adding a comment why this is safe though. We thought about
    it for some while before we were convinced...
    I'm fine with that, but the logic is present in multiple places, and I
    did not want to comment them all identically. If there's a central
    place in which a comment would be appropriate, let's add it there; or
    IOW, what do you suggest in detail?
    That's a good point. Generally lots of this is underdocumented/commented
    and can only learned by spending a year or so in the postgres code. I'd
    say rename README.HOT to README and add a section there and reference it
    from those two places? It already has a mostly general explanation of
    concurrent index builds. Don't have a better idea.
    Hm. As previously said, I am less than convinced of those adhoc
    mechanisms and I think this should get properly integrated into the
    normal cache invalidation mechanisms.
    But: I think this is safe since we compare the stored/cached xmin/tid
    with one gotten from the SearchSysCache just before which will point to
    the correct location as of the last AcceptInvalidationMessages(). I
    can't think of a way were this would allow the case you describe.
    I don't understand why it can't. AcceptInvalidationMessages()
    guarantees that we're looking at the latest version of the catalog,
    but it doesn't say anything about whether the latest catalog state
    happens to look like any earlier catalog state.
    Well, AcceptInvalidationMessages() will get a new version of the tuple
    with a new tid/xmin combo. So what would need to happen is the function
    being updated (to a new location), then the old version needs to get
    removed, then the new one updated again, reusing to the old
    location. Allthewhile either both versions need to have been frozen or
    we need to have wrapped around to the same xid. All that without the
    function being executed inbetween which would have invalidated the old
    state and stored a new xmin/tid.
    But you're right, nothing except chance prevents that from happening,
    not sure what I thought of earlier.

    Greetings,

    Andres Freund

    --
      Andres Freund http://www.2ndQuadrant.com/
      PostgreSQL Development, 24x7 Support, Training & Services
  • Robert Haas at May 29, 2013 at 11:43 am

    On Tue, May 28, 2013 at 10:08 PM, Andres Freund wrote:
    On 2013-05-28 21:26:49 -0400, Robert Haas wrote:
    On Tue, May 28, 2013 at 8:00 PM, Andres Freund wrote:
    I only suggested MOVED_IN/OFF because I didn't remember
    HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID was still free ;). So, unless that
    combination could have been produced in the past in a way I couldn't
    find so far, I am all for it. I don't see a problem with breaking
    backward compatibility on that level and I don't think there's any way
    to get there without some level of low level compat break.
    I'm not sure why you call it a compatibility break. It's true that an
    old PostgreSQL can't read new heaps, but we've never guaranteed that
    direction anyway, and every time we add or reinterpret any infomask
    bits, that's true. For example, the fklocks patch did tat. What's
    important is that a new PostgreSQL will still be able to read old
    heaps; that is, pg_upgrade compatibility is preserved.
    Oh, not the on-disk format. But as you said, code that looks at xmin is
    going to need to change. fklocks broke code that relied on
    HeapTupleHeaderGetXmax(), this would break code that looks at
    xmin. Removing/renaming *GetXmin similarly sounds like a good idea to
    make the breakage explicit.
    I thought about that, but there are enough callers that don't care
    that it seemed like the wrong choice to me.
    I am all for adding a comment why this is safe though. We thought about
    it for some while before we were convinced...
    I'm fine with that, but the logic is present in multiple places, and I
    did not want to comment them all identically. If there's a central
    place in which a comment would be appropriate, let's add it there; or
    IOW, what do you suggest in detail?
    That's a good point. Generally lots of this is underdocumented/commented
    and can only learned by spending a year or so in the postgres code. I'd
    say rename README.HOT to README and add a section there and reference it
    from those two places? It already has a mostly general explanation of
    concurrent index builds. Don't have a better idea.
    Anyone else have a suggestion?

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Alvaro Herrera at May 29, 2013 at 3:55 pm

    Robert Haas escribió:
    On Tue, May 28, 2013 at 10:08 PM, Andres Freund wrote:
    On 2013-05-28 21:26:49 -0400, Robert Haas wrote:

    I am all for adding a comment why this is safe though. We thought about
    it for some while before we were convinced...
    I'm fine with that, but the logic is present in multiple places, and I
    did not want to comment them all identically. If there's a central
    place in which a comment would be appropriate, let's add it there; or
    IOW, what do you suggest in detail?
    That's a good point. Generally lots of this is underdocumented/commented
    and can only learned by spending a year or so in the postgres code. I'd
    say rename README.HOT to README and add a section there and reference it
    from those two places? It already has a mostly general explanation of
    concurrent index builds. Don't have a better idea.
    Anyone else have a suggestion?
    I support the idea of using README files. I don't have an opinion on
    whether it's best to have a single file for everything (i.e. rename
    README.HOT and add to it) or just explain this in README.freeze or
    something like that.

    --
    Álvaro Herrera http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
  • Peter Eisentraut at May 29, 2013 at 1:58 pm

    On 5/28/13 8:00 PM, Andres Freund wrote:
    - Various procedural languages use the combination of TID and XMIN to
    determine whether a function needs to be recompiled.
    Hm. As previously said, I am less than convinced of those adhoc
    mechanisms and I think this should get properly integrated into the
    normal cache invalidation mechanisms.
    Yes please.
  • Andres Freund at Jun 24, 2013 at 12:11 pm

    On 2013-05-28 09:21:27 -0400, Robert Haas wrote:
    As a general statement, I view this work as something that is likely
    needed no matter which one of the "remove freezing" approaches that
    have been proposed we choose to adopt. It does not fix anything in
    and of itself, but it (hopefully) removes an objection to the entire
    line of inquiry.
    Agreed. And it also improves the status quo when debugging. I wish this
    would have been the representation since the beginning.

    Some remarks:
    * I don't really like that HeapTupleHeaderXminFrozen() now is frequently
       performed without checking for FrozenTransactionId. I think the places
       where that's done are currently safe, but it seems likely that we
       introduce bugs due to people writing similar code.
       I think replacing *GetXmin by a wrapper that returns
       FrozenTransactionId if the hint bit tell us so would be a good
       idea. Then add *GetRawXmin for the rest (probably mostly display).
       Seems like it would make the patch actually smaller.

    * The PLs imo should set fn_xmin to FrozenTransactionId if the hint bit
       tell us the tuple is frozen.

    * I think rewrite_heap_dead_tuple needs to check for a frozen xmin and
       store that. We might looking at a chain which partially was done in
       <9.4. Not sure if that's a realistic scenario, but I'd rather be safe.

    Otherwise I don't see much that needs to be done before this can get
    committed.

    Greetings,

    Andres Freund

    --
      Andres Freund http://www.2ndQuadrant.com/
      PostgreSQL Development, 24x7 Support, Training & Services
  • Robert Haas at Jul 2, 2013 at 8:30 pm

    On Mon, Jun 24, 2013 at 8:12 AM, Andres Freund wrote:
    Agreed. And it also improves the status quo when debugging. I wish this
    would have been the representation since the beginning.

    Some remarks:
    * I don't really like that HeapTupleHeaderXminFrozen() now is frequently
    performed without checking for FrozenTransactionId. I think the places
    where that's done are currently safe, but it seems likely that we
    introduce bugs due to people writing similar code.
    I think replacing *GetXmin by a wrapper that returns
    FrozenTransactionId if the hint bit tell us so would be a good
    idea. Then add *GetRawXmin for the rest (probably mostly display).
    Seems like it would make the patch actually smaller.
    I did think about this approach, but it seemed like it would add
    cycles in a significant number of places. For example, consider the
    HeapTupleSatisfies() routines, which are perhaps PostgreSQL's finest
    example of a place where you DON'T want to add any cycles. All of the
    checks on xmin are conditional on HeapTupleHeaderXminCommitted()
    having been found already to be false. That implies that the frozen
    bits aren't set either, so if HeapTupleHeaderGetXmin() were to recheck
    the bits it would be a waste. As I got to the end of the patch I was
    a little dismayed by the number of places that did need adjustment to
    check both things, but there are still plenty of important places that
    don't.
    * The PLs imo should set fn_xmin to FrozenTransactionId if the hint bit
    tell us the tuple is frozen.
    Why? I thought about that, but it seemed to me that the purpose of
    that caching was to avoid confusing two functions whose pg_proc tuples
    ended up at the same TID. So there's a failure mechanism there: the
    tuple can get vacuumed away and replaced with a new tuple which then
    gets frozen, and everything (bogusly) matches. If the actual
    XID-prior-to-freezing has to match, ISTM that the chances of a false
    match are far lower. You have to get the new tuple at the same TID
    slot *and* the XID counter has to wrap back around to the
    exactly-right XID to get a false match on XID, within the lifetime of
    a single backend. That's not bloody likely. Remember, the whole
    point of the patch is to start freezing things sooner, so the scenario
    where both the original and replacement tuples are frozen is going to
    become more common.

    We also don't particularly *want* the freezing of a pg_proc tuple to
    force recompilations in every backend.
    * I think rewrite_heap_dead_tuple needs to check for a frozen xmin and
    store that. We might looking at a chain which partially was done in
    <9.4. Not sure if that's a realistic scenario, but I'd rather be safe.
    IIUC, you're talking about the scenario where we have an update chain
    X -> Y, where X is dead but not actually removed and Y is
    (forensically) frozen. We're examining tuple Y and trying to
    determine whether X has been entered in rs_unresolved_tups. If, as I
    think you're proposing, we consider the xmin of Y to be
    FrozenTransactionId, we will definitely not find it - because the way
    it got into the table in the first place was based on the value
    returned by HeapTupleHeaderGetUpdateXid(). And that value is certain
    not to be FrozenTransactionId, because we never set the xmax of a
    tuple to FrozenTransactionId.

    There's no possibility of getting confused here; if X is still around
    at all, it's xmax is of the same generation as Y's xmin. Otherwise,
    we've had an undetected XID wraparound.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Andres Freund at Jul 3, 2013 at 5:05 pm

    On 2013-07-02 16:29:56 -0400, Robert Haas wrote:
    On Mon, Jun 24, 2013 at 8:12 AM, Andres Freund wrote:
    Agreed. And it also improves the status quo when debugging. I wish this
    would have been the representation since the beginning.

    Some remarks:
    * I don't really like that HeapTupleHeaderXminFrozen() now is frequently
    performed without checking for FrozenTransactionId. I think the places
    where that's done are currently safe, but it seems likely that we
    introduce bugs due to people writing similar code.
    I think replacing *GetXmin by a wrapper that returns
    FrozenTransactionId if the hint bit tell us so would be a good
    idea. Then add *GetRawXmin for the rest (probably mostly display).
    Seems like it would make the patch actually smaller.
    I did think about this approach, but it seemed like it would add
    cycles in a significant number of places. For example, consider the
    HeapTupleSatisfies() routines, which are perhaps PostgreSQL's finest
    example of a place where you DON'T want to add any cycles. All of the
    checks on xmin are conditional on HeapTupleHeaderXminCommitted()
    having been found already to be false. That implies that the frozen
    bits aren't set either, so if HeapTupleHeaderGetXmin() were to recheck
    the bits it would be a waste. As I got to the end of the patch I was
    a little dismayed by the number of places that did need adjustment to
    check both things, but there are still plenty of important places that
    don't.
    Well, nothing would prevent using the HeapTupleHeaderGetRawXmin() in
    those places. Exactly the number of callsites is what makes me think
    that somebody will get this wrong in the future.
    * The PLs imo should set fn_xmin to FrozenTransactionId if the hint bit
    tell us the tuple is frozen.
    Why? I thought about that, but it seemed to me that the purpose of
    that caching was to avoid confusing two functions whose pg_proc tuples
    ended up at the same TID.
    [good reasoning]
    Man. I hate this hack. I wonder what happens if somebody does a VACUUM
    FULL of pg_class and there are a bunch of functions created in the same
    transaction...
    * I think rewrite_heap_dead_tuple needs to check for a frozen xmin and
    store that. We might looking at a chain which partially was done in
    <9.4. Not sure if that's a realistic scenario, but I'd rather be safe.
    IIUC, you're talking about the scenario where we have an update chain
    X -> Y, where X is dead but not actually removed and Y is
    (forensically) frozen. We're examining tuple Y and trying to
    determine whether X has been entered in rs_unresolved_tups. If, as I
    think you're proposing, we consider the xmin of Y to be
    FrozenTransactionId, we will definitely not find it - because the way
    it got into the table in the first place was based on the value
    returned by HeapTupleHeaderGetUpdateXid(). And that value is certain
    not to be FrozenTransactionId, because we never set the xmax of a
    tuple to FrozenTransactionId.
    I am thinking of something slightly different. rewrite_heap_dead_tuple()
    now removes tuples/xids from the unresolved table that could be from a
    different xid epoch since it unconditionally does a HASH_REMOVE if it
    finds an entry doing a lookup using the *preserved* xid. Earlier that
    was harmless since for frozen tuples it only ever used
    FrozenTransactionId which obviously cannot be part of a valid chain and
    couldn't even get entered into unresolved_tups.

    I am not sure at all if that actually can be harmful but there isn't any
    reason we would need to do the delete so I wouldn't. There can be
    complex enough situation where later parts of a ctid chain are dead and
    earlier ones are recently dead and such that I would rather be cautious.
    There's no possibility of getting confused here; if X is still around
    at all, it's xmax is of the same generation as Y's xmin. Otherwise,
    we've had an undetected XID wraparound.
    Another issue I thought about is what we will return for SELECT xmin
    FROM blarg; Some people use that in their applications (IIRC
    skytools/pqg/londiste does so) and they might get confused if we
    suddently return xids from the future. On the other hand, not returning
    the original xid would be a shame as well...

    Greetings,

    Andres Freund

    --
      Andres Freund http://www.2ndQuadrant.com/
      PostgreSQL Development, 24x7 Support, Training & Services
  • Marko Kreen at Jul 3, 2013 at 6:54 pm

    On Wed, Jul 3, 2013 at 8:07 PM, Andres Freund wrote:
    On 2013-07-02 16:29:56 -0400, Robert Haas wrote:
    There's no possibility of getting confused here; if X is still around
    at all, it's xmax is of the same generation as Y's xmin. Otherwise,
    we've had an undetected XID wraparound.
    Another issue I thought about is what we will return for SELECT xmin
    FROM blarg; Some people use that in their applications (IIRC
    skytools/pqg/londiste does so) and they might get confused if we
    suddently return xids from the future. On the other hand, not returning
    the original xid would be a shame as well...
    Skytools/pqg/londiste use only txid_* APIs, they don't look at
    4-byte xids on table rows.

    --
    marko
  • Robert Haas at Jul 3, 2013 at 6:59 pm

    On Wed, Jul 3, 2013 at 1:07 PM, Andres Freund wrote:
    Well, nothing would prevent using the HeapTupleHeaderGetRawXmin() in
    those places. Exactly the number of callsites is what makes me think
    that somebody will get this wrong in the future.
    Well, I guess I could go rework the whole patch that way. It's a fair
    request, but I kinda doubt it's going to make the patch smaller.
    * I think rewrite_heap_dead_tuple needs to check for a frozen xmin and
    store that. We might looking at a chain which partially was done in
    <9.4. Not sure if that's a realistic scenario, but I'd rather be safe.
    IIUC, you're talking about the scenario where we have an update chain
    X -> Y, where X is dead but not actually removed and Y is
    (forensically) frozen. We're examining tuple Y and trying to
    determine whether X has been entered in rs_unresolved_tups. If, as I
    think you're proposing, we consider the xmin of Y to be
    FrozenTransactionId, we will definitely not find it - because the way
    it got into the table in the first place was based on the value
    returned by HeapTupleHeaderGetUpdateXid(). And that value is certain
    not to be FrozenTransactionId, because we never set the xmax of a
    tuple to FrozenTransactionId.
    I am thinking of something slightly different. rewrite_heap_dead_tuple()
    now removes tuples/xids from the unresolved table that could be from a
    different xid epoch since it unconditionally does a HASH_REMOVE if it
    finds an entry doing a lookup using the *preserved* xid. Earlier that
    was harmless since for frozen tuples it only ever used
    FrozenTransactionId which obviously cannot be part of a valid chain and
    couldn't even get entered into unresolved_tups.

    I am not sure at all if that actually can be harmful but there isn't any
    reason we would need to do the delete so I wouldn't. There can be
    complex enough situation where later parts of a ctid chain are dead and
    earlier ones are recently dead and such that I would rather be cautious.
    OK, I think I see your point, and I think you're right.
    There's no possibility of getting confused here; if X is still around
    at all, it's xmax is of the same generation as Y's xmin. Otherwise,
    we've had an undetected XID wraparound.
    Another issue I thought about is what we will return for SELECT xmin
    FROM blarg; Some people use that in their applications (IIRC
    skytools/pqg/londiste does so) and they might get confused if we
    suddently return xids from the future. On the other hand, not returning
    the original xid would be a shame as well...
    Yeah. I think the system columns that we have today are pretty much
    crap. I wonder if we couldn't throw them out and replace them with
    some kind of functions that you can pass a row to. That would allow
    us to expose a lot more detail without adding a bajillion hidden
    columns, and for a bonus we'd save substantially on catalog bloat.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Josh Berkus at Jul 8, 2013 at 6:34 pm

    On 07/03/2013 11:59 AM, Robert Haas wrote:
    Yeah. I think the system columns that we have today are pretty much
    crap. I wonder if we couldn't throw them out and replace them with
    some kind of functions that you can pass a row to. That would allow
    us to expose a lot more detail without adding a bajillion hidden
    columns, and for a bonus we'd save substantially on catalog bloat.
    Where are we on this patch? Should I mark it Returned and you'll work
    on it for September?

    --
    Josh Berkus
    PostgreSQL Experts Inc.
    http://pgexperts.com
  • Robert Haas at Jul 8, 2013 at 8:38 pm

    On Jul 8, 2013, at 1:34 PM, Josh Berkus wrote:
    On 07/03/2013 11:59 AM, Robert Haas wrote:
    Yeah. I think the system columns that we have today are pretty much
    crap. I wonder if we couldn't throw them out and replace them with
    some kind of functions that you can pass a row to. That would allow
    us to expose a lot more detail without adding a bajillion hidden
    columns, and for a bonus we'd save substantially on catalog bloat.
    Where are we on this patch? Should I mark it Returned and you'll work
    on it for September?
    Sure.

    ...Robert

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedMay 28, '13 at 1:21p
activeJul 8, '13 at 8:38p
posts17
users6
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase