So freezing multixacts is not all that easy. I mean, you just scan the
page looking for multis lesser than the cutoff; for those that are dead,
they can just be removed completely, but what about ones that still have
members running? This is pretty unlikely but not impossible.

If there's only one remaining member, the problem is easy: replace it
with that transaction's xid, and set the appropriate hint bits. But if
there's more than one, the only way out is to create a new multi. This
increases multixactid consumption, but I don't see any other option.

However, there are cases where not even that is possible -- consider
tuple freezing during WAL recovery. Recovery is going to need to
replace those multis with other multis, but it cannot create new multis
itself. The only solution here appears to be that when multis are
frozen in the master, replacement multis have to be logged too. So the
heap_freeze_tuple Xlog record will have a map of old multi to new. That
way, recovery can just determine the new multi to use for any particular
old multi; since multixact creation is also logged, we're certain that
the replacement value has already been defined.

Sounds ugly, but not horrible.

Thoughts, opinions?

--
Álvaro Herrera <alvherre@alvh.no-ip.org>

Search Discussions

  • Robert Haas at Feb 2, 2012 at 2:24 pm

    On Wed, Feb 1, 2012 at 11:33 PM, Alvaro Herrera wrote:
    So freezing multixacts is not all that easy.  I mean, you just scan the
    page looking for multis lesser than the cutoff; for those that are dead,
    they can just be removed completely, but what about ones that still have
    members running?  This is pretty unlikely but not impossible. Right.
    If there's only one remaining member, the problem is easy: replace it
    with that transaction's xid, and set the appropriate hint bits.  But if
    there's more than one, the only way out is to create a new multi.  This
    increases multixactid consumption, but I don't see any other option.
    Why do we need to freeze anything if the transactions are still
    running? We certainly don't freeze regular transaction IDs while the
    transactions are still running; it would give wrong answers. It's
    probably possible to do it for mxids, but why would you need to?
    Suppose you have a tuple A which is locked by a series of transactions
    T0, T1, T2, ...; AIUI, each new locker is going to have to create a
    new mxid with all the existing entries plus a new one for itself.
    But, unless I'm confused, as it's doing so, it can discard any entries
    for locks taken by transactions which are no longer running. So given
    an mxid with living members, any dead member in that mxid must have
    been living at the time the newest member was added. Surely we can't
    be consuming mxids anywhere near fast enough for that to be a problem.
    There could be an updating transaction involved as well, but if
    that's not running any more then it has either committed (in which
    case the tuple will be dead once the global-xmin advances past it) or
    aborted (in which case we can forget about it).
    However, there are cases where not even that is possible -- consider
    tuple freezing during WAL recovery.  Recovery is going to need to
    replace those multis with other multis, but it cannot create new multis
    itself.  The only solution here appears to be that when multis are
    frozen in the master, replacement multis have to be logged too.  So the
    heap_freeze_tuple Xlog record will have a map of old multi to new.  That
    way, recovery can just determine the new multi to use for any particular
    old multi; since multixact creation is also logged, we're certain that
    the replacement value has already been defined.
    This doesn't sound right. Why would recovery need to create a multi
    that didn't exist on the master? Any multi it applies to a record
    should be one that it was told to apply by the master; and the master
    should have already WAL-logged the creation of that multi. I don't
    think that "replacement" mxids have to be logged; I think that *all*
    mxids have to be logged. Am I all wet?

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Alvaro Herrera at Feb 6, 2012 at 2:31 pm

    Excerpts from Robert Haas's message of jue feb 02 11:24:08 -0300 2012:
    On Wed, Feb 1, 2012 at 11:33 PM, Alvaro Herrera wrote:

    If there's only one remaining member, the problem is easy: replace it
    with that transaction's xid, and set the appropriate hint bits.  But if
    there's more than one, the only way out is to create a new multi.  This
    increases multixactid consumption, but I don't see any other option.
    Why do we need to freeze anything if the transactions are still
    running? We certainly don't freeze regular transaction IDs while the
    transactions are still running; it would give wrong answers. It's
    probably possible to do it for mxids, but why would you need to?
    Well, I was thinking that we could continue generating the mxids
    continuously and if we didn't freeze the old running ones, we could
    overflow. So one way to deal with the problem would be rewriting the
    old ones into new ones. But it has occurred to me that instead of doing
    that we could simply disallow creation of new ones until the oldest ones
    have been closed and removed from tables -- which is more in line with
    what we do for Xids anyway.
    Suppose you have a tuple A which is locked by a series of transactions
    T0, T1, T2, ...; AIUI, each new locker is going to have to create a
    new mxid with all the existing entries plus a new one for itself.
    But, unless I'm confused, as it's doing so, it can discard any entries
    for locks taken by transactions which are no longer running.
    That's correct. But the problem is a tuple that is locked or updated by
    a very old transaction that doesn't commit or rollback, and the tuple is
    never locked again. Eventually the Xid could remain live while the mxid
    is in wraparound danger.
    So given
    an mxid with living members, any dead member in that mxid must have
    been living at the time the newest member was added. Surely we can't
    be consuming mxids anywhere near fast enough for that to be a problem.
    Well, the problem is that while it should be rare to consume mxids as
    fast as necessary for this problem to show up, it *is* possible --
    unless we add some protection that they are not created until the old
    ones are frozen (which now means "removed").
    However, there are cases where not even that is possible -- consider
    tuple freezing during WAL recovery.  Recovery is going to need to
    replace those multis with other multis, but it cannot create new multis
    itself.  The only solution here appears to be that when multis are
    frozen in the master, replacement multis have to be logged too.  So the
    heap_freeze_tuple Xlog record will have a map of old multi to new.  That
    way, recovery can just determine the new multi to use for any particular
    old multi; since multixact creation is also logged, we're certain that
    the replacement value has already been defined.
    This doesn't sound right. Why would recovery need to create a multi
    that didn't exist on the master? Any multi it applies to a record
    should be one that it was told to apply by the master; and the master
    should have already WAL-logged the creation of that multi. I don't
    think that "replacement" mxids have to be logged; I think that *all*
    mxids have to be logged. Am I all wet?
    Well, yeah, all mxids are logged, in particular those that would have
    been used for replacement. However I think I've discarded the idea of
    replacement altogether now, because it makes simpler both on master and
    slave.

    --
    Álvaro Herrera <alvherre@commandprompt.com>
    The PostgreSQL Company - Command Prompt, Inc.
    PostgreSQL Replication, Consulting, Custom Development, 24x7 support
  • Robert Haas at Feb 6, 2012 at 4:19 pm

    On Mon, Feb 6, 2012 at 9:31 AM, Alvaro Herrera wrote:
    Suppose you have a tuple A which is locked by a series of transactions
    T0, T1, T2, ...; AIUI, each new locker is going to have to create a
    new mxid with all the existing entries plus a new one for itself.
    But, unless I'm confused, as it's doing so, it can discard any entries
    for locks taken by transactions which are no longer running.
    That's correct.  But the problem is a tuple that is locked or updated by
    a very old transaction that doesn't commit or rollback, and the tuple is
    never locked again.  Eventually the Xid could remain live while the mxid
    is in wraparound danger.
    Ah, I see. I think we should probably handle that the same way we do
    for XIDs: try to force autovac when things get tight, then start
    issuing warnings, and finally just refuse to assign any more MXIDs.

    Another thing that might make sense, for both XIDs and MXIDs, is to
    start killing transactions that are preventing vacuum/autovacuum from
    doing their thing. This could mean either killing the people who are
    holding back RecentGlobalXmin, so that we can actually freeze the old
    stuff; or killing people who are holding a conflicting lock, using the
    recovery-conflict stuff or some adaptation of it. We've made it
    fairly difficult to avoid having autovacuum run at all with the
    xidVacLimit/xidStopLimit stuff, but there's still no real defense
    against autovacuum running but failing to mitigate the problem, either
    because there's a long-running transaction holding a snapshot open, or
    because someone is sitting on a relation or buffer lock. This of
    course is off-topic from your patch here...

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Alvaro Herrera at Feb 6, 2012 at 8:11 pm

    Excerpts from Robert Haas's message of lun feb 06 13:19:14 -0300 2012:

    On Mon, Feb 6, 2012 at 9:31 AM, Alvaro Herrera
    wrote:
    Suppose you have a tuple A which is locked by a series of transactions
    T0, T1, T2, ...; AIUI, each new locker is going to have to create a
    new mxid with all the existing entries plus a new one for itself.
    But, unless I'm confused, as it's doing so, it can discard any entries
    for locks taken by transactions which are no longer running.
    That's correct.  But the problem is a tuple that is locked or updated by
    a very old transaction that doesn't commit or rollback, and the tuple is
    never locked again.  Eventually the Xid could remain live while the mxid
    is in wraparound danger.
    Ah, I see. I think we should probably handle that the same way we do
    for XIDs: try to force autovac when things get tight, then start
    issuing warnings, and finally just refuse to assign any more MXIDs. Agreed.
    Another thing that might make sense, for both XIDs and MXIDs, is to
    start killing transactions that are preventing vacuum/autovacuum from
    doing their thing. This could mean either killing the people who are
    holding back RecentGlobalXmin, so that we can actually freeze the old
    stuff; or killing people who are holding a conflicting lock, using the
    recovery-conflict stuff or some adaptation of it.
    Yeah -- right now we only emit some innocuous-looking messages, which
    I've seen most people to ignore until they get bitten by a forced
    anti-wraparound vacuum. It'd be nice to get more agressive about that
    as the situation gets more critical.

    --
    Álvaro Herrera <alvherre@commandprompt.com>
    The PostgreSQL Company - Command Prompt, Inc.
    PostgreSQL Replication, Consulting, Custom Development, 24x7 support
  • Simon Riggs at Feb 6, 2012 at 3:57 pm

    On Thu, Feb 2, 2012 at 4:33 AM, Alvaro Herrera wrote:

    However, there are cases where not even that is possible -- consider
    tuple freezing during WAL recovery.  Recovery is going to need to
    replace those multis with other multis, but it cannot create new multis
    itself.  The only solution here appears to be that when multis are
    frozen in the master, replacement multis have to be logged too.  So the
    heap_freeze_tuple Xlog record will have a map of old multi to new.  That
    way, recovery can just determine the new multi to use for any particular
    old multi; since multixact creation is also logged, we're certain that
    the replacement value has already been defined.
    Multixacts are ignored during recovery. Why do anything at all?

    --
    Simon Riggs                   http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedFeb 2, '12 at 4:33a
activeFeb 6, '12 at 8:11p
posts6
users4
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase