FAQ
Hackers,

I'm trying to figure out how to log the subtransaction operations.

A simple idea would be to add the committed subxact Xids to the
xl_xact_commit and xl_xact_abort structs. However, this will be a
variable length array, and those structs already have variable length
arrays at the end (filenodes to drop). I don't see how would I overlay
that into a C struct.

A different idea would be to write a XLog record for subxact aborts
(which would include Xids of all committed child subxacts), and
at main transaction commit write another (different) XLog record, where
XLogRecord.Xid = mainTransactionXid, which would carry the committed
subtransactions.

Does this make any sense? Ideas, comments?

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"No reniegues de lo que alguna vez creíste"

Search Discussions

  • Tom Lane at May 11, 2004 at 3:06 am

    Alvaro Herrera writes:
    A simple idea would be to add the committed subxact Xids to the
    xl_xact_commit and xl_xact_abort structs. However, this will be a
    variable length array, and those structs already have variable length
    arrays at the end (filenodes to drop). I don't see how would I overlay
    that into a C struct.
    Just means you have to do some address arithmetic instead of being able
    to reference the additional data as a struct member. Tedious but hardly
    difficult. See for instance the handling of "move" data in
    XLOG_HEAP_MOVE records, or "unused" data in XLOG_HEAP_CLEAN.

    I'd not recommend emitting additional xlog records unless there's a
    genuine log-semantics-related reason for doing so. (Thinks about it...)
    It seems like you'd want to emit a subtrans abort record to carry
    information about file deletions, but there's no real need to emit
    subtrans commit records, is there?

    regards, tom lane
  • Simon Riggs at May 11, 2004 at 6:45 am

    On Tue, 2004-05-11 at 04:05, Tom Lane wrote:
    Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
    A simple idea would be to add the committed subxact Xids to the
    xl_xact_commit and xl_xact_abort structs. However, this will be a
    variable length array, and those structs already have variable length
    arrays at the end (filenodes to drop). I don't see how would I overlay
    that into a C struct.
    Just means you have to do some address arithmetic instead of being able
    to reference the additional data as a struct member. Tedious but hardly
    difficult. See for instance the handling of "move" data in
    XLOG_HEAP_MOVE records, or "unused" data in XLOG_HEAP_CLEAN.

    I'd not recommend emitting additional xlog records unless there's a
    genuine log-semantics-related reason for doing so. (Thinks about it...)
    It seems like you'd want to emit a subtrans abort record to carry
    information about file deletions, but there's no real need to emit
    subtrans commit records, is there?
    Should we write a subtrans commit record when WAL_DEBUG is set? We could
    remove it completely in the future when the interaction between PITR
    recovery and subtransactions have been fully covered off.

    My thinking is that subtransactions themselves are not valid recovery
    targets. As long as we can tell the top level commits from the
    subtransactions then it should be easy to recover. My thinking is that
    if a subtransaction id is wrongly quoted as a recover point, then this
    would be treated as a "rollforward to txnid X, but do not include this
    one". Does that make sense? Does everything we have support that?

    Best Regards, Simon Riggs
  • Alvaro Herrera at May 11, 2004 at 1:56 pm

    On Tue, May 11, 2004 at 07:44:53AM +0100, Simon Riggs wrote:

    Tom Lane wrote:
    Just means you have to do some address arithmetic instead of being able
    to reference the additional data as a struct member. Tedious but hardly
    difficult. See for instance the handling of "move" data in
    XLOG_HEAP_MOVE records, or "unused" data in XLOG_HEAP_CLEAN.
    Got it. I had skimmed looking for an example of this but didn't find
    this.
    Should we write a subtrans commit record when WAL_DEBUG is set? We could
    remove it completely in the future when the interaction between PITR
    recovery and subtransactions have been fully covered off.
    Actually, the only thing a subtransaction has to recover is mark its Xid
    as either committed of aborted in clog, and possible file deletions, so
    there shouldn't be too much interaction with your work.

    (Unrelated: note that after main transaction commit, a committed
    subtransaction is indistinguishable from a committed main transaction --
    and with the current idea of XLog I have, after recovering a transaction
    tree from XLog there won't be any mark in pg_subtrans. So the system
    will not be exactly as it was before but it won't matter.)
    My thinking is that subtransactions themselves are not valid recovery
    targets. As long as we can tell the top level commits from the
    subtransactions then it should be easy to recover. My thinking is that
    if a subtransaction id is wrongly quoted as a recover point, then this
    would be treated as a "rollforward to txnid X, but do not include this
    one". Does that make sense? Does everything we have support that?
    Hmm ... I think it should be forbidden to quote a subtrans Xid as
    rollforward point. Not sure if that can be done though, or how to do
    it.

    --
    Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
    "La gente vulgar solo piensa en pasar el tiempo;
    el que tiene talento, en aprovecharlo"
  • Tom Lane at May 11, 2004 at 2:39 pm

    Alvaro Herrera writes:
    Hmm ... I think it should be forbidden to quote a subtrans Xid as
    rollforward point. Not sure if that can be done though, or how to do
    it.
    Seems like a nonissue, unless the XLOG trace makes a subtrans look the
    same as a main trans, which it'd not do would it?

    We could allow specification of a subtrans ID to be interpreted the same
    as specification of its parent main trans. Dunno if that's actually
    useful to anyone. Actually, I'd think that people would generally
    specify recovery up to a particular timestamp, and not be interested in
    xact numbers at all ...

    regards, tom lane
  • Bruce Momjian at May 11, 2004 at 3:33 pm

    Tom Lane wrote:
    Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
    Hmm ... I think it should be forbidden to quote a subtrans Xid as
    rollforward point. Not sure if that can be done though, or how to do
    it.
    Seems like a nonissue, unless the XLOG trace makes a subtrans look the
    same as a main trans, which it'd not do would it?

    We could allow specification of a subtrans ID to be interpreted the same
    as specification of its parent main trans. Dunno if that's actually
    useful to anyone. Actually, I'd think that people would generally
    specify recovery up to a particular timestamp, and not be interested in
    xact numbers at all ...
    I don't think timestamp is going to be precise enough. Basically I can
    see someone saying I want recovery up to 4am, but anything more specific
    will need xid. I suggested that we write an xlog dump tool so you can
    see the xids (with some xid details) and rough timestamps stored in the
    WAL file and choose the xid for recovery.

    --
    Bruce Momjian | http://candle.pha.pa.us
    pgman@candle.pha.pa.us | (610) 359-1001
    + If your life is a hard drive, | 13 Roberts Road
    + Christ can be your backup. | Newtown Square, Pennsylvania 19073
  • Simon Riggs at May 11, 2004 at 8:26 pm

    On Tue, 2004-05-11 at 16:33, Bruce Momjian wrote:
    Tom Lane wrote:
    Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
    Hmm ... I think it should be forbidden to quote a subtrans Xid as
    rollforward point. Not sure if that can be done though, or how to do
    it.
    Seems like a nonissue, unless the XLOG trace makes a subtrans look the
    same as a main trans, which it'd not do would it?
    I agree that a subtrans xid should not be a valid rollforward point.

    Forgive me discussing what seems like obvious points - I'm sure you
    appreciate we need an exact statement of how/when to terminate recovery
    and that might be found in looking harder at the subtrans questions.
    This is my third re-write of this e-mail, since I keep thinking of
    additional things while going for the "definitive statement". I had
    thought this was straightforward...

    Currently, recovery loops until end of xlogs. There is no exit condition
    from the loop. There is not currently a timestamp on the xlogs -
    anywhere apart from the file date on each xlog.

    Xids are assigned sequentially to transactions as they start. However,
    Xids are not committed sequentially. Moreover, checkpoint records do not
    wait for transactions to complete, so a checkpoint could record an Xid,
    yet a lower Xid might still be in progress and commit sometime after the
    checkpoint. So, when we do a backup, we might take with us a pg_control
    that has a particular Xid, only to find lots of later committed, but
    earlier Xids in the xlogs. So Xid can have no lower bound. (and a fully
    formed clog is essential to recovery).

    If we go searching for a particular Xid, there is no way to tell whether
    an Xid suggested by a user is too big or too small for use as a recovery
    target. We need to recover - it is the only way to tell; if we find an
    Xid that matches, we stop. If not, we keep going until end of logs, when
    we need to issue a "recovered fully - the Xid you gave was not valid",
    which may take some time and is also very clearly not what was wanted.
    (If they had wanted full recovery, they would have asked).

    So searching on an Xid is inherently a poor way to recover. Which is a
    shame, because it seemed like an easy target. Unless of course, we live
    with this vagueness and get on and build the XLogSpy...

    Xlog records ARE written sequentially, so a timestamp written to the
    xlogs COULD be used as a target for halting recovery. We would be able
    to decide, ahead of starting recovery, whether we would be able to
    sensibly recover to that point by using the pg_control checkpoint time
    as the lower bound and the file write times of the highest xlog as the
    upper bound. Once decided that the target timestamp lies between upper
    and lower bounds, we begin recovery, knowing exactly where it will
    complete.

    During recovery, we would search for a timestamp. If found exactly,
    stop. If exceeded, stop. Any transactions not committed at that point
    are, as we say, out of luck. ....This approach has a certainty about it
    that I think is much better than the error prone Xid hunting approach,
    and is also more attuned to the human reality (time matters, Xids
    don't).

    Earlier, Bruce and I had discussed that for reasons of time pressure,
    the PITR code for this release would consist of
    a) recovery to a particular Xid
    b) later, a utility that allowed xlogs to be inspected to allow DBA to
    decide which is the correct Xid to recover to.
    Those ideas don't sound as good now....

    Therefore: action on me? - add a timestamp to EACH xlog record -
    something I had been shying away from.
    On Tue, 2004-05-11 at 14:56, Alvaro Herrera wrote:
    (Unrelated: note that after main transaction commit, a committed
    subtransaction is indistinguishable from a committed main transaction --
    and with the current idea of XLog I have, after recovering a transaction
    tree from XLog there won't be any mark in pg_subtrans. So the system
    will not be exactly as it was before but it won't matter.)
    I don't think we need a subtrans commit directly, since if the top-level
    commits after the subtrans has committed, then we're good.

    However, if a subtrans aborts, yet the top-level commits there will be
    data written to the database about an aborted transaction. We don't have
    Undo, so the subtrans clog must be updated to show that the subtrans
    aborted, otherwise we would read both the committed (top-level) and the
    uncommitted data (subtrans).

    Another way of putting it - if it was worth writing before a crash, it
    is worth recovering after a crash. Shurely?
    We could allow specification of a subtrans ID to be interpreted the same
    as specification of its parent main trans. Dunno if that's actually
    useful to anyone. Actually, I'd think that people would generally
    specify recovery up to a particular timestamp, and not be interested in
    xact numbers at all ...
    I don't think timestamp is going to be precise enough. Basically I can
    see someone saying I want recovery up to 4am, but anything more specific
    will need xid. I suggested that we write an xlog dump tool so you can
    see the xids (with some xid details) and rough timestamps stored in the
    WAL file and choose the xid for recovery.
    Bruce, As I started this e-mail (1st time), I completely agreed with
    you. I've now had to switch my thinking.

    (Doesn't effect archiving architecture....)

    I'm a little dazed....comments anyone?

    Best regards, Simon Riggs
  • Tom Lane at May 11, 2004 at 8:29 pm

    Simon Riggs writes:
    Currently, recovery loops until end of xlogs. There is no exit condition
    from the loop. There is not currently a timestamp on the xlogs -
    anywhere apart from the file date on each xlog.
    Sure there is: every transaction commit record (and I think abort as
    well) carries a timestamp. This was put in with PITR in mind in the
    very beginning.

    regards, tom lane
  • Simon Riggs at May 11, 2004 at 9:58 pm

    On Tue, 2004-05-11 at 21:29, Tom Lane wrote:
    Simon Riggs <simon@2ndquadrant.com> writes:
    Currently, recovery loops until end of xlogs. There is no exit condition
    from the loop. There is not currently a timestamp on the xlogs -
    anywhere apart from the file date on each xlog.
    Sure there is: every transaction commit record (and I think abort as
    well) carries a timestamp. This was put in with PITR in mind in the
    very beginning.
    When being wrong has a purpose, I just love it.

    I'll put my code-goggles back on the right way round.

    Thanks, Simon
  • Bruce Momjian at May 11, 2004 at 9:04 pm

    Simon Riggs wrote:
    During recovery, we would search for a timestamp. If found exactly,
    stop. If exceeded, stop. Any transactions not committed at that point
    are, as we say, out of luck. ....This approach has a certainty about it
    that I think is much better than the error prone Xid hunting approach,
    and is also more attuned to the human reality (time matters, Xids
    don't).

    Earlier, Bruce and I had discussed that for reasons of time pressure,
    the PITR code for this release would consist of
    a) recovery to a particular Xid
    b) later, a utility that allowed xlogs to be inspected to allow DBA to
    decide which is the correct Xid to recover to.
    Those ideas don't sound as good now....

    Therefore: action on me? - add a timestamp to EACH xlog record -
    something I had been shying away from.
    Many transactions are going to have the same timestamp because that just
    isn't precise enough to choose a particular transaction. I agree
    finding a particular xid in the logs is hard. We could just scan the
    logs to see if we find the xid before doing the recovery.

    Anyway, I though we agreed to just get total recovery working for 7.5
    and we can deal with recovery to a particular point later. Or we can
    just add timestamps to the wal file header and restore to a particular
    wal file date timestamp.

    --
    Bruce Momjian | http://candle.pha.pa.us
    pgman@candle.pha.pa.us | (610) 359-1001
    + If your life is a hard drive, | 13 Roberts Road
    + Christ can be your backup. | Newtown Square, Pennsylvania 19073
  • Simon Riggs at May 11, 2004 at 10:03 pm

    On Tue, 2004-05-11 at 22:01, Bruce Momjian wrote:
    Anyway, I though we agreed to just get total recovery working for 7.5
    and we can deal with recovery to a particular point later.
    Back on target now! Regards, Simon
  • Tom Lane at May 11, 2004 at 10:08 pm

    Bruce Momjian writes:
    Many transactions are going to have the same timestamp because that just
    isn't precise enough to choose a particular transaction.
    I think this argument is largely a red herring ... but if it makes you
    feel better, we could change the contents of the commit timestamp to
    be gettimeofday() output (seconds+microseconds) instead of just time()
    output. That should be precise enough for practical purposes.

    regards, tom lane
  • Bruce Momjian at May 11, 2004 at 10:10 pm

    Tom Lane wrote:
    Bruce Momjian <pgman@candle.pha.pa.us> writes:
    Many transactions are going to have the same timestamp because that just
    isn't precise enough to choose a particular transaction.
    I think this argument is largely a red herring ... but if it makes you
    feel better, we could change the contents of the commit timestamp to
    be gettimeofday() output (seconds+microseconds) instead of just time()
    output. That should be precise enough for practical purposes.
    I am saying timestamp as used for specifying a recovery location might
    not be unique enough, no?

    --
    Bruce Momjian | http://candle.pha.pa.us
    pgman@candle.pha.pa.us | (610) 359-1001
    + If your life is a hard drive, | 13 Roberts Road
    + Christ can be your backup. | Newtown Square, Pennsylvania 19073
  • Tom Lane at May 11, 2004 at 11:04 pm

    Bruce Momjian writes:
    Tom Lane wrote:
    I think this argument is largely a red herring ... but if it makes you
    feel better, we could change the contents of the commit timestamp to
    be gettimeofday() output (seconds+microseconds) instead of just time()
    output. That should be precise enough for practical purposes.
    I am saying timestamp as used for specifying a recovery location might
    not be unique enough, no?
    Why not? I don't think there are going to be practical situations where
    the user knows that he wants transactions up till exactly 6:48:52 PM
    anyway. He'll be lucky if he knows that the junior DBA dropped the
    wrong table around 6:30 :-(. It's even less likely that he desperately
    needs to revert to before such a disaster but still get in a transaction
    that happened to commit at the same second. Take it down to the
    microsecond level and the use-case becomes vanishingly small.

    What is *very* likely, but has no chance of being supported by PITR,
    is that he wishes he could keep the results of transactions committed
    shortly after the error he's trying to undo. At the microsecond level,
    the transaction commit order is not that predictable anyway (except of
    course when one transaction's lock blocks another). I think it's
    pointless to get all exercised about whether you can stop on *this*
    transaction and not *that* one when they are committing so close
    together. Half the time the user will be wishing he could omit the
    first and include the second, but he doesn't get to do that, and so
    I say being able to point to the stopping xact by xid is just an
    exercise in technical hair-splitting. I don't think it solves any
    real-world problem.

    Another way of putting this is that one-second resolution in the states
    you have available to restore to is plenty good enough for real-world
    cases.

    regards, tom lane
  • Bruce Momjian at May 12, 2004 at 3:47 am

    Tom Lane wrote:
    Bruce Momjian <pgman@candle.pha.pa.us> writes:
    Tom Lane wrote:
    I think this argument is largely a red herring ... but if it makes you
    feel better, we could change the contents of the commit timestamp to
    be gettimeofday() output (seconds+microseconds) instead of just time()
    output. That should be precise enough for practical purposes.
    I am saying timestamp as used for specifying a recovery location might
    not be unique enough, no?
    Why not? I don't think there are going to be practical situations where
    the user knows that he wants transactions up till exactly 6:48:52 PM
    anyway. He'll be lucky if he knows that the junior DBA dropped the
    wrong table around 6:30 :-(. It's even less likely that he desperately
    needs to revert to before such a disaster but still get in a transaction
    that happened to commit at the same second. Take it down to the
    microsecond level and the use-case becomes vanishingly small.
    Here is my logic. Once they have a way to dump the WAL contents, folks
    trying to recover to a specific point in the past are going to look at
    the WAL dump and hopefully identify the transaction that was bad. They
    then will want to roll back to just before that transaction.

    Do they subtract one second from the transaction? Seems it would be
    easier to just pick the xid that was just before the bad one. Also,
    considering the various time formats and timezone issues that it is
    simpler to just have them specify an xid.

    --
    Bruce Momjian | http://candle.pha.pa.us
    pgman@candle.pha.pa.us | (610) 359-1001
    + If your life is a hard drive, | 13 Roberts Road
    + Christ can be your backup. | Newtown Square, Pennsylvania 19073
  • Simon Riggs at May 12, 2004 at 7:23 am

    On Wed, 2004-05-12 at 04:47, Bruce Momjian wrote:
    Tom Lane wrote:
    Bruce Momjian <pgman@candle.pha.pa.us> writes:
    Tom Lane wrote:
    I think this argument is largely a red herring ... but if it makes you
    feel better, we could change the contents of the commit timestamp to
    be gettimeofday() output (seconds+microseconds) instead of just time()
    output. That should be precise enough for practical purposes.
    I am saying timestamp as used for specifying a recovery location might
    not be unique enough, no?
    Why not? I don't think there are going to be practical situations where
    the user knows that he wants transactions up till exactly 6:48:52 PM
    anyway. He'll be lucky if he knows that the junior DBA dropped the
    wrong table around 6:30 :-(. It's even less likely that he desperately
    needs to revert to before such a disaster but still get in a transaction
    that happened to commit at the same second. Take it down to the
    microsecond level and the use-case becomes vanishingly small.
    Here is my logic. Once they have a way to dump the WAL contents, folks
    trying to recover to a specific point in the past are going to look at
    the WAL dump and hopefully identify the transaction that was bad. They
    then will want to roll back to just before that transaction.

    Do they subtract one second from the transaction? Seems it would be
    easier to just pick the xid that was just before the bad one. Also,
    considering the various time formats and timezone issues that it is
    simpler to just have them specify an xid.
    Well, I think I agree with both sides of this debate.

    Solution: provide both timestamp AND Xid capability. We assume that if
    they specify Xid, it is because they know and, for whatever reason,
    care, about the exact specification of where recovery stops.

    If you know a large statement just executed in error, then you want to
    restore back to just before the error.

    My earlier angst was based upon mistaking that there was no timestamp.
    There is now a simple choice of recovery targets and fairly simple to
    implement both. Design is now clear for me.

    Best Regards, Simon Riggs
  • Alvaro Herrera at May 11, 2004 at 9:26 pm

    On Tue, May 11, 2004 at 09:25:37PM +0100, Simon Riggs wrote:
    On Tue, 2004-05-11 at 16:33, Bruce Momjian wrote:
    Tom Lane wrote:
    Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
    Hmm ... I think it should be forbidden to quote a subtrans Xid as
    rollforward point. Not sure if that can be done though, or how to do
    it.
    Seems like a nonissue, unless the XLOG trace makes a subtrans look the
    same as a main trans, which it'd not do would it?
    I agree that a subtrans xid should not be a valid rollforward point.
    If you try to do that you'll fail because there will be no XLog record
    signalling the commit of a subtransaction. They will be marked
    committed as necessary as a subproduct of main transaction committing.

    Currently, recovery loops until end of xlogs. There is no exit condition
    from the loop. There is not currently a timestamp on the xlogs -
    anywhere apart from the file date on each xlog.
    Both xact commit and abort have timestamps in the XLog. I think valid
    recovery points are transaction commit/abort, not transaction start.

    If we go searching for a particular Xid, there is no way to tell whether
    an Xid suggested by a user is too big or too small for use as a recovery
    target. We need to recover - it is the only way to tell; if we find an
    Xid that matches, we stop. If not, we keep going until end of logs, when
    we need to issue a "recovered fully - the Xid you gave was not valid",
    which may take some time and is also very clearly not what was wanted.
    I think the user should first examine the logs with whatever tools are
    provided, and use a timestamp or a Xid listed in the XLog. If they use
    a Xid that's not listed, it's not our fault ...

    b) later, a utility that allowed xlogs to be inspected to allow DBA to
    decide which is the correct Xid to recover to.
    Why is this difficult? There are lots of subsys_desc() functions which
    already returns what's in each log record as a string. The tool could
    initially just dump that ...

    Therefore: action on me? - add a timestamp to EACH xlog record -
    something I had been shying away from.
    You only need timestamps in xl_xact_commit and xl_xact_abort, which are
    already there.
    On Tue, 2004-05-11 at 14:56, Alvaro Herrera wrote:
    (Unrelated: note that after main transaction commit, a committed
    subtransaction is indistinguishable from a committed main transaction --
    and with the current idea of XLog I have, after recovering a transaction
    tree from XLog there won't be any mark in pg_subtrans. So the system
    will not be exactly as it was before but it won't matter.)
    I don't think we need a subtrans commit directly, since if the top-level
    commits after the subtrans has committed, then we're good.
    If the subxact wrote a tuple, its Xid has to be in the pg_clog. Thus
    we need to recover the pg_clog write.
    However, if a subtrans aborts, yet the top-level commits there will be
    data written to the database about an aborted transaction. We don't have
    Undo, so the subtrans clog must be updated to show that the subtrans
    aborted, otherwise we would read both the committed (top-level) and the
    uncommitted data (subtrans).
    If the aborted subxact wrote a tuple, its Xid has to be in the pg_clog.
    Another way of putting it - if it was worth writing before a crash, it
    is worth recovering after a crash. Surely?
    Right. What I was saying is that we don't need pg_subtrans info,
    because that's only needed while the subtransaction is marked as
    "subcommitted" but it's parent hasn't committed or aborted yet. The
    subcommitted status is changed to committed/aborted when the main
    transaction commits or aborts; at recovery time, we already know if that
    happenned or not so we can mark it right away.

    --
    Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
    "La vida es para el que se aventura"
  • Tom Lane at May 11, 2004 at 9:54 pm

    Alvaro Herrera writes:
    Both xact commit and abort have timestamps in the XLog. I think valid
    recovery points are transaction commit/abort, not transaction start.
    A transaction abort cannot be an interesting stop point either; you may
    as well stop at the latest preceding commit. Whatever the aborted
    transaction did need not be entered into the database.

    The abort record timestamps may be useful for navigation (to help
    determine where you want to stop) but AFAICS only main transaction
    commit records are useful stopping points.

    regards, tom lane
  • Simon Riggs at May 11, 2004 at 10:08 pm
    Thanks to all of you for such swift advice and correction!
    Have a good evening...

    On Tue, 2004-05-11 at 22:26, Alvaro Herrera wrote:
    On Tue, May 11, 2004 at 09:25:37PM +0100, Simon Riggs wrote:
    On Tue, 2004-05-11 at 16:33, Bruce Momjian wrote:
    Tom Lane wrote:
    Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
    Both xact commit and abort have timestamps in the XLog. I think valid
    recovery points are transaction commit/abort, not transaction start.
    Agreed.
    If we go searching for a particular Xid, there is no way to tell whether
    an Xid suggested by a user is too big or too small for use as a recovery
    target. We need to recover - it is the only way to tell; if we find an
    Xid that matches, we stop. If not, we keep going until end of logs, when
    we need to issue a "recovered fully - the Xid you gave was not valid",
    which may take some time and is also very clearly not what was wanted.
    I think the user should first examine the logs with whatever tools are
    provided, and use a timestamp or a Xid listed in the XLog. If they use
    a Xid that's not listed, it's not our fault ...
    Blame is in the eye of the beholder. Anyway, because of above, no
    worries.
    b) later, a utility that allowed xlogs to be inspected to allow DBA to
    decide which is the correct Xid to recover to.
    Why is this difficult? There are lots of subsys_desc() functions which
    already returns what's in each log record as a string. The tool could
    initially just dump that ...
    It's not hard, its all a question of finite time resources.
    Therefore: action on me? - add a timestamp to EACH xlog record -
    something I had been shying away from.
    You only need timestamps in xl_xact_commit and xl_xact_abort, which are
    already there.
    On Tue, 2004-05-11 at 14:56, Alvaro Herrera wrote:
    (Unrelated: note that after main transaction commit, a committed
    subtransaction is indistinguishable from a committed main transaction --
    and with the current idea of XLog I have, after recovering a transaction
    tree from XLog there won't be any mark in pg_subtrans. So the system
    will not be exactly as it was before but it won't matter.)
    I don't think we need a subtrans commit directly, since if the top-level
    commits after the subtrans has committed, then we're good.
    If the subxact wrote a tuple, its Xid has to be in the pg_clog. Thus
    we need to recover the pg_clog write.
    However, if a subtrans aborts, yet the top-level commits there will be
    data written to the database about an aborted transaction. We don't have
    Undo, so the subtrans clog must be updated to show that the subtrans
    aborted, otherwise we would read both the committed (top-level) and the
    uncommitted data (subtrans).
    If the aborted subxact wrote a tuple, its Xid has to be in the pg_clog.
    Another way of putting it - if it was worth writing before a crash, it
    is worth recovering after a crash. Surely?
    Right. What I was saying is that we don't need pg_subtrans info,
    because that's only needed while the subtransaction is marked as
    "subcommitted" but it's parent hasn't committed or aborted yet. The
    subcommitted status is changed to committed/aborted when the main
    transaction commits or aborts; at recovery time, we already know if that
    happenned or not so we can mark it right away.
    Maybe I'm just not following you, which seems likely.

    Surely it is possible that a subtrans could abort, yet the fulltrans
    commits. Yet that doesn't in any way imply that the subtrans should
    change from aborted to committed?

    Anyway, I've raised my thoughts, so if you're still OK, thats fine by
    me. I'll get back to the wheel to see what can be spun...

    Best regards, Simon Riggs
  • Alvaro Herrera at May 11, 2004 at 10:30 pm

    On Tue, May 11, 2004 at 11:07:47PM +0100, Simon Riggs wrote:
    On Tue, 2004-05-11 at 22:26, Alvaro Herrera wrote:

    Right. What I was saying is that we don't need pg_subtrans info,
    because that's only needed while the subtransaction is marked as
    "subcommitted" but it's parent hasn't committed or aborted yet. The
    subcommitted status is changed to committed/aborted when the main
    transaction commits or aborts; at recovery time, we already know if that
    happenned or not so we can mark it right away.
    Maybe I'm just not following you, which seems likely.

    Surely it is possible that a subtrans could abort, yet the fulltrans
    commits. Yet that doesn't in any way imply that the subtrans should
    change from aborted to committed?
    The subtransaction will be recorded as aborted in pg_clog (I mean there
    will be a XLog record saying that it aborted). That protects it from
    showing as committed, no matter what happens to the parent xact.

    --
    Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
    "La tristeza es un muro entre dos jardines" (Khalil Gibran)
  • Tom Lane at May 11, 2004 at 10:19 pm

    Alvaro Herrera writes:
    On Tue, May 11, 2004 at 09:25:37PM +0100, Simon Riggs wrote:
    Another way of putting it - if it was worth writing before a crash, it
    is worth recovering after a crash. Surely?
    Right.
    Wrong. Only actions that were *committed* before the crash, or in
    general the selected recovery stopping time, need be recovered. The
    actions-so-far of transactions that were in progress at that time don't
    have to be redone. It doesn't really matter if we do do them; we have a
    consistent database state either way. I'm just pointing this out for
    the sake of clear thinking.
    What I was saying is that we don't need pg_subtrans info,
    because that's only needed while the subtransaction is marked as
    "subcommitted" but it's parent hasn't committed or aborted yet. The
    subcommitted status is changed to committed/aborted when the main
    transaction commits or aborts; at recovery time, we already know if that
    happenned or not so we can mark it right away.
    Agreed, if pg_subtrans is never consulted except for children of
    in-progress (or recently-completed) main transactions, then it need not
    be recovered. However, I'm not sure this is really true. Couldn't
    there be an entry in pg_clog showing a SUBTRANS COMMIT state immediately
    after a recovery? If the main transaction never gets to commit or
    abort, there would be nothing in XLOG to cue the recovery process to go
    and clear that clog entry. So without some further mechanism I'd expect
    the first backend that visits the tuple in question to go and consult
    pg_subtrans. There would need to be a way of knowing "XIDs older than
    this are by definition not to be sought in pg_subtrans". With a cutoff,
    you could assume that SUBTRANS COMMIT entries older than the cutoff can
    be changed to ABORT state.

    What's not clear to me is how we can advance that cutoff; it might have
    to go along with the clog truncation logic.

    regards, tom lane
  • Alvaro Herrera at May 11, 2004 at 3:40 pm

    On Tue, May 11, 2004 at 10:37:49AM -0400, Tom Lane wrote:
    Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
    Hmm ... I think it should be forbidden to quote a subtrans Xid as
    rollforward point. Not sure if that can be done though, or how to do
    it.
    Seems like a nonissue, unless the XLOG trace makes a subtrans look the
    same as a main trans, which it'd not do would it?
    Huh, certainly not.
    We could allow specification of a subtrans ID to be interpreted the same
    as specification of its parent main trans. Dunno if that's actually
    useful to anyone.
    Probably not.

    --
    Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
    "If it wasn't for my companion, I believe I'd be having
    the time of my life" (John Dunbar)
  • Andreas Zeugswetter at May 12, 2004 at 9:20 am

    I think this argument is largely a red herring ... but if it makes you
    feel better, we could change the contents of the commit timestamp to
    be gettimeofday() output (seconds+microseconds) instead of just time()
    output. That should be precise enough for practical purposes.
    I am saying timestamp as used for specifying a recovery location might
    not be unique enough, no?
    Maybe the api should allow a xid only in addition to a timestamp, for use when
    timestamp alone is not precise enough. That would solve the problem of not
    finding the xid. I would stay away from microseconds.

    Andreas

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedMay 10, '04 at 11:13p
activeMay 12, '04 at 9:20a
posts23
users5
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase