On Tue, 2004-05-11 at 16:33, Bruce Momjian wrote:
Tom Lane wrote:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
Hmm ... I think it should be forbidden to quote a subtrans Xid as
rollforward point. Not sure if that can be done though, or how to do
Seems like a nonissue, unless the XLOG trace makes a subtrans look the
same as a main trans, which it'd not do would it?
I agree that a subtrans xid should not be a valid rollforward point.

Forgive me discussing what seems like obvious points - I'm sure you
appreciate we need an exact statement of how/when to terminate recovery
and that might be found in looking harder at the subtrans questions.
This is my third re-write of this e-mail, since I keep thinking of
additional things while going for the "definitive statement". I had
thought this was straightforward...

Currently, recovery loops until end of xlogs. There is no exit condition
from the loop. There is not currently a timestamp on the xlogs -
anywhere apart from the file date on each xlog.

Xids are assigned sequentially to transactions as they start. However,
Xids are not committed sequentially. Moreover, checkpoint records do not
wait for transactions to complete, so a checkpoint could record an Xid,
yet a lower Xid might still be in progress and commit sometime after the
checkpoint. So, when we do a backup, we might take with us a pg_control
that has a particular Xid, only to find lots of later committed, but
earlier Xids in the xlogs. So Xid can have no lower bound. (and a fully
formed clog is essential to recovery).

If we go searching for a particular Xid, there is no way to tell whether
an Xid suggested by a user is too big or too small for use as a recovery
target. We need to recover - it is the only way to tell; if we find an
Xid that matches, we stop. If not, we keep going until end of logs, when
we need to issue a "recovered fully - the Xid you gave was not valid",
which may take some time and is also very clearly not what was wanted.
(If they had wanted full recovery, they would have asked).

So searching on an Xid is inherently a poor way to recover. Which is a
shame, because it seemed like an easy target. Unless of course, we live
with this vagueness and get on and build the XLogSpy...

Xlog records ARE written sequentially, so a timestamp written to the
xlogs COULD be used as a target for halting recovery. We would be able
to decide, ahead of starting recovery, whether we would be able to
sensibly recover to that point by using the pg_control checkpoint time
as the lower bound and the file write times of the highest xlog as the
upper bound. Once decided that the target timestamp lies between upper
and lower bounds, we begin recovery, knowing exactly where it will

During recovery, we would search for a timestamp. If found exactly,
stop. If exceeded, stop. Any transactions not committed at that point
are, as we say, out of luck. ....This approach has a certainty about it
that I think is much better than the error prone Xid hunting approach,
and is also more attuned to the human reality (time matters, Xids

Earlier, Bruce and I had discussed that for reasons of time pressure,
the PITR code for this release would consist of
a) recovery to a particular Xid
b) later, a utility that allowed xlogs to be inspected to allow DBA to
decide which is the correct Xid to recover to.
Those ideas don't sound as good now....

Therefore: action on me? - add a timestamp to EACH xlog record -
something I had been shying away from.
On Tue, 2004-05-11 at 14:56, Alvaro Herrera wrote:
(Unrelated: note that after main transaction commit, a committed
subtransaction is indistinguishable from a committed main transaction --
and with the current idea of XLog I have, after recovering a transaction
tree from XLog there won't be any mark in pg_subtrans. So the system
will not be exactly as it was before but it won't matter.)
I don't think we need a subtrans commit directly, since if the top-level
commits after the subtrans has committed, then we're good.

However, if a subtrans aborts, yet the top-level commits there will be
data written to the database about an aborted transaction. We don't have
Undo, so the subtrans clog must be updated to show that the subtrans
aborted, otherwise we would read both the committed (top-level) and the
uncommitted data (subtrans).

Another way of putting it - if it was worth writing before a crash, it
is worth recovering after a crash. Shurely?
We could allow specification of a subtrans ID to be interpreted the same
as specification of its parent main trans. Dunno if that's actually
useful to anyone. Actually, I'd think that people would generally
specify recovery up to a particular timestamp, and not be interested in
xact numbers at all ...
I don't think timestamp is going to be precise enough. Basically I can
see someone saying I want recovery up to 4am, but anything more specific
will need xid. I suggested that we write an xlog dump tool so you can
see the xids (with some xid details) and rough timestamps stored in the
WAL file and choose the xid for recovery.
Bruce, As I started this e-mail (1st time), I completely agreed with
you. I've now had to switch my thinking.

(Doesn't effect archiving architecture....)

I'm a little dazed....comments anyone?

Best regards, Simon Riggs

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 8 of 23 | next ›
Discussion Overview
grouppgsql-hackers @
postedMay 10, '04 at 11:13p
activeMay 12, '04 at 9:20a



site design / logo © 2021 Grokbase