FAQ

On 30 May 2013 19:39, Robert Haas wrote:
On Thu, May 30, 2013 at 9:33 AM, Heikki Linnakangas
wrote:
The reason we have to freeze is that otherwise our 32-bit XIDs wrap around
and become ambiguous. The obvious solution is to extend XIDs to 64 bits, but
that would waste a lot space. The trick is to add a field to the page header
indicating the 'epoch' of the XID, while keeping the XIDs in tuple header
32-bit wide (*). Check.
The other reason we freeze is to truncate the clog. But with 64-bit XIDs, we
wouldn't actually need to change old XIDs on disk to FrozenXid. Instead, we
could implicitly treat anything older than relfrozenxid as frozen. Check.
That's the basic idea. Vacuum freeze only needs to remove dead tuples, but
doesn't need to dirty pages that contain no dead tuples.
Check.
Yes, this is the critical point. Large insert-only tables don't need
to be completely re-written twice.

Since we're not storing 64-bit wide XIDs on every tuple, we'd still need to
replace the XIDs with FrozenXid whenever the difference between the smallest
and largest XID on a page exceeds 2^31. But that would only happen when
you're updating the page, in which case the page is dirtied anyway, so it
wouldn't cause any extra I/O.
It would cause some extra WAL activity, but it wouldn't dirty the page
an extra time.
This would also be the first step in allowing the clog to grow larger than 2
billion transactions, eliminating the need for anti-wraparound freezing
altogether. You'd still want to truncate the clog eventually, but it would
be nice to not be pressed against the wall with "run vacuum freeze now, or
the system will shut down".
Interesting. That seems like a major advantage.
(*) "Adding an epoch" is inaccurate, but I like to use that as my mental
model. If you just add a 32-bit epoch field, then you cannot have xids from
different epochs on the page, which would be a problem. In reality, you
would store one 64-bit XID value in the page header, and use that as the
"reference point" for all the 32-bit XIDs on the tuples. See existing
convert_txid() function for how that works. Another method is to store the
32-bit xid values in tuple headers as offsets from the per-page 64-bit
value, but then you'd always need to have the 64-bit value at hand when
interpreting the XIDs, even if they're all recent.
As I see it, the main downsides of this approach are:

(1) It breaks binary compatibility (unless you do something to
provided for it, like put the epoch in the special space).

(2) It consumes 8 bytes per page. I think it would be possible to get
this down to say 5 bytes per page pretty easily; we'd simply decide
that the low-order 3 bytes of the reference XID must always be 0.
Possibly you could even do with 4 bytes, or 4 bytes plus some number
of extra bits.
Yes, the idea of having a "base Xid" on every page is complicated and
breaks compatibility. Same idea can work well if we do this via tuple
headers.

(3) You still need to periodically scan the entire relation, or else
have a freeze map as Simon and Josh suggested.
I don't think that is needed with this approach.

(The freeze map was Andres' idea, not mine. I just accepted it as what
I thought was the only way forwards. Now I see other ways)
The upsides of this approach as compared with what Andres and I are
proposing are:

(1) It provides a stepping stone towards allowing indefinite expansion
of CLOG, which is quite appealing as an alternative to a hard
shut-down.
I would be against expansion of the CLOG beyond its current size. If
we have removed all aborted rows and marked hints, then we don't need
the CLOG values and can trim that down.

I don't mind the hints, its the freezing we don't need.

convert_txid() function for how that works. Another method is to store the
32-bit xid values in tuple headers as offsets from the per-page 64-bit
value, but then you'd always need to have the 64-bit value at hand when
interpreting the XIDs, even if they're all recent.
You've touched here on the idea of putting the epoch in the tuple
header, which is where what I posted comes together. We don't need
anything at page level, we just need something on each tuple.

Please can you look at my recent post on how to put this in the tuple header?

--
  Simon Riggs http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 14 of 39 | next ›
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedMay 30, '13 at 1:34p
activeAug 30, '13 at 6:34p
posts39
users9
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase