On Wed, Aug 28, 2013 at 10:58 AM, Ants Aasma wrote:
I currently see the following courses of action:

1. Do nothing about the inconsistency, use a transient global counter
for master commit order and commit record LSN for slaves.
Pro: doesn't change any semantics
Con: we are not making any progress towards cluster wide snapshots
or even serializable transactions on slaves.

2. Create a new WAL record type that is inserted when a transaction
becomes visible. LSN of this record determines transaction visibility
order. Async transactions can be optimized to skip this record. This
record does not need to be flushed.
Pro: cluster wide consistency, replication method agnostic
Con: one extra WAL record insertion per writing transaction. (32
bytes of WAL per tx)

3. Use a transient global counter on master, send xid-csn pairs to
slave via a side channel on the replication connection.
Pro: Less overhead than WAL records
Con: replication protocol needs (possibly invasive) changes, WAL
shipping based replication can't use this mechanism, lots of extra
code required.

4. Make the choice between 1 and 2 user configurable (it seems to me
that it could even be changed without a restart).

I think approach #2 is dead on arrival, at least as a default policy.
It essentially amounts to requiring two commit records per transaction
rather than one, and I think that has no chance of being acceptable.
It's not just or even primarily the *volume* of WAL that I'm concerned
about so much as the feeling that hitting WAL twice rather than once
at the end of a transaction that may have only written one or two WAL
records to begin with is going to slow things down pretty
substantially, especially in high-concurrency scenarios.

I wouldn't entirely dismiss the idea of changing the user-visible
semantics. In addition to a WAL insertion pointer and a WAL flush
pointer, you'd have a WAL snapshot pointer, which could run ahead of
the flush pointer if the transactions were all asynchronous, but which
for synchronous transactions could not advance faster than the flush
pointer. Only users running a mix of synchronous_commit=on and
synchronous_commit=off would be harmed, and maybe we could convince
ourselves that's OK.

Still, there's no doubt that there is a downside there. Therefore,
I'm inclined to suggest that you implement #1. If, at a later time,
we want to make progress on the issue of cluster-wide snapshot
consistency, you could implement #2 or #3 as an optional feature that
can be turned on via some flag. However, I would recommend against
trying to do that in the initial patch; I think that doing either #2
or #3 is really a separate feature, and I think if you try to
incorporate all of that code into the main CSN patch it's just going
to be a distraction from what figures to be a very complicated patch
even in minimal form.

If you did choose to implement #2 as an option at some point, it would
probably be worth optimizing for the case where commit ordering and
visibility ordering match, and try to find a design where you only
need the extra WAL record when the orderings don't match. I'm not
sure exactly how to do that, but it might be worth investigating. I
don't think that's enough to save #2 as a default behavior, but it
might make it more palatable as an option.

I agree with what others have said insofar as it would be nifty if we
could use the commit LSN as the commit sequence number. But I think
you've put your finger on why that's not likely to work out well.

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 7 | next ›
Discussion Overview
grouppgsql-hackers @
postedAug 28, '13 at 2:58p
activeAug 29, '13 at 10:24p



site design / logo © 2018 Grokbase