On Fri, 2010-09-17 at 11:09 +0300, Heikki Linnakangas wrote:
(changed subject again.)
On 17/09/10 10:06, Simon Riggs wrote:
I don't think we can determine how far to implement without considering
both approaches in detail. With regard to your points below, I don't
think any of those points could be committed first.
Yeah, I think we need to decide on the desired feature set first, before
we dig deeper into the the patches. The design and implementation will
fall out of that.
Well, we've discussed these things many times and talking hasn't got us
very far on its own. We need measurements and neutral assessments.
The patches are simple and we have time.
This isn't just about UI, there are significant and important
differences between the proposals in terms of the capability and control
I propose we develop both patches further and performance test them.
Many of the features I have proposed are performance related and people
need to be able to see what is important, and what is not. But not
through mere discussion, we need numbers to show which things matter and
which things don't. And those need to be derived objectively.
* Support multiple standbys with various synchronization levels.
* What happens if a synchronous standby isn't connected at the moment?
Return immediately vs. wait forever.
* Per-transaction control. Some transactions are important, others are not.
* Quorum commit. Wait until n standbys acknowledge. n=1 and n=all
servers can be seen as important special cases of this.
* async, recv, fsync and replay levels of synchronization.
That's a reasonable starting list of points, there may be others.
So what should the user interface be like? Given the 1st and 2nd
requirement, we need standby registration. If some standbys are
important and others are not, the master needs to distinguish between
them to be able to determine that a transaction is safely delivered to
the important standbys.
My patch provides those two requirements without standby registration,
so we very clearly don't "need" standby registration.
The question is do we want standby registration on master and if so,
For per-transaction control, ISTM it would be enough to have a simple
user-settable GUC like synchronous_commit. Let's call it
"synchronous_replication_commit" for now.
If you wish to change the name of the GUC away from the one I have
proposed, fine. Please note that aspect isn't important to me and I will
happily concede all such points to the majority view.
For non-critical transactions,http://archives.postgresql.org/pgsql-hackers/2008-07/msg01001.php
you can turn it off. That's very simple for developers to understand and
use. I don't think we need more fine-grained control than that at
transaction level, in all the use cases I can think of you have a stream
of important transactions, mixed with non-important ones like log
messages that you want to finish fast in a best-effort fashion.
Sounds like we're getting somewhere. See below.
actually tempted to tie that to the existing synchronous_commit GUC, the
use case seems exactly the same.
Check the date!
I think that particular point is going to confuse us. It will draw much
bike shedding and won't help us decide between patches. It's a nicety
that can be left to a time after we have the core feature committed.
OTOH, if we do want fine-grained per-transaction control, a simple
boolean or even an enum GUC doesn't really cut it. For truly
fine-grained control you want to be able to specify exceptions like
"wait until this is replayed in slave named 'reporting'" or 'don't wait
for acknowledgment from slave named 'uk-server'". With standby
registration, we can invent a syntax for specifying overriding rules in
the transaction. Something like SET replication_exceptions =
For the control between async/recv/fsync/replay, I like to think in
a) asynchronous vs synchronous
b) if it's synchronous, how synchronous is it? recv, fsync or replay?
I think it makes most sense to set sync vs. async in the master, and the
level of synchronicity in the slave. Although I have sympathy for the
argument that it's simpler if you configure it all from the master side
I have catered for such requests by suggesting a plugin that allows you
to implement that complexity without overburdening the core code.
This strikes me as an "ad absurdum" argument. Since the above
over-complexity would doubtless be seen as insane by Tom et al, it
attempts to persuade that we don't need recv, fsync and apply either.
Fujii has long talked about 4 levels of service also. Why change? I had
thought that part was pretty much agreed between all of us.
Without performance tests to demonstrate "why", these do sound hard to
understand. But we should note that DRBD offers recv ("B") and fsync
("C") as separate options. And Oracle implements all 3 of recv, fsync
and apply. Neither of them describe those options so simply and easily
as the way we are proposing with a 4 valued enum (with async as the
If we have only one option for sync_rep = 'on' which of recv | fsync |
apply would it implement? You don't mention that. Which do you choose?
For what reason do you make that restriction? The code doesn't get any
simpler, in my patch at least, from my perspective it would be a
restriction without benefit.
I no longer seek to persuade by words alone. The existence of my patch
means that I think that only measurements and tests will show why I have
been saying these things. We need performance tests. I'm not ready for
them today, but will be very soon. I suspect you aren't either since
from earlier discussions you didn't appear to have much about overall
throughput, only about response times for single transactions. I'm happy
to be proved wrong there.
Putting all of that together. I think Fujii-san's standby.conf is pretty
What it needs is the additional GUC for transaction-level control.
The difference between the patches is not a simple matter of a GUC.
My proposal allows a single standby to provide efficient replies to
multiple requested durability levels all at the same time. With
efficient use of network resources. ISTM that because the other patch
cannot provide that you'd like to persuade us that we don't need that,
ever. You won't sell me on that point, cos I can see lots of uses for
Another use case for you:
* customer orders are important, but we want lots of them, so we use
recv mode for those.
* pricing data hardly ever changes, but when it does we need it to be
applied across the cluster so we don't get read mismatches, so those
rare transactions use apply mode.
If you don't want multiple modes at once, you don't need to use that
feature. But there is no reason to prevent people having the choice,
when a design exists that can provide it.
(A separate and later point, is that I would one day like to annotate
specific tables and functions with different modes, so a sysadmin can
point out which data is important at table level - which is what MySQL
provides by allowing choice of storage engine for particular tables.
Nobody cares about the specific engine, they care about the durability
implications of those choices. This isn't part of the current proposal,
just a later statement of direction.)
Simon Riggs www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Training and Services