On Fri, Mar 18, 2011 at 2:37 PM, Markus Wanner wrote:
On 03/18/2011 02:40 PM, Kevin Grittner wrote:
Then the only thing you would consider sync replication, as far as I
can see, is two phase commit
I think waiting for the ACK before actually making the changes from the
transaction visible (COMMIT) would suffice for disallowing such an
inconsistency to manifest. But obviously, MySQL decided it's not worth
doing that, as it's such a rare event and a short period of time that
may show inconsistencies...
There are fewer options for implementing this in MySQL because
replication requires a binlog on the master and that requires the
internal use of XA to keep the binlog and InnoDB in sync as they are
separate resource managers. In theory, this can be changed so that
commit is only forced for the binlog and then on a crash missing
transactions could be copied from the binlog to InnoDB but I don't
think this will ever change.
By "fewer options" I mean that commit in MySQL with InnoDB and the
1) prepare to InnoDB (force transaction log to disk for changes from
2) write binlog events from this transaction to the binlog
3) write XID event to the binlog (at this point transaction commit is
official, will survive a crash)
4) force binlog to disk
5) release row locks held by transaction in innodb
6) write commit record to innodb transaction log
7) force write of commit record to disk
Group commit is done for the fsyncs from steps 1 and 7. It is not done
for the fsync done in step 4.
Regardless, the processing above is complicated even without
semi-sync. AFAIK, semi-sync code occurs after step 7 but I have not
looked at the official version of semi-sync code in MySQL and my
memory of the work we did at Google is vague.
It is great if Postgres doesn't have this issue. It wasn't clear to me
from lurking on this list. I hope your docs highlight the behavior as
not having the issue is a big deal.