On Wed, Aug 28, 2013 at 9:02 AM, Andres Freund wrote:
During swap phase, process was waiting for transactions with older
snapshots than the one taken by transaction doing the swap as they
might hold the old index information. I think that we can get rid of
it thanks to the MVCC snapshots as other backends are now able to see
what is the correct index information to fetch.
I don't see MVCC snapshots guaranteeing that. The only thing changed due
to them is that other backends see a self consistent picture of the
catalog (i.e. not either, neither or both versions of a tuple as
earlier). It's still can be out of date. And we rely on those not being
out of date.

I need to look into the patch for more details.
I agree with Andres. The only way in which the MVCC catalog snapshot
patch helps is that you can now do a transactional update on a system
catalog table without fearing that other backends will see the row as
nonexistent or duplicated. They will see exactly one version of the
row, just as you would naturally expect. However, a backend's
syscaches can still contain old versions of rows, and they can still
cache older versions of some tuples and newer versions of other
tuples. Those caches only get reloaded when shared-invalidation
messages are processed, and that only happens when the backend
acquires a lock on a new relation.

I have been of the opinion for some time now that the
shared-invalidation code is not a particularly good design for much of
what we need. Waiting for an old snapshot is often a proxy for
waiting long enough that we can be sure every other backend will
process the shared-invalidation message before it next uses any of the
cached data that will be invalidated by that message. However, it
would be better to be able to send invalidation messages in some way
that causes them to processed more eagerly by other backends, and that
provides some more specific feedback on whether or not they have
actually been processed. Then we could send the invalidation
messages, wait just until everyone confirms that they have been seen,
which should hopefully happen quickly, and then proceed. This would
probably lead to much shorter waits. Or maybe we should have
individual backends process invalidations more frequently, and try to
set things up so that once an invalidation is sent, the sending
backend is immediately guaranteed that it will be processed soon
enough, and thus it doesn't need to wait at all. This is all pie in
the sky, though. I don't have a clear idea how to design something
that's an improvement over the (rather intricate) system we have

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Search Discussions

Discussion Posts


Related Discussions



site design / logo © 2017 Grokbase