So, as the testing rolls on, I started to see some failures in various
ALTER-FOREIGN-thingy commands. The cause proved to be that numerous
places in foreigncmds.c do this:

tuple = SearchSysCacheCopy(...);

... alter the tuple as needed ...

rel = heap_open(target-catalog, RowExclusiveLock);

simple_heap_update(rel, &tuple->t_self, tuple);

heap_close(rel, RowExclusiveLock);

rather than the more common pattern in which the catalog is opened
first. I confess to not having realized this myself (or if I ever did
know it, I'd forgotten), but *the above coding pattern is not safe*.
You must get your lock on the catalog *before* looking up the target
tuple, else its TID may be obsoleted by a concurrent vacuum full before
you've obtained lock on the catalog. Both update and delete operations
are at risk in this way.

foreigncmds.c is not hard to fix, but the scary aspect of this is the
possibility that we've made the same mistake elsewhere, or might do so
again in future. Some desultory examination of simple_heap_update and
simple_heap_delete calls didn't find any other instances, but I am not
sure I didn't miss anything. And this seems like an easy trap to fall
into when refactoring (the current work to try to unify operations like
ALTER OWNER could easily get into this kind of problem, for instance).

I tried to think of some practical way to mechanically test for this
type of error, but came up with nothing. Any ideas?

regards, tom lane

Search Discussions

  • Robert Haas at Aug 14, 2011 at 8:32 pm

    On Sun, Aug 14, 2011 at 2:21 PM, Tom Lane wrote:
    So, as the testing rolls on, I started to see some failures in various
    ALTER-FOREIGN-thingy commands.  The cause proved to be that numerous
    places in foreigncmds.c do this:

    tuple = SearchSysCacheCopy(...);

    ... alter the tuple as needed ...

    rel = heap_open(target-catalog, RowExclusiveLock);

    simple_heap_update(rel, &tuple->t_self, tuple);

    heap_close(rel, RowExclusiveLock);

    rather than the more common pattern in which the catalog is opened
    first.
    Interesting. I vaguely recall flipping some of those around (to put
    the lock acquisition first) before committing the 9.1-era foreign
    table patch; it didn't seem like an entirely healthy thing to do. But
    I didn't really have any concrete notion of why it might be dangerous.
    foreigncmds.c is not hard to fix, but the scary aspect of this is the
    possibility that we've made the same mistake elsewhere, or might do so
    again in future.  Some desultory examination of simple_heap_update and
    simple_heap_delete calls didn't find any other instances, but I am not
    sure I didn't miss anything.  And this seems like an easy trap to fall
    into when refactoring (the current work to try to unify operations like
    ALTER OWNER could easily get into this kind of problem, for instance).

    I tried to think of some practical way to mechanically test for this
    type of error, but came up with nothing.  Any ideas?
    Hmm. How about setting the TID to an illegal value of some kind when
    a catcache tuple is extracted without a table lock? Then any
    subsequent update or delete using that tuple would blow up. I think
    that'd be way too expensive to do in normal running but perhaps we
    could have a #define...

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedAug 14, '11 at 6:22p
activeAug 14, '11 at 8:32p
posts2
users2
websitepostgresql.org...
irc#postgresql

2 users in discussion

Tom Lane: 1 post Robert Haas: 1 post

People

Translate

site design / logo © 2022 Grokbase