I've been able to reproduce the syndrome reported here and here:
http://archives.postgresql.org/pgsql-performance/2011-02/msg00497.php
http://archives.postgresql.org/pgsql-general/2011-04/msg00487.php

It's a bit tricky to do it reliably, but once you get into the right
state it's reproducible. You need an open transaction sitting in
another session (I used "begin isolation level serializable;
select * from pg_enum;"), and then mix around the following commands

vacuum full pg_class;
reindex table pg_class;
reindex table pg_index;
vacuum full pg_index;

until you get a failure. I added some debugging elog's to the code
while I was chasing this, so my psql terminal window looks like this:

vft=# vacuum full pg_class;
VACUUM
vft=# reindex table pg_class;
WARNING: broken HOT chain at DELETE_IN_PROGRESS tuple 6/30
WARNING: setting indcheckxmin for 2662 in table 2610 fn 65231 tuple 1/52
WARNING: broken HOT chain at DELETE_IN_PROGRESS tuple 6/30
WARNING: broken HOT chain at DELETE_IN_PROGRESS tuple 6/31
WARNING: setting indcheckxmin for 2663 in table 2610 fn 65231 tuple 2/1
REINDEX
vft=# reindex table pg_index;
WARNING: broken HOT chain at RECENTLY_DEAD tuple 2/1
WARNING: setting indcheckxmin for 2678 in table 2610 fn 65231 tuple 1/47
WARNING: broken HOT chain at RECENTLY_DEAD tuple 2/1
WARNING: setting indcheckxmin for 2679 in table 2610 fn 65231 tuple 1/43
REINDEX
vft=# vacuum full pg_index;
ERROR: duplicate key value violates unique constraint "pg_index_indexrelid_index"
DETAIL: Key (indexrelid)=(2678) already exists.
vft=#

The failure is here:

#0 errfinish (dummy=0) at elog.c:369
#1 0x1f2b94 in _bt_check_unique (rel=0x400feb00, itup=0x4016e270,
heapRel=0x40101bb0, buf=205, offset=43, itup_scankey=0x40171040,
checkUnique=UNIQUE_CHECK_YES, is_unique=0x7b03c2b0 "\001")
at nbtinsert.c:404
#2 0x1f2698 in _bt_doinsert (rel=0x400feb00, itup=0x4016e270,
checkUnique=UNIQUE_CHECK_YES, heapRel=0x40101bb0) at nbtinsert.c:161
#3 0x1fa224 in btinsert (fcinfo=0x4005254c) at nbtree.c:229
#4 0x4fd6dc in FunctionCall6 (flinfo=0x0, arg1=0, arg2=0, arg3=0,
arg4=1075232940, arg5=1074797488, arg6=1) at fmgr.c:1417
#5 0x1f0454 in index_insert (indexRelation=0x400feb00, values=0x7b03bdf0,
isnull=0x7b03be70 "", heap_t_ctid=0x4016c0ac, heapRelation=0x40101bb0,
checkUnique=UNIQUE_CHECK_YES) at indexam.c:198
#6 0x250fcc in CatalogIndexInsert (indstate=0x0, heapTuple=0x4016c0a8)
at indexing.c:135
#7 0x25105c in CatalogUpdateIndexes (heapRel=0x0, heapTuple=0x4016c0a8)
at indexing.c:161
#8 0x2506e4 in reindex_index (indexId=2678, skip_constraint_checks=1 '\001')
at index.c:2516
#9 0x250898 in reindex_relation (relid=0, toast_too=0 '\000', flags=2)
at index.c:2630
#10 0x2b0ac8 in finish_heap_swap (OIDOldHeap=2610, OIDNewHeap=65187,
is_system_catalog=0 '\000', swap_toast_by_content=0 '\000',
check_constraints=0, frozenXid=654) at cluster.c:1367
#11 0x2afa04 in rebuild_relation (OldHeap=0xfea3, indexOid=0,
freeze_min_age=-1, freeze_table_age=-1) at cluster.c:609
#12 0x2af370 in cluster_rel (tableOid=2610, indexOid=0, recheck=0 '\000',
verbose=0 '\000', freeze_min_age=-1, freeze_table_age=-1) at cluster.c:394
#13 0x3060fc in vacuum_rel (relid=2610, vacstmt=0x400647b8, do_toast=0 '\000',
for_wraparound=0 '\000', scanned_all=0x7b03b880 "") at vacuum.c:970
#14 0x30500c in vacuum (vacstmt=0x400647b8, relid=2, do_toast=1 '\001',
bstrategy=0x40025458, for_wraparound=0 '\000', isTopLevel=1)
at vacuum.c:230

What is happening is that the preceding "reindex table pg_index" set the
indcheckxmin flag for pg_index's indexes. The new table built by vacuum
full contains no HOT chains at all, let alone broken ones, so at the end
of reindex_index we decide we ought to clear the indcheckxmin flag.
But we are not done swapping the old and new indexes, so the uniqueness
check doesn't work right --- I believe it's looking into the old index
and finding a TID that means something totally different in the new
table.

I had originally been working on the theory that the problem was coming
from the code that sets indcheckxmin in index_build(), but at least in
this manifestation it's actually the code that resets the flag in
reindex_index() that is failing. But IMO they are both buggy as can be,
because they are totally failing to consider the possibility that the
table/index being hacked on is pg_index itself. There are all sorts of
conceivable failure modes there, such as inserting the updated tuple
into the old copy of pg_index instead of the new one, and any of them
might come to pass anytime someone rearranges any of the code within
hailing distance of vacuum/reindex/etc. I suspect that related failures
are possible if other system catalogs need to get consulted during the
syscache fetch, too --- if one of them is the target table, we have the
same sort of issue of executing code that isn't expecting the catalogs
to be moving underneath it.

I thought for awhile about restructuring the reindexing code so that
these indcheckxmin updates don't happen until after we're fully out of
the reindexing and heap-swapping operation. This would be vaguely
similar to the pushups that reindex_relation has to go through to
reindex pg_class successfully. However, it would be pretty messy to
pass back the relevant state to someplace where it'd be safer to fix
the flag.

On further reflection, though, none of this should be necessary because
in reality there should never be any broken HOT chains in pg_index in
the first place. On that theory, the real bug is IndexBuildHeapScan's
laziness about determining whether a dead HOT tuple actually represents
a problem. We could try to make it disassemble the tuple and check the
index fields, but that seems a bit complicated and error-prone for
something that needs to be back-patched.

So, my proposal for a fix is this: change IndexBuildHeapScan so it
never sets ii_BrokenHotChain at all if is_system_catalog is true.
This amounts to assuming that no new indexes get added to system
catalogs after initdb, or at least not during concurrent operations
wherein indcheckxmin would be important.

Comments?

regards, tom lane

Search Discussions

  • Kevin Grittner at Apr 15, 2011 at 7:29 pm

    Tom Lane wrote:

    This amounts to assuming that no new indexes get added to system
    catalogs after initdb, or at least not during concurrent
    operations wherein indcheckxmin would be important.
    Sounds reasonable, but can we enforce it through locks rather than
    assuming, or isn't there a clear way to do that without risking
    deadlocks?

    -Kevin
  • Tom Lane at Apr 15, 2011 at 7:35 pm

    "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
    Tom Lane wrote:
    This amounts to assuming that no new indexes get added to system
    catalogs after initdb, or at least not during concurrent
    operations wherein indcheckxmin would be important.
    Sounds reasonable, but can we enforce it through locks rather than
    assuming, or isn't there a clear way to do that without risking
    deadlocks?
    Well, we already enforce it through the allow_system_table_mods mechanism:

    regression=# create index foo on pg_index(indcheckxmin);
    ERROR: permission denied: "pg_index" is a system catalog

    Yes, you can turn that off if you try hard enough, but it's on your own
    head to know the consequences.

    regards, tom lane
  • Kevin Grittner at Apr 15, 2011 at 7:49 pm

    Tom Lane wrote:
    "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
    can we enforce it through locks [?]
    Well, we already enforce it through the allow_system_table_mods
    mechanism:
    Good enough for me.

    -Kevin
  • Noah Misch at Apr 16, 2011 at 11:19 am

    On Fri, Apr 15, 2011 at 03:01:04PM -0400, Tom Lane wrote:
    What is happening is that the preceding "reindex table pg_index" set the
    indcheckxmin flag for pg_index's indexes. The new table built by vacuum
    full contains no HOT chains at all, let alone broken ones, so at the end
    of reindex_index we decide we ought to clear the indcheckxmin flag.
    But we are not done swapping the old and new indexes, so the uniqueness
    check doesn't work right --- I believe it's looking into the old index
    and finding a TID that means something totally different in the new
    table.
    For that matter, I believe it's also attempting to insert into the old index.
    I had originally been working on the theory that the problem was coming
    from the code that sets indcheckxmin in index_build(), but at least in
    this manifestation it's actually the code that resets the flag in
    reindex_index() that is failing. But IMO they are both buggy as can be,
    because they are totally failing to consider the possibility that the
    table/index being hacked on is pg_index itself. There are all sorts of
    conceivable failure modes there, such as inserting the updated tuple
    into the old copy of pg_index instead of the new one, and any of them
    might come to pass anytime someone rearranges any of the code within
    hailing distance of vacuum/reindex/etc. I suspect that related failures
    are possible if other system catalogs need to get consulted during the
    syscache fetch, too --- if one of them is the target table, we have the
    same sort of issue of executing code that isn't expecting the catalogs
    to be moving underneath it.
    I think we're safe _consulting_ the system catalogs, since systable_beginscan
    notes that case and does not use obsolete indexes. Any other system catalog
    _updates_ are potentially risky, though. Perhaps index_insert() and friends
    should assert that the index is not pending rebuild?
    So, my proposal for a fix is this: change IndexBuildHeapScan so it
    never sets ii_BrokenHotChain at all if is_system_catalog is true.
    This amounts to assuming that no new indexes get added to system
    catalogs after initdb, or at least not during concurrent operations
    wherein indcheckxmin would be important.
    Seems reasonable and effective.

    Thanks,
    nm
  • Tom Lane at Apr 16, 2011 at 3:18 pm

    Noah Misch writes:
    On Fri, Apr 15, 2011 at 03:01:04PM -0400, Tom Lane wrote:
    What is happening is that the preceding "reindex table pg_index" set the
    indcheckxmin flag for pg_index's indexes. The new table built by vacuum
    full contains no HOT chains at all, let alone broken ones, so at the end
    of reindex_index we decide we ought to clear the indcheckxmin flag.
    But we are not done swapping the old and new indexes, so the uniqueness
    check doesn't work right --- I believe it's looking into the old index
    and finding a TID that means something totally different in the new
    table.
    For that matter, I believe it's also attempting to insert into the old index.
    I wondered about that, but it seemed to me that if that happened, it
    ought to have much more visible symptoms --- ie, the committed row would
    not be findable through the new index. The typical case according to my
    testing was that the unique check wouldn't fail, so we should have been
    hearing reports of pg_index searches failing post-VACUUM, and we
    weren't. I did not expend the time to trace it down in detail, though,
    once I'd gotten the general picture of what was happening.
    I think we're safe _consulting_ the system catalogs, since systable_beginscan
    notes that case and does not use obsolete indexes. Any other system catalog
    _updates_ are potentially risky, though. Perhaps index_insert() and friends
    should assert that the index is not pending rebuild?
    [ squint... ] Won't that result in the rebuild failing outright? We
    can't remove an index from the pending list till after it's rebuilt,
    for obvious reasons.

    regards, tom lane
  • Noah Misch at Apr 16, 2011 at 4:48 pm

    On Sat, Apr 16, 2011 at 11:17:53AM -0400, Tom Lane wrote:
    Noah Misch <noah@leadboat.com> writes:
    On Fri, Apr 15, 2011 at 03:01:04PM -0400, Tom Lane wrote:
    What is happening is that the preceding "reindex table pg_index" set the
    indcheckxmin flag for pg_index's indexes. The new table built by vacuum
    full contains no HOT chains at all, let alone broken ones, so at the end
    of reindex_index we decide we ought to clear the indcheckxmin flag.
    But we are not done swapping the old and new indexes, so the uniqueness
    check doesn't work right --- I believe it's looking into the old index
    and finding a TID that means something totally different in the new
    table.
    For that matter, I believe it's also attempting to insert into the old index.
    I wondered about that, but it seemed to me that if that happened, it
    ought to have much more visible symptoms --- ie, the committed row would
    not be findable through the new index. The typical case according to my
    testing was that the unique check wouldn't fail, so we should have been
    hearing reports of pg_index searches failing post-VACUUM, and we
    weren't. I did not expend the time to trace it down in detail, though,
    once I'd gotten the general picture of what was happening.
    I'm not 100% sure, either.
    I think we're safe _consulting_ the system catalogs, since systable_beginscan
    notes that case and does not use obsolete indexes. Any other system catalog
    _updates_ are potentially risky, though. Perhaps index_insert() and friends
    should assert that the index is not pending rebuild?
    [ squint... ] Won't that result in the rebuild failing outright? We
    can't remove an index from the pending list till after it's rebuilt,
    for obvious reasons.
    That would be a problem if an ambuild function were to call back through the
    indexam.c layer to add an individual entry. No core access method does that,
    and there's nothing I can see to recommend doing it.

    nm
  • Tom Lane at Apr 16, 2011 at 5:04 pm

    Noah Misch writes:
    On Sat, Apr 16, 2011 at 11:17:53AM -0400, Tom Lane wrote:
    [ squint... ] Won't that result in the rebuild failing outright? We
    can't remove an index from the pending list till after it's rebuilt,
    for obvious reasons.
    That would be a problem if an ambuild function were to call back through the
    indexam.c layer to add an individual entry. No core access method does that,
    and there's nothing I can see to recommend doing it.
    Oh ... hm, I was thinking that ambuild calls were also validated using
    that macro, but now I see they're not even done in this file. So it
    probably does work to do it like that. It needs rather more than no
    commentary, though.

    regards, tom lane

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedApr 15, '11 at 7:01p
activeApr 16, '11 at 5:04p
posts8
users3
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase