We've had a number of odd things that have been going on that I can't
really explain, and that don't seem to result in log entries. Here's
some info:

- this is running 8.2.4 on a solaris 10 machine
- I reran the dump after posting and these problems did not reoccur
- We have a number of replicated schemas and tables on this server.
There were other problems with the replication that happened earlier in
the evening.
- we have been having some very odd problems where our replication
scripts hang intermittantly. For the life of me I can't figure out why,
but when this happens, I look for processes that are idle in transaction
that are more than one day old and kill them. That seems to allow the
replication to finish. I have a few users that use a variety of
products to view and manipulate the data in these tables (tableau,
access, excel, ems, phppgadmin, dbvisualizer) and it seems like some
connections/transactions never terminate, but I can't figure out which
ones or why. I've been struggling with this problem for some time, but
have never had an issue with the stalled replication affecting the dump.
I was actually hoping that this error would help shed light on the
replication problem.


Tom Lane wrote:
Mija Lee <mija@scharp.org> writes:
I have a script that I use to do regular dumps of my database. Over the
weekend it failed, and produced the following error message. I'm not
sure why this would have happened, how I would find out which index is
referenced by 136451098, or where this select came from.
It sounds like system catalog corruption, which is not good :-(.
pg_dump.sqlhost: Error message from server: ERROR: cache lookup failed
for index 136451098
pg_dump.sqlhost: The command was: SELECT t.tableoid, t.oid, t.relname as
indexname, pg_catalog.pg_get_indexdef(i.indexrelid) as indexdef,
t.relnatts as indnkeys, i.indkey, i.indisclustered, c.contype,
c.conname, c.tableoid as contableoid, c.oid as conoid, (SELECT spcname
FROM pg_catalog.pg_tablespace s WHERE s.oid = t.reltablespace) as
tablespace, array_to_string(t.reloptions, ', ') as options FROM
pg_catalog.pg_index i JOIN pg_catalog.pg_class t ON (t.oid =
i.indexrelid) LEFT JOIN pg_catalog.pg_depend d ON (d.classid =
t.tableoid AND d.objid = t.oid AND d.deptype = 'i') LEFT JOIN
pg_catalog.pg_constraint c ON (d.refclassid = c.tableoid AND d.refobjid
= c.oid) WHERE i.indrelid = '136451090'::pg_catalog.oid ORDER BY indexname
That looks like pg_dump's query to get information about the indexes of
a particular table. So apparently the problem index is one of the ones
for the table with OID 136451090. The easiest way to find out which one
that is is
select '136451090'::regclass;
Trying \d on each of that table's indexes in succession would tell you
which one is trashed.

As for fixing it, the $64 question is how extensive is the catalog
corruption. I see no very good reason to hope that only this one index
is affected :-(. What you probably want to do is try to get a clean
pg_dump then initdb and reload --- at least that's how I'd approach it,
rather than hoping that there's no lurking problems remaining after you
hack your way around the one you can see.

What I'd try first is a REINDEX on pg_class. If that doesn't help,
try to delete the pg_index row linking 136451098 and 136451090.

What PG version is this, anyway, and did anything weird happen on your
system that might explain data corruption?

regards, tom lane

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 3 of 5 | next ›
Discussion Overview
grouppgsql-novice @
postedDec 10, '07 at 9:25p
activeDec 11, '07 at 5:38p

2 users in discussion

Mija Lee: 3 posts Tom Lane: 2 posts



site design / logo © 2021 Grokbase