Hi:

I have a script that I use to do regular dumps of my database. Over the
weekend it failed, and produced the following error message. I'm not
sure why this would have happened, how I would find out which index is
referenced by 136451098, or where this select came from.

pg_dump.sqlhost: Error message from server: ERROR: cache lookup failed
for index 136451098
pg_dump.sqlhost: The command was: SELECT t.tableoid, t.oid, t.relname as
indexname, pg_catalog.pg_get_indexdef(i.indexrelid) as indexdef,
t.relnatts as indnkeys, i.indkey, i.indisclustered, c.contype,
c.conname, c.tableoid as contableoid, c.oid as conoid, (SELECT spcname
FROM pg_catalog.pg_tablespace s WHERE s.oid = t.reltablespace) as
tablespace, array_to_string(t.reloptions, ', ') as options FROM
pg_catalog.pg_index i JOIN pg_catalog.pg_class t ON (t.oid =
i.indexrelid) LEFT JOIN pg_catalog.pg_depend d ON (d.classid =
t.tableoid AND d.objid = t.oid AND d.deptype = 'i') LEFT JOIN
pg_catalog.pg_constraint c ON (d.refclassid = c.tableoid AND d.refobjid
= c.oid) WHERE i.indrelid = '136451090'::pg_catalog.oid ORDER BY indexname


Any help would be greatly appreciated.

Mija

Search Discussions

  • Tom Lane at Dec 10, 2007 at 10:49 pm

    Mija Lee writes:
    I have a script that I use to do regular dumps of my database. Over the
    weekend it failed, and produced the following error message. I'm not
    sure why this would have happened, how I would find out which index is
    referenced by 136451098, or where this select came from.
    It sounds like system catalog corruption, which is not good :-(.
    pg_dump.sqlhost: Error message from server: ERROR: cache lookup failed
    for index 136451098
    pg_dump.sqlhost: The command was: SELECT t.tableoid, t.oid, t.relname as
    indexname, pg_catalog.pg_get_indexdef(i.indexrelid) as indexdef,
    t.relnatts as indnkeys, i.indkey, i.indisclustered, c.contype,
    c.conname, c.tableoid as contableoid, c.oid as conoid, (SELECT spcname
    FROM pg_catalog.pg_tablespace s WHERE s.oid = t.reltablespace) as
    tablespace, array_to_string(t.reloptions, ', ') as options FROM
    pg_catalog.pg_index i JOIN pg_catalog.pg_class t ON (t.oid =
    i.indexrelid) LEFT JOIN pg_catalog.pg_depend d ON (d.classid =
    t.tableoid AND d.objid = t.oid AND d.deptype = 'i') LEFT JOIN
    pg_catalog.pg_constraint c ON (d.refclassid = c.tableoid AND d.refobjid
    = c.oid) WHERE i.indrelid = '136451090'::pg_catalog.oid ORDER BY indexname
    That looks like pg_dump's query to get information about the indexes of
    a particular table. So apparently the problem index is one of the ones
    for the table with OID 136451090. The easiest way to find out which one
    that is is
    select '136451090'::regclass;
    Trying \d on each of that table's indexes in succession would tell you
    which one is trashed.

    As for fixing it, the $64 question is how extensive is the catalog
    corruption. I see no very good reason to hope that only this one index
    is affected :-(. What you probably want to do is try to get a clean
    pg_dump then initdb and reload --- at least that's how I'd approach it,
    rather than hoping that there's no lurking problems remaining after you
    hack your way around the one you can see.

    What I'd try first is a REINDEX on pg_class. If that doesn't help,
    try to delete the pg_index row linking 136451098 and 136451090.

    What PG version is this, anyway, and did anything weird happen on your
    system that might explain data corruption?

    regards, tom lane
  • Mija Lee at Dec 11, 2007 at 4:33 pm
    We've had a number of odd things that have been going on that I can't
    really explain, and that don't seem to result in log entries. Here's
    some info:

    - this is running 8.2.4 on a solaris 10 machine
    - I reran the dump after posting and these problems did not reoccur
    - We have a number of replicated schemas and tables on this server.
    There were other problems with the replication that happened earlier in
    the evening.
    - we have been having some very odd problems where our replication
    scripts hang intermittantly. For the life of me I can't figure out why,
    but when this happens, I look for processes that are idle in transaction
    that are more than one day old and kill them. That seems to allow the
    replication to finish. I have a few users that use a variety of
    products to view and manipulate the data in these tables (tableau,
    access, excel, ems, phppgadmin, dbvisualizer) and it seems like some
    connections/transactions never terminate, but I can't figure out which
    ones or why. I've been struggling with this problem for some time, but
    have never had an issue with the stalled replication affecting the dump.
    I was actually hoping that this error would help shed light on the
    replication problem.

    Mija

    Tom Lane wrote:
    Mija Lee <mija@scharp.org> writes:
    I have a script that I use to do regular dumps of my database. Over the
    weekend it failed, and produced the following error message. I'm not
    sure why this would have happened, how I would find out which index is
    referenced by 136451098, or where this select came from.
    It sounds like system catalog corruption, which is not good :-(.
    pg_dump.sqlhost: Error message from server: ERROR: cache lookup failed
    for index 136451098
    pg_dump.sqlhost: The command was: SELECT t.tableoid, t.oid, t.relname as
    indexname, pg_catalog.pg_get_indexdef(i.indexrelid) as indexdef,
    t.relnatts as indnkeys, i.indkey, i.indisclustered, c.contype,
    c.conname, c.tableoid as contableoid, c.oid as conoid, (SELECT spcname
    FROM pg_catalog.pg_tablespace s WHERE s.oid = t.reltablespace) as
    tablespace, array_to_string(t.reloptions, ', ') as options FROM
    pg_catalog.pg_index i JOIN pg_catalog.pg_class t ON (t.oid =
    i.indexrelid) LEFT JOIN pg_catalog.pg_depend d ON (d.classid =
    t.tableoid AND d.objid = t.oid AND d.deptype = 'i') LEFT JOIN
    pg_catalog.pg_constraint c ON (d.refclassid = c.tableoid AND d.refobjid
    = c.oid) WHERE i.indrelid = '136451090'::pg_catalog.oid ORDER BY indexname
    That looks like pg_dump's query to get information about the indexes of
    a particular table. So apparently the problem index is one of the ones
    for the table with OID 136451090. The easiest way to find out which one
    that is is
    select '136451090'::regclass;
    Trying \d on each of that table's indexes in succession would tell you
    which one is trashed.

    As for fixing it, the $64 question is how extensive is the catalog
    corruption. I see no very good reason to hope that only this one index
    is affected :-(. What you probably want to do is try to get a clean
    pg_dump then initdb and reload --- at least that's how I'd approach it,
    rather than hoping that there's no lurking problems remaining after you
    hack your way around the one you can see.

    What I'd try first is a REINDEX on pg_class. If that doesn't help,
    try to delete the pg_index row linking 136451098 and 136451090.

    What PG version is this, anyway, and did anything weird happen on your
    system that might explain data corruption?

    regards, tom lane
  • Tom Lane at Dec 11, 2007 at 5:16 pm

    Mija Lee writes:
    We've had a number of odd things that have been going on that I can't
    really explain, and that don't seem to result in log entries. Here's
    some info:
    - this is running 8.2.4 on a solaris 10 machine
    - I reran the dump after posting and these problems did not reoccur
    - We have a number of replicated schemas and tables on this server.
    There were other problems with the replication that happened earlier in
    the evening.
    Hmm, are you using Slony? If so, you ought to take this to the Slony
    mailing lists. There are various restrictions in Slony on what it
    assumes can be done to a replicated table, and I believe that things
    like what you are seeing are one of the possible consequences of
    breaking Slony's expectations. That's about as far as my knowledge
    goes though.

    regards, tom lane
  • Mija Lee at Dec 11, 2007 at 5:38 pm
    We are not using slony - we are replicating from sql server using perl.

    Tom Lane wrote:
    Mija Lee <mija@scharp.org> writes:
    We've had a number of odd things that have been going on that I can't
    really explain, and that don't seem to result in log entries. Here's
    some info:
    - this is running 8.2.4 on a solaris 10 machine
    - I reran the dump after posting and these problems did not reoccur
    - We have a number of replicated schemas and tables on this server.
    There were other problems with the replication that happened earlier in
    the evening.
    Hmm, are you using Slony? If so, you ought to take this to the Slony
    mailing lists. There are various restrictions in Slony on what it
    assumes can be done to a replicated table, and I believe that things
    like what you are seeing are one of the possible consequences of
    breaking Slony's expectations. That's about as far as my knowledge
    goes though.

    regards, tom lane

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-novice @
categoriespostgresql
postedDec 10, '07 at 9:25p
activeDec 11, '07 at 5:38p
posts5
users2
websitepostgresql.org
irc#postgresql

2 users in discussion

Mija Lee: 3 posts Tom Lane: 2 posts

People

Translate

site design / logo © 2021 Grokbase