Howdy

Running PostgreSQL 7.2.1 on RedHat Linux

This past weekend, the RAID array took a hit and crashed
the database. I get a few errors like:

[snip]
DEBUG: invalid secondary checkpoint record
FATAL 2: unable to locate a valid checkpoint record
DEBUG: proc_exit(2)
DEBUG: shmem_exit(2)
DEBUG: exit(2)
DEBUG: reaping dead processes
DEBUG: startup process (pid 23543) exited with exit code 2
DEBUG: aborting startup due to startup process failure
[/snip]

I'm reading around and I see a bit of information about
using pg_resetxlog, but I don't know much about it.

1) How do you use pg_resetxlog?
2) Am I at risk of losing all of my data
if I use this?
3) Some info at groups.google.com suggest that I will
have to initdb - I figured that would be good anyway, but
how does pg_resetxlog play into this?

TIA!

-X

Search Discussions

  • Tom Lane at Oct 20, 2003 at 4:37 pm

    "Johnson, Shaunn" <SJohnson6@bcbsm.com> writes:
    1) How do you use pg_resetxlog?
    Read the README file for it.
    2) Am I at risk of losing all of my data
    if I use this?
    You're at risk of having corrupted data, but I think that's true already
    because of the disk failure.
    3) Some info at groups.google.com suggest that I will
    have to initdb - I figured that would be good anyway, but
    how does pg_resetxlog play into this?
    After pg_resetxlog you should be able to dump your database. I'd
    recommend doing that, then initdb, reload the dump, look for
    inconsistencies ...

    regards, tom lane
  • Tom Lane at Oct 20, 2003 at 4:39 pm

    "Johnson, Shaunn" <SJohnson6@bcbsm.com> writes:
    Running PostgreSQL 7.2.1 on RedHat Linux
    PS: if you are still using 7.2.* then you should *definitely* be on
    7.2.4. Since your DB is down anyway, this seems like a good time for an
    upgrade ...

    regards, tom lane
  • Johnson, Shaunn at Oct 21, 2003 at 6:45 am
    --Thanks to all that replied

    --I did the pg_resetxlog -f $PGDATA and
    --it generated a new log, but when I try to
    --restart the database, it fails. I look in the
    --logs and I see:

    [snip]
    Oct 21 01:32:13 localsvr postgres[2598]: [1] FATAL 2:
    control file context is broken
    Oct 21 01:32:14 localsvr postgresql: Starting postgresql
    service: failed
    [/snip]

    --There doesn't seem to be much info in groups.google.com
    --about this message. What does this mean and how can
    --I fix the DB (so I can dump it into a new version)?

    --Thanks!

    [snip from past email]
    This past weekend, the RAID array took a hit and crashed
    the database. I get a few errors like:
    [snip]
    DEBUG: invalid secondary checkpoint record
    FATAL 2: unable to locate a valid checkpoint record
    DEBUG: proc_exit(2)
    DEBUG: shmem_exit(2)
    DEBUG: exit(2)
    DEBUG: reaping dead processes
    DEBUG: startup process (pid 23543) exited with exit code 2
    DEBUG: aborting startup due to startup process failure
    [/snip]

    [/snip from past email]

    -X
  • Tom Lane at Oct 21, 2003 at 2:18 pm

    "Johnson, Shaunn" <SJohnson6@bcbsm.com> writes:
    --I did the pg_resetxlog -f $PGDATA and
    --it generated a new log, but when I try to
    --restart the database, it fails. I look in the
    --logs and I see:
    [snip]
    Oct 21 01:32:13 localsvr postgres[2598]: [1] FATAL 2:
    control file context is broken
    Are you sure you used the right version of pg_resetxlog? This seems to
    indicate that pg_resetxlog produced a bogus pg_control file.
    pg_controldata might help to identify the problem.

    regards, tom lane
  • Johnson, Shaunn at Oct 21, 2003 at 8:49 am
    --silly question:

    --is it possible to build a new DB and point it to
    --the existing location / files of the old DB
    --and mount data that way?

    --i know ... i'm grasping for straws ... this is
    --just about a retarded / rhetorical question ...

    -X
  • Johnson, Shaunn at Oct 21, 2003 at 3:03 pm
    --you're right about the contrib version - it's wrong.
    --can i d/l an old version and recompile it?

    -X
    -----Original Message-----
    From: Tom Lane
    To: Johnson, Shaunn
    Cc: pgsql-general@postgresql.org
    Sent: 10/21/03 10:17 AM
    Subject: Re: [GENERAL] how to use pg_resetxlog

    "Johnson, Shaunn" <SJohnson6@bcbsm.com> writes:
    --I did the pg_resetxlog -f $PGDATA and
    --it generated a new log, but when I try to
    --restart the database, it fails. I look in the
    --logs and I see:
    [snip]
    Oct 21 01:32:13 localsvr postgres[2598]: [1] FATAL 2:
    control file context is broken
    Are you sure you used the right version of pg_resetxlog? This seems to
    indicate that pg_resetxlog produced a bogus pg_control file.
    pg_controldata might help to identify the problem.

    regards, tom lane
  • Tom Lane at Oct 21, 2003 at 3:16 pm

    "Johnson, Shaunn" <SJohnson6@bcbsm.com> writes:
    --you're right about the contrib version - it's wrong.
    --can i d/l an old version and recompile it?
    Should work. You'll need a complete source tarball for 7.2.*.

    regards, tom lane
  • Johnson, Shaunn at Oct 21, 2003 at 3:53 pm
    --howdy:

    --installed the 7.2.1 tarball and configured / gmake / install
    --the pg_resetxlog. when i restarted postmaster, it says 'ok',
    --but when i tried to get in via the command line, i got this:
    [snip]
    psql: FATAL 1: cannot open pg_attribute: No such file or directory
    [/snip]

    --did i miss a step?

    --thanks again for all of your help!

    -X

    -----Original Message-----
    From: Tom Lane
    To: Johnson, Shaunn
    Cc: 'pgsql-general@postgresql.org '
    Sent: 10/21/03 11:15 AM
    Subject: Re: [GENERAL] how to use pg_resetxlog

    "Johnson, Shaunn" <SJohnson6@bcbsm.com> writes:
    --you're right about the contrib version - it's wrong.
    --can i d/l an old version and recompile it?
    Should work. You'll need a complete source tarball for 7.2.*.

    regards, tom lane
  • Tom Lane at Oct 21, 2003 at 4:03 pm

    "Johnson, Shaunn" <SJohnson6@bcbsm.com> writes:
    --installed the 7.2.1 tarball and configured / gmake / install
    --the pg_resetxlog. when i restarted postmaster, it says 'ok',
    --but when i tried to get in via the command line, i got this:
    [snip]
    psql: FATAL 1: cannot open pg_attribute: No such file or directory
    [/snip]
    --did i miss a step?
    You probably used too small a value for the initial XID setting. You
    need to redo pg_resetxlog with a -x value larger than the number of
    transactions elapsed in your database. You can determine a suitable
    number to use by looking in $PGDATA/pg_clog to see the largest file
    number there (beware, they are in hex). Add one, then multiply by 1024*1024.

    regards, tom lane
  • Johnson, Shaunn at Oct 21, 2003 at 4:43 pm
    --i'm still not sure what's going on.

    --i did the following
    [snip]
    bash-2.05$ export PGDATA=/var/lib/pgsql/data/
    bash-2.05$ echo $PGDATA
    /var/lib/pgsql/data/
    bash-2.05$ ./pg_resetxlog -x 11534336 $PGDATA
    XLOG reset.

    ---

    bash-2.05$ ./pg_resetxlog -n /var/lib/pgsql/data/
    pg_control values:

    pg_control version number: 71
    Catalog version number: 200201121
    Current log file id: 1014
    Next log file segment: 57
    Latest checkpoint's StartUpID: 25
    Latest checkpoint's NextXID: 11534337
    Latest checkpoint's NextOID: 2528235005
    Database block size: 8192
    Blocks per segment of large relation: 131072
    LC_COLLATE: C
    LC_CTYPE: C

    [/snip]

    --and even though the db says it's okay, i still
    --get this :

    [snip]
    psql: FATAL 1: cannot open pg_attribute: No such file or directory
    [/snip]

    --i realize i must be getting rather boring right about
    --now, but i really appreciate the info / guidance.
    --thanks!

    -X

    -----Original Message-----
    From: Tom Lane
    To: Johnson, Shaunn
    Cc: ''pgsql-general@postgresql.org ' '
    Sent: 10/21/03 12:03 PM
    Subject: Re: [GENERAL] how to use pg_resetxlog

    "Johnson, Shaunn" <SJohnson6@bcbsm.com> writes:
    --installed the 7.2.1 tarball and configured / gmake / install
    --the pg_resetxlog. when i restarted postmaster, it says 'ok',
    --but when i tried to get in via the command line, i got this:
    [snip]
    psql: FATAL 1: cannot open pg_attribute: No such file or directory
    [/snip]
    --did i miss a step?
    You probably used too small a value for the initial XID setting. You
    need to redo pg_resetxlog with a -x value larger than the number of
    transactions elapsed in your database. You can determine a suitable
    number to use by looking in $PGDATA/pg_clog to see the largest file
    number there (beware, they are in hex). Add one, then multiply by
    1024*1024.

    regards, tom lane
  • Tom Lane at Oct 21, 2003 at 5:40 pm

    "Johnson, Shaunn" <SJohnson6@bcbsm.com> writes:
    --i realize i must be getting rather boring right about
    --now, but i really appreciate the info / guidance.
    Somewhere along here you may need to give up and revert to whatever your
    most recent backup is :-(.

    Taking a second look at the error message:
    psql: FATAL 1: cannot open pg_attribute: No such file or directory
    I realized that this probably isn't an XID issue as I first thought.
    Either the pg_attribute file is actually missing (is there a file named
    '1249' in the database directory?) or the relfilenode value for it is
    clobbered in pg_class. In the latter case you could maybe find and fix
    the erroneous data with a hex editor, but one would have to wonder what
    else has been clobbered during the disk crash.

    Thinking about it, it could be that the clobber is only in the database's
    pg_internal.init cache file --- you could try renaming that out of the
    way to see if the system can successfully make a new one.

    Also, this failure should only affect one database in the cluster ---
    can you connect to any other ones?

    regards, tom lane
  • Dennis Gearon at Oct 21, 2003 at 6:24 pm

    Tom Lane wrote:
    I realized that this probably isn't an XID issue as I first thought.
    Either the pg_attribute file is actually missing (is there a file named
    '1249' in the database directory?) or the relfilenode value for it is
    clobbered in pg_class. In the latter case you could maybe find and fix
    the erroneous data with a hex editor, but one would have to wonder what
    else has been clobbered during the disk crash.

    Thinking about it, it could be that the clobber is only in the database's
    pg_internal.init cache file --- you could try renaming that out of the
    way to see if the system can successfully make a new one.

    Also, this failure should only affect one database in the cluster ---
    can you connect to any other ones?

    regards, tom lane
    Is there in any graphical picture of the system files, utilities, and
    how the interact with each other?

    Or in how to set up Postgres?

    I would be willing to draw it up if someone wanted to send me pencil
    drawings and go throught the editing iterations with me.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-general @
categoriespostgresql
postedOct 20, '03 at 1:19p
activeOct 21, '03 at 6:24p
posts13
users3
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase