FAQ
Less than a week ago we installed a database and new software in a production environment.

Today someone was attempting to install the same software in a test environment, and realized too late that a script he ran was run against the production database. It dropped three tables.

We need to recover those tables. This is what we have to work with.

1) We have a file system backup from 3AM.

2) We have not yet instituted a daily postgres backup with pg_dump.

3) We have all the WAL files since going into production (unarchived, only 6 files needed to cover the period).

There are two ways I can think of to try to recover the data.

1) I presume that we can restore the entire directory from the 3am backup (to a different physical location of course) and then export the data in the three tables to csv files and reimport it. With that approach, is there anything that should be done to test the integrity of the data?

2) I should think that I could also restore the data from the WAL files, but when I create a recovery.conf file and use pg_resetxlog.exe, I can get it to do its thing without complaint (renames recovery.conf to recovery.done) or I get the following error in pg_log:

LOG: database system was shut down at 2010-07-06 13:27:42 EDT
LOG: starting archive recovery
LOG: restore_command = 'donothing.bat'
LOG: invalid magic number 0000 in log file 0, segment 31, offset 0
LOG: invalid primary checkpoint record.
LOG: invalid magic number 0000 in log file 0, segment 31, offset 0
LOG: invalid secondary checkpoint record
PANIC: could not locate a valid checkpoint record

In neither case does it restore any data.

Note: for the recovery procedure, I have a BAT file that literally does nothing. The reason is that I've copied the proper WAL files into the pg_xlog directory, so there is no archiving per se.

I have looked at the options pg_resetxlog.exe has and can't figure out if there's some parameter I can set to make it restore the data. Is it not possible, without doing a checkpoint? Can I construct a check point manually?

John

Search Discussions

  • Jesper Krogh at Jul 7, 2010 at 4:54 am

    On 2010-07-07 03:24, John T. Dow wrote:
    In neither case does it restore any data.

    Note: for the recovery procedure, I have a BAT file that literally does nothing. The reason is that I've copied the proper WAL files into the pg_xlog directory, so there is no archiving per se.
    Ok, try to fix that.. as per:
    24.3.3 here:
    http://www.postgresql.org/docs/8.4/interactive/continuous-archiving.html

    Assuming that the filesystem backup was made using
    pg_start_backup()/pg_stop_backup() it will just work.

    Jesper
  • John T. Dow at Jul 7, 2010 at 7:12 pm
    I changed the recovery.conf file so it has one statement:

    restore_command = 'copy D:\devj\WCSD\received\pg_xlog\%f %p'



    This is what I now get in pg_log:

    LOG: database system was shut down at 2010-07-07 14:48:34 EDT
    LOG: starting archive recovery
    LOG: restore_command = 'copy D:\devj\WCSD\received\pg_xlog\%f %p'
    LOG: could not open file "pg_xlog/0000000700000000000000BE" (log file 0, segment 190): No such file or directory
    LOG: invalid primary checkpoint record
    LOG: could not open file "pg_xlog/0000000700000000000000BE" (log file 0, segment 190): No such file or directory
    LOG: invalid secondary checkpoint record
    PANIC: could not locate a valid checkpoint record



    And as before, nothing is restored.

    These are the "archived" files:

    0000000100000000000000B6
    0000000100000000000000B7
    0000000100000000000000B8
    0000000100000000000000B9
    0000000100000000000000BA
    0000000100000000000000BB
    0000000100000000000000BC

    It's looking for timeline 7 but the timeline of the archived files is 1. (Big deal, I can rename them.)

    It's also looking for file BE, but the files are B6 through BC.


    I also tried with this: restore_command = 'copy D:\devj\WCSD\received\%f %p'


    John







    On Wed, 07 Jul 2010 06:33:01 +0200, Jesper Krogh wrote:

    On 2010-07-07 03:24, John T. Dow wrote:
    In neither case does it restore any data.
    Note: for the recovery procedure, I have a BAT file that literally does nothing. The reason is that I've copied the proper WAL files into the pg_xlog directory, so there is no archiving per se
    Ok, try to fix that.. as per:
    24.3.3 here:
    http://www.postgresql.org/docs/8.4/interactive/continuous-archiving.html

    Assuming that the filesystem backup was made using
    pg_start_backup()/pg_stop_backup() it will just work.

    --
    Jesper
  • John T. Dow at Jul 8, 2010 at 8:29 pm

    On Wed, 07 Jul 2010 22:18:13 +0200, Jesper Krogh wrote:
    On 2010-07-07 20:54, John T. Dow wrote:
    I changed the recovery.conf file so it has one statement:

    restore_command = 'copy D:\devj\WCSD\received\pg_xlog\%f %p'



    This is what I now get in pg_log:

    LOG: database system was shut down at 2010-07-07 14:48:34 EDT
    LOG: starting archive recovery
    LOG: restore_command = 'copy D:\devj\WCSD\received\pg_xlog\%f %p'
    LOG: could not open file "pg_xlog/0000000700000000000000BE" (log file 0, segment 190): No such file or directory
    LOG: invalid primary checkpoint record
    LOG: could not open file "pg_xlog/0000000700000000000000BE" (log file 0, segment 190): No such file or directory
    LOG: invalid secondary checkpoint record
    PANIC: could not locate a valid checkpoint record



    And as before, nothing is restored.

    These are the "archived" files:

    0000000100000000000000B6
    0000000100000000000000B7
    0000000100000000000000B8
    0000000100000000000000B9
    0000000100000000000000BA
    0000000100000000000000BB
    0000000100000000000000BC

    It's looking for timeline 7 but the timeline of the archived files is 1. (Big deal, I can rename them.)

    It's also looking for file BE, but the files are B6 through BC.


    I also tried with this: restore_command = 'copy D:\devj\WCSD\received\%f %p'
    then you either have a wrong base backup (did you run
    pg_start_backup/pg_stop_backup)? or
    the wrong set of WAL-files for your database.

    --
    Jesper



    We did not do pg_start_backup/pg_stop_backup.

    I was naively hoping that the WAL files would work on a "similar" database (ie same tables, but created independently).

    I understand that WAL files can only be applied to the database which they are based on, or an exact copy of the database. Therefore we need to copy the existing database files (file copy) and then try this technique on the copy. That will take some doing, as people are not in the same city, don't work the same hours, etc.

    What if we never did pg_start_backup? Will it work anyway?

    Also, it is unclear from the documentation whether you must have archived wal files to replay the wal files. There seems to be no way to replay them other than with recovery.conf, and recovery.conf seems to require a restore_command that actually does something. Is it necessary to clear any wal files from the pg_xlog directory, and is it necessary that all wal files to be replayed must be copied into pg_xlog using the restore_command? This things aren't really stated very clearly.

    John
  • Tom Lane at Jul 8, 2010 at 8:58 pm

    "John T. Dow" <john@johntdow.com> writes:
    We did not do pg_start_backup/pg_stop_backup. Ugh.
    What if we never did pg_start_backup? Will it work anyway?
    You could maybe make it work, if you had full_page_writes turned on
    and have a continuous series of WAL files extending back to before
    the manual filesystem backup was started. What pg_start_backup mainly
    does for you is to automate things and make sure there is a well-defined
    spot at which a successful replay can be started. It's *not* going to
    "just work" without pg_start_backup, though. You'd need to manually
    fake up a suitable backup label file, and maybe some other hacking.
    Otherwise what's likely to happen is that the recovery goes through
    but leaves you with a corrupted database anyway.

    If the data is worth this much trouble to you, I'd suggest hiring a
    Postgres consultant who's experienced in data recovery.
    Also, it is unclear from the documentation whether you must have
    archived wal files to replay the wal files. There seems to be no way
    to replay them other than with recovery.conf, and recovery.conf seems
    to require a restore_command that actually does something.
    You can just have it copy from pg_xlog, if all the files you need are
    in pg_xlog. That's a pretty uncommon situation though, so there's not
    any special easy case for it.

    regards, tom lane
  • John T. Dow at Jul 8, 2010 at 9:21 pm
    Tom

    You've helped clarify things. Thanks.

    Our situation is kind of a special case and I was trying to learn from it what can be done with the WAL files.

    We only started the production system a few days ago, so we have all the WAL files. I don't care of we'd end up with a corrupted database because I wouldn't do a recovery to the original database but to a copy. The intention is to copy out the data for three small tables.

    As I understand it, one should do a pg_start_backup, then do a file system backup, then do pg_stop_backup.

    Several questions:

    First, if the file system back is run automatically at 3AM, what's the best way to do the start/stop backup?

    Second, what about doing a pg_dump instead of a file system backup?

    Third, the manual in 22.3.2 says "It is also possible to make a backup dump while the postmaster is stopped. In this case, you obviously cannot use pg_start_backup or pg_stop_backup, and you will therefore be left to your own devices to keep track of which backup dump is which and how far back the associated WAL files go. It is generally better to follow the on-line backup procedure above." That implies that I can do what I'm trying to do, but it doesn't say how. You indicate that it's messy. I hoped that meant that one can manually replay wal files, one by one, but I guess not.

    John



    On Thu, 08 Jul 2010 16:58:18 -0400, Tom Lane wrote:

    "John T. Dow" <john@johntdow.com> writes:
    We did not do pg_start_backup/pg_stop_backup. Ugh.
    What if we never did pg_start_backup? Will it work anyway?
    You could maybe make it work, if you had full_page_writes turned on
    and have a continuous series of WAL files extending back to before
    the manual filesystem backup was started. What pg_start_backup mainly
    does for you is to automate things and make sure there is a well-defined
    spot at which a successful replay can be started. It's *not* going to
    "just work" without pg_start_backup, though. You'd need to manually
    fake up a suitable backup label file, and maybe some other hacking.
    Otherwise what's likely to happen is that the recovery goes through
    but leaves you with a corrupted database anyway.

    If the data is worth this much trouble to you, I'd suggest hiring a
    Postgres consultant who's experienced in data recovery.
    Also, it is unclear from the documentation whether you must have
    archived wal files to replay the wal files. There seems to be no way
    to replay them other than with recovery.conf, and recovery.conf seems
    to require a restore_command that actually does something.
    You can just have it copy from pg_xlog, if all the files you need are
    in pg_xlog. That's a pretty uncommon situation though, so there's not
    any special easy case for it.

    regards, tom lane

    --
    Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-novice

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-novice @
categoriespostgresql
postedJul 7, '10 at 1:25a
activeJul 8, '10 at 9:21p
posts6
users3
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase