In CVS tip, try running the regression tests against an installed
postmaster (ie, make installcheck); then as soon as the tests are
done, kill -9 the bgwriter process to force a database restart.
Most of the time you'll get a PANIC during recovery:

LOG: background writer process (PID 2493) was terminated by signal 9
LOG: server process (PID 2493) was terminated by signal 9
LOG: terminating any other active server processes
LOG: all server processes terminated; reinitializing
LOG: database system was interrupted at 2004-08-04 14:26:23 EDT
LOG: checkpoint record is at 0/4C1CA28
LOG: redo record is at 0/4BFD510; undo record is at 0/0; shutdown FALSE
LOG: next transaction ID: 11269; next OID: 294376
LOG: database system was not properly shut down; automatic recovery in progress
LOG: redo starts at 0/4BFD510
PANIC: could not create directory "/home/postgres/testversion/data/pg_tblspc/301180/163304": No such file or directory
LOG: startup process (PID 4560) was terminated by signal 6
LOG: aborting startup due to startup process failure

The panic is here:

(gdb) bt
#0 0xc0141220 in ?? () from /usr/lib/libc.1
#1 0xc00aa7ec in ?? () from /usr/lib/libc.1
#2 0xc008c2b8 in ?? () from /usr/lib/libc.1
#3 0xc0086d9c in ?? () from /usr/lib/libc.1
#4 0x2c6080 in errfinish (dummy=1) at elog.c:454
#5 0x185984 in TablespaceCreateDbspace (spcNode=1074100592, dbNode=0,
isRedo=1 '\001') at tablespace.c:140
#6 0x23c90c in smgrcreate (reln=0x400a1d80, isTemp=0 '\000', isRedo=1 '\001')
at smgr.c:327
#7 0x23d6cc in smgr_redo (lsn={xlogid = 0, xrecoff = 86455912},
record=0x40067be8) at smgr.c:876
#8 0x115714 in StartupXLOG () at xlog.c:4229
#9 0x11dc5c in BootstrapMain (argc=4, argv=0x7b03b630) at bootstrap.c:426
#10 0x20b7dc in StartChildProcess (xlop=2) at postmaster.c:3233

and of course the problem is that log replay is not prepared to cope
with a reference to a table that's in a tablespace that no longer
exists. The regression tests trigger the problem because they do a
DROP TABLESPACE near the end.

This is impossible to fix nicely because the information to reconstruct
the tablespace is simply not available. We could make an ordinary
directory (not a symlink) under pg_tblspc and then limp along in the
expectation that it would get removed before we finish replay. Or we
could just skip logged operations on files within the tablespace, but
that feels pretty uncomfortable to me --- it amounts to deliberately
discarding data ...

Any thoughts?

regards, tom lane

Search Discussions

  • Kevin Brown at Aug 5, 2004 at 12:55 am

    Tom Lane wrote:
    In CVS tip, try running the regression tests against an installed
    postmaster (ie, make installcheck); then as soon as the tests are
    done, kill -9 the bgwriter process to force a database restart.
    Most of the time you'll get a PANIC during recovery: [...]
    This is impossible to fix nicely because the information to reconstruct
    the tablespace is simply not available. We could make an ordinary
    directory (not a symlink) under pg_tblspc and then limp along in the
    expectation that it would get removed before we finish replay. Or we
    could just skip logged operations on files within the tablespace, but
    that feels pretty uncomfortable to me --- it amounts to deliberately
    discarding data ...

    Any thoughts?
    How is a dropped table handled by the recovery code? Doesn't it present
    the same sort of issues (though on a smaller scale)?



    --
    Kevin Brown [email protected]
  • Tom Lane at Aug 5, 2004 at 2:48 am

    Kevin Brown writes:
    Tom Lane wrote:
    This is impossible to fix nicely because the information to reconstruct
    the tablespace is simply not available. We could make an ordinary
    directory (not a symlink) under pg_tblspc and then limp along in the
    expectation that it would get removed before we finish replay. Or we
    could just skip logged operations on files within the tablespace, but
    that feels pretty uncomfortable to me --- it amounts to deliberately
    discarding data ...
    How is a dropped table handled by the recovery code? Doesn't it present
    the same sort of issues (though on a smaller scale)?
    Not really. If the replay code encounters an update to a table file
    that's not there, it simply creates the file and plows ahead. The thing
    that I'm stuck on about tablespaces is that if the symlink in
    $PGDATA/pg_tblspc isn't there, there's no evident way to recreate it
    correctly --- we have no idea where it was supposed to point.

    regards, tom lane
  • Gavin Sherry at Aug 5, 2004 at 3:04 am

    On Wed, 4 Aug 2004, Tom Lane wrote:

    Kevin Brown <[email protected]> writes:
    Tom Lane wrote:
    This is impossible to fix nicely because the information to reconstruct
    the tablespace is simply not available. We could make an ordinary
    directory (not a symlink) under pg_tblspc and then limp along in the
    expectation that it would get removed before we finish replay. Or we
    could just skip logged operations on files within the tablespace, but
    that feels pretty uncomfortable to me --- it amounts to deliberately
    discarding data ...
    How is a dropped table handled by the recovery code? Doesn't it present
    the same sort of issues (though on a smaller scale)?
    Not really. If the replay code encounters an update to a table file
    that's not there, it simply creates the file and plows ahead. The thing
    that I'm stuck on about tablespaces is that if the symlink in
    $PGDATA/pg_tblspc isn't there, there's no evident way to recreate it
    correctly --- we have no idea where it was supposed to point.
    I don't think we have any choice but to log the symlink creation. Will
    this solve the problem?

    Gavin
  • Tom Lane at Aug 5, 2004 at 3:09 am

    Gavin Sherry writes:
    On Wed, 4 Aug 2004, Tom Lane wrote:
    Not really. If the replay code encounters an update to a table file
    that's not there, it simply creates the file and plows ahead. The thing
    that I'm stuck on about tablespaces is that if the symlink in
    $PGDATA/pg_tblspc isn't there, there's no evident way to recreate it
    correctly --- we have no idea where it was supposed to point.
    I don't think we have any choice but to log the symlink creation. Will
    this solve the problem?
    We do need to do that, but it will *not* solve this problem. The
    scenario that causes the problem is

    CREATE TABLESPACE
    ...
    much time passes
    ...
    CHECKPOINT
    ...
    modify tables in tablespace
    drop tables in tablespace
    DROP TABLESPACE
    ...
    system crash

    Now the system needs to replay from the last checkpoint. It's going to
    hit updates to tables that aren't there anymore in a tablespace that's
    not there anymore. There will not be anything in the replayed part of
    the log that will give a clue where that tablespace was physically.

    regards, tom lane
  • Gavin Sherry at Aug 5, 2004 at 3:23 am

    On Wed, 4 Aug 2004, Tom Lane wrote:

    Gavin Sherry <[email protected]> writes:
    On Wed, 4 Aug 2004, Tom Lane wrote:
    Not really. If the replay code encounters an update to a table file
    that's not there, it simply creates the file and plows ahead. The thing
    that I'm stuck on about tablespaces is that if the symlink in
    $PGDATA/pg_tblspc isn't there, there's no evident way to recreate it
    correctly --- we have no idea where it was supposed to point.
    I don't think we have any choice but to log the symlink creation. Will
    this solve the problem?
    We do need to do that, but it will *not* solve this problem. The
    scenario that causes the problem is

    CREATE TABLESPACE
    ...
    much time passes
    ...
    CHECKPOINT
    ...
    modify tables in tablespace
    drop tables in tablespace
    DROP TABLESPACE
    ...
    system crash

    Now the system needs to replay from the last checkpoint. It's going to
    hit updates to tables that aren't there anymore in a tablespace that's
    not there anymore. There will not be anything in the replayed part of
    the log that will give a clue where that tablespace was physically.
    Ahh, yes of course.

    Seems like the best way would be to create the path under pg_tblspc as
    directories and plough ahead, like you said. The only alternatively that
    comes to mind is that we could keep all the directory structure and
    symlinks around until the next checkpoint. But that would be messy and may
    well not solve the problem anyway for things like PITR.

    Gavin
  • Greg Stark at Aug 5, 2004 at 3:57 am

    Gavin Sherry writes:

    CREATE TABLESPACE
    ...
    much time passes
    ...
    CHECKPOINT
    ...
    modify tables in tablespace
    drop tables in tablespace
    DROP TABLESPACE
    ...
    system crash
    What happens here if no table spaces are involved?

    It just creates bogus tables with partial data counting on the restore to see
    the drop table command later and delete the corrupt tables?

    Does that pose any danger with PITR? The scenario above seems ok since if the
    PITR starting point is after the drop table/tablespace then presumably the
    recovery target has to be after that as well? Is there any other scenario
    where the partial data files could escape the recovery process?

    --
    greg
  • Christopher Kings-Lynne at Aug 5, 2004 at 3:38 am

    We do need to do that, but it will *not* solve this problem. The
    scenario that causes the problem is

    CREATE TABLESPACE
    ...
    much time passes
    ...
    CHECKPOINT
    ...
    modify tables in tablespace
    drop tables in tablespace
    DROP TABLESPACE
    ...
    system crash

    Now the system needs to replay from the last checkpoint. It's going to
    hit updates to tables that aren't there anymore in a tablespace that's
    not there anymore. There will not be anything in the replayed part of
    the log that will give a clue where that tablespace was physically.
    Maybe we need to create a new system tablespace: pg_recovery

    Then when this situation occurs, if the tablespace cannot be located, we
    recrated the objects in the system 'pg_recovery' tablespace or something.

    I dunno :)

    Chris
  • Andrew Dunstan at Aug 5, 2004 at 3:47 am
    Tom Lane said:
    The
    scenario that causes the problem is

    CREATE TABLESPACE
    ...
    much time passes
    ...
    CHECKPOINT
    ...
    modify tables in tablespace
    drop tables in tablespace
    DROP TABLESPACE
    ...
    system crash

    Now the system needs to replay from the last checkpoint. It's going to
    hit updates to tables that aren't there anymore in a tablespace that's
    not there anymore. There will not be anything in the replayed part of
    the log that will give a clue where that tablespace was physically.
    Could we create the tables in the default tablespace? Or create a dummy
    tablespace (since it's not there we expect it to be removed anyway, don't
    we?) I guess the big danger would be running out of disk space, but maybe
    that is a lower risk than this one.

    cheers

    andrew
  • Bruce Momjian at Aug 5, 2004 at 4:03 am

    Andrew Dunstan wrote:
    Tom Lane said:
    The
    scenario that causes the problem is

    CREATE TABLESPACE
    ...
    much time passes
    ...
    CHECKPOINT
    ...
    modify tables in tablespace
    drop tables in tablespace
    DROP TABLESPACE
    ...
    system crash

    Now the system needs to replay from the last checkpoint. It's going to
    hit updates to tables that aren't there anymore in a tablespace that's
    not there anymore. There will not be anything in the replayed part of
    the log that will give a clue where that tablespace was physically.
    Could we create the tables in the default tablespace? Or create a dummy
    tablespace (since it's not there we expect it to be removed anyway, don't
    we?) I guess the big danger would be running out of disk space, but maybe
    that is a lower risk than this one.
    Uh, why is the symlink not going to be there already?

    --
    Bruce Momjian | http://candle.pha.pa.us
    [email protected] | (610) 359-1001
    + If your life is a hard drive, | 13 Roberts Road
    + Christ can be your backup. | Newtown Square, Pennsylvania 19073
  • Tom Lane at Aug 5, 2004 at 4:11 am

    Bruce Momjian writes:
    Uh, why is the symlink not going to be there already?
    Because we removed it at the DROP TABLESPACE.

    regards, tom lane
  • Christopher Kings-Lynne at Aug 5, 2004 at 4:38 am

    Uh, why is the symlink not going to be there already?

    Because we removed it at the DROP TABLESPACE.
    Maybe we could avoid removing it until the next checkpoint? Or is that
    not enough. Maybe it could stay there forever :/

    Chris
  • Tom Lane at Aug 5, 2004 at 4:59 am

    Christopher Kings-Lynne writes:
    Maybe we could avoid removing it until the next checkpoint? Or is that
    not enough. Maybe it could stay there forever :/
    Part of the problem here is that this code has to serve several
    purposes. We have different scenarios to worry about:

    * crash recovery from the most recent checkpoint

    * PITR replay over a long interval (many checkpoints)

    * recovery in the face of a partially corrupt filesystem

    It's the last one that is mostly bothering me at the moment. I don't
    want us to throw away data simply because the filesystem forgot an
    inode. Yeah, we might not have enough data in the WAL log to completely
    reconstruct a table, but we should push out what we do have, *not* toss
    it into the bit bucket.

    In the first case (straight crash recovery) I think it is true that any
    reference to a missing file is a reference to a file that will get
    deleted before recovery finishes. But I don't think that holds for PITR
    (we might be asked to stop short of where the table gets deleted) nor
    for the case where there's been filesystem damage.

    regards, tom lane
  • Kevin Brown at Aug 8, 2004 at 7:09 pm

    Tom Lane wrote:
    Christopher Kings-Lynne <[email protected]> writes:
    Maybe we could avoid removing it until the next checkpoint? Or is that
    not enough. Maybe it could stay there forever :/
    Part of the problem here is that this code has to serve several
    purposes. We have different scenarios to worry about:

    * crash recovery from the most recent checkpoint

    * PITR replay over a long interval (many checkpoints)

    * recovery in the face of a partially corrupt filesystem

    It's the last one that is mostly bothering me at the moment. I don't
    want us to throw away data simply because the filesystem forgot an
    inode. Yeah, we might not have enough data in the WAL log to completely
    reconstruct a table, but we should push out what we do have, *not* toss
    it into the bit bucket.
    I like the idea tossed out by one of the others the most: create a
    "recovery" system tablespace, and use it to resolve issues like this.

    The question is: what do you do with the tables in that tablespace once
    recovery is complete? Leave them there? That's certainly a possibility
    (in fact, it seems the best option, especially now that we're doing
    PITR), but it means that the DBA would have to periodically clean up that
    tablespace so that it doesn't run out of space during a later recovery.
    Actually, it seems to me to be the only option that isn't the equivalent
    of throwing away the data...
    In the first case (straight crash recovery) I think it is true that any
    reference to a missing file is a reference to a file that will get
    deleted before recovery finishes. But I don't think that holds for PITR
    (we might be asked to stop short of where the table gets deleted) nor
    for the case where there's been filesystem damage.
    But doesn't PITR assume that a full filesystem-level restore of the
    database as it was prior to the events in the first event log being
    replayed has been done? In that event, wouldn't the PITR process Just
    Work?


    --
    Kevin Brown [email protected]
  • Bruce Momjian at Aug 13, 2004 at 4:02 am
    Did we resolve this?

    ---------------------------------------------------------------------------

    Tom Lane wrote:
    Christopher Kings-Lynne <[email protected]> writes:
    Maybe we could avoid removing it until the next checkpoint? Or is that
    not enough. Maybe it could stay there forever :/
    Part of the problem here is that this code has to serve several
    purposes. We have different scenarios to worry about:

    * crash recovery from the most recent checkpoint

    * PITR replay over a long interval (many checkpoints)

    * recovery in the face of a partially corrupt filesystem

    It's the last one that is mostly bothering me at the moment. I don't
    want us to throw away data simply because the filesystem forgot an
    inode. Yeah, we might not have enough data in the WAL log to completely
    reconstruct a table, but we should push out what we do have, *not* toss
    it into the bit bucket.

    In the first case (straight crash recovery) I think it is true that any
    reference to a missing file is a reference to a file that will get
    deleted before recovery finishes. But I don't think that holds for PITR
    (we might be asked to stop short of where the table gets deleted) nor
    for the case where there's been filesystem damage.

    regards, tom lane

    ---------------------------(end of broadcast)---------------------------
    TIP 3: if posting/reading through Usenet, please send an appropriate
    subscribe-nomail command to [email protected] so that your
    message can get through to the mailing list cleanly
    --
    Bruce Momjian | http://candle.pha.pa.us
    [email protected] | (610) 359-1001
    + If your life is a hard drive, | 13 Roberts Road
    + Christ can be your backup. | Newtown Square, Pennsylvania 19073
  • Tom Lane at Aug 13, 2004 at 4:12 am

    Bruce Momjian writes:
    Did we resolve this?
    No, it's an open issue.

    regards, tom lane
  • Bruce Momjian at Aug 15, 2004 at 12:57 am
    Added to open items:

    * fix recovery of DROP TABLESPACE after checkpoint


    ---------------------------------------------------------------------------

    Tom Lane wrote:
    Christopher Kings-Lynne <[email protected]> writes:
    Maybe we could avoid removing it until the next checkpoint? Or is that
    not enough. Maybe it could stay there forever :/
    Part of the problem here is that this code has to serve several
    purposes. We have different scenarios to worry about:

    * crash recovery from the most recent checkpoint

    * PITR replay over a long interval (many checkpoints)

    * recovery in the face of a partially corrupt filesystem

    It's the last one that is mostly bothering me at the moment. I don't
    want us to throw away data simply because the filesystem forgot an
    inode. Yeah, we might not have enough data in the WAL log to completely
    reconstruct a table, but we should push out what we do have, *not* toss
    it into the bit bucket.

    In the first case (straight crash recovery) I think it is true that any
    reference to a missing file is a reference to a file that will get
    deleted before recovery finishes. But I don't think that holds for PITR
    (we might be asked to stop short of where the table gets deleted) nor
    for the case where there's been filesystem damage.

    regards, tom lane

    ---------------------------(end of broadcast)---------------------------
    TIP 3: if posting/reading through Usenet, please send an appropriate
    subscribe-nomail command to [email protected] so that your
    message can get through to the mailing list cleanly
    --
    Bruce Momjian | http://candle.pha.pa.us
    [email protected] | (610) 359-1001
    + If your life is a hard drive, | 13 Roberts Road
    + Christ can be your backup. | Newtown Square, Pennsylvania 19073
  • Bruce Momjian at Oct 6, 2004 at 5:34 pm
    Is this fixed?

    ---------------------------------------------------------------------------

    Tom Lane wrote:
    Christopher Kings-Lynne <[email protected]> writes:
    Maybe we could avoid removing it until the next checkpoint? Or is that
    not enough. Maybe it could stay there forever :/
    Part of the problem here is that this code has to serve several
    purposes. We have different scenarios to worry about:

    * crash recovery from the most recent checkpoint

    * PITR replay over a long interval (many checkpoints)

    * recovery in the face of a partially corrupt filesystem

    It's the last one that is mostly bothering me at the moment. I don't
    want us to throw away data simply because the filesystem forgot an
    inode. Yeah, we might not have enough data in the WAL log to completely
    reconstruct a table, but we should push out what we do have, *not* toss
    it into the bit bucket.

    In the first case (straight crash recovery) I think it is true that any
    reference to a missing file is a reference to a file that will get
    deleted before recovery finishes. But I don't think that holds for PITR
    (we might be asked to stop short of where the table gets deleted) nor
    for the case where there's been filesystem damage.

    regards, tom lane

    ---------------------------(end of broadcast)---------------------------
    TIP 3: if posting/reading through Usenet, please send an appropriate
    subscribe-nomail command to [email protected] so that your
    message can get through to the mailing list cleanly
    --
    Bruce Momjian | http://candle.pha.pa.us
    [email protected] | (610) 359-1001
    + If your life is a hard drive, | 13 Roberts Road
    + Christ can be your backup. | Newtown Square, Pennsylvania 19073
  • Tom Lane at Oct 6, 2004 at 7:16 pm

    Bruce Momjian writes:
    Is this fixed?
    Yes.

    regards, tom lane

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedAug 4, '04 at 6:40p
activeOct 6, '04 at 7:16p
posts19
users7
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2023 Grokbase