FAQ
On Marko Kreen's detailed suggestion, I've implemented a restartable
recovery mode for archive recovery (aka PITR). Restart points are known
as recovery checkpoints and are normally taken every 100 checkpoints in
the log to ensure good recovery performance.

An additional mode
standby_mode = 'true'
can also be specified, which ensures that a recovery checkpoint occurs
for each checkpoint in the logs.

Some other code refactorings, though all changes isolated to xlog.c and
to pg_control.h; code comments welcome.

Applies cleanly to cvstip, passes make check.

Further details testing is very desirable. I've tested restarting a
recovery twice and things work successfully.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

Search Discussions

  • Andreas Seltenreich at Jul 16, 2006 at 12:22 pm

    Simon Riggs writes:

    [2. text/x-patch; restartableRecovery.patch]
    Hmm, wouldn't you have to reboot the resource managers at each
    checkpoint? I'm afraid otherwise things like postponed page splits
    could get lost on restart from a later checkpoint.

    regards,
    andreas
  • Tom Lane at Jul 16, 2006 at 2:52 pm

    Andreas Seltenreich writes:
    Simon Riggs <simon@2ndquadrant.com> writes:
    [2. text/x-patch; restartableRecovery.patch]
    Hmm, wouldn't you have to reboot the resource managers at each
    checkpoint? I'm afraid otherwise things like postponed page splits
    could get lost on restart from a later checkpoint.
    Ouch. That's a bit nasty. You can't just apply a postponed split at
    checkpoint time, because the WAL record could easily be somewhere after
    the checkpoint, leading to duplicate insertions. Right offhand I don't
    see how to make this work :-(

    regards, tom lane
  • Simon Riggs at Jul 16, 2006 at 3:58 pm

    On Sun, 2006-07-16 at 10:51 -0400, Tom Lane wrote:
    Andreas Seltenreich <andreas+pg@gate450.dyndns.org> writes:
    Simon Riggs <simon@2ndquadrant.com> writes:
    [2. text/x-patch; restartableRecovery.patch]
    Hmm, wouldn't you have to reboot the resource managers at each
    checkpoint? I'm afraid otherwise things like postponed page splits
    could get lost on restart from a later checkpoint.
    Ouch. That's a bit nasty. You can't just apply a postponed split at
    checkpoint time, because the WAL record could easily be somewhere after
    the checkpoint, leading to duplicate insertions. Right offhand I don't
    see how to make this work :-(
    Yes, ouch. So much for gung-ho code sprints; thanks Andreas.

    To do this we would need to have another rmgr specific routine that gets
    called at a recovery checkpoint. This would then write to disk the
    current state of the incomplete multi-WAL actions, in some manner.
    During the startup routines we would check for any pre-existing state
    files and use those to initialise the incomplete action cache. Cleanup
    would then discard all state files.

    That allows us to not-forget actions, but it doesn't help us if there
    are problems repeating actions twice. We would at least know that we are
    in a potential double-action zone and could give different kinds of
    errors or handling.

    Or we can simply mark any indexes incomplete-needs-rebuild if they had a
    page split during the overlap time between the last known good recovery
    checkpoint and the following one. But that does lead to randomly bounded
    recovery time, which might be better to have started from scratch
    anyway.

    Given time available for 8.2, neither one is a quick fix.

    --
    Simon Riggs
    EnterpriseDB http://www.enterprisedb.com
  • Simon Riggs at Aug 1, 2006 at 12:04 am

    On Sun, 2006-07-16 at 20:56 +0100, Simon Riggs wrote:
    On Sun, 2006-07-16 at 15:33 -0400, Tom Lane wrote:
    Simon Riggs <simon@2ndquadrant.com> writes:
    On Sun, 2006-07-16 at 12:40 -0400, Tom Lane wrote:
    A compromise that might be good enough is to add an rmgr routine defined
    as "bool is_idle(void)" that tests whether the rmgr has any open state
    to worry about. Then, recovery checkpoints are done only if all rmgrs
    say they are idle.
    Perhaps that should be extended to say whether there are any
    non-idempotent changes made in the last checkpoint period. That might
    cover a wider set of potential actions.
    Perhaps best to call it safe_to_checkpoint(), and not pre-judge what
    reasons the rmgr might have for not wanting to restart here.
    You read my mind.
    If we are only going to do a recovery checkpoint at every Nth checkpoint
    record, then occasionally having to skip one seems no big problem ---
    just do it at the first subsequent record that is safe.
    Got it.
    I've implemented this for BTree, GIN, GIST using an additional rmgr
    function bool rm_safe_restartpoint(void)

    The functions are actually trivial, assuming I've understood this and
    how GIST and GIN work for their xlogging.

    "Recovery checkpoints" are now renamed "restartpoints" to avoid
    confusion with checkpoints. So checkpoints occur during normal
    processing (only) and restartpoints occur during recovery (only).

    Updated patch enclosed, which I believe has no conflicts with the other
    patches on xlog.c just submitted.

    Much additional testing required, but the underlying concepts are very
    simple really. Andreas: any further gotchas? :-)

    --
    Simon Riggs
    EnterpriseDB http://www.enterprisedb.com
  • Bruce Momjian at Aug 1, 2006 at 12:10 am
    Nice. I was going to ask if this could make it into 8.2.

    ---------------------------------------------------------------------------

    Simon Riggs wrote:
    On Sun, 2006-07-16 at 20:56 +0100, Simon Riggs wrote:
    On Sun, 2006-07-16 at 15:33 -0400, Tom Lane wrote:
    Simon Riggs <simon@2ndquadrant.com> writes:
    On Sun, 2006-07-16 at 12:40 -0400, Tom Lane wrote:
    A compromise that might be good enough is to add an rmgr routine defined
    as "bool is_idle(void)" that tests whether the rmgr has any open state
    to worry about. Then, recovery checkpoints are done only if all rmgrs
    say they are idle.
    Perhaps that should be extended to say whether there are any
    non-idempotent changes made in the last checkpoint period. That might
    cover a wider set of potential actions.
    Perhaps best to call it safe_to_checkpoint(), and not pre-judge what
    reasons the rmgr might have for not wanting to restart here.
    You read my mind.
    If we are only going to do a recovery checkpoint at every Nth checkpoint
    record, then occasionally having to skip one seems no big problem ---
    just do it at the first subsequent record that is safe.
    Got it.
    I've implemented this for BTree, GIN, GIST using an additional rmgr
    function bool rm_safe_restartpoint(void)

    The functions are actually trivial, assuming I've understood this and
    how GIST and GIN work for their xlogging.

    "Recovery checkpoints" are now renamed "restartpoints" to avoid
    confusion with checkpoints. So checkpoints occur during normal
    processing (only) and restartpoints occur during recovery (only).

    Updated patch enclosed, which I believe has no conflicts with the other
    patches on xlog.c just submitted.

    Much additional testing required, but the underlying concepts are very
    simple really. Andreas: any further gotchas? :-)

    --
    Simon Riggs
    EnterpriseDB http://www.enterprisedb.com
    [ Attachment, skipping... ]
    ---------------------------(end of broadcast)---------------------------
    TIP 4: Have you searched our list archives?

    http://archives.postgresql.org
    --
    Bruce Momjian bruce@momjian.us
    EnterpriseDB http://www.enterprisedb.com

    + If your life is a hard drive, Christ can be your backup. +
  • Tom Lane at Aug 7, 2006 at 5:06 pm

    Simon Riggs writes:
    I've implemented this for BTree, GIN, GIST using an additional rmgr
    function bool rm_safe_restartpoint(void)
    ...
    "Recovery checkpoints" are now renamed "restartpoints" to avoid
    confusion with checkpoints. So checkpoints occur during normal
    processing (only) and restartpoints occur during recovery (only).
    Applied with revisions. As submitted the patch pushed backup_label out
    of the way immediately upon reading it, which is no good: you need to be
    sure that the starting checkpoint location is written to pg_control
    first, else an immediate crash would allow the thing to try to start
    from whatever checkpoint is listed in the backed-up pg_control. Also,
    the minimum recovery stopping point that's obtained using the label file
    still has to be enforced if there's a crash during the replay sequence.
    I felt the best way to do that was to copy the minimum stopping point
    into pg_control, so that's what the code does now.

    Also, as I mentioned earlier, I think that doing restartpoints on the
    basis of elapsed time is simpler and more useful than having an explicit
    distinction between "normal" and "standby" modes. We can always invent
    a standby_mode flag later if we need one, but we don't need it for this.

    regards, tom lane
  • Simon Riggs at Aug 9, 2006 at 11:48 am

    On Mon, 2006-08-07 at 13:05 -0400, Tom Lane wrote:
    Simon Riggs <simon@2ndquadrant.com> writes:
    I've implemented this for BTree, GIN, GIST using an additional rmgr
    function bool rm_safe_restartpoint(void)
    ...
    "Recovery checkpoints" are now renamed "restartpoints" to avoid
    confusion with checkpoints. So checkpoints occur during normal
    processing (only) and restartpoints occur during recovery (only).
    Applied with revisions.
    err....CheckPointGuts() :-) I guess patch reviews need some spicing up.
    As submitted the patch pushed backup_label out
    of the way immediately upon reading it, which is no good: you need to be
    sure that the starting checkpoint location is written to pg_control
    first, else an immediate crash would allow the thing to try to start
    from whatever checkpoint is listed in the backed-up pg_control. Also,
    the minimum recovery stopping point that's obtained using the label file
    still has to be enforced if there's a crash during the replay sequence.
    I felt the best way to do that was to copy the minimum stopping point
    into pg_control, so that's what the code does now.
    Thanks for checking that.
    Also, as I mentioned earlier, I think that doing restartpoints on the
    basis of elapsed time is simpler and more useful than having an explicit
    distinction between "normal" and "standby" modes. We can always invent
    a standby_mode flag later if we need one, but we don't need it for this.
    OK, agreed.

    The original thinking was that writing a restartpoint was more crucial
    when in standby mode; but this way we've better performance and have a
    low ceiling on the restart time if that should ever occur at the worst
    moment.

    Thanks again to Marko for the concept.

    I'll work on the docs for backup.sgml also.

    --
    Simon Riggs
    EnterpriseDB http://www.enterprisedb.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-patches @
categoriespostgresql
postedJul 11, '06 at 8:14p
activeAug 9, '06 at 11:48a
posts8
users4
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase