Currently, if you take a backup with "pg_basebackup -x" (which means it
will include all the WAL to required restore within the backup dump),
and hit Ctrl-C while the WAL is being streamed, you end up with a data
directory that you can start postmaster from, even though the backup was
not complete. So what appears to be a valid backup - it starts up fine -
can actually be corrupt.

I put in a check against that back in March, but it had to be reverted
because it broke crash recovery when the system crashed while a
pg_start_backup() based backup was in progress:

http://archives.postgresql.org/message-id/4DA58686.1050501@enterprisedb.com

Here's a patch to add it back in a more fine-grained fashion. The patch
adds an extra line to backup_label, indicating whether the backup was
taken with pg_start/stop_backup(), or by streaming (= pg_basebackup).
For a backup taken with pg_start_backup(), the behavior is kept the same
as it has been - if the end-of-backup record is not reached during crash
recovery, the database starts up anyway. But for a streamed backup, you
get an error at startup.

I think this is a nice additional safeguard to have, making streamed
backups more robust. I'd like to add this to 9.1, but it required an
extra field to be added to the control file, so it would force an
initdb. It's probably not worth that. Or, we could sneak in the extra
boolean field to some currently unused pad space in the ControlFile struct.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Search Discussions

  • Alvaro Herrera at Aug 9, 2011 at 3:20 pm

    Excerpts from Heikki Linnakangas's message of mar ago 09 05:00:00 -0400 2011:

    I think this is a nice additional safeguard to have, making streamed
    backups more robust. I'd like to add this to 9.1, but it required an
    extra field to be added to the control file, so it would force an
    initdb. It's probably not worth that. Or, we could sneak in the extra
    boolean field to some currently unused pad space in the ControlFile struct.
    How about making the new backup_label field optional? If absent, assume
    current behavior.

    --
    Álvaro Herrera <alvherre@commandprompt.com>
    The PostgreSQL Company - Command Prompt, Inc.
    PostgreSQL Replication, Consulting, Custom Development, 24x7 support
  • Heikki Linnakangas at Aug 9, 2011 at 3:22 pm

    On 09.08.2011 18:20, Alvaro Herrera wrote:
    Excerpts from Heikki Linnakangas's message of mar ago 09 05:00:00 -0400 2011:
    I think this is a nice additional safeguard to have, making streamed
    backups more robust. I'd like to add this to 9.1, but it required an
    extra field to be added to the control file, so it would force an
    initdb. It's probably not worth that. Or, we could sneak in the extra
    boolean field to some currently unused pad space in the ControlFile struct.
    How about making the new backup_label field optional? If absent, assume
    current behavior.
    That's how I actually did it in the patch. However, the problem wrt.
    requiring initdb is not the new field in backup_label, it's the new
    field in the control file.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Tom Lane at Aug 9, 2011 at 4:07 pm

    Heikki Linnakangas writes:
    On 09.08.2011 18:20, Alvaro Herrera wrote:
    How about making the new backup_label field optional? If absent, assume
    current behavior.
    That's how I actually did it in the patch. However, the problem wrt.
    requiring initdb is not the new field in backup_label, it's the new
    field in the control file.
    Yeah. I think it's too late to be fooling with pg_control for 9.1.
    Just fix it in HEAD.

    regards, tom lane
  • Heikki Linnakangas at Aug 10, 2011 at 9:27 am

    On 09.08.2011 19:07, Tom Lane wrote:
    Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes:
    On 09.08.2011 18:20, Alvaro Herrera wrote:
    How about making the new backup_label field optional? If absent, assume
    current behavior.
    That's how I actually did it in the patch. However, the problem wrt.
    requiring initdb is not the new field in backup_label, it's the new
    field in the control file.
    Yeah. I think it's too late to be fooling with pg_control for 9.1.
    Just fix it in HEAD.
    Done.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Magnus Hagander at Aug 10, 2011 at 9:35 am

    On Tue, Aug 9, 2011 at 18:07, Tom Lane wrote:
    Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
    On 09.08.2011 18:20, Alvaro Herrera wrote:
    How about making the new backup_label field optional?  If absent, assume
    current behavior.
    That's how I actually did it in the patch. However, the problem wrt.
    requiring initdb is not the new field in backup_label, it's the new
    field in the control file.
    Yeah.  I think it's too late to be fooling with pg_control for 9.1.
    Just fix it in HEAD.
    Should we add a note to the documentation of pg_basebackup in 9.1
    telling people to take care about the failure case? Or add a signal
    handler in the pg_basebackup client emitting a warning about it?
  • Heikki Linnakangas at Aug 10, 2011 at 10:44 am

    On 10.08.2011 12:29, Magnus Hagander wrote:
    On Tue, Aug 9, 2011 at 18:07, Tom Lanewrote:
    Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes:
    On 09.08.2011 18:20, Alvaro Herrera wrote:
    How about making the new backup_label field optional? If absent, assume
    current behavior.
    That's how I actually did it in the patch. However, the problem wrt.
    requiring initdb is not the new field in backup_label, it's the new
    field in the control file.
    Yeah. I think it's too late to be fooling with pg_control for 9.1.
    Just fix it in HEAD.
    Should we add a note to the documentation of pg_basebackup in 9.1
    telling people to take care about the failure case?
    Something like "Note: if you abort the backup before it's finished, the
    backup won't be valid" ? That seems pretty obvious to me, hardly worth
    documenting.
    Or add a signal
    handler in the pg_basebackup client emitting a warning about it?
    We don't have such a signal handler pg_dump either. I don't think we
    should add it.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Magnus Hagander at Aug 10, 2011 at 10:53 am

    On Wed, Aug 10, 2011 at 12:44, Heikki Linnakangas wrote:
    On 10.08.2011 12:29, Magnus Hagander wrote:

    On Tue, Aug 9, 2011 at 18:07, Tom Lanewrote:
    Heikki Linnakangas<heikki.linnakangas@enterprisedb.com>  writes:
    On 09.08.2011 18:20, Alvaro Herrera wrote:

    How about making the new backup_label field optional?  If absent,
    assume
    current behavior.
    That's how I actually did it in the patch. However, the problem wrt.
    requiring initdb is not the new field in backup_label, it's the new
    field in the control file.
    Yeah.  I think it's too late to be fooling with pg_control for 9.1.
    Just fix it in HEAD.
    Should we add a note to the documentation of pg_basebackup in 9.1
    telling people to take care about the failure case?
    Something like "Note: if you abort the backup before it's finished, the
    backup won't be valid" ? That seems pretty obvious to me, hardly worth
    documenting.
    I meant something more along the line of that it looks ok, but may be corrupted.

    Or add a signal
    handler in the pg_basebackup client emitting a warning about it?
    We don't have such a signal handler pg_dump either. I don't think we should
    add it.
    Hmm. I guess an aborted pg_dump will also "look ok but actually be
    corrupt" (or incomplete). Good point.
  • Robert Haas at Aug 10, 2011 at 12:19 pm

    On Wed, Aug 10, 2011 at 6:53 AM, Magnus Hagander wrote:
    On Wed, Aug 10, 2011 at 12:44, Heikki Linnakangas
    wrote:
    On 10.08.2011 12:29, Magnus Hagander wrote:

    On Tue, Aug 9, 2011 at 18:07, Tom Lanewrote:
    Heikki Linnakangas<heikki.linnakangas@enterprisedb.com>  writes:
    On 09.08.2011 18:20, Alvaro Herrera wrote:

    How about making the new backup_label field optional?  If absent,
    assume
    current behavior.
    That's how I actually did it in the patch. However, the problem wrt.
    requiring initdb is not the new field in backup_label, it's the new
    field in the control file.
    Yeah.  I think it's too late to be fooling with pg_control for 9.1.
    Just fix it in HEAD.
    Should we add a note to the documentation of pg_basebackup in 9.1
    telling people to take care about the failure case?
    Something like "Note: if you abort the backup before it's finished, the
    backup won't be valid" ? That seems pretty obvious to me, hardly worth
    documenting.
    I meant something more along the line of that it looks ok, but may be corrupted.
    Yeah. I'm frankly pretty nervous about shipping 9.1 with this
    problem, but note that I don't have a better idea. I'd favor making
    pg_basebackup emit a warning or maybe even remove the backup if it's
    aborted midway through.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Simon Riggs at Aug 10, 2011 at 12:34 pm

    On Wed, Aug 10, 2011 at 1:19 PM, Robert Haas wrote:
    On Wed, Aug 10, 2011 at 6:53 AM, Magnus Hagander wrote:
    On Wed, Aug 10, 2011 at 12:44, Heikki Linnakangas
    wrote:
    On 10.08.2011 12:29, Magnus Hagander wrote:

    On Tue, Aug 9, 2011 at 18:07, Tom Lanewrote:
    Heikki Linnakangas<heikki.linnakangas@enterprisedb.com>  writes:
    On 09.08.2011 18:20, Alvaro Herrera wrote:

    How about making the new backup_label field optional?  If absent,
    assume
    current behavior.
    That's how I actually did it in the patch. However, the problem wrt.
    requiring initdb is not the new field in backup_label, it's the new
    field in the control file.
    Yeah.  I think it's too late to be fooling with pg_control for 9.1.
    Just fix it in HEAD.
    Should we add a note to the documentation of pg_basebackup in 9.1
    telling people to take care about the failure case?
    Something like "Note: if you abort the backup before it's finished, the
    backup won't be valid" ? That seems pretty obvious to me, hardly worth
    documenting.
    I meant something more along the line of that it looks ok, but may be corrupted.
    Yeah.  I'm frankly pretty nervous about shipping 9.1 with this
    problem, but note that I don't have a better idea.  I'd favor making
    pg_basebackup emit a warning or maybe even remove the backup if it's
    aborted midway through.
    I don't understand why we need to change pg_control for this?

    Why can't we just add a line to backup_label as the first action of
    pg_basebackup and then updated it the last action to show the backup
    set is complete?

    That would be safe for 9.1

    --
    Simon Riggs                   http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
  • Heikki Linnakangas at Aug 10, 2011 at 4:34 pm

    On 10.08.2011 15:34, Simon Riggs wrote:
    On Wed, Aug 10, 2011 at 1:19 PM, Robert Haaswrote:
    On Wed, Aug 10, 2011 at 6:53 AM, Magnus Haganderwrote:
    On Wed, Aug 10, 2011 at 12:44, Heikki Linnakangas
    wrote:
    On 10.08.2011 12:29, Magnus Hagander wrote:

    On Tue, Aug 9, 2011 at 18:07, Tom Lanewrote:
    Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes:
    On 09.08.2011 18:20, Alvaro Herrera wrote:

    How about making the new backup_label field optional? If absent,
    assume
    current behavior.
    That's how I actually did it in the patch. However, the problem wrt.
    requiring initdb is not the new field in backup_label, it's the new
    field in the control file.
    Yeah. I think it's too late to be fooling with pg_control for 9.1.
    Just fix it in HEAD.
    Should we add a note to the documentation of pg_basebackup in 9.1
    telling people to take care about the failure case?
    Something like "Note: if you abort the backup before it's finished, the
    backup won't be valid" ? That seems pretty obvious to me, hardly worth
    documenting.
    I meant something more along the line of that it looks ok, but may be corrupted.
    Yeah. I'm frankly pretty nervous about shipping 9.1 with this
    problem, but note that I don't have a better idea. I'd favor making
    pg_basebackup emit a warning or maybe even remove the backup if it's
    aborted midway through.
    I don't understand why we need to change pg_control for this?

    Why can't we just add a line to backup_label as the first action of
    pg_basebackup and then updated it the last action to show the backup
    set is complete?
    Hmm, that's not possible for the 'tar' output, but would work for 'dir'
    output. Another similar idea would be to withhold the control file in
    memory until the end of backup, and append it to the output as last. The
    backup can't be restored until the control file is written out.

    That won't protect from more complicated scenarios, like if you take the
    backup without the -x flag, and copy some but not all of the required
    WAL files manually to the pg_xlog directory. But it'd be much better
    than nothing for 9.1.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Tom Lane at Aug 10, 2011 at 5:59 pm

    Heikki Linnakangas writes:
    Hmm, that's not possible for the 'tar' output, but would work for 'dir'
    output. Another similar idea would be to withhold the control file in
    memory until the end of backup, and append it to the output as last. The
    backup can't be restored until the control file is written out.
    That won't protect from more complicated scenarios, like if you take the
    backup without the -x flag, and copy some but not all of the required
    WAL files manually to the pg_xlog directory. But it'd be much better
    than nothing for 9.1.
    Maybe we're overcomplicating this. What about changing pg_basebackup to
    print a message when the backup is completely sent/received? People
    would get used to that quickly, and would know to be suspicious if they
    didn't see it.

    regards, tom lane
  • Robert Haas at Aug 10, 2011 at 6:01 pm

    On Wed, Aug 10, 2011 at 1:45 PM, Tom Lane wrote:
    Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
    Hmm, that's not possible for the 'tar' output, but would work for 'dir'
    output. Another similar idea would be to withhold the control file in
    memory until the end of backup, and append it to the output as last. The
    backup can't be restored until the control file is written out.
    That won't protect from more complicated scenarios, like if you take the
    backup without the -x flag, and copy some but not all of the required
    WAL files manually to the pg_xlog directory. But it'd be much better
    than nothing for 9.1.
    Maybe we're overcomplicating this.  What about changing pg_basebackup to
    print a message when the backup is completely sent/received?  People
    would get used to that quickly, and would know to be suspicious if they
    didn't see it.
    Yeah, but would they be sufficiently suspicious to think "oh, my
    backup is hopeless corrupted even if it seems to work"?

    I think a clearer warning is needed, at the very least, and if there's
    a way to prevent it altogether at least in straightforward cases, that
    would be even better.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Magnus Hagander at Aug 10, 2011 at 6:16 pm

    On Wed, Aug 10, 2011 at 19:45, Tom Lane wrote:
    Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
    Hmm, that's not possible for the 'tar' output, but would work for 'dir'
    output. Another similar idea would be to withhold the control file in
    memory until the end of backup, and append it to the output as last. The
    backup can't be restored until the control file is written out.
    That won't protect from more complicated scenarios, like if you take the
    backup without the -x flag, and copy some but not all of the required
    WAL files manually to the pg_xlog directory. But it'd be much better
    than nothing for 9.1.
    Maybe we're overcomplicating this.  What about changing pg_basebackup to
    print a message when the backup is completely sent/received?  People
    would get used to that quickly, and would know to be suspicious if they
    didn't see it.
    That would suck for scripts, and have people redirect the output to
    /dev/null instead, wouldn't it? And it violates the "unix expectation"
    that is that a successful command will not write anything to it's
    output...
  • Fujii Masao at Aug 16, 2011 at 1:10 am

    On Thu, Aug 11, 2011 at 1:34 AM, Heikki Linnakangas wrote:
    Hmm, that's not possible for the 'tar' output, but would work for 'dir'
    output. Another similar idea would be to withhold the control file in memory
    until the end of backup, and append it to the output as last. The backup
    can't be restored until the control file is written out.

    That won't protect from more complicated scenarios, like if you take the
    backup without the -x flag, and copy some but not all of the required WAL
    files manually to the pg_xlog directory. But it'd be much better than
    nothing for 9.1.
    We need to skip checking whether we've reached the end backup location
    only when the server crashes while base backup using pg_start_backup. Right?
    We can do this by *not* initializing ControlFile->backupStartPoint if the server
    is doing crash recovery and backupEndRequired is false. What about the attached
    patch?

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
  • Heikki Linnakangas at Aug 17, 2011 at 8:49 am

    On 16.08.2011 04:10, Fujii Masao wrote:
    On Thu, Aug 11, 2011 at 1:34 AM, Heikki Linnakangas
    wrote:
    Hmm, that's not possible for the 'tar' output, but would work for 'dir'
    output. Another similar idea would be to withhold the control file in memory
    until the end of backup, and append it to the output as last. The backup
    can't be restored until the control file is written out.

    That won't protect from more complicated scenarios, like if you take the
    backup without the -x flag, and copy some but not all of the required WAL
    files manually to the pg_xlog directory. But it'd be much better than
    nothing for 9.1.
    We need to skip checking whether we've reached the end backup location
    only when the server crashes while base backup using pg_start_backup. Right? Yes.
    We can do this by *not* initializing ControlFile->backupStartPoint if the server
    is doing crash recovery and backupEndRequired is false. What about the attached
    patch?
    Hmm, this behaves slightly differently, if you first try to start the
    restored server without recovery.conf, stop recovery, and restart it
    after adding recovery.conf. But I guess that's not a big deal, the check
    is simply skipped in that case, which is what always happens without
    this patch anyway. Committed this to 9.1, but kept master as it was.

    (sorry for the delay, I wanted to fix the bogus comment as soon as I saw
    it, but needed some time to ponder the rest of the patch)

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Fujii Masao at Aug 17, 2011 at 9:26 am

    On Wed, Aug 17, 2011 at 5:49 PM, Heikki Linnakangas wrote:
    Hmm, this behaves slightly differently, if you first try to start the
    restored server without recovery.conf, stop recovery, and restart it after
    adding recovery.conf. But I guess that's not a big deal, the check is simply
    skipped in that case, which is what always happens without this patch
    anyway.
    Oh, I forgot to consider that case. Yeah, I agree with you.
    Committed this to 9.1,
    Thanks a lot!
    but kept master as it was.
    So, in master, we should change pg_controldata.c and pg_resetxlog.c for
    new pg_control field "backupEndRequired"?
    (sorry for the delay, I wanted to fix the bogus comment as soon as I saw it,
    but needed some time to ponder the rest of the patch)
    NM. Thanks!

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
  • Heikki Linnakangas at Aug 17, 2011 at 9:38 am

    On 17.08.2011 12:26, Fujii Masao wrote:
    So, in master, we should change pg_controldata.c and pg_resetxlog.c for
    new pg_control field "backupEndRequired"?
    Ah, good catch! Fixed.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Dimitri Fontaine at Aug 12, 2011 at 9:10 pm

    Magnus Hagander writes:
    Or add a signal
    handler in the pg_basebackup client emitting a warning about it?
    We don't have such a signal handler pg_dump either. I don't think we should
    add it.
    Hmm. I guess an aborted pg_dump will also "look ok but actually be
    corrupt" (or incomplete). Good point.
    What about having the signal handler corrupt the backup by adding some
    garbage into it? Now the failure case is obvious…

    Regards,
    --
    Dimitri Fontaine
    http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedAug 9, '11 at 9:00a
activeAug 17, '11 at 9:38a
posts19
users8
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase