I will provide a patch which can exeute pg_start/stop_backup
including to solve above comment and conditions in next stage.
Then please review.
done.


* Procedure

1. Call pg_start_backup('x') on the standby.
2. Take a backup of the data dir.
3. Call pg_stop_backup() on the standby.
4. Copy the control file on the standby to the backup.
5. Check whether the control file is status during hot standby with pg_controldata.
-> If the standby promote between 3. and 4., the backup can not recovery.
-> pg_control is that "Minimum recovery ending location" is equals 0/0.
-> backup-end record is not written.

* Not correspond yet

* full_page_write = off
-> If the primary is "full_page_write = off", archive recovery may not act
normally. Therefore the standby may need to check whether "full_page_write
= off" to WAL.

--------------------------------------------
Jun Ishizuka
NTT Software Corporation
TEL:045-317-7018
E-Mail: ishizuka.jun@po.ntts.co.jp
--------------------------------------------

Search Discussions

  • Cédric Villemain at Aug 5, 2011 at 8:02 am

    2011/8/5 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
    I will provide a patch which can exeute pg_start/stop_backup
    including to solve above comment and conditions in next stage.
    Then please review.
    done. great !

    * Procedure

    1. Call pg_start_backup('x') on the standby.
    2. Take a backup of the data dir.
    3. Call pg_stop_backup() on the standby.
    4. Copy the control file on the standby to the backup.
    5. Check whether the control file is status during hot standby with pg_controldata.
    -> If the standby promote between 3. and 4., the backup can not recovery.
    -> pg_control is that "Minimum recovery ending location" is equals 0/0.
    -> backup-end record is not written.

    * Not correspond yet

    * full_page_write = off
    -> If the primary is "full_page_write = off", archive recovery may not act
    normally. Therefore the standby may need to check whether "full_page_write
    = off" to WAL.
    Isn't having a standby make the full_page_write = on in all case
    (bypass configuration) ?
    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------


    --
    Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-hackers


    --
    Cédric Villemain +33 (0)6 20 30 22 52
    http://2ndQuadrant.fr/
    PostgreSQL: Support 24x7 - Développement, Expertise et Formation
  • Jun Ishiduka at Aug 15, 2011 at 8:48 am

    * Not correspond yet

    * full_page_write = off
    -> If the primary is "full_page_write = off", archive recovery may not act
    normally. Therefore the standby may need to check whether "full_page_write
    = off" to WAL.
    Isn't having a standby make the full_page_write = on in all case
    (bypass configuration) ?
    what's the meaning?


    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
  • Robert Haas at Aug 15, 2011 at 11:52 am

    2011/8/15 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
    * Not correspond yet

    * full_page_write = off
    -> If the primary is "full_page_write = off", archive recovery may not act
    normally. Therefore the standby may need to check whether "full_page_write
    = off" to WAL.
    Isn't having a standby make the full_page_write = on in all case
    (bypass configuration) ?
    what's the meaning?
    Yeah. full_page_writes is a WAL generation parameter. Standbys don't
    generate WAL. I think you just have to insist that the master has it
    on.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Jun Ishiduka at Aug 16, 2011 at 6:10 am

    * Not correspond yet

    ?* full_page_write = off
    ? ?-> If the primary is "full_page_write = off", archive recovery may not act
    ? ? ? normally. Therefore the standby may need to check whether "full_page_write
    ? ? ? = off" to WAL.
    Isn't having a standby make the full_page_write = on in all case
    (bypass configuration) ?
    what's the meaning?
    Thanks.

    This has the following two problems.
    * pg_start_backup() must set 'on' to full_page_writes of the master that
    is actual writing of the WAL, but not the standby.
    * The standby doesn't need to connect to the master that's actual writing
    WAL.
    (Ex. Standby2 in Cascade Replication: Master - Standby1 - Standby2)

    I'm worried how I should clear these problems.

    Regards.

    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
  • Steve Singer at Aug 16, 2011 at 3:29 pm

    On 11-08-16 02:09 AM, Jun Ishiduka wrote:
    Thanks.

    This has the following two problems.
    * pg_start_backup() must set 'on' to full_page_writes of the master that
    is actual writing of the WAL, but not the standby.
    Is there any way to tell from the WAL segments if they contain the full
    page data? If so could you verify this on the second slave when it is
    brought up? Or can you track this on the first slave and produce an
    error in either pg_start_backup or pg_stop_backup()

    I see in xlog.h XLR_BKP_REMOVABLE, the comment above it says that this
    flag is used to indicate that the archiver can compress the full page
    blocks to non-full page blocks. I am not familiar with where in the code
    this actually happens but will this cause issues if the first standby is
    processing WAL files from the archive?

    * The standby doesn't need to connect to the master that's actual writing
    WAL.
    (Ex. Standby2 in Cascade Replication: Master - Standby1 - Standby2)

    I'm worried how I should clear these problems.

    Regards.

    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------

  • Jun Ishiduka at Aug 17, 2011 at 9:00 am

    Is there any way to tell from the WAL segments if they contain the full
    page data? If so could you verify this on the second slave when it is
    brought up? Or can you track this on the first slave and produce an
    error in either pg_start_backup or pg_stop_backup()
    Sure.
    I will make a patch with the way to tell from the WAL segments if they
    contain the full page data.

    I see in xlog.h XLR_BKP_REMOVABLE, the comment above it says that this
    flag is used to indicate that the archiver can compress the full page
    blocks to non-full page blocks. I am not familiar with where in the code
    this actually happens but will this cause issues if the first standby is
    processing WAL files from the archive?
    I confirmed the flag in xlog.c, so I seemed to only insert it in
    XLogInsert(). I consider whether it is available.


    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
  • Fujii Masao at Aug 17, 2011 at 10:19 am

    2011/8/17 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
    I see in xlog.h XLR_BKP_REMOVABLE, the comment above it says that this
    flag is used to indicate that the archiver can compress the full page
    blocks to non-full page blocks. I am not familiar with where in the code
    this actually happens but will this cause issues if the first standby is
    processing WAL files from the archive?
    I confirmed the flag in xlog.c, so I seemed to only insert it in
    XLogInsert(). I consider whether it is available.
    That flag is not available to check whether full-page writing was
    skipped or not.
    Because it's in full-page data, not non-full-page one.

    The straightforward approach to address the problem you raised is to log
    the change of full_page_writes on the master. Since such a WAL record is also
    replicated to the standby, the standby can know whether full_page_writes is
    enabled or not in the master, from the WAL record. If it's disabled,
    pg_start_backup() in the standby should emit an error and refuse standby-only
    backup. If the WAL record indicating that full_page_writes was disabled
    on the master arrives during standby-only backup, the standby should cancel
    the backup.

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
  • Robert Haas at Aug 17, 2011 at 12:40 pm

    On Wed, Aug 17, 2011 at 6:19 AM, Fujii Masao wrote:
    2011/8/17 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
    I see in xlog.h XLR_BKP_REMOVABLE, the comment above it says that this
    flag is used to indicate that the archiver can compress the full page
    blocks to non-full page blocks. I am not familiar with where in the code
    this actually happens but will this cause issues if the first standby is
    processing WAL files from the archive?
    I confirmed the flag in xlog.c, so I seemed to only insert it in
    XLogInsert(). I consider whether it is available.
    That flag is not available to check whether full-page writing was
    skipped or not.
    Because it's in full-page data, not non-full-page one.

    The straightforward approach to address the problem you raised is to log
    the change of full_page_writes on the master. Since such a WAL record is also
    replicated to the standby, the standby can know whether full_page_writes is
    enabled or not in the master, from the WAL record. If it's disabled,
    pg_start_backup() in the standby should emit an error and refuse standby-only
    backup. If the WAL record indicating that full_page_writes was disabled
    on the master arrives during standby-only backup, the standby should cancel
    the backup.
    Seems like something we could add to XLOG_PARAMETER_CHANGE fairly easily.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Fujii Masao at Aug 17, 2011 at 1:53 pm

    On Wed, Aug 17, 2011 at 9:40 PM, Robert Haas wrote:
    On Wed, Aug 17, 2011 at 6:19 AM, Fujii Masao wrote:
    The straightforward approach to address the problem you raised is to log
    the change of full_page_writes on the master. Since such a WAL record is also
    replicated to the standby, the standby can know whether full_page_writes is
    enabled or not in the master, from the WAL record. If it's disabled,
    pg_start_backup() in the standby should emit an error and refuse standby-only
    backup. If the WAL record indicating that full_page_writes was disabled
    on the master arrives during standby-only backup, the standby should cancel
    the backup.
    Seems like something we could add to XLOG_PARAMETER_CHANGE fairly easily.
    I'm afraid it's not so easy. Because since fpw can be changed by
    SIGHUP, it's not
    easy to ensure that logging the change of fpw must happen ahead of the actual
    behavior change by that. Probably we need to make the backend which detects
    the change of fpw first log that.

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
  • Robert Haas at Aug 17, 2011 at 3:09 pm

    On Wed, Aug 17, 2011 at 9:53 AM, Fujii Masao wrote:
    On Wed, Aug 17, 2011 at 9:40 PM, Robert Haas wrote:
    On Wed, Aug 17, 2011 at 6:19 AM, Fujii Masao wrote:
    The straightforward approach to address the problem you raised is to log
    the change of full_page_writes on the master. Since such a WAL record is also
    replicated to the standby, the standby can know whether full_page_writes is
    enabled or not in the master, from the WAL record. If it's disabled,
    pg_start_backup() in the standby should emit an error and refuse standby-only
    backup. If the WAL record indicating that full_page_writes was disabled
    on the master arrives during standby-only backup, the standby should cancel
    the backup.
    Seems like something we could add to XLOG_PARAMETER_CHANGE fairly easily.
    I'm afraid it's not so easy. Because since fpw can be changed by
    SIGHUP, it's not
    easy to ensure that logging the change of fpw must happen ahead of the actual
    behavior change by that. Probably we need to make the backend which detects
    the change of fpw first log that.
    Ugh, you're right. But then you might have problems if the state
    changes again before all backends have picked up the previous change.
    What I've thought about before is making one backend (say, bgwriter)
    store its latest value in shared memory, protected by some lock that
    would already be held at the time the value is needed. Everyone else
    uses the shared memory copy instead of relying on their local value.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Fujii Masao at Aug 18, 2011 at 1:43 am

    On Thu, Aug 18, 2011 at 12:09 AM, Robert Haas wrote:
    Ugh, you're right.  But then you might have problems if the state
    changes again before all backends have picked up the previous change. Right.
    What I've thought about before is making one backend (say, bgwriter)
    store its latest value in shared memory, protected by some lock that
    would already be held at the time the value is needed.  Everyone else
    uses the shared memory copy instead of relying on their local value.
    Sounds reasonable.

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
  • Jun Ishiduka at Aug 16, 2011 at 6:14 am

    * Not correspond yet

    ?* full_page_write = off
    ? ?-> If the primary is "full_page_write = off", archive recovery may not act
    ? ? ? normally. Therefore the standby may need to check whether "full_page_write
    ? ? ? = off" to WAL.
    Isn't having a standby make the full_page_write = on in all case
    (bypass configuration) ?
    what's the meaning?
    Yeah. full_page_writes is a WAL generation parameter. Standbys don't
    generate WAL. I think you just have to insist that the master has it
    on.
    Thanks.

    This has the following two problems.
    * pg_start_backup() must set 'on' to full_page_writes of the master that
    is actual writing of the WAL, but not the standby.
    * The standby doesn't need to connect to the master that's actual writing
    WAL.
    (Ex. Standby2 in Cascade Replication: Master - Standby1 - Standby2)

    I'm worried how I should clear these problems.

    Regards.



    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
  • Fujii Masao at Aug 18, 2011 at 2:13 am

    2011/8/5 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
    * Procedure

    1. Call pg_start_backup('x') on the standby.
    2. Take a backup of the data dir.
    3. Call pg_stop_backup() on the standby.
    4. Copy the control file on the standby to the backup.
    5. Check whether the control file is status during hot standby with pg_controldata.
    -> If the standby promote between 3. and 4., the backup can not recovery.
    -> pg_control is that "Minimum recovery ending location" is equals 0/0.
    -> backup-end record is not written.
    What if we do #4 before #3? The backup gets corrupted? My guess is
    that the backup is still valid even if we copy pg_control before executing
    pg_stop_backup(). Which would not require #5 because if the standby
    promotion happens before pg_stop_backup(), pg_stop_backup() can
    detect that status change and cancel the backup.

    #5 looks fragile. If we can get rid of it, the procedure becomes more
    robust, I think.

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
  • Jun Ishiduka at Aug 18, 2011 at 5:49 am

    * Procedure

    1. Call pg_start_backup('x') on the standby.
    2. Take a backup of the data dir.
    3. Call pg_stop_backup() on the standby.
    4. Copy the control file on the standby to the backup.
    5. Check whether the control file is status during hot standby with pg_controldata.
    ? -> If the standby promote between 3. and 4., the backup can not recovery.
    ? ? ?-> pg_control is that "Minimum recovery ending location" is equals 0/0.
    ? ? ?-> backup-end record is not written.
    What if we do #4 before #3? The backup gets corrupted? My guess is
    that the backup is still valid even if we copy pg_control before executing
    pg_stop_backup(). Which would not require #5 because if the standby
    promotion happens before pg_stop_backup(), pg_stop_backup() can
    detect that status change and cancel the backup.

    #5 looks fragile. If we can get rid of it, the procedure becomes more
    robust, I think.
    Sure, you're right.

    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
  • Jun Ishiduka at Sep 12, 2011 at 6:48 am
    Hi, Created a patch in response to comments.


    * Procedure
    1. Call pg_start_backup('x') on hot standby.
    2. Take a backup of the data dir.
    3. Copy the control file on hot standby to the backup.
    4. Call pg_stop_backup() on hot standby.


    * Behavior
    (take backup)
    If we execute pg_start_backup() on hot standby then execute restartpoint,
    write a strings as "FROM: slave" in backup_label and change backup mode,
    but do not change full_page_writes into "on" forcibly.

    If we execute pg_stop_backup() on hot standby then rename backup_label
    and change backup mode, but neither write backup end record and history
    file nor wait to complete the WAL archiving.
    pg_stop_backup() is returned this MinRecoveryPoint as result.

    If we execute pg_stop_backup() on the server promoted then error
    message is output since read the backup_label.

    (recovery)
    If we recover with the backup taken on hot standby, MinRecoveryPoint in
    the control file copied by 3 of above-procedure is used instead of backup
    end record.

    If recovery starts as first, BackupEndPoint in the control file is written
    a same value as MinRecoveryPoint. This is for remembering the value of
    MinRecoveryPoint during recovery.

    HINT message("If this has ...") is always output when we recover with the
    backup taken on hot standby.


    * Problem
    full_page_writes's problem.
    This has the following two problems.
    * pg_start_backup() must set 'on' to full_page_writes of the master that
    is actual writing of the WAL, but not the standby.
    * The standby doesn't need to connect to the master that's actual writing
    WAL.
    (Ex. Standby2 in Cascade Replication: Master - Standby1 - Standby2) >
    I'm worried how I should clear these problems.
    Status: Considering
    (Latest: http://archives.postgresql.org/pgsql-hackers/2011-08/msg00880.php)


    Regards.


    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
  • Jun Ishiduka at Sep 13, 2011 at 7:02 am
    Update patch.

    Changes:
    * set 'on' full_page_writes by user (in document)
    * read "FROM: XX" in backup_label (in xlog.c)
    * check status when pg_stop_backup is executed (in xlog.c)
    Hi, Created a patch in response to comments.


    * Procedure
    1. Call pg_start_backup('x') on hot standby.
    2. Take a backup of the data dir.
    3. Copy the control file on hot standby to the backup.
    4. Call pg_stop_backup() on hot standby.


    * Behavior
    (take backup)
    If we execute pg_start_backup() on hot standby then execute restartpoint,
    write a strings as "FROM: slave" in backup_label and change backup mode,
    but do not change full_page_writes into "on" forcibly.

    If we execute pg_stop_backup() on hot standby then rename backup_label
    and change backup mode, but neither write backup end record and history
    file nor wait to complete the WAL archiving.
    pg_stop_backup() is returned this MinRecoveryPoint as result.

    If we execute pg_stop_backup() on the server promoted then error
    message is output since read the backup_label.

    (recovery)
    If we recover with the backup taken on hot standby, MinRecoveryPoint in
    the control file copied by 3 of above-procedure is used instead of backup
    end record.

    If recovery starts as first, BackupEndPoint in the control file is written
    a same value as MinRecoveryPoint. This is for remembering the value of
    MinRecoveryPoint during recovery.

    HINT message("If this has ...") is always output when we recover with the
    backup taken on hot standby.


    * Problem
    full_page_writes's problem.
    This has the following two problems.
    * pg_start_backup() must set 'on' to full_page_writes of the master that
    is actual writing of the WAL, but not the standby.
    * The standby doesn't need to connect to the master that's actual writing
    WAL.
    (Ex. Standby2 in Cascade Replication: Master - Standby1 - Standby2)

    I'm worried how I should clear these problems.
    Status: Considering
    (Latest: http://archives.postgresql.org/pgsql-hackers/2011-08/msg00880.php)


    Regards.


    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------

    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
  • Fujii Masao at Sep 21, 2011 at 2:50 am
    2011/9/13 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
    Update patch.

    Changes:
    * set 'on' full_page_writes by user (in document)
    * read "FROM: XX" in backup_label (in xlog.c)
    * check status when pg_stop_backup is executed (in xlog.c)
    Thanks for updating the patch.

    Before reviewing the patch, to encourage people to comment and
    review the patch, I explain what this patch provides:

    This patch provides the capability to take a base backup during recovery,
    i.e., from the standby server. This is very useful feature to offload the
    expense of periodic backups from the master. That backup procedure is
    similar to that during normal running, but slightly different:

    1. Execute pg_start_backup on the standby. To execute a query on the
    standby, hot standby must be enabled.

    2. Perform a file system backup on the standby.

    3. Copy the pg_control file from the cluster directory on the standby to
    the backup as follows:

    cp $PGDATA/global/pg_control /mnt/server/backupdir/global

    4. Execute pg_stop_backup on the standby.

    The backup taken by the above procedure is available for an archive
    recovery or standby server.

    If the standby is promoted during a backup, pg_stop_backup() detects
    the change of the server status and fails. The data backed up before the
    promotion is invalid and not available for recovery.

    Taking a backup from the standby by using pg_basebackup is still not
    possible. But we can relax that restriction after applying this patch.

    To take a base backup during recovery safely, some sort of parameters
    must be set properly. Hot standby must be enabled on the standby, i.e.,
    wal_level and hot_standby must be enabled on the master and the standby,
    respectively. FPW (full page writes) is required for a base backup,
    so full_page_writes must be enabled on the master.

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
  • Magnus Hagander at Sep 21, 2011 at 5:13 am

    On Wed, Sep 21, 2011 at 04:50, Fujii Masao wrote:
    2011/9/13 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
    Update patch.

    Changes:
    * set 'on' full_page_writes by user (in document)
    * read "FROM: XX" in backup_label (in xlog.c)
    * check status when pg_stop_backup is executed (in xlog.c)
    Thanks for updating the patch.

    Before reviewing the patch, to encourage people to comment and
    review the patch, I explain what this patch provides:

    This patch provides the capability to take a base backup during recovery,
    i.e., from the standby server. This is very useful feature to offload the
    expense of periodic backups from the master. That backup procedure is
    similar to that during normal running, but slightly different:

    1. Execute pg_start_backup on the standby. To execute a query on the
    standby, hot standby must be enabled.

    2. Perform a file system backup on the standby.

    3. Copy the pg_control file from the cluster directory on the standby to
    the backup as follows:

    cp $PGDATA/global/pg_control /mnt/server/backupdir/global
    But this is done as part of step 2 already. I assume what this really
    means is that the pg_control file must be the last file backed up?

    (Since there are certainly a lot other ways to do the backup than just
    cp to a mounted directory..)

    4. Execute pg_stop_backup on the standby.

    The backup taken by the above procedure is available for an archive
    recovery or standby server.

    If the standby is promoted during a backup, pg_stop_backup() detects
    the change of the server status and fails. The data backed up before the
    promotion is invalid and not available for recovery.

    Taking a backup from the standby by using pg_basebackup is still not
    possible. But we can relax that restriction after applying this patch.
    I think that this is going to be very important, particularly given
    the requirements on pt 3 above. (But yes, it certainly doesn't have to
    be done as part of this patch, but it really should be the plan to
    have this included in the same version)

    To take a base backup during recovery safely, some sort of parameters
    must be set properly. Hot standby must be enabled on the standby, i.e.,
    wal_level and hot_standby must be enabled on the master and the standby,
    respectively. FPW (full page writes) is required for a base backup,
    so full_page_writes must be enabled on the master.
    Presumably pg_start_backup() will check this. And we'll somehow track
    this before pg_stop_backup() as well? (for such evil things such as
    the user changing FPW from on to off and then back to on again during
    a backup, will will make it look correct both during start and stop,
    but incorrect in the middle - pg_stop_backup needs to fail in that
    case as well)
  • Fujii Masao at Sep 21, 2011 at 6:24 am

    On Wed, Sep 21, 2011 at 2:13 PM, Magnus Hagander wrote:
    On Wed, Sep 21, 2011 at 04:50, Fujii Masao wrote:
    3. Copy the pg_control file from the cluster directory on the standby to
    the backup as follows:

    cp $PGDATA/global/pg_control /mnt/server/backupdir/global
    But this is done as part of step 2 already. I assume what this really
    means is that the pg_control file must be the last file backed up?
    Yes.

    When we perform an archive recovery from the backup taken during
    normal processing, we gets a backup end location from the backup-end
    WAL record which was written by pg_stop_backup(). But since no WAL
    writing is allowed during recovery, pg_stop_backup() on the standby
    cannot write a backup-end WAL record. So, in his patch, instead of
    a backup-end WAL record, the startup process uses the minimum
    recovery point recorded in pg_control which has been included in the
    backup, as a backup end location. BTW, a backup end location is
    used to check whether recovery has reached a consistency state
    (i.e., end-of-backup).

    To use the minimum recovery point in pg_control as a backup end
    location safely, pg_control must be backed up last. Otherwise, data
    page which has the newer LSN than the minimum recovery point
    might be included in the backup.
    (Since there are certainly a lot other ways to do the backup than just
    cp to a mounted directory..)
    Yes. The above command I described is just an example.
    4. Execute pg_stop_backup on the standby.

    The backup taken by the above procedure is available for an archive
    recovery or standby server.

    If the standby is promoted during a backup, pg_stop_backup() detects
    the change of the server status and fails. The data backed up before the
    promotion is invalid and not available for recovery.

    Taking a backup from the standby by using pg_basebackup is still not
    possible. But we can relax that restriction after applying this patch.
    I think that this is going to be very important, particularly given
    the requirements on pt 3 above. (But yes, it certainly doesn't have to
    be done as part of this patch, but it really should be the plan to
    have this included in the same version)
    Agreed.
    To take a base backup during recovery safely, some sort of parameters
    must be set properly. Hot standby must be enabled on the standby, i.e.,
    wal_level and hot_standby must be enabled on the master and the standby,
    respectively. FPW (full page writes) is required for a base backup,
    so full_page_writes must be enabled on the master.
    Presumably pg_start_backup() will check this. And we'll somehow track
    this before pg_stop_backup() as well? (for such evil things such as
    the user changing FPW from on to off and then back to on again during
    a backup, will will make it look correct both during start and stop,
    but incorrect in the middle - pg_stop_backup needs to fail in that
    case as well)
    Right. As I suggested upthread, to address that problem, we need to log
    the change of FPW on the master, and then we need to check whether
    such a WAL is replayed on the standby during the backup. If it's done,
    pg_stop_backup() should emit an error.

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
  • Magnus Hagander at Sep 21, 2011 at 8:34 am

    On Wed, Sep 21, 2011 at 08:23, Fujii Masao wrote:
    On Wed, Sep 21, 2011 at 2:13 PM, Magnus Hagander wrote:
    On Wed, Sep 21, 2011 at 04:50, Fujii Masao wrote:
    3. Copy the pg_control file from the cluster directory on the standby to
    the backup as follows:

    cp $PGDATA/global/pg_control /mnt/server/backupdir/global
    But this is done as part of step 2 already. I assume what this really
    means is that the pg_control file must be the last file backed up?
    Yes.

    When we perform an archive recovery from the backup taken during
    normal processing, we gets a backup end location from the backup-end
    WAL record which was written by pg_stop_backup(). But since no WAL
    writing is allowed during recovery, pg_stop_backup() on the standby
    cannot write a backup-end WAL record. So, in his patch, instead of
    a backup-end WAL record, the startup process uses the minimum
    recovery point recorded in pg_control which has been included in the
    backup, as a backup end location. BTW, a backup end location is
    used to check whether recovery has reached a consistency state
    (i.e., end-of-backup).

    To use the minimum recovery point in pg_control as a backup end
    location safely, pg_control must be backed up last. Otherwise, data
    page which has the newer LSN than the minimum recovery point
    might be included in the backup.
    Ah, check.

    (Since there are certainly a lot other ways to do the backup than just
    cp to a mounted directory..)
    Yes. The above command I described is just an example.
    ok.

    4. Execute pg_stop_backup on the standby.

    The backup taken by the above procedure is available for an archive
    recovery or standby server.

    If the standby is promoted during a backup, pg_stop_backup() detects
    the change of the server status and fails. The data backed up before the
    promotion is invalid and not available for recovery.

    Taking a backup from the standby by using pg_basebackup is still not
    possible. But we can relax that restriction after applying this patch.
    I think that this is going to be very important, particularly given
    the requirements on pt 3 above. (But yes, it certainly doesn't have to
    be done as part of this patch, but it really should be the plan to
    have this included in the same version)
    Agreed.
    To take a base backup during recovery safely, some sort of parameters
    must be set properly. Hot standby must be enabled on the standby, i.e.,
    wal_level and hot_standby must be enabled on the master and the standby,
    respectively. FPW (full page writes) is required for a base backup,
    so full_page_writes must be enabled on the master.
    Presumably pg_start_backup() will check this. And we'll somehow track
    this before pg_stop_backup() as well? (for such evil things such as
    the user changing FPW from on to off and then back to on again during
    a backup, will will make it look correct both during start and stop,
    but incorrect in the middle - pg_stop_backup needs to fail in that
    case as well)
    Right. As I suggested upthread, to address that problem, we need to log
    the change of FPW on the master, and then we need to check whether
    such a WAL is replayed on the standby during the backup. If it's done,
    pg_stop_backup() should emit an error.
    I somehow missed this thread completely, so I didn't catch your
    previous comments - oops, sorry. The important point being that we
    need to track if when this happens even if it has been reset to a
    valid value. So we can't just check the state of the variable at the
    beginning and at the end.
  • Fujii Masao at Sep 22, 2011 at 12:13 pm

    On Wed, Sep 21, 2011 at 5:34 PM, Magnus Hagander wrote:
    On Wed, Sep 21, 2011 at 08:23, Fujii Masao wrote:
    On Wed, Sep 21, 2011 at 2:13 PM, Magnus Hagander wrote:
    Presumably pg_start_backup() will check this. And we'll somehow track
    this before pg_stop_backup() as well? (for such evil things such as
    the user changing FPW from on to off and then back to on again during
    a backup, will will make it look correct both during start and stop,
    but incorrect in the middle - pg_stop_backup needs to fail in that
    case as well)
    Right. As I suggested upthread, to address that problem, we need to log
    the change of FPW on the master, and then we need to check whether
    such a WAL is replayed on the standby during the backup. If it's done,
    pg_stop_backup() should emit an error.
    I somehow missed this thread completely, so I didn't catch your
    previous comments - oops, sorry. The important point being that we
    need to track if when this happens even if it has been reset to a
    valid value. So we can't just check the state of the variable at the
    beginning and at the end.
    Right. Let me explain again what I'm thinking.

    When FPW is changed, the master always writes the WAL record
    which contains the current value of FPW. This means that the standby
    can track all changes of FPW by reading WAL records.

    The standby has two flags: One indicates whether FPW has always
    been TRUE since last restartpoint. Another indicates whether FPW
    has always been TRUE since last pg_start_backup(). The standby
    can maintain those flags by reading WAL records streamed from
    the master.

    If the former flag indicates FALSE (i.e., the WAL records which
    the standby has replayed since last restartpoint might not contain
    required FPW), pg_start_backup() fails. If the latter flag indicates
    FALSE (i.e., the WAL records which the standby has replayed
    during the backup might not contain required FPW),
    pg_stop_backup() fails.

    If I'm not missing something, this approach can address the problem
    which you're concerned about.

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
  • Magnus Hagander at Sep 22, 2011 at 3:44 pm

    On Thu, Sep 22, 2011 at 14:13, Fujii Masao wrote:
    On Wed, Sep 21, 2011 at 5:34 PM, Magnus Hagander wrote:
    On Wed, Sep 21, 2011 at 08:23, Fujii Masao wrote:
    On Wed, Sep 21, 2011 at 2:13 PM, Magnus Hagander wrote:
    Presumably pg_start_backup() will check this. And we'll somehow track
    this before pg_stop_backup() as well? (for such evil things such as
    the user changing FPW from on to off and then back to on again during
    a backup, will will make it look correct both during start and stop,
    but incorrect in the middle - pg_stop_backup needs to fail in that
    case as well)
    Right. As I suggested upthread, to address that problem, we need to log
    the change of FPW on the master, and then we need to check whether
    such a WAL is replayed on the standby during the backup. If it's done,
    pg_stop_backup() should emit an error.
    I somehow missed this thread completely, so I didn't catch your
    previous comments - oops, sorry. The important point being that we
    need to track if when this happens even if it has been reset to a
    valid value. So we can't just check the state of the variable at the
    beginning and at the end.
    Right. Let me explain again what I'm thinking.

    When FPW is changed, the master always writes the WAL record
    which contains the current value of FPW. This means that the standby
    can track all changes of FPW by reading WAL records.

    The standby has two flags: One indicates whether FPW has always
    been TRUE since last restartpoint. Another indicates whether FPW
    has always been TRUE since last pg_start_backup(). The standby
    can maintain those flags by reading WAL records streamed from
    the master.

    If the former flag indicates FALSE (i.e., the WAL records which
    the standby has replayed since last restartpoint might not contain
    required FPW), pg_start_backup() fails. If the latter flag indicates
    FALSE (i.e., the WAL records which the standby has replayed
    during the backup might not contain required FPW),
    pg_stop_backup() fails.

    If I'm not missing something, this approach can address the problem
    which you're concerned about.
    Yeah, it sounds safe to me.

    Would it make sense for pg_start_backup() to have the ability to wait
    for the next restartpoint in a case like this, if we know that FPW has
    been set? Instead of failing? Or maybe that's just overcomplicating
    things when trying to be user-friendly.
  • Fujii Masao at Sep 26, 2011 at 12:19 pm

    On Fri, Sep 23, 2011 at 12:44 AM, Magnus Hagander wrote:
    Would it make sense for pg_start_backup() to have the ability to wait
    for the next restartpoint in a case like this, if we know that FPW has
    been set? Instead of failing? Or maybe that's just overcomplicating
    things when trying to be user-friendly.
    I don't think that it's worth adding code for such a feature. Because I believe
    there are not many users who enable FPW on-the-fly for standby-only backup
    and use such a feature.

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
  • Josh Berkus at Sep 21, 2011 at 4:52 pm
    Fujii,

    I haven't really been following your latest patches about taking backups
    from the standby and cascading replication, but I wanted to see if it
    fulfills another TODO: the ability to "remaster" (that is, designate the
    "lead standby" as the new master) without needing to copy WAL files.

    Supporting remastering using steaming replication only was on your TODO
    list when we closed 9.1. It seems like this would get solved as a
    side-effect, but I wanted to confirm that.

    --
    Josh Berkus
    PostgreSQL Experts Inc.
    http://pgexperts.com
  • Fujii Masao at Sep 26, 2011 at 8:07 am

    On Thu, Sep 22, 2011 at 1:52 AM, Josh Berkus wrote:
    Fujii,

    I haven't really been following your latest patches about taking backups
    from the standby and cascading replication, but I wanted to see if it
    fulfills another TODO: the ability to "remaster" (that is, designate the
    "lead standby" as the new master) without needing to copy WAL files.
    Sorry, I could not follow you. I believe that we can "remaster" even in 9.1.
    When the master crashes, we can choose the "lead standby" by comparing
    each standby replay location, and can promote it by pg_ctl promote.

    What "remaster" feature are you expecting we should develop in 9.2?

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
  • Fujii Masao at Sep 22, 2011 at 1:25 pm

    On Wed, Sep 21, 2011 at 11:50 AM, Fujii Masao wrote:
    2011/9/13 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
    Update patch.

    Changes:
    * set 'on' full_page_writes by user (in document)
    * read "FROM: XX" in backup_label (in xlog.c)
    * check status when pg_stop_backup is executed (in xlog.c)
    Thanks for updating the patch.

    Before reviewing the patch, to encourage people to comment and
    review the patch, I explain what this patch provides:
    Attached is the updated version of the patch. I refactored the code, fixed
    some bugs, added lots of source code comments, improved the document,
    but didn't change the basic design. Please check this patch, and let's use
    this patch as the base if you agree with that.

    In the current patch, there is no safeguard for preventing users from
    taking backup during recovery when FPW is disabled. This is unsafe.
    Are you planning to implement such a safeguard?

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
  • Steve Singer at Sep 26, 2011 at 2:44 am

    On 11-09-22 09:24 AM, Fujii Masao wrote:
    On Wed, Sep 21, 2011 at 11:50 AM, Fujii Masaowrote:
    2011/9/13 Jun Ishiduka<ishizuka.jun@po.ntts.co.jp>:
    Update patch.

    Changes:
    * set 'on' full_page_writes by user (in document)
    * read "FROM: XX" in backup_label (in xlog.c)
    * check status when pg_stop_backup is executed (in xlog.c)
    Thanks for updating the patch.

    Before reviewing the patch, to encourage people to comment and
    review the patch, I explain what this patch provides:
    Attached is the updated version of the patch. I refactored the code, fixed
    some bugs, added lots of source code comments, improved the document,
    but didn't change the basic design. Please check this patch, and let's use
    this patch as the base if you agree with that.
    I have looked at both Jun's patch from Sept 13 and Fujii's updates to
    the patch. I agree that Fujii's updated version should be used as the
    basis for changes going forward. My comments below refer to that
    version (unless otherwise noted).


    In backup.sgml the new section titled "Making a Base Backup during
    Recovery" I would prefer to see some mention in the title that this
    procedure is for standby servers ie "Making a Base Backup from a Standby
    Database". Users who have setup a hot-standby database should be
    familiar with the 'standby' terminology. I agree that the "during
    recovery" description is technically correct but I'm not sure someone
    who is looking through the manual for instructions on making a base
    backup from here standby will realize this is the section they should read.

    Around line 969 where you give an example of copying the control file I
    would be a bit clearer that this is an example command. Ie (Copy the
    pg_control file from the cluster directory to the global sub-directory
    of the backup. For example "cp $PGDATA/global/pg_control
    /mnt/server/backupdir/global")


    Testing Notes
    -----------------------------

    I created a standby server from a base backup of another standby server.
    On this new standby server I then

    1. Ran pg_start_backup('3'); and left the psql connection open
    2. touch /tmp/3 -- my trigger_file

    ssinger@ssinger-laptop:/usr/local/pgsql92git/bin$ LOG: trigger file
    found: /tmp/3
    FATAL: terminating walreceiver process due to administrator command
    LOG: restored log file "000000010000000000000006" from archive
    LOG: record with zero length at 0/60002F0
    LOG: restored log file "000000010000000000000006" from archive
    LOG: redo done at 0/6000298
    LOG: restored log file "000000010000000000000006" from archive
    PANIC: record with zero length at 0/6000298
    LOG: startup process (PID 19011) was terminated by signal 6: Aborted
    LOG: terminating any other active server processes
    WARNING: terminating connection because of crash of another server process
    DETAIL: The postmaster has commanded this server process to roll back
    the current transaction and exit, because another server process exited
    abnormally and possibly corrupted shared memory.
    HINT: In a moment you should be able to reconnect to the database and
    repeat your command.

    The new postmaster (the one trying to be promoted) dies. This is
    somewhat repeatable.

    ----

    If a base backup is in progress on a recovery database and that recovery
    database is promoted to master, following the promotion (if you don't
    restart the postmaster). I see
    select pg_stop_backup();
    ERROR: database system status mismatches between pg_start_backup() and
    pg_stop_backup()

    If you restart the postmaster this goes away. When the postmaster
    leaves recovery mode I think it should abort an existing base backup so
    pg_stop_backup() will say no backup in progress, or give an error
    message on pg_stop_backup() saying that the base backup won't be
    usable. The above error doesn't really tell the user why there is a
    mismatch.

    ---------

    In my testing a few times I got into a situation where a standby server
    coming from a recovery target took a while to finish recovery (this is
    on a database with no activity). Then when i tried promoting that
    server to master I got

    LOG: trigger file found: /tmp/3
    FATAL: terminating walreceiver process due to administrator command
    LOG: restored log file "000000010000000000000009" from archive
    LOG: restored log file "000000010000000000000009" from archive
    LOG: redo done at 0/90000E8
    LOG: restored log file "000000010000000000000009" from archive
    PANIC: unexpected pageaddr 0/6000000 in log file 0, segment 9, offset 0
    LOG: startup process (PID 1804) was terminated by signal 6: Aborted
    LOG: terminating any other active server processes


    It is *possible* I mixed up the order of a step somewhere since my
    testing isn't script based. A standby server that 'looks' okay but can't
    actually be promoted is dangerous.

    This version of the patch (I was testing the Sept 22nd version) seems
    less stable than how I remember the version from the July CF. Maybe I'm
    just testing it harder or maybe something has been broken.


    In the current patch, there is no safeguard for preventing users from
    taking backup during recovery when FPW is disabled. This is unsafe.
    Are you planning to implement such a safeguard?
    I agree with Fujii that we need a way (on the recovery machine) to
    detect if the master doesn't have FPW on. The ideas up-thread on how to
    do this sound good.

    Regards,


  • Fujii Masao at Sep 27, 2011 at 2:56 am

    On Mon, Sep 26, 2011 at 11:39 AM, Steve Singer wrote:
    I have looked at both Jun's patch from Sept 13 and Fujii's updates to the
    patch.  I agree that Fujii's updated version should be used as the basis for
    changes going forward.   My comments below refer to that version (unless
    otherwise noted).
    Thanks for the tests and comments!
    In backup.sgml  the new section titled "Making a Base Backup during
    Recovery"  I would prefer to see some mention in the title that this
    procedure is for standby servers ie "Making a Base Backup from a Standby
    Database".  Users who have setup a hot-standby database should be familiar
    with the 'standby' terminology. I agree that the "during recovery"
    description is technically correct but I'm not sure someone who is looking
    through the manual for instructions on making a base backup from here
    standby will realize this is the section they should read.
    I used the term "recovery" rather than "standby" because we can take
    a backup even from the server in normal archive recovery mode but not
    standby mode. But there is not many users who take a backup during
    normal archive recovery, so I agree that the term "standby" is better to
    be used in the document. Will change.
    Around line 969 where you give an example of copying the control file I
    would be a bit clearer that this is an example command.  Ie (Copy the
    pg_control file from the cluster directory to the global sub-directory of
    the backup.  For example "cp $PGDATA/global/pg_control
    /mnt/server/backupdir/global")
    Looks better. Will change.
    Testing Notes
    -----------------------------

    I created a standby server from a base backup of another standby server. On
    this new standby server I then

    1. Ran pg_start_backup('3'); and left the psql connection open
    2. touch /tmp/3 -- my trigger_file

    ssinger@ssinger-laptop:/usr/local/pgsql92git/bin$ LOG:  trigger file found:
    /tmp/3
    FATAL:  terminating walreceiver process due to administrator command
    LOG:  restored log file "000000010000000000000006" from archive
    LOG:  record with zero length at 0/60002F0
    LOG:  restored log file "000000010000000000000006" from archive
    LOG:  redo done at 0/6000298
    LOG:  restored log file "000000010000000000000006" from archive
    PANIC:  record with zero length at 0/6000298
    LOG:  startup process (PID 19011) was terminated by signal 6: Aborted
    LOG:  terminating any other active server processes
    WARNING:  terminating connection because of crash of another server process
    DETAIL:  The postmaster has commanded this server process to roll back the
    current transaction and exit, because another server process exited
    abnormally and possibly corrupted shared memory.
    HINT:  In a moment you should be able to reconnect to the database and
    repeat your command.

    The new postmaster (the one trying to be promoted) dies.  This is somewhat
    repeatable.
    Looks weired. Though the WAL record starting from 0/6000298 was read
    successfully, then re-fetch of the same record fails at the end of recovery.
    One possible cause is the corruption of archived WAL file. What
    restore_command on the standby and archive_command on the master
    are you using? Could you confirm that there is no chance to overwrite
    archive WAL files in your environment?

    I tried to reproduce this problem several times, but I could not. Could
    you provide the test case which reproduces the problem?
    If a base backup is in progress on a recovery database and that recovery
    database is promoted to master, following the promotion (if you don't
    restart the postmaster).  I see
    select pg_stop_backup();
    ERROR:  database system status mismatches between pg_start_backup() and
    pg_stop_backup()

    If you restart the postmaster this goes away.  When the postmaster leaves
    recovery mode I think it should abort an existing base backup so
    pg_stop_backup() will say no backup in progress,
    I don't think that it's good idea to cancel the backup when promoting
    the standby.
    Because if we do so, we need to handle correctly the case where cancel of backup
    and pg_start_backup/pg_stop_backup are performed at the same time. We can
    simply do that by protecting those whole operations including pg_start_backup's
    checkpoint by the lwlock. But I don't think that it's worth
    introducing new lwlock
    only for that. And it's not good to take a lwlock through
    time-consuming checkpoint
    operation. Of course we can avoid such a lwlock, but which would require more
    complicated code.
    or give an error message on
    pg_stop_backup() saying that the base backup won't be usable.  The above
    error doesn't really tell the user why there is a mismatch.
    What about the following error message?

    ERROR: pg_stop_backup() was executed during normal processing though
    pg_start_backup() was executed during recovery
    HINT: The database backup will not be usable.

    Or, you have better idea?
    In my testing a few times I got into a situation where a standby server
    coming from a recovery target took a while to finish recovery (this is on a
    database with no activity).  Then when i tried promoting that server to
    master I got

    LOG:  trigger file found: /tmp/3
    FATAL:  terminating walreceiver process due to administrator command
    LOG:  restored log file "000000010000000000000009" from archive
    LOG:  restored log file "000000010000000000000009" from archive
    LOG:  redo done at 0/90000E8
    LOG:  restored log file "000000010000000000000009" from archive
    PANIC:  unexpected pageaddr 0/6000000 in log file 0, segment 9, offset
    LOG:  startup process (PID 1804) was terminated by signal 6: Aborted
    LOG:  terminating any other active server processes

    It is *possible* I mixed up the order of a step somewhere since my testing
    isn't script based. A standby server that 'looks' okay but can't actually be
    promoted is dangerous.
    Looks the same problem as the above. Another weired point is that
    the same archived WAL file is restored two times before redo is done.
    I'm not sure why this happens... Could you provide the test case which
    reproduces this problem? Will diagnose.

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
  • Fujii Masao at Sep 27, 2011 at 5:51 am

    On Tue, Sep 27, 2011 at 11:56 AM, Fujii Masao wrote:
    In backup.sgml  the new section titled "Making a Base Backup during
    Recovery"  I would prefer to see some mention in the title that this
    procedure is for standby servers ie "Making a Base Backup from a Standby
    Database".  Users who have setup a hot-standby database should be familiar
    with the 'standby' terminology. I agree that the "during recovery"
    description is technically correct but I'm not sure someone who is looking
    through the manual for instructions on making a base backup from here
    standby will realize this is the section they should read.
    I used the term "recovery" rather than "standby" because we can take
    a backup even from the server in normal archive recovery mode but not
    standby mode. But there is not many users who take a backup during
    normal archive recovery, so I agree that the term "standby" is better to
    be used in the document. Will change.
    Done.
    Around line 969 where you give an example of copying the control file I
    would be a bit clearer that this is an example command.  Ie (Copy the
    pg_control file from the cluster directory to the global sub-directory of
    the backup.  For example "cp $PGDATA/global/pg_control
    /mnt/server/backupdir/global")
    Looks better. Will change.
    Done.
    or give an error message on
    pg_stop_backup() saying that the base backup won't be usable.  The above
    error doesn't really tell the user why there is a mismatch.
    What about the following error message?

    ERROR:  pg_stop_backup() was executed during normal processing though
    pg_start_backup() was executed during recovery
    HINT:  The database backup will not be usable.
    Done. I attached the new version of the patch.

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
  • Jun Ishiduka at Oct 9, 2011 at 6:11 pm
    I created a patch corresponding FPW.
    Fujii's patch (ver 9) is based.

    Manage own FPW in shared-memory (on master)
    * startup and walwriter process update it. startup initializes it
    after REDO. walwriter updates it when started or received SIGHUP.

    Insert WAL including a value of current FPW (on master)
    * In the the same timing as update, they insert WAL (is named
    XLOG_FPW_CHANGE). XLOG_FPW_CHANGE has a value of the changed FPW.
    * When it creates CHECKPOINT, it adds a value of current FPW to the
    CHECKPOINT WAL.

    Manage master's FPW in local-memory in startup (on standby)
    * It takes a value of the master's FPW by reading XLOG_FPW_CHANGE at
    REDO.

    Check when pg_start_backup/pg_stop_backup (on standby)
    * It checks to use these two value.
    * master's FPW at latest CHECKPOINT
    * current master's FPW by XLOG_FPW_CHANGE

    Regards.


    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
  • Simon Riggs at Oct 9, 2011 at 6:56 pm

    2011/10/9 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:

    Insert WAL including a value of current FPW (on master)
    * In the the same timing as update, they insert WAL (is named
    XLOG_FPW_CHANGE). XLOG_FPW_CHANGE has a value of the changed FPW.
    * When it creates CHECKPOINT, it adds a value of current FPW to the
    CHECKPOINT WAL.
    I can't see a reason why we would use a new WAL record for this,
    rather than modify the XLOG_PARAMETER_CHANGE record type which was
    created for a very similar reason.
    The code would be much simpler if we just extend
    XLOG_PARAMETER_CHANGE, so please can we do that?

    The log message "full_page_writes on master is set invalid more than
    once during online backup" should read "at least once" rather than
    "more than once".

    lastFpwDisabledLSN needs to be initialized.

    Is there a reason to add lastFpwDisabledLSN onto the Control file? If
    we log parameters after every checkpoint then we'll know the values
    when we startup. If we keep logging parameters this way we'll end up
    with a very awkward and large control file. I would personally prefer
    to avoid that, but that thought could go either way. Let's see if
    anyone else thinks that also.

    Looks good.

    --
    Simon Riggs                   http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
  • Jun Ishiduka at Oct 11, 2011 at 10:19 am

    I can't see a reason why we would use a new WAL record for this,
    rather than modify the XLOG_PARAMETER_CHANGE record type which was
    created for a very similar reason.
    The code would be much simpler if we just extend
    XLOG_PARAMETER_CHANGE, so please can we do that? Sure.
    The log message "full_page_writes on master is set invalid more than
    once during online backup" should read "at least once" rather than
    "more than once". Yes.
    lastFpwDisabledLSN needs to be initialized.
    I think it don't need because all values in XLogCtl is initialized 0.
    Is there a reason to add lastFpwDisabledLSN onto the Control file? If
    we log parameters after every checkpoint then we'll know the values
    when we startup. If we keep logging parameters this way we'll end up
    with a very awkward and large control file. I would personally prefer
    to avoid that, but that thought could go either way. Let's see if
    anyone else thinks that also.
    Yes. I add to CreateCheckPoint().

    Image:
    CreateCheckPoint()
    {
    if (!shutdown && XLogStandbyInfoActive())
    {
    LogStandbySnapshot()
    XLogReportParameters()
    }
    }

    XLogReportParameters()
    {
    if (fpw == 'off' || ... )
    XLOGINSERT()
    }

    However, it'll write XLOG_PARAMETER_CHANGE every checkpoints when FPW is 'off'.
    (It will increases the amount of WAL.)
    Is it OK?


    Regards.

    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
  • Jun Ishiduka at Oct 11, 2011 at 3:19 pm

    I can't see a reason why we would use a new WAL record for this,
    rather than modify the XLOG_PARAMETER_CHANGE record type which was
    created for a very similar reason.
    The code would be much simpler if we just extend
    XLOG_PARAMETER_CHANGE, so please can we do that? Sure.
    The log message "full_page_writes on master is set invalid more than
    once during online backup" should read "at least once" rather than
    "more than once". Yes.
    lastFpwDisabledLSN needs to be initialized.
    I think it don't need because all values in XLogCtl is initialized 0.
    Is there a reason to add lastFpwDisabledLSN onto the Control file? If
    we log parameters after every checkpoint then we'll know the values
    when we startup. If we keep logging parameters this way we'll end up
    with a very awkward and large control file. I would personally prefer
    to avoid that, but that thought could go either way. Let's see if
    anyone else thinks that also.
    Yes. I add to CreateCheckPoint().

    Image:
    CreateCheckPoint()
    {
    if (!shutdown && XLogStandbyInfoActive())
    {
    LogStandbySnapshot()
    XLogReportParameters()
    }
    }

    XLogReportParameters()
    {
    if (fpw == 'off' || ... )
    XLOGINSERT()
    }

    However, it'll write XLOG_PARAMETER_CHANGE every checkpoints when FPW is 'off'.
    (It will increases the amount of WAL.)
    Is it OK?
    Done.

    Updated patch attached.

    Regards.

    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
  • Steve Singer at Oct 11, 2011 at 9:44 pm

    On 11-10-11 11:17 AM, Jun Ishiduka wrote:
    Done.

    Updated patch attached.
    I have taken Jun's latest patch and applied it on top of Fujii's most
    recent patch. I did some testing with the result but nothing theory
    enough to stumble on any race conditions.

    Some testing notes
    ------------------------------
    select pg_start_backup('x');
    ERROR: full_page_writes on master is set invalid at least once since
    latest checkpoint

    I think this error should be rewritten as
    ERROR: full_page_writes on master has been off at some point since
    latest checkpoint

    We should be using 'off' instead of 'invalid' since that is what is what
    the user sets it to.


    I switched full_page_writes=on , on the master

    did a pg_start_backup() on the slave1.

    Then I switched full_page_writes=off on the master, did a reload +
    checkpoint.

    I was able to then do my backup of slave1, copy the control file, and
    pg_stop_backup().
    When I did the test slave2 started okay, but is this safe? Do we need a
    warning from pg_stop_backup() that is printed if it is detected that
    full_page_writes was turned off on the master during the backup period?


    Code Notes
    ---------------------
    *** 6865,6870 ****
    --- 6871,6886 ----
    /* Pre-scan prepared transactions to find out the range of XIDs present */
    oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);

    + /*
    + * The startup updates FPW in shaerd-memory after REDO. However, it must
    + * perform before writing the WAL of the CHECKPOINT. The reason is that
    + * it uses a value of fpw in shared-memory when it writes a WAL of its
    + * CHECKPOTNT.
    + */

    Minor typo above at 'CHECKPOTNT'



    If my concern about full page writes being switched to off in the middle
    of a backup is unfounded then I think this patch is ready for a
    committer. They can clean the two editorial changes when they apply the
    patches.

    If do_pg_stop_backup is going to need some logic to recheck the full
    page write status then an updated patch is required.




    Regards.

    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------

  • Jun Ishiduka at Oct 12, 2011 at 2:46 am

    Some testing notes
    ------------------------------
    select pg_start_backup('x');
    ERROR: full_page_writes on master is set invalid at least once since
    latest checkpoint

    I think this error should be rewritten as
    ERROR: full_page_writes on master has been off at some point since
    latest checkpoint

    We should be using 'off' instead of 'invalid' since that is what is what
    the user sets it to.
    Sure.

    I switched full_page_writes=on , on the master

    did a pg_start_backup() on the slave1.

    Then I switched full_page_writes=off on the master, did a reload +
    checkpoint.

    I was able to then do my backup of slave1, copy the control file, and
    pg_stop_backup().

    When I did the test slave2 started okay, but is this safe? Do we need a
    warning from pg_stop_backup() that is printed if it is detected that
    full_page_writes was turned off on the master during the backup period?
    I also reproduced.

    pg_stop_backup() fails in most cases.
    However, it succeeds if both the following cases are true.
    * checkpoint is done before walwriter recieves SIGHUP.
    * slave1 has not received the WAL of 'off' by SIGHUP yet.


    Minor typo above at 'CHECKPOTNT'
    Yes.

    If my concern about full page writes being switched to off in the middle
    of a backup is unfounded then I think this patch is ready for a
    committer. They can clean the two editorial changes when they apply the
    patches.
    Yes. I'll clean since these comments fix.

    If do_pg_stop_backup is going to need some logic to recheck the full
    page write status then an updated patch is required.
    It already contains.


    Regards.

    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
  • Jun Ishiduka at Oct 12, 2011 at 7:29 am

    Some testing notes
    ------------------------------
    select pg_start_backup('x');
    ERROR: full_page_writes on master is set invalid at least once since
    latest checkpoint

    I think this error should be rewritten as
    ERROR: full_page_writes on master has been off at some point since
    latest checkpoint

    We should be using 'off' instead of 'invalid' since that is what is what
    the user sets it to. Sure.
    Minor typo above at 'CHECKPOTNT'
    Yes.

    I updated to patch corresponded above-comments.

    Regards.


    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
  • Fujii Masao at Oct 12, 2011 at 7:54 am

    2011/10/12 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
    ERROR: full_page_writes on master is set invalid at least once since
    latest checkpoint

    I think this error should be rewritten as
    ERROR: full_page_writes on master has been off at some point since
    latest checkpoint

    We should be using 'off' instead of 'invalid' since that is what is what
    the user sets it to.
    Sure.
    What about the following message? It sounds more precise to me.

    ERROR: WAL generated with full_page_writes=off was replayed since last
    restartpoint
    I updated to patch corresponded above-comments.
    Thanks for updating the patch! Here are the comments:

    * don't yet have the insert lock, forcePageWrites could change under us,
    * but we'll recheck it once we have the lock.
    */
    - doPageWrites = fullPageWrites || Insert->forcePageWrites;
    + doPageWrites = Insert->fullPageWrites || Insert->forcePageWrites;

    The source comment needs to be modified.

    * just turned off, we could recompute the record without full pages, but
    * we choose not to bother.)
    */
    - if (Insert->forcePageWrites && !doPageWrites)
    + if ((Insert->fullPageWrites || Insert->forcePageWrites) && !doPageWrites)

    Same as above.

    + LWLockAcquire(WALInsertLock, LW_EXCLUSIVE);
    + XLogCtl->Insert.fullPageWrites = fullPageWrites;
    + LWLockRelease(WALInsertLock);

    I don't think WALInsertLock needs to be hold here because there is no
    concurrently running process which can access Insert.fullPageWrites.
    For example, Insert->currpos and Insert->LogwrtResult are also changed
    without the lock there.

    The source comment of XLogReportParameters() needs to be modified.

    XLogReportParameters() should skip writing WAL if full_page_writes has not been
    changed by SIGHUP.

    XLogReportParameters() should skip updating pg_control if any parameter related
    to hot standby has not been changed.

    + if (!fpw_manager)
    + {
    + LWLockAcquire(WALInsertLock, LW_EXCLUSIVE);
    + fpw = XLogCtl->Insert.fullPageWrites;
    + LWLockRelease(WALInsertLock);

    It's safe to take WALInsertLock with shared mode here.

    In checkpoint, XLogReportParameters() is called only when wal_level is
    hot_standby.
    OTOH, in walwriter, it's always called even when wal_level is not hot_standby.
    Can't we skip calling XLogReportParameters() whenever wal_level is not
    hot_standby?

    In do_pg_start_backup() and do_pg_stop_backup(), the spinlock must be held to
    see XLogCtl->lastFpwDisabledLSN.

    + /* check whether the master's FPW is 'off' since pg_start_backup. */
    + if (recovery_in_progress && XLByteLE(startpoint, XLogCtl->lastFpwDisabledLSN))
    + ereport(ERROR,
    + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
    + errmsg("full_page_writes on master has been off at some point
    during online backup")));

    What about changing the error message to:
    ERROR: WAL generated with full_page_writes=off was replayed during online backup

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
  • Jun Ishiduka at Oct 13, 2011 at 4:33 am

    ERROR: full_page_writes on master is set invalid at least once since
    latest checkpoint

    I think this error should be rewritten as
    ERROR: full_page_writes on master has been off at some point since
    latest checkpoint

    We should be using 'off' instead of 'invalid' since that is what is what
    the user sets it to.
    Sure.
    What about the following message? It sounds more precise to me.

    ERROR: WAL generated with full_page_writes=off was replayed since last
    restartpoint
    Okay, I changes the patch to this messages.
    If someone says there is a idea better than it, I will consider again.

    I updated to patch corresponded above-comments.
    Thanks for updating the patch! Here are the comments:

    * don't yet have the insert lock, forcePageWrites could change under us,
    * but we'll recheck it once we have the lock.
    */
    - doPageWrites = fullPageWrites || Insert->forcePageWrites;
    + doPageWrites = Insert->fullPageWrites || Insert->forcePageWrites;

    The source comment needs to be modified.

    * just turned off, we could recompute the record without full pages, but
    * we choose not to bother.)
    */
    - if (Insert->forcePageWrites && !doPageWrites)
    + if ((Insert->fullPageWrites || Insert->forcePageWrites) && !doPageWrites)

    Same as above.
    Sure.

    XLogReportParameters() should skip writing WAL if full_page_writes has not been
    changed by SIGHUP.

    XLogReportParameters() should skip updating pg_control if any parameter related
    to hot standby has not been changed.
    YES.

    In checkpoint, XLogReportParameters() is called only when wal_level is
    hot_standby.
    OTOH, in walwriter, it's always called even when wal_level is not hot_standby.
    Can't we skip calling XLogReportParameters() whenever wal_level is not
    hot_standby?
    Yes, It is possible.

    In do_pg_start_backup() and do_pg_stop_backup(), the spinlock must be held to
    see XLogCtl->lastFpwDisabledLSN.
    Yes.

    What about changing the error message to:
    ERROR: WAL generated with full_page_writes=off was replayed during online backup
    Okay, too.


    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
  • Jun Ishiduka at Oct 13, 2011 at 5:03 am
    Sorry.
    I was not previously able to answer fujii's all comments.
    This is the remaining answers.

    + LWLockAcquire(WALInsertLock, LW_EXCLUSIVE);
    + XLogCtl->Insert.fullPageWrites = fullPageWrites;
    + LWLockRelease(WALInsertLock);

    I don't think WALInsertLock needs to be hold here because there is no
    concurrently running process which can access Insert.fullPageWrites.
    For example, Insert->currpos and Insert->LogwrtResult are also changed
    without the lock there. Yes.
    The source comment of XLogReportParameters() needs to be modified.
    Yes, too.


    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
  • Jun Ishiduka at Oct 13, 2011 at 9:41 am

    ERROR: full_page_writes on master is set invalid at least once since
    latest checkpoint

    I think this error should be rewritten as
    ERROR: full_page_writes on master has been off at some point since
    latest checkpoint

    We should be using 'off' instead of 'invalid' since that is what is what
    the user sets it to.
    Sure.
    What about the following message? It sounds more precise to me.

    ERROR: WAL generated with full_page_writes=off was replayed since last
    restartpoint
    Okay, I changes the patch to this messages.
    If someone says there is a idea better than it, I will consider again.

    I updated to patch corresponded above-comments.
    Thanks for updating the patch! Here are the comments:

    * don't yet have the insert lock, forcePageWrites could change under us,
    * but we'll recheck it once we have the lock.
    */
    - doPageWrites = fullPageWrites || Insert->forcePageWrites;
    + doPageWrites = Insert->fullPageWrites || Insert->forcePageWrites;

    The source comment needs to be modified.

    * just turned off, we could recompute the record without full pages, but
    * we choose not to bother.)
    */
    - if (Insert->forcePageWrites && !doPageWrites)
    + if ((Insert->fullPageWrites || Insert->forcePageWrites) && !doPageWrites)

    Same as above.
    Sure.

    XLogReportParameters() should skip writing WAL if full_page_writes has not been
    changed by SIGHUP.

    XLogReportParameters() should skip updating pg_control if any parameter related
    to hot standby has not been changed.
    YES.

    In checkpoint, XLogReportParameters() is called only when wal_level is
    hot_standby.
    OTOH, in walwriter, it's always called even when wal_level is not hot_standby.
    Can't we skip calling XLogReportParameters() whenever wal_level is not
    hot_standby?
    Yes, It is possible.

    In do_pg_start_backup() and do_pg_stop_backup(), the spinlock must be held to
    see XLogCtl->lastFpwDisabledLSN.
    Yes.

    What about changing the error message to:
    ERROR: WAL generated with full_page_writes=off was replayed during online backup
    Okay, too.
    Sorry.
    I was not previously able to answer fujii's all comments.
    This is the remaining answers.

    + LWLockAcquire(WALInsertLock, LW_EXCLUSIVE);
    + XLogCtl->Insert.fullPageWrites = fullPageWrites;
    + LWLockRelease(WALInsertLock);

    I don't think WALInsertLock needs to be hold here because there is no
    concurrently running process which can access Insert.fullPageWrites.
    For example, Insert->currpos and Insert->LogwrtResult are also changed
    without the lock there. Yes.
    The source comment of XLogReportParameters() needs to be modified.
    Yes, too.
    Done.
    I updated to patch corresponded above-comments.

    Regards.

    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
  • Fujii Masao at Oct 14, 2011 at 12:28 pm

    2011/10/13 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
    I updated to patch corresponded above-comments.
    Thanks for updating the patch!

    As I suggested in the reply to Simon, I think that the change of FPW
    should be WAL-logged separately from that of HS parameters. ISTM
    packing them in one WAL record makes XLogReportParameters()
    quite confusing. Thought?

    if (!shutdown && XLogStandbyInfoActive())
    + {
    LogStandbySnapshot(&checkPoint.oldestActiveXid, &checkPoint.nextXid);
    + XLogReportParameters(REPORT_ON_BACKEND);
    + }

    Why doesn't the change of FPW need to be WAL-logged when
    shutdown checkpoint is performed? It's helpful to add the comment
    explaining why.


    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
  • Jun Ishiduka at Oct 15, 2011 at 1:37 am

    As I suggested in the reply to Simon, I think that the change of FPW
    should be WAL-logged separately from that of HS parameters. ISTM
    packing them in one WAL record makes XLogReportParameters()
    quite confusing. Thought?
    I want to confirm the reply of Simon. I think we cannot decide how this
    code should be if there is not the reply.

    if (!shutdown && XLogStandbyInfoActive())
    + {
    LogStandbySnapshot(&checkPoint.oldestActiveXid, &checkPoint.nextXid);
    + XLogReportParameters(REPORT_ON_BACKEND);
    + }

    Why doesn't the change of FPW need to be WAL-logged when
    shutdown checkpoint is performed? It's helpful to add the comment
    explaining why.
    Sure. I update the patch soon.



    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
  • Jun Ishiduka at Oct 15, 2011 at 2:14 am

    if (!shutdown && XLogStandbyInfoActive())
    + {
    LogStandbySnapshot(&checkPoint.oldestActiveXid, &checkPoint.nextXid);
    + XLogReportParameters(REPORT_ON_BACKEND);
    + }

    Why doesn't the change of FPW need to be WAL-logged when
    shutdown checkpoint is performed? It's helpful to add the comment
    explaining why.
    Sure. I update the patch soon.
    Done.
    Please check this.

    Regards.

    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
  • Fujii Masao at Oct 17, 2011 at 7:16 am
    2011/10/15 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
    if (!shutdown && XLogStandbyInfoActive())
    +   {
    LogStandbySnapshot(&checkPoint.oldestActiveXid, &checkPoint.nextXid);
    +           XLogReportParameters(REPORT_ON_BACKEND);
    +   }

    Why doesn't the change of FPW need to be WAL-logged when
    shutdown checkpoint is performed? It's helpful to add the comment
    explaining why.
    Sure. I update the patch soon.
    Done.
    + /*
    + * The backend writes WAL of FPW at checkpoint. However, The backend do
    + * not need to write WAL of FPW at checkpoint shutdown because it
    + * performs when startup finishes.
    + */
    + XLogReportParameters(REPORT_ON_BACKEND);

    I'm still unclear why that WAL doesn't need to be written at shutdown
    checkpoint.
    Anyway, the first sentence in the above comments is not right. Not a backend but
    a bgwriter writes that WAL at checkpoint.

    The second also seems not to be right. It implies that a shutdown checkpoint is
    performed only at end of startup. But it may be done when smart or fast shutdown
    is requested.

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
  • Jun Ishiduka at Oct 18, 2011 at 6:28 am

    + /*
    + * The backend writes WAL of FPW at checkpoint. However, The backend do
    + * not need to write WAL of FPW at checkpoint shutdown because it
    + * performs when startup finishes.
    + */
    + XLogReportParameters(REPORT_ON_BACKEND);

    I'm still unclear why that WAL doesn't need to be written at shutdown
    checkpoint.
    Anyway, the first sentence in the above comments is not right. Not a backend but
    a bgwriter writes that WAL at checkpoint.

    The second also seems not to be right. It implies that a shutdown checkpoint is
    performed only at end of startup. But it may be done when smart or fast shutdown
    is requested.

    Okay.
    I change to the following messages.

    /*
    * The bgwriter writes WAL of FPW at checkpoint. But does not at shutdown.
    * Because XLogReportParameters() is always called at the end of startup
    * process, it does not need to be called at shutdown.
    */


    In addition, I change macro name.

    REPORT_ON_BACKEND -> REPORT_ON_BGWRITER


    Regards.

    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
  • Jun Ishiduka at Oct 19, 2011 at 2:49 am

    + /*
    + * The backend writes WAL of FPW at checkpoint. However, The backend do
    + * not need to write WAL of FPW at checkpoint shutdown because it
    + * performs when startup finishes.
    + */
    + XLogReportParameters(REPORT_ON_BACKEND);

    I'm still unclear why that WAL doesn't need to be written at shutdown
    checkpoint.
    Anyway, the first sentence in the above comments is not right. Not a backend but
    a bgwriter writes that WAL at checkpoint.

    The second also seems not to be right. It implies that a shutdown checkpoint is
    performed only at end of startup. But it may be done when smart or fast shutdown
    is requested.

    Okay.
    I change to the following messages.

    /*
    * The bgwriter writes WAL of FPW at checkpoint. But does not at shutdown.
    * Because XLogReportParameters() is always called at the end of startup
    * process, it does not need to be called at shutdown.
    */


    In addition, I change macro name.

    REPORT_ON_BACKEND -> REPORT_ON_BGWRITER
    I have updated as above-comment.
    Please check this.

    Regards.

    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
  • Jun Ishiduka at Oct 19, 2011 at 7:40 am

    As I suggested in the reply to Simon, I think that the change of FPW
    should be WAL-logged separately from that of HS parameters. ISTM
    packing them in one WAL record makes XLogReportParameters()
    quite confusing. Thought?
    I updated a patch for what you have suggested (that the change of FPW
    should be WAL-logged separately from that of HS parameters).

    I want to base on this patch if there are no other opinions.

    Regards.


    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
  • Fujii Masao at Oct 24, 2011 at 4:31 pm

    2011/10/19 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
    As I suggested in the reply to Simon, I think that the change of FPW
    should be WAL-logged separately from that of HS parameters. ISTM
    packing them in one WAL record makes XLogReportParameters()
    quite confusing. Thought?
    I updated a patch for what you have suggested (that the change of FPW
    should be WAL-logged separately from that of HS parameters).

    I want to base on this patch if there are no other opinions.
    Thanks for updating the patch!

    Attached is the updated version of the patch. I merged your patch into
    standby_online_backup_09_fujii.patch, refactored the code, fixed some
    bugs, added lots of source code comments, but didn't change the basic
    design that you proposed.

    In your patch, FPW is always WAL-logged at startup even when FPW has
    not been changed since last shutdown. I don't think that's required.
    I changed the recovery code so that it keeps track of last FPW indicated
    by WAL record. Then, at end of startup, if that FPW is equal to FPW
    specified in postgresql.conf (which means that FPW has not been changed
    since last shutdown or crash), WAL-logging of FPW is skipped. This change
    prevents unnecessary WAL-logging. Thought?

    Is the patch well-formed enough to mark as ready-for-committer? It would
    be very helpful if you review the patch.

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
  • Heikki Linnakangas at Oct 24, 2011 at 4:26 pm

    On 24.10.2011 15:29, Fujii Masao wrote:
    + <listitem>
    + <para>
    + Copy the pg_control file from the cluster directory to the global
    + sub-directory of the backup. For example:
    + <programlisting>
    + cp $PGDATA/global/pg_control /mnt/server/backupdir/global
    + </programlisting>
    + </para>
    + </listitem>
    Why is this step required? The control file is overwritten by
    information from the backup_label anyway, no?
    + <listitem>
    + <para>
    + Again connect to the database as a superuser, and execute
    + <function>pg_stop_backup</>. This terminates the backup mode, but does not
    + perform a switch to the next WAL segment, create a backup history file and
    + wait for all required WAL segments to be archived,
    + unlike that during normal processing.
    + </para>
    + </listitem>
    How do you ensure that all the required WAL segments have been archived,
    then?
    + </orderedlist>
    + </para>
    +
    + <para>
    + You cannot use the <application>pg_basebackup</> tool to take the backup
    + from the standby.
    + </para>
    Why not? We have cascading replication now.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Fujii Masao at Oct 25, 2011 at 5:12 am
    Thanks for the review!

    On Tue, Oct 25, 2011 at 12:24 AM, Heikki Linnakangas
    wrote:
    On 24.10.2011 15:29, Fujii Masao wrote:

    +    <listitem>
    +     <para>
    +      Copy the pg_control file from the cluster directory to the global
    +      sub-directory of the backup. For example:
    + <programlisting>
    + cp $PGDATA/global/pg_control /mnt/server/backupdir/global
    + </programlisting>
    +     </para>
    +    </listitem>
    Why is this step required? The control file is overwritten by information
    from the backup_label anyway, no?
    Yes, when recovery starts, the control file is overwritten. But before that,
    we retrieve the minimum recovery point from the control file. Then it's used
    as the backup end location.

    During recovery, pg_stop_backup() cannot write an end-of-backup record.
    So, in standby-only backup, other way to retrieve the backup end location
    (instead of an end-of-backup record) is required. Ishiduka-san used the
    control file as that, according to your suggestion ;)
    http://archives.postgresql.org/pgsql-hackers/2011-05/msg01405.php
    +    <listitem>
    +     <para>
    +      Again connect to the database as a superuser, and execute
    +      <function>pg_stop_backup</>. This terminates the backup mode, but
    does not
    +      perform a switch to the next WAL segment, create a backup history
    file and
    +      wait for all required WAL segments to be archived,
    +      unlike that during normal processing.
    +     </para>
    +    </listitem>
    How do you ensure that all the required WAL segments have been archived,
    then?
    The patch doesn't provide any capability to ensure that, IOW assumes that's
    a user responsibility. If a user wants to ensure that, he/she needs to calculate
    the backup start and end WAL files from the result of pg_start_backup()
    and pg_stop_backup() respectively, and needs to wait until those files have
    appeared in the archive. Also if the required WAL file has not been archived
    yet, a user might need to execute pg_switch_xlog() in the master.

    If we change pg_stop_backup() so that, even during recovery, it waits until
    all required WAL files have been archived, we would need to WAL-log
    the completion of WAL archiving in the master. This enables the standby to
    check whether specified WAL files have been archived. We should change
    the patch in this way? But even if we change, you still might need to execute
    pg_switch_xlog() in the master additionally, and pg_stop_backup() might keep
    waiting infinitely if the master is not in progress.
    +   </orderedlist>
    +    </para>
    +
    +    <para>
    +     You cannot use the <application>pg_basebackup</> tool to take the
    backup
    +     from the standby.
    +    </para>
    Why not? We have cascading replication now.
    Because no one has implemented that feature.

    Yeah, we have cascading replication, but without adopting the standby-only
    backup patch, pg_basebackup cannot execute do_pg_start_backup() and
    do_pg_stop_backup() during recovery. So we can think that the patch that
    Ishiduka-san proposed is the first step to extend pg_basebackup so that it
    can take backup from the standby.

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedAug 5, '11 at 6:47a
activeJan 26, '12 at 6:09a
posts91
users10
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase