FAQ
Hi,

I got the following assertion failure when I promoted the standby.

2013-05-04 00:12:31 JST sby1 LOG: received promote request
2013-05-04 00:12:31 JST sby1 FATAL: terminating walreceiver process
due to administrator command
2013-05-04 00:12:31 JST sby1 LOG: redo done at 0/6FFE038
2013-05-04 00:12:31 JST sby1 LOG: last completed transaction was at
log time 2013-05-04 00:12:25.878909+09
2013-05-04 00:12:31 JST sby1 LOG: selected new timeline ID: 2
2013-05-04 00:12:31 JST sby1 LOG: archive recovery complete
2013-05-04 00:12:31 JST sby1 LOG: checkpoint starting:
TRAP: FailedAssertion("!(sentPtr <= sendTimeLineValidUpto)", File:
"walsender.c", Line: 1465)
2013-05-04 00:12:31 JST sby1 LOG: autovacuum launcher started

The way to reproduce this is:

1. Create one master A, one standby B, and one cascade standby C.
2. Run pgbench -i -s 10
3. Promote the standby B before pgbench -i finishes

Regards,

--
Fujii Masao

Search Discussions

  • Amit Langote at May 5, 2013 at 9:13 am
    Hello,

    I tried reproducing the scenario. Note that I did not archive xlogs
    (that is, archive_command = '/bin/true' and corresponding
    restore_command = '/bin/false'). I performed the steps you mentioned
    and could find following:

    ***** Log on standby-1:
    [Standby-1]LOG: database system was interrupted; last known up at
    2013-05-05 14:05:08 IST
    [Standby-1]LOG: creating missing WAL directory "pg_xlog/archive_status"
    [Standby-1]LOG: entering standby mode
    [Standby-1]LOG: started streaming WAL from primary at 0/2000000 on timeline 1
    [Standby-1]LOG: redo starts at 0/2000024
    [Standby-1]LOG: consistent recovery state reached at 0/20000DC
    [Standby-1]LOG: database system is ready to accept read only connections
    [Standby-1]LOG: received promote request
    [Standby-1]FATAL: terminating walreceiver process due to administrator command
    [Standby-1]LOG: invalid magic number 0000 in log segment
    000000010000000000000003, offset 5316608
    [Standby-1]LOG: redo done at 0/350F0B8
    [Standby-1]LOG: last completed transaction was at log time 2013-05-05
    14:05:14.571492+05:30
    [Standby-1]LOG: selected new timeline ID: 2
    [Standby-1]LOG: archive recovery complete
    [Standby-1]ERROR: server switched off timeline 1 at 0/3510B14, but walsender already streamed up to 0/3512000
    [Standby-1]LOG: database system is ready to accept connections
    [Standby-1]LOG: autovacuum launcher started


    ****** Log on Standby-2:
    [Standby-2]LOG: database system was interrupted while in recovery at
    log time 2013-05-05 14:05:07 IST
    [Standby-2]HINT: If this has occurred more than once some data might
    be corrupted and you might need to choose an earlier recovery target.
    [Standby-2]LOG: creating missing WAL directory "pg_xlog/archive_status"
    [Standby-2]LOG: entering standby mode
    [Standby-2]LOG: started streaming WAL from primary at 0/2000000 on timeline 1
    [Standby-2]LOG: redo starts at 0/2000024
    [Standby-2]LOG: consistent recovery state reached at 0/3000000
    [Standby-2]LOG: database system is ready to accept read only connections
    [Standby-2]FATAL: could not receive data from WAL stream: ERROR: server switched off timeline 1 at 0/3510B14, but walsender already streamed up to 0/3512000
    [Standby-2]LOG: invalid magic number 0000 in log segment
    000000010000000000000003, offset 5316608
    [Standby-2]LOG: fetching timeline history file for timeline 2 from
    primary server
    [Standby-2]LOG: started streaming WAL from primary at 0/3000000 on timeline 1
    [Standby-2]LOG: replication terminated by primary server
    [Standby-2]DETAIL: End of WAL reached on timeline 1 at 0/3510B14
    [Standby-2]LOG: restarted WAL streaming at 0/3000000 on timeline 1
    [Standby-2]LOG: replication terminated by primary server
    [Standby-2]DETAIL: End of WAL reached on timeline 1 at 0/3510B14
    [Standby-2]LOG: restarted WAL streaming at 0/3000000 on timeline 1
    [Standby-2]LOG: replication terminated by primary server
    [Standby-2]DETAIL: End of WAL reached on timeline 1 at 0/3510B14
    [Standby-2]LOG: restarted WAL streaming at 0/3000000 on timeline 1
    [Standby-2]LOG: replication terminated by primary server
    [Standby-2]DETAIL: End of WAL reached on timeline 1 at 0/3510B14
    [Standby-2]LOG: restarted WAL streaming at 0/3000000 on timeline 1
    [Standby-2]LOG: replication terminated by primary server
    [Standby-2]DETAIL: End of WAL reached on timeline 1 at 0/3510B14
    [Standby-2]LOG: restarted WAL streaming at 0/3000000 on timeline 1
    [Standby-2]LOG: replication terminated by primary server
    [Standby-2]DETAIL: End of WAL reached on timeline 1 at 0/3510B14
    [Standby-2]LOG: restarted WAL streaming at 0/3000000 on timeline 1
    [Standby-2]LOG: replication terminated by primary server
    [Standby-2]DETAIL: End of WAL reached on timeline 1 at 0/3510B14
    [Standby-2]LOG: restarted WAL streaming at 0/3000000 on timeline 1
    [Standby-2]LOG: replication terminated by primary server
    [Standby-2]DETAIL: End of WAL reached on timeline 1 at 0/3510B14
    ...
    ...
    ...

    ****** Also, in the ps output, following is the state of wal sender
    (standby-1) and wal receiver (standby-2)

    amit 8084 5675 0 14:13 ? 00:00:00 postgres: wal receiver
    process restarting at 0/3000000
    amit 8085 5648 0 14:13 ? 00:00:00 postgres: wal sender
    process amit [local] idle


    Is this related to the assertion failure that you have reported?

    --

    Amit Langote
  • Heikki Linnakangas at May 6, 2013 at 10:10 am

    On 05.05.2013 12:13, Amit Langote wrote:
    Hello,

    I tried reproducing the scenario. Note that I did not archive xlogs
    (that is, archive_command = '/bin/true' and corresponding
    restore_command = '/bin/false'). I performed the steps you mentioned
    and could find following:
    [snip]
    [Standby-2]FATAL: could not receive data from WAL stream: ERROR: server switched off timeline 1 at 0/3510B14, but walsender already streamed up to 0/3512000
    Is this related to the assertion failure that you have reported?
    Yep, same issue. If you compile with assertions disabled, you get that
    error instead of the assertion. They both check for the same thing, it
    was an oversight to have both.

    - Heikki
  • Heikki Linnakangas at May 6, 2013 at 10:08 am

    On 03.05.2013 18:17, Fujii Masao wrote:
    Hi,

    I got the following assertion failure when I promoted the standby.

    2013-05-04 00:12:31 JST sby1 LOG: received promote request
    2013-05-04 00:12:31 JST sby1 FATAL: terminating walreceiver process
    due to administrator command
    2013-05-04 00:12:31 JST sby1 LOG: redo done at 0/6FFE038
    2013-05-04 00:12:31 JST sby1 LOG: last completed transaction was at
    log time 2013-05-04 00:12:25.878909+09
    2013-05-04 00:12:31 JST sby1 LOG: selected new timeline ID: 2
    2013-05-04 00:12:31 JST sby1 LOG: archive recovery complete
    2013-05-04 00:12:31 JST sby1 LOG: checkpoint starting:
    TRAP: FailedAssertion("!(sentPtr<= sendTimeLineValidUpto)", File:
    "walsender.c", Line: 1465)
    2013-05-04 00:12:31 JST sby1 LOG: autovacuum launcher started

    The way to reproduce this is:

    1. Create one master A, one standby B, and one cascade standby C.
    2. Run pgbench -i -s 10
    3. Promote the standby B before pgbench -i finishes
    I was able to reproduce this. The assertion checks that if the system is
    promoted at WAL location X, we must not have already sent WAL at > X to
    the client. As the code stands, that assumption is wrong; the walsender
    will merrily stream WAL that hasn't been replayed yet, and the system
    can be promoted before replaying all the WAL that has been streamed to a
    cascading standby. The comment in GetStandbyFlushRecPtr(), which is the
    function that determined how far the WAL may be streamed to a cascading
    standby, says this:
    /*
    * We can safely send what's already been replayed. Also, if walreceiver
    * is streaming WAL from the same timeline, we can send anything that
    * it has streamed, but hasn't been replayed yet.
    */
    There seems to be two bugs here:

    1. This used to work in 9.2, because the startup process would always
    replay all the WAL present in pg_xlog before promoting (the WAL present
    in pg_xlog was streamed from master). But the refactorings in xlog.c in
    9.3devel broke that, so that the startup process can promote earlier.

    2. Even after fixing the logic in xlog.c, there is still a corner-case
    where the startup process can promote before all the WAL that has been
    received from walreceiver has been received. That happens if the WAL
    streaming is terminated at a page boundary, rather than at a record
    boundary. For example, we might have received WAL up to the page
    boundary at 0/5BFA000, but the last *complete* record that we have
    received ends at 0/5BF9BD8.

    To fix the second issue, I think two things need to happen. First, we
    need to suppress the check in walsender. Second, we need to teach the
    WAL replay to back off when that happens. At the moment, the replay in
    the cascading standby gets stuck, trying to fetch the next page
    containing rest of the partial WAL record. Instead, it should throw away
    the partial record it has, and resync at the end of the last replayed
    record. I think 9.2 has the same bug, BTW. Without support for timeline
    switches over streaming replication, it was just more difficult to hit.

    - Heikki
  • Heikki Linnakangas at May 7, 2013 at 6:35 pm

    On 06.05.2013 13:08, Heikki Linnakangas wrote:
    On 03.05.2013 18:17, Fujii Masao wrote:
    Hi,

    I got the following assertion failure when I promoted the standby.

    2013-05-04 00:12:31 JST sby1 LOG: received promote request
    2013-05-04 00:12:31 JST sby1 FATAL: terminating walreceiver process
    due to administrator command
    2013-05-04 00:12:31 JST sby1 LOG: redo done at 0/6FFE038
    2013-05-04 00:12:31 JST sby1 LOG: last completed transaction was at
    log time 2013-05-04 00:12:25.878909+09
    2013-05-04 00:12:31 JST sby1 LOG: selected new timeline ID: 2
    2013-05-04 00:12:31 JST sby1 LOG: archive recovery complete
    2013-05-04 00:12:31 JST sby1 LOG: checkpoint starting:
    TRAP: FailedAssertion("!(sentPtr<= sendTimeLineValidUpto)", File:
    "walsender.c", Line: 1465)
    2013-05-04 00:12:31 JST sby1 LOG: autovacuum launcher started

    The way to reproduce this is:

    1. Create one master A, one standby B, and one cascade standby C.
    2. Run pgbench -i -s 10
    3. Promote the standby B before pgbench -i finishes
    I was able to reproduce this. The assertion checks that if the system is
    promoted at WAL location X, we must not have already sent WAL at > X to
    the client. As the code stands, that assumption is wrong; the walsender
    will merrily stream WAL that hasn't been replayed yet, and the system
    can be promoted before replaying all the WAL that has been streamed to a
    cascading standby. The comment in GetStandbyFlushRecPtr(), which is the
    function that determined how far the WAL may be streamed to a cascading
    standby, says this:
    /*
    * We can safely send what's already been replayed. Also, if walreceiver
    * is streaming WAL from the same timeline, we can send anything that
    * it has streamed, but hasn't been replayed yet.
    */
    There seems to be two bugs here:

    1. This used to work in 9.2, because the startup process would always
    replay all the WAL present in pg_xlog before promoting (the WAL present
    in pg_xlog was streamed from master). But the refactorings in xlog.c in
    9.3devel broke that, so that the startup process can promote earlier.

    2. Even after fixing the logic in xlog.c, there is still a corner-case
    where the startup process can promote before all the WAL that has been
    received from walreceiver has been received. That happens if the WAL
    streaming is terminated at a page boundary, rather than at a record
    boundary. For example, we might have received WAL up to the page
    boundary at 0/5BFA000, but the last *complete* record that we have
    received ends at 0/5BF9BD8.

    To fix the second issue, I think two things need to happen. First, we
    need to suppress the check in walsender. Second, we need to teach the
    WAL replay to back off when that happens. At the moment, the replay in
    the cascading standby gets stuck, trying to fetch the next page
    containing rest of the partial WAL record. Instead, it should throw away
    the partial record it has, and resync at the end of the last replayed
    record.
    I came up with the attached patch for this
    (fix-standby-promotion-assert-fail-2.patch). You will also need to apply
    fast-promotion-quick-fix.patch to work around the bug in fast promotion
    I reported here:
    http://www.postgresql.org/message-id/5188CFFA.3020209@vmware.com.

    pg_receivexlog has a variant of the same bug. If the server has sent WAL
    up to segment boundary, and the segment boundary splits a WAL record in
    half, and the server is then promoted so that the promotion checkpoint
    (or end-of-recovery) record goes to the previous segment, pg_receivexlog
    will not fetch the segment containing the checkpoint record correctly.
    When following a timeline switch, it should restart streaming from the
    new timeline at the point where the timeline switch occurred, rather
    than at the point on the old timeline where it stopped. Usually it's the
    same thing, but not if the streaming was paused at a page or segment
    boundary.

    To fix this for pg_receivexlog, I added the start position of the next
    timeline to the result set that the server sends at end of WAL
    streaming. Previously, it only sent the TLI of the next timeline, and it
    was assumed that the starting point is the same as the point where
    streaming was stopped.

    It's a bit late to change it, but I have to say I don't like this whole
    business of relaying WAL to a cascading standby that hasn't been
    replayed. It would be a lot simpler if you could assume that whatever
    gets streamed downstream has already been replayed upstream. I
    understand that it's nice from a performance point of view, and because
    a cascading standby isn't then lagged behind if the replay gets stalled
    in the upstream server because of a hot standby conflict. But still..

    I'm going to sleep over this and continue testing tomorrow. Please have
    a look.
    I think 9.2 has the same bug, BTW. Without support for timeline
    switches over streaming replication, it was just more difficult to hit.
    Looking closer, I believe 9.2 is OK. Recovery loops back to retry the
    whole record correctly, it was just the timeline switch over streaming
    replication code that got confused.

    - Heikki

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedMay 3, '13 at 3:17p
activeMay 7, '13 at 6:35p
posts5
users3
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase