FAQ
So I've been delaying moving some production boxes over to 9.0.4 from
9.0.2 because hot standby fails with:
(this is on the "hot standby" machine that connects to the master)

2011-06-08 11:40:48 MDT [6072]: [2-1] user= LOG: entering standby mode
2011-06-08 11:40:48 MDT [6072]: [3-1] user= DEBUG: checkpoint record
is at 86/E5D725F0
2011-06-08 11:40:48 MDT [6072]: [4-1] user= DEBUG: redo record is at
86/E39E8248; shutdown FALSE
2011-06-08 11:40:48 MDT [6072]: [5-1] user= DEBUG: next transaction
ID: 0/35456371; next OID: 34090526
2011-06-08 11:40:48 MDT [6072]: [6-1] user= DEBUG: next MultiXactId:
523; next MultiXactOffset: 1046
2011-06-08 11:40:48 MDT [6072]: [7-1] user= DEBUG: oldest unfrozen
transaction ID: 654, in database 1
2011-06-08 11:40:48 MDT [6072]: [8-1] user= DEBUG: transaction ID
wrap limit is 2147484301, limited by database with OID 1
2011-06-08 11:40:48 MDT [6072]: [9-1] user= DEBUG: initializing for hot standby
2011-06-08 11:40:48 MDT [6072]: [10-1] user= LOG: redo starts at 86/E39E8248
2011-06-08 11:40:48 MDT [6072]: [11-1] user= LOG: invalid record
length at 86/E39F2010
2011-06-08 11:40:48 MDT [6074]: [1-1] user= LOG: streaming
replication successfully connected to primary
2011-06-08 11:40:49 MDT [6072]: [12-1] user= LOG: invalid record
length at 86/E3A16010
2011-06-08 11:40:49 MDT [6074]: [2-1] user= FATAL: terminating
walreceiver process due to administrator command
2011-06-08 11:40:49 MDT [6072]: [13-1] user= LOG: invalid record
length at 86/E3A3C010
2011-06-08 11:40:53 MDT [6072]: [14-1] user= LOG: invalid record
length at 86/E3A54010
2011-06-08 11:40:53 MDT [6075]: [1-1] user= FATAL: terminating
walreceiver process due to administrator command
2011-06-08 11:40:53 MDT [6072]: [15-1] user= LOG: invalid record
length at 86/E3A74010
2011-06-08 11:40:58 MDT [6076]: [1-1] user= LOG: streaming
replication successfully connected to primary
2011-06-08 11:40:59 MDT [6072]: [16-1] user= LOG: invalid record
length at 86/E3AC6010
2011-06-08 11:40:59 MDT [6076]: [2-1] user= FATAL: terminating
walreceiver process due to administrator command
2011-06-08 11:40:59 MDT [6072]: [17-1] user= LOG: invalid record
length at 86/E3ACC010
2011-06-08 11:41:03 MDT [6072]: [18-1] user= LOG: invalid record
length at 86/E3B32010
2011-06-08 11:41:03 MDT [6078]: [1-1] user= FATAL: terminating
walreceiver process due to administrator command
[ repeats... ]

Originally I thought there might be some corner case bug in 9.0.3 or
9.0.4. However after recompiling 9.0.2 with gcc 4.6 and hitting the
same problem-- I tried compiling 9.0.4 with gcc 4.5 and it seemed to
work great. I then tired various optimization levels on 4.6:
-O0: works
-O1: works
-O2: fails
-Os: works

I suppose the next step is to narrow it down to a specific flag -O2
uses... But I thought I would post here first-- maybe someone else has
hit this? Or maybe someone has a bright idea on how to narrow this
down?

# linux 2.6.39.1 x86_64 AMD opteron box
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-unknown-linux-gnu/4.6.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: /build/src/gcc-4.6-20110603/configure --prefix=/usr
--libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man
--infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/
--enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++
--enable-shared --enable-threads=posix --with-system-zlib
--enable-__cxa_atexit --disable-libunwind-exceptions
--enable-clocale=gnu --enable-gnu-unique-object
--enable-linker-build-id --with-ppl --enable-cloog-backend=isl
--enable-lto --enable-gold --enable-ld=default --enable-plugin
--with-plugin-ld=ld.gold --disable-multilib --disable-libstdcxx-pch
--enable-checking=release
Thread model: posix
gcc version 4.6.0 20110603 (prerelease) (GCC)

Search Discussions

  • Tom Lane at Jun 8, 2011 at 6:49 pm

    Alex Hunsaker writes:
    So I've been delaying moving some production boxes over to 9.0.4 from
    2011-06-08 11:41:03 MDT [6078]: [1-1] user= FATAL: terminating
    walreceiver process due to administrator command
    [ repeats... ]
    I suppose the next step is to narrow it down to a specific flag -O2
    uses... But I thought I would post here first-- maybe someone else has
    hit this? Or maybe someone has a bright idea on how to narrow this
    down?
    Maybe using a "prerelease" gcc version isn't such a hot idea for
    production. It's very, very, very difficult to see how the behavior you
    describe isn't a compiler bug. (Well, I could also believe that
    something external is repeatedly hitting the walreceiver with a SIGTERM,
    but it's hard to square that with the behavior changing when you
    recompile with different -O levels ...)

    It might be useful to strace the postmaster and walreceiver processes
    just to see if any signal is actually being sent or received.

    regards, tom lane
  • Alex Hunsaker at Jun 8, 2011 at 6:58 pm

    On Wed, Jun 8, 2011 at 12:49, Tom Lane wrote:
    Alex Hunsaker <badalex@gmail.com> writes:
    So I've been delaying moving some production boxes over to 9.0.4 from
    2011-06-08 11:41:03 MDT [6078]: [1-1] user= FATAL:  terminating
    walreceiver process due to administrator command
    [ repeats... ]
    I suppose the next step is to narrow it down to a specific flag -O2
    uses... But I thought I would post here first-- maybe someone else has
    hit this? Or maybe someone has a bright idea on how to narrow this
    down?
    Maybe using a "prerelease" gcc version isn't such a hot idea for
    production.  It's very, very, very difficult to see how the behavior you
    describe isn't a compiler bug.
    Yeah :-). However ill note it looks like its the default compiler for
    fedora 15, ubuntu natty and debian sid.
    It might be useful to strace the postmaster and walreceiver processes
    just to see if any signal is actually being sent or received.
    Will do.
  • Alex Hunsaker at Jun 8, 2011 at 8:15 pm

    On Wed, Jun 8, 2011 at 12:58, Alex Hunsaker wrote:
    On Wed, Jun 8, 2011 at 12:49, Tom Lane wrote:
    It might be useful to strace the postmaster and walreceiver processes
    just to see if any signal is actually being sent or received.
    Will do.
    Find it attached.

    Corresponding logs:
    2011-06-08 14:01:21 MDT [27779]: [1-1] user= LOG: could not create
    IPv6 socket: Address family not supported by protocol
    2011-06-08 14:01:21 MDT [27781]: [1-1] user= LOG: database system was
    shut down in recovery at 2011-06-08 14:00:37 MDT
    2011-06-08 14:01:21 MDT [27781]: [2-1] user= LOG: entering standby mode
    2011-06-08 14:01:21 MDT [27781]: [3-1] user= DEBUG: checkpoint record
    is at 86/F6E76708
    2011-06-08 14:01:21 MDT [27781]: [4-1] user= DEBUG: redo record is at
    86/F4DF9BF0; shutdown FALSE
    2011-06-08 14:01:21 MDT [27781]: [5-1] user= DEBUG: next transaction
    ID: 0/35484525; next OID: 34123294
    2011-06-08 14:01:21 MDT [27781]: [6-1] user= DEBUG: next MultiXactId:
    523; next MultiXactOffset: 1046
    2011-06-08 14:01:21 MDT [27781]: [7-1] user= DEBUG: oldest unfrozen
    transaction ID: 654, in database 1
    2011-06-08 14:01:21 MDT [27781]: [8-1] user= DEBUG: transaction ID
    wrap limit is 2147484301, limited by database with OID 1
    2011-06-08 14:01:21 MDT [27781]: [9-1] user= DEBUG: initializing for
    hot standby
    2011-06-08 14:01:21 MDT [27781]: [10-1] user= LOG: redo starts at 86/F4DF9BF0
    2011-06-08 14:01:21 MDT [27781]: [11-1] user= LOG: invalid record
    length at 86/F4E62010
    2011-06-08 14:01:21 MDT [27783]: [1-1] user= LOG: streaming
    replication successfully connected to primary
    2011-06-08 14:01:22 MDT [27781]: [12-1] user= LOG: invalid record
    length at 86/F4E82010
    2011-06-08 14:01:24 MDT [27783]: [2-1] user= FATAL: terminating
    walreceiver process due to administrator command
    2011-06-08 14:01:24 MDT [27781]: [13-1] user= LOG: invalid record
    length at 86/F4EEC010
    2011-06-08 14:01:26 MDT [27784]: [1-1] user= LOG: streaming
    replication successfully connected to primary
    2011-06-08 14:01:26 MDT [27781]: [14-1] user= LOG: invalid record
    length at 86/F4F1E010
    2011-06-08 14:01:26 MDT [27784]: [2-1] user= FATAL: terminating
    walreceiver process due to administrator command
    2011-06-08 14:01:26 MDT [27781]: [15-1] user= LOG: invalid record
    length at 86/F4F22010
    2011-06-08 14:01:31 MDT [27785]: [1-1] user= LOG: streaming
    replication successfully connected to primary
    ^C2011-06-08 14:01:32 MDT [27779]: [2-1] user= LOG: received fast
    shutdown request
    2011-06-08 14:01:32 MDT [27785]: [2-1] user= FATAL: terminating
    walreceiver process due to administrator command
    2011-06-08 14:01:32 MDT [27782]: [1-1] user= LOG: shutting down
    2011-06-08 14:01:32 MDT [27782]: [2-1] user= LOG: database system is shut down
  • Tom Lane at Jun 8, 2011 at 8:31 pm

    Alex Hunsaker writes:
    On Wed, Jun 8, 2011 at 12:49, Tom Lane wrote:
    It might be useful to strace the postmaster and walreceiver processes
    just to see if any signal is actually being sent or received.
    Find it attached.
    Well, the trace shows exactly what I thought was happening: each time
    the startup process hits one of these:
    2011-06-08 14:01:22 MDT [27781]: [12-1] user= LOG: invalid record
    length at 86/F4E82010
    it sends a SIGTERM to kill the walreceiver, because it thinks this
    indicates a walreceiver problem. Then we launch another one and manage
    to process a few more WAL records, lather rinse repeat.

    So it's interesting that this only happens with a particular gcc version,
    because it's not apparent to me why it works properly for anybody.
    Isn't hitting a zero record length an expected case when we run ahead of
    the amount of WAL produced by the master?

    regards, tom lane
  • Fujii Masao at Jun 9, 2011 at 2:37 am

    On Thu, Jun 9, 2011 at 5:31 AM, Tom Lane wrote:
    So it's interesting that this only happens with a particular gcc version,
    because it's not apparent to me why it works properly for anybody.
    Isn't hitting a zero record length an expected case when we run ahead of
    the amount of WAL produced by the master?
    At least while walreceiver is running, recovery doesn't go ahead of the
    last receive location. So that's not an expected case.

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
  • Mark Kirkwood at Jun 8, 2011 at 10:20 pm

    On 09/06/11 06:58, Alex Hunsaker wrote:
    Yeah :-). However ill note it looks like its the default compiler for
    fedora 15, ubuntu natty and debian sid.
    FWIW Ubuntu natty uses gcc 4.5.2, probably just as as well in the light
    of your findings :-)

    Cheers

    Mark
  • Alex Hunsaker at Jun 8, 2011 at 10:56 pm

    On Wed, Jun 8, 2011 at 16:20, Mark Kirkwood wrote:
    On 09/06/11 06:58, Alex Hunsaker wrote:

    Yeah :-). However ill note it looks like its the default compiler for
    fedora 15, ubuntu natty and debian sid.
    FWIW Ubuntu natty uses gcc 4.5.2, probably just as as well in the light of
    your findings :-)
    Yeah I was just looking at distrowatch, its "snapshot natty" that uses
    4.6.0. ubuntu 11.04 uses 4.5.2 like you said.

    http://distrowatch.com/table.php?distribution=ubuntu
  • Tom Lane at Jun 10, 2011 at 6:39 pm

    Mark Kirkwood writes:
    On 09/06/11 06:58, Alex Hunsaker wrote:
    Yeah :-). However ill note it looks like its the default compiler for
    fedora 15, ubuntu natty and debian sid.
    FWIW Ubuntu natty uses gcc 4.5.2, probably just as as well in the light
    of your findings :-)
    I've been able to reproduce this on released Fedora 15, and sure enough
    it is a compiler bug. The problem comes from these fragments of
    ReadRecord():

    ReadRecord(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt)
    {
    XLogRecPtr tmpRecPtr = EndRecPtr;

    if (RecPtr == NULL)
    RecPtr = &tmpRecPtr;

    targetRecOff = RecPtr->xrecoff % XLOG_BLCKSZ;

    if (targetRecOff == 0)
    tmpRecPtr.xrecoff += pageHeaderSize;

    record = (XLogRecord *) ((char *) readBuf + RecPtr->xrecoff % XLOG_BLCKSZ);

    gcc 4.6.0 is assuming that the value it first fetches for RecPtr->xrecoff
    is still usable for computing the "record" pointer, despite the possible
    intervening update of tmpRecPtr. That of course leads to "record"
    pointing to the wrong place, which results in an incorrect conclusion
    that we're looking at an invalid record header, which leads to killing
    and restarting the walreceiver. I haven't traced what happens after
    that, but apparently in standby mode we'll come back to ReadRecord with
    a record pointer that's already advanced over the page header, else it'd
    be an infinite loop.

    Note that this means that crash recovery, not only standby mode, is
    broken with this compiler.

    I've filed a bug report for this:
    https://bugzilla.redhat.com/show_bug.cgi?id=712480
    but I wouldn't care to hold my breath while waiting for a fix to appear
    upstream, let alone propagate to all the distros already using 4.6.0.
    I think we need a workaround.

    The obvious question to ask here is why are we updating
    "tmpRecPtr.xrecoff" and not "RecPtr->xrecoff"? The latter choice would
    be a lot more readable, since the immediately surrounding code doesn't
    refer to tmpRecPtr. I think the idea is to guarantee that the caller's
    referenced record pointer (if any) isn't modified, but if RecPtr is not
    pointing at tmpRecPtr here, we have got big problems anyway.

    So I'm tempted to propose this fix:

    if (targetRecOff == 0)
    {
    /*
    * Can only get here in the continuing-from-prev-page case, because
    * XRecOffIsValid eliminated the zero-page-offset case otherwise. Need
    * to skip over the new page's header.
    */
    - tmpRecPtr.xrecoff += pageHeaderSize;
    + Assert(RecPtr == &tmpRecPtr);
    + RecPtr->xrecoff += pageHeaderSize;
    targetRecOff = pageHeaderSize;
    }

    Another possibility, which might be less ugly, is to delete the above
    code entirely and handle the page-header case in the RecPtr == NULL
    branch a few lines above.

    Comments?

    regards, tom lane
  • Alex Hunsaker at Jun 10, 2011 at 8:00 pm

    On Fri, Jun 10, 2011 at 12:38, Tom Lane wrote:
    I've been able to reproduce this on released Fedora 15, and sure enough
    it is a compiler bug.  The problem comes from these fragments of
    ReadRecord():
    [ ... ]
    Whoa, awesome. I spent a few more hours comparing the assembly-- then
    I tried compiling a subset of xlog.c with some educated guesses as to
    what function might be getting mis-compiled. Obviously my guesses were
    not educated enough! :-)
    I've filed a bug report for this:
    https://bugzilla.redhat.com/show_bug.cgi?id=712480
    but I wouldn't care to hold my breath while waiting for a fix to appear
    upstream, let alone propagate to all the distros already using 4.6.0.
    I wouldn't hold my breath either.
    I think we need a workaround.

    The obvious question to ask here is why are we updating
    "tmpRecPtr.xrecoff" and not "RecPtr->xrecoff"?  The latter choice would
    be a lot more readable, since the immediately surrounding code doesn't
    refer to tmpRecPtr.  I think the idea is to guarantee that the caller's
    referenced record pointer (if any) isn't modified, but if RecPtr is not
    pointing at tmpRecPtr here, we have got big problems anyway.
    Hrm, Couldn't we change all the references to tmpRecPtr to use RecPtr
    instead? (Except of course where we assign RecPtr = &tmpRecPtr); I
    think that would make the code look a lot less confused. Something
    like the attached?

    FYI Im happy to test whatever you fix you propose for a few days on
    this machine. It gets a fair amount of traffic... hopefully enough to
    exercise some possible corner cases.
  • Tom Lane at Jun 10, 2011 at 8:24 pm

    Alex Hunsaker writes:
    On Fri, Jun 10, 2011 at 12:38, Tom Lane wrote:
    I think we need a workaround.
    My second idea about moving the test up doesn't work, because we can't
    know the page header size until after we've read the page. But I've
    verified that the attached patch does make the problem go away on my
    F15 box.
    Hrm, Couldn't we change all the references to tmpRecPtr to use RecPtr
    instead? (Except of course where we assign RecPtr = &tmpRecPtr); I
    think that would make the code look a lot less confused. Something
    like the attached?
    Yeah, we could do that too; slightly modified version of your change
    included in the attached.

    regards, tom lane
  • Alex Hunsaker at Jun 10, 2011 at 8:52 pm

    On Fri, Jun 10, 2011 at 14:24, Tom Lane wrote:
    Alex Hunsaker <badalex@gmail.com> writes:
    Hrm, Couldn't we change all the references to tmpRecPtr to use RecPtr
    instead? (Except of course where we assign RecPtr = &tmpRecPtr); I
    think that would make the code look a lot less confused. Something
    like the attached?
    Yeah, we could do that too; slightly modified version of your change
    included in the attached.
    A cassert enabled build seems to be working, ill leave it running over
    the weekend...
  • Alex Hunsaker at Jun 8, 2011 at 6:53 pm

    On Wed, Jun 8, 2011 at 12:12, Alex Hunsaker wrote:
    So I've been delaying moving some production boxes over to 9.0.4 from
    9.0.2 because hot standby fails with:
    (this is on the "hot standby" machine that connects to the master)
    [ ...]
    2011-06-08 11:41:03 MDT [6072]: [18-1] user= LOG:  invalid record
    length at 86/E3B32010
    2011-06-08 11:41:03 MDT [6078]: [1-1] user= FATAL:  terminating
    walreceiver process due to administrator command
    [ repeats... ]
    [...] I then tired various optimization levels on 4.6:
    -O0: works
    -O1: works
    -O2: fails
    -Os: works
    So I tracked it down to "-fgcse", that is CFLAGS="-O2 -fno-gcse"
    ./configure works. I then took a few guesses and compiled all of
    postgres with -O2, then manually recompiled xlog.c with -f-no-gcse.
    that combination seems to work.

    [ One thing im not sure is why -Os works, I tried -O2 and added all
    the -fno-XXX options it says -Os adds. I suppose its either they turn
    off/on other optimizations the man page does not mention, or I guess
    thats compiler bugs for ya ]
  • Tom Lane at Jun 8, 2011 at 7:05 pm

    Alex Hunsaker writes:
    So I tracked it down to "-fgcse", that is CFLAGS="-O2 -fno-gcse"
    ./configure works. I then took a few guesses and compiled all of
    postgres with -O2, then manually recompiled xlog.c with -f-no-gcse.
    that combination seems to work.
    Huh, interesting. So the bug must be lurking somewhere around the logic
    that deals with failedSources: somehow we're getting to the
    ShutdownWalRcv call in XLogPageRead.

    regards, tom lane

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedJun 8, '11 at 6:16p
activeJun 10, '11 at 8:52p
posts14
users4
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase