After quite a bit of hair-pulling trying to install Fedora 19 Alpha,
I've been able to reproduce the initdb-time failure that's currently
being seen on buildfarm member anchovy, and was also complained of
recently by an end user:
http://www.postgresql.org/message-id/CAOD=oQ-kq3Eg5SOvRYOVxDuqibVWC8R0wEivPsMGcyzZY-nfzA@mail.gmail.com

It is exactly what I suspected, namely that gcc 4.8.0 is applying an
optimization that breaks our code; the odd thing though is that it's
not breaking 9.2 or HEAD, just the older branches.

It turns out that what's happening is that with
-faggressive-loop-optimizations turned on (as it is by default),
gcc decides that loops that iterate over the elements of an int2vector
can iterate at most once, because int2vector is declared with a fixed
size values[] array:

     int16 values[1]; /* VARIABLE LENGTH ARRAY */
} int2vector; /* VARIABLE LENGTH STRUCT */

Now, gcc does know better than to make such an assumption
unconditionally, but what I discovered is that it *will* assume this if
the int2vector is declared as a non-last element of a larger struct,
so that (in gcc's little mind anyway) it couldn't possibly really be
a variable-length array.

In other words, the reason 9.2 and up don't break is commit
8137f2c32322c624e0431fac1621e8e9315202f9, which arranged to hide
non-fixed-offset catalog columns from the compiler. Without that,
gcc decides that for instance pg_index.indkey cannot have more than one
member. That breaks the loop in BuildIndexInfo() that copies the key
column numbers into an IndexInfo, leading to the observed failure.

Since gcc 4.8 is going to be on a lot of people's machines pretty soon,
I think we need to do something to prevent it from breaking 8.4.x and
9.0.x. It looks like our choices are (1) teach configure to enable
-fno-aggressive-loop-optimizations if the compiler recognizes it,
or (2) back-port commit 8137f2c32322c624e0431fac1621e8e9315202f9.

I'm a bit leaning towards (1), mainly because I'm not excited about
fighting a compiler arms race in the back branches.

It also strikes me that we ought to take this as a warning sign
that we need to work on getting rid of coding like the above in favor
of genuine "flexible arrays", before the gcc boys think of some other
overly-cute optimization based on the assumption that an array declared
with a fixed size really is fixed.

    regards, tom lane

Search Discussions

  • Peter Geoghegan at Apr 5, 2013 at 10:28 pm

    On Fri, Apr 5, 2013 at 11:14 PM, Tom Lane wrote:
    It also strikes me that we ought to take this as a warning sign
    that we need to work on getting rid of coding like the above in favor
    of genuine "flexible arrays", before the gcc boys think of some other
    overly-cute optimization based on the assumption that an array declared
    with a fixed size really is fixed.
    The traditional argument against that has been that that's a C99
    feature. However, since it appears that even MSVC supports flexible
    arrays (which are described as a "Microsoft extension", so may not
    have identical semantics), it might be possible to do this across the
    board without contorting the code with preprocessor hacks. That's
    something that I'd certainly be in favor of pursuing.


    --
    Peter Geoghegan
  • Tom Lane at Apr 5, 2013 at 10:38 pm

    Peter Geoghegan writes:
    On Fri, Apr 5, 2013 at 11:14 PM, Tom Lane wrote:
    It also strikes me that we ought to take this as a warning sign
    that we need to work on getting rid of coding like the above in favor
    of genuine "flexible arrays", before the gcc boys think of some other
    overly-cute optimization based on the assumption that an array declared
    with a fixed size really is fixed.
    The traditional argument against that has been that that's a C99
    feature.
    Well, we already have a solution for that, see FLEXIBLE_ARRAY_MEMBER.
    But up to now we've just supposed that that was a code beautification
    thing and there was no particular urgency to convert all applicable
    places to use that notation.

    Since there's a potential to break code with such changes (we'd have to
    fix any uses of sizeof on the struct type), it's been very far down the
    to-do list. But now it appears that we're taking risks if we *don't*
    change it.

        regards, tom lane
  • Andres Freund at Apr 5, 2013 at 10:40 pm

    On 2013-04-05 23:28:03 +0100, Peter Geoghegan wrote:
    On Fri, Apr 5, 2013 at 11:14 PM, Tom Lane wrote:
    It also strikes me that we ought to take this as a warning sign
    that we need to work on getting rid of coding like the above in favor
    of genuine "flexible arrays", before the gcc boys think of some other
    overly-cute optimization based on the assumption that an array declared
    with a fixed size really is fixed.
    The traditional argument against that has been that that's a C99
    feature. However, since it appears that even MSVC supports flexible
    arrays (which are described as a "Microsoft extension", so may not
    have identical semantics), it might be possible to do this across the
    board without contorting the code with preprocessor hacks. That's
    something that I'd certainly be in favor of pursuing.
    The respective macro magic is already in place, its just not used in all
    places. The problem is more that we can't easily use it in all places
    because e.g. in the one case mentioned here the array isn't in the last
    place *in the back branches*.

    Greetings,

    Andres Freund

    --
      Andres Freund http://www.2ndQuadrant.com/
      PostgreSQL Development, 24x7 Support, Training & Services
  • Peter Geoghegan at Apr 5, 2013 at 10:44 pm

    On Fri, Apr 5, 2013 at 11:39 PM, Andres Freund wrote:
    The respective macro magic is already in place, its just not used in all
    places. The problem is more that we can't easily use it in all places
    because e.g. in the one case mentioned here the array isn't in the last
    place *in the back branches*.
    Are you proposing that we use the FLEXIBLE_ARRAY_MEMBER macro in every
    single place where we currently use the one element array pattern? I
    count one place where we currently use FLEXIBLE_ARRAY_MEMBER. It'd be
    pretty ugly to have that everywhere, in my opinion.


    --
    Peter Geoghegan
  • Tom Lane at Apr 5, 2013 at 10:50 pm

    Peter Geoghegan writes:
    Are you proposing that we use the FLEXIBLE_ARRAY_MEMBER macro in every
    single place where we currently use the one element array pattern?
    Yup, exactly.
    I count one place where we currently use FLEXIBLE_ARRAY_MEMBER. It'd be
    pretty ugly to have that everywhere, in my opinion.
    Hm, I see 4 places in HEAD. But in any case, is

         int16 values[1]; /* VARIABLE LENGTH ARRAY */
    } int2vector; /* VARIABLE LENGTH STRUCT */

    really better than

         int16 values[FLEXIBLE_ARRAY_MEMBER];
    } int2vector;

    ? I don't think so. Relying on comments to tell about critical
    semantics of a data structure isn't really nice if you can do it
    in a way that is standards-blessed and (some) compilers understand.

        regards, tom lane
  • Peter Geoghegan at Apr 5, 2013 at 11:01 pm

    On Fri, Apr 5, 2013 at 11:50 PM, Tom Lane wrote:
    Hm, I see 4 places in HEAD. But in any case, is
    My mistake. I had REL9_2_STABLE checked out.
    int16 values[1]; /* VARIABLE LENGTH ARRAY */
    } int2vector; /* VARIABLE LENGTH STRUCT */

    really better than

    int16 values[FLEXIBLE_ARRAY_MEMBER];
    } int2vector;

    ? I don't think so.
    I can see your point. Now that I look at the comments beside
    FLEXIBLE_ARRAY_MEMBER, I see that indeed, as I suspected, the
    Microsoft flexible array members are not completely compatible with
    C99 style flexible arrays, so this may be the least-worst option.

    --
    Peter Geoghegan
  • Peter Eisentraut at Apr 6, 2013 at 1:50 am

    On Fri, 2013-04-05 at 23:44 +0100, Peter Geoghegan wrote:
    On Fri, Apr 5, 2013 at 11:39 PM, Andres Freund wrote:
    The respective macro magic is already in place, its just not used in all
    places. The problem is more that we can't easily use it in all places
    because e.g. in the one case mentioned here the array isn't in the last
    place *in the back branches*.
    Are you proposing that we use the FLEXIBLE_ARRAY_MEMBER macro in every
    single place where we currently use the one element array pattern? I
    count one place where we currently use FLEXIBLE_ARRAY_MEMBER. It'd be
    pretty ugly to have that everywhere, in my opinion.
    Background: The reason I put in that one use of FLEXIBLE_ARRAY_MEMBER is
    that at one point clang threw a warning about the old coding. There
    were no warnings about the other sites that use array[1].

    The reason that the whole code wasn't converted right away was (besides
    a lot of legwork with sizeof and offsetoff) that flexible array members
    aren't allowed in the middle of structs. Which eventually led to the
    mentioned commit 8137f2c32322c624e0431fac1621e8e9315202f9.

    If someone wants to go through and change the rest of the code to use
    FLEXIBLE_ARRAY_MEMBER, I won't mind. But I think it actually has
    nothing to do with the current bug or future-proofing anything. All
    compilers tolerate the current coding.
  • Tom Lane at Apr 6, 2013 at 4:59 pm

    Peter Eisentraut writes:
    The reason that the whole code wasn't converted right away was (besides
    a lot of legwork with sizeof and offsetoff) that flexible array members
    aren't allowed in the middle of structs. Which eventually led to the
    mentioned commit 8137f2c32322c624e0431fac1621e8e9315202f9.
    If someone wants to go through and change the rest of the code to use
    FLEXIBLE_ARRAY_MEMBER, I won't mind. But I think it actually has
    nothing to do with the current bug or future-proofing anything. All
    compilers tolerate the current coding.
    The reason I'm thinking it's a good idea is that it would expose any
    remaining places where we have nominally var-length arrays embedded in
    larger structs. Now that I've seen the failures with gcc 4.8.0, I'm
    quite worried that there might be some more declarations like that
    which we've not identified yet, but that by chance aren't causing
    obvious failures today. (This is also why I'm not that excited about
    trying to fix things "properly" in the back branches compared to
    selecting -fno-aggressive-loop-optimizations: I'm afraid there might
    be more to it than just the one commit.)

        regards, tom lane
  • Peter Eisentraut at Apr 30, 2013 at 3:37 am

    On Sat, 2013-04-06 at 12:59 -0400, Tom Lane wrote:
    The reason I'm thinking it's a good idea is that it would expose any
    remaining places where we have nominally var-length arrays embedded in
    larger structs. Now that I've seen the failures with gcc 4.8.0, I'm
    quite worried that there might be some more declarations like that
    which we've not identified yet, but that by chance aren't causing
    obvious failures today.
    Here is a rough patch that replaces almost all occurrences of
    something[1] in a struct with FLEXIBLE_ARRAY_MEMBER. It crashes left
    and right (because of sizeof issues, probably), but at least so far the
    compiler hasn't complained about any flexible-array members not at the
    end of the struct, which is what it did last time. So the answer to
    your concern so far is negative.

    Completing this patch will be quite a bit more debugging work. Some
    kind of electric fence for palloc would be helpful.
  • Andres Freund at Jul 23, 2013 at 8:30 am

    On 2013-04-29 23:37:43 -0400, Peter Eisentraut wrote:
    On Sat, 2013-04-06 at 12:59 -0400, Tom Lane wrote:
    The reason I'm thinking it's a good idea is that it would expose any
    remaining places where we have nominally var-length arrays embedded in
    larger structs. Now that I've seen the failures with gcc 4.8.0, I'm
    quite worried that there might be some more declarations like that
    which we've not identified yet, but that by chance aren't causing
    obvious failures today.
    Here is a rough patch that replaces almost all occurrences of
    something[1] in a struct with FLEXIBLE_ARRAY_MEMBER. It crashes left
    and right (because of sizeof issues, probably), but at least so far the
    compiler hasn't complained about any flexible-array members not at the
    end of the struct, which is what it did last time. So the answer to
    your concern so far is negative.
    I think this point in the cycle would be a good one to apply something
    like this.
    Completing this patch will be quite a bit more debugging work. Some
    kind of electric fence for palloc would be helpful.
    Noah's recently added valgrind mode should provide this.

    Do you have an updated version of this patch already? I'd be willing to
    make a pass over it to check whether I find any missed updates...

    Greetings,

    Andres Freund

    --
      Andres Freund http://www.2ndQuadrant.com/
      PostgreSQL Development, 24x7 Support, Training & Services
  • Tom Lane at Apr 5, 2013 at 10:45 pm

    Andres Freund writes:
    The respective macro magic is already in place, its just not used in all
    places. The problem is more that we can't easily use it in all places
    because e.g. in the one case mentioned here the array isn't in the last
    place *in the back branches*.
    I don't think we should try to back-patch such changes; there seems too
    much risk of breaking third-party code because of the sizeof() issue.
    But it'd be a good idea to have it in place before we find ourselves
    having to do -fno-aggressive-loop-optimizations or some such even in
    up-to-date branches.

    (I'm actually even more worried about gcc bugs that make this type of
    assumption than about intentional changes on their part.)

        regards, tom lane
  • Gavin Flower at Apr 5, 2013 at 10:29 pm

    On 06/04/13 11:14, Tom Lane wrote:
    After quite a bit of hair-pulling trying to install Fedora 19 Alpha,
    I've been able to reproduce the initdb-time failure that's currently
    being seen on buildfarm member anchovy, and was also complained of
    recently by an end user:
    http://www.postgresql.org/message-id/CAOD=oQ-kq3Eg5SOvRYOVxDuqibVWC8R0wEivPsMGcyzZY-nfzA@mail.gmail.com

    It is exactly what I suspected, namely that gcc 4.8.0 is applying an
    optimization that breaks our code; the odd thing though is that it's
    not breaking 9.2 or HEAD, just the older branches.

    It turns out that what's happening is that with
    -faggressive-loop-optimizations turned on (as it is by default),
    gcc decides that loops that iterate over the elements of an int2vector
    can iterate at most once, because int2vector is declared with a fixed
    size values[] array:

    int16 values[1]; /* VARIABLE LENGTH ARRAY */
    } int2vector; /* VARIABLE LENGTH STRUCT */

    Now, gcc does know better than to make such an assumption
    unconditionally, but what I discovered is that it *will* assume this if
    the int2vector is declared as a non-last element of a larger struct,
    so that (in gcc's little mind anyway) it couldn't possibly really be
    a variable-length array.

    In other words, the reason 9.2 and up don't break is commit
    8137f2c32322c624e0431fac1621e8e9315202f9, which arranged to hide
    non-fixed-offset catalog columns from the compiler. Without that,
    gcc decides that for instance pg_index.indkey cannot have more than one
    member. That breaks the loop in BuildIndexInfo() that copies the key
    column numbers into an IndexInfo, leading to the observed failure.

    Since gcc 4.8 is going to be on a lot of people's machines pretty soon,
    I think we need to do something to prevent it from breaking 8.4.x and
    9.0.x. It looks like our choices are (1) teach configure to enable
    -fno-aggressive-loop-optimizations if the compiler recognizes it,
    or (2) back-port commit 8137f2c32322c624e0431fac1621e8e9315202f9.

    I'm a bit leaning towards (1), mainly because I'm not excited about
    fighting a compiler arms race in the back branches.

    It also strikes me that we ought to take this as a warning sign
    that we need to work on getting rid of coding like the above in favor
    of genuine "flexible arrays", before the gcc boys think of some other
    overly-cute optimization based on the assumption that an array declared
    with a fixed size really is fixed.

    regards, tom lane
    I am probably missing something here!

    I would have thought it reasonable for a compiler assume 'an array
    declared with a fixed size really is fixed.'!

    Seems dangerous to play tricks like that (though I admit to doing nasty
    things like that in COBOL many many years ago!).


    Cheers,
    Gavin
  • Peter Eisentraut at Apr 6, 2013 at 1:45 am

    On Fri, 2013-04-05 at 18:14 -0400, Tom Lane wrote:
    Since gcc 4.8 is going to be on a lot of people's machines pretty
    soon,
    I think we need to do something to prevent it from breaking 8.4.x and
    9.0.x. It looks like our choices are (1) teach configure to enable
    -fno-aggressive-loop-optimizations if the compiler recognizes it,
    or (2) back-port commit 8137f2c32322c624e0431fac1621e8e9315202f9.
    Using a fixed-size struct member as a flexible one has always been a
    violation of the C standard, although a widely tolerated one. Doing
    that in the middle of a struct, however, is totally wrong, and the
    compiler is perfectly in its right to make a mess of it. Even flexible
    array members are not allowed in the middle of a struct.

    So I think this is not a compiler bug or an arms race. We just need to
    fix the code. So I'm in favor of backporting.
  • Robert Haas at Apr 7, 2013 at 1:11 am

    On Fri, Apr 5, 2013 at 9:45 PM, Peter Eisentraut wrote:
    So I think this is not a compiler bug or an arms race. We just need to
    fix the code. So I'm in favor of backporting.
    I can certainly see this argument. I understand Tom's point about an
    arms race, but back-porting this doesn't feel terribly risky to me.
    The thing is, if the arms race is escalating faster than we're
    comfortable with, we can always opt opt at a later time; it's not as
    if back-porting this fix now commits us irrevocably.

    Then, too, I tend to think this is more our fault than gcc's - for a change.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Peter Eisentraut at Apr 8, 2013 at 8:42 pm

    On 4/5/13 6:14 PM, Tom Lane wrote:
    Since gcc 4.8 is going to be on a lot of people's machines pretty soon,
    I think we need to do something to prevent it from breaking 8.4.x and
    9.0.x. It looks like our choices are (1) teach configure to enable
    -fno-aggressive-loop-optimizations if the compiler recognizes it,
    or (2) back-port commit 8137f2c32322c624e0431fac1621e8e9315202f9.

    I'm a bit leaning towards (1), mainly because I'm not excited about
    fighting a compiler arms race in the back branches.
    At the moment, I wouldn't do anything. At least until we have converted
    master to use flexible array members completely, and we have learned the
    extent of the issue.

    The problem manifests itself easily through the regression tests, so
    there is no guessing about whether a particular combination of versions
    will work. Someone who uses a cutting edge compiler with a somewhat old
    PG release is doing something special anyway, so they should have the
    required skills to put in the workaround.

    I would rather avoid patching in specific compiler options for specific
    versions. These things come and go, but releases live a long time. How
    do we know -fno-aggressive-loop-optimizations is the only option we need
    in the long run? I'd rather see a direct crash or a known code fix.

    As an aside, we already have -fno-strict-aliasing and -fwrapv. Add more
    and it will begin to read like

    -fmy-code -fis-broken -fhelp-me

    ;-)
  • Gavin Flower at Apr 8, 2013 at 8:53 pm

    On 09/04/13 08:41, Peter Eisentraut wrote:
    On 4/5/13 6:14 PM, Tom Lane wrote:
    Since gcc 4.8 is going to be on a lot of people's machines pretty soon,
    I think we need to do something to prevent it from breaking 8.4.x and
    9.0.x. It looks like our choices are (1) teach configure to enable
    -fno-aggressive-loop-optimizations if the compiler recognizes it,
    or (2) back-port commit 8137f2c32322c624e0431fac1621e8e9315202f9.

    I'm a bit leaning towards (1), mainly because I'm not excited about
    fighting a compiler arms race in the back branches.
    At the moment, I wouldn't do anything. At least until we have converted
    master to use flexible array members completely, and we have learned the
    extent of the issue.

    The problem manifests itself easily through the regression tests, so
    there is no guessing about whether a particular combination of versions
    will work. Someone who uses a cutting edge compiler with a somewhat old
    PG release is doing something special anyway, so they should have the
    required skills to put in the workaround.

    I would rather avoid patching in specific compiler options for specific
    versions. These things come and go, but releases live a long time. How
    do we know -fno-aggressive-loop-optimizations is the only option we need
    in the long run? I'd rather see a direct crash or a known code fix.

    As an aside, we already have -fno-strict-aliasing and -fwrapv. Add more
    and it will begin to read like

    -fmy-code -fis-broken -fhelp-me

    ;-)

    -fno-break-my-code
  • Greg Stark at Apr 30, 2013 at 3:04 pm

    On Fri, Apr 5, 2013 at 11:14 PM, Tom Lane wrote:
    Since gcc 4.8 is going to be on a lot of people's machines pretty soon,
    I think we need to do something to prevent it from breaking 8.4.x and
    9.0.x. It looks like our choices are (1) teach configure to enable
    -fno-aggressive-loop-optimizations if the compiler recognizes it,
    or (2) back-port commit 8137f2c32322c624e0431fac1621e8e9315202f9.

    I'm a bit leaning towards (1), mainly because I'm not excited about
    I'm confused. I would have described (1) as entering an arms race.
    Each new optimization related to arrays and structs would need a new
    flag. Whereas (2) makes the code pretty common traditional code that
    gcc is going to need to tolerate for the foreseeable future

    --
    greg

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedApr 5, '13 at 10:14p
activeJul 23, '13 at 8:30a
posts18
users7
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase