Hello !

Here is the problem :


We have a database that works perfectly on a Xserv G5, 10.4.8. (PPC) , but
encounter multiple postmaster and postgres crashes when we try it on an
intel mac.

We tried on a 10.4.8 intel xserv xeon, and on a 10.4.8 intel macbook pro,
with postgres8.1.5 and 8.2.1 ... with the same issues.

The postmaster is well launched, and no problem is logged.
But after a few times, when the machine begin to have many requests, i can
see almost regular crashes from some postgres or postmaster launched
process.

Postgres log is a bit helpless cause crashes seems to occure "randomly" with
every requests, even with the autovacuum.


The CrashLog log for every crashes one of this two codes

Exception: EXC_BAD_ACCESS (0x0001)
Codes: KERN_PROTECTION_FAILURE (0x0002) at 0x9003b7e0

Exception: EXC_BAD_ACCESS (0x0001)
Codes: KERN_PROTECTION_FAILURE (0x0002) at 0x00000000

I was thinking of some memory problem, so I tried to run the server without
any special configuration in postgresql.conf like shared_buffer (exept
allowing external connections) but, the problem still remains.


I don't know why it works on ppc and not on intel. Must I try to recompile
postgres from the sources with specials options ?

Does anyone have an idea or a hint ?
Thank you in advance !


Marc



PS : This actually occured on these platforms :

CPU : MacbookPro Core2duo & Xserve Xeon (both Intel cpus)
OS Version: 10.4.8 (Build 8N1051) & (Build 8N1215)
Postgres Version : 8.1.5 (Macport & Entropy ports) , 8.2.1 (Entropy port)

Search Discussions

  • Tom Lane at Jan 31, 2007 at 3:34 pm

    Marc Simonin writes:
    The CrashLog log for every crashes one of this two codes
    Exception: EXC_BAD_ACCESS (0x0001)
    Codes: KERN_PROTECTION_FAILURE (0x0002) at 0x9003b7e0
    Exception: EXC_BAD_ACCESS (0x0001)
    Codes: KERN_PROTECTION_FAILURE (0x0002) at 0x00000000
    Always those same addresses? If so I'd wonder about a corrupt-data
    problem. Can you get a stack trace from the core files?

    regards, tom lane
  • Marc Simonin at Feb 1, 2007 at 12:36 pm
    That's a point !
    Let's see ...
    In fact I found that with 69 crashs (sic!) , I had only 8 differents
    adresses.


    But if it's really a currupt-data related problem, could the same base
    really works without errors on another platform (PPC G5) ?

    Knowing that the crashing database is a clean new install from postgres,
    where I then load the base from another machine with a pg_dumpall | psql . I
    always do this way cause I never had this kind a problem, but I'm a newb in
    the great PG world :-)


    Best regards,
    Marc Simonin



    All the crash look like this one. Some process terminated by signal 10.
    ****************
    Log file

    CETLOG: autovacuum: processing database "bozo"
    CETLOG: autovacuum process (PID 24305) was terminated by signal 10
    CETLOG: terminating any other active server processes
    CETWARNING: terminating connection because of crash of another server
    process
    CETDETAIL: The postmaster has commanded this server process to roll back
    the current transaction and exit, because another server process exited
    abnormally and possibly
    corrupted shared memory.
    CETHINT: In a moment you should be able to reconnect to the database and
    repeat your command.
    CETLOG: all server processes terminated; reinitializing

    ***************


    In fact ... I don't really know where to find this stack trace :-D
    But I put here one of the Crash logs and put attached the gzipped entire
    Crashlog file (if it pass through the mailing list !).


    Hope it can help !

    ***************

    Command: postmaster
    Path: /opt/local/lib/pgsql8/bin/postmaster
    Parent: postmaster [120]

    Version: ??? (???)

    PID: 18876
    Thread: 0

    Exception: EXC_BAD_ACCESS (0x0001)
    Codes: KERN_PROTECTION_FAILURE (0x0002) at 0x9003b7bc

    Thread 0 Crashed:
    0 postmaster 0x002635f1 AllocSetAlloc + 1158
    1 postmaster 0x0026407d MemoryContextAllocZero + 105
    2 postmaster 0x0023b74b InitCatCache + 189
    3 postmaster 0x00244110 InitCatalogCache + 281
    4 postmaster 0x00256d65 InitPostgres + 710
    5 postmaster 0x001ab757 PostgresMain + 4366
    6 postmaster 0x00176ca7 BackendRun + 2173
    7 postmaster 0x00176141 BackendStartup + 197
    8 postmaster 0x00173a1b ServerLoop + 614
    9 postmaster 0x001731be PostmasterMain + 4390
    10 postmaster 0x0011c96a main + 660
    11 postmaster 0x0000196a _start + 216
    12 postmaster 0x00001891 start + 41

    Thread 0 crashed with X86 Thread State (32-bit):
    eax: 0x00000000 ebx: 0x00263179 ecx: 0x0001dfe0 edx: 0x9003b7bc
    edi: 0x002efd90 esi: 0x0000000b ebp: 0xbfffedf8 esp: 0xbfffed90
    ss: 0x0000001f efl: 0x00010206 eip: 0x002635f1 cs: 0x00000017
    ds: 0x0000001f es: 0x0000001f fs: 0x00000000 gs: 0x00000037

    Binary Images Description:
    0x1000 - 0x2f0fff postmaster /opt/local/lib/pgsql8/bin/postmaster
    0x387000 - 0x3b9fff libssl.0.9.8.dylib
    /opt/local/lib/libssl.0.9.8.dylib
    0x3cc000 - 0x3ddfff libz.1.dylib /opt/local/lib/libz.1.dylib
    0x505000 - 0x5f4fff libcrypto.0.9.8.dylib
    /opt/local/lib/libcrypto.0.9.8.dylib
    0x65d000 - 0x67afff libreadline.5.1.dylib
    /opt/local/lib/libreadline.5.1.dylib
    0x8fe00000 - 0x8fe49fff dyld 46.1 /usr/lib/dyld
    0x90000000 - 0x9016ffff libSystem.B.dylib /usr/lib/libSystem.B.dylib
    0x901bf000 - 0x901c1fff libmathCommon.A.dylib
    /usr/lib/system/libmathCommon.A.dylib
    0x90bcf000 - 0x90bd6fff libgcc_s.1.dylib /usr/lib/libgcc_s.1.dylib
    0x94960000 - 0x9497dfff libresolv.9.dylib /usr/lib/libresolv.9.dylib
    0x95a2e000 - 0x95a5cfff libncurses.5.4.dylib
    /usr/lib/libncurses.5.4.dylib



    « Tom Lane » <tgl@sss.pgh.pa.us> a écrit :
    Marc Simonin <m.simonin@allibert-trekking.com> writes:
    The CrashLog log for every crashes one of this two codes
    Exception: EXC_BAD_ACCESS (0x0001)
    Codes: KERN_PROTECTION_FAILURE (0x0002) at 0x9003b7e0
    Exception: EXC_BAD_ACCESS (0x0001)
    Codes: KERN_PROTECTION_FAILURE (0x0002) at 0x00000000
    Always those same addresses? If so I'd wonder about a corrupt-data
    problem. Can you get a stack trace from the core files?

    regards, tom lane
  • Tom Lane at Feb 1, 2007 at 5:29 pm

    Marc Simonin writes:
    But I put here one of the Crash logs and put attached the gzipped entire
    Crashlog file (if it pass through the mailing list !).
    Wow, those stack traces are all over the map, aren't they. Either you
    are hitting a dozen different Postgres bugs that no one else has ever
    seen, or you've got a flaky machine. I think the second is considerably
    more likely --- especially since several of the crashes are in startup
    code that every backend process ought to execute exactly the same way
    every time.

    Perhaps bad RAM, or a bad motherboard? I've also seen machines go nuts
    like this if the fan froze up, allowing the CPU to overheat. Anyway,
    take it back to Apple ... I hope it's still under warranty ...

    regards, tom lane
  • Fabrice Vincent at Feb 2, 2007 at 6:24 pm
    Hi,

    Marc is unavailable today so I take over in order to move forward with our
    crash issue.

    Tom, it is very unlikely that the issue is located with the hardware as we
    tested on 2 brand new hardware and both exibit exactly the same symptoms
    despite they are differents models...

    Would you have any other idea of where to look for the cause of these crash?
    For example would it be possible that the crash would be caused by some
    system library rather than the postgres code itself?
    Also, is there any debugging option we could turn on on the faulty systems
    in order to pin point where is located the bug?

    Thanks a million for your help.

    Best regards.
    Fabrice

    De : Tom Lane <tgl@sss.pgh.pa.us>
    Date : Thu, 01 Feb 2007 12:29:01 -0500
    À : Marc Simonin <m.simonin@allibert-trekking.com>
    Cc : <pgsql-ports@postgresql.org>
    Objet : Re: [PORTS] Multiple Crashs on OSX Intel

    Marc Simonin <m.simonin@allibert-trekking.com> writes:
    But I put here one of the Crash logs and put attached the gzipped entire
    Crashlog file (if it pass through the mailing list !).
    Wow, those stack traces are all over the map, aren't they. Either you
    are hitting a dozen different Postgres bugs that no one else has ever
    seen, or you've got a flaky machine. I think the second is considerably
    more likely --- especially since several of the crashes are in startup
    code that every backend process ought to execute exactly the same way
    every time.

    Perhaps bad RAM, or a bad motherboard? I've also seen machines go nuts
    like this if the fan froze up, allowing the CPU to overheat. Anyway,
    take it back to Apple ... I hope it's still under warranty ...

    regards, tom lane

    ---------------------------(end of broadcast)---------------------------
    TIP 5: don't forget to increase your free space map settings
  • Tom Lane at Feb 2, 2007 at 6:20 pm

    Fabrice Vincent writes:
    Tom, it is very unlikely that the issue is located with the hardware as we
    tested on 2 brand new hardware and both exibit exactly the same symptoms
    despite they are differents models...
    [ shrug... ] It's not impossible that you've got two lemons ... stranger
    things have happened. One pretty obvious opportunity for a common-mode
    failure is if you loaded them up with RAM chips from the same batch.

    The symptoms shown in your crashreporter logs don't look anything like
    a software problem to me: they're not consistent, and a lot of the
    crashes are in code that is exercised exactly the same way on every
    process start. Also, we're not seeing reports of similar problems from
    anyone else running PG on Intel Mac; which is definitely a nonempty
    population --- there's one in the buildfarm for instance, and it's
    showing zero failures in the back branches:
    http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=jackal&br=REL8_1_STABLE

    So I'm going to stick to my bet that it's a hardware problem.

    regards, tom lane

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-ports @
categoriespostgresql
postedJan 31, '07 at 1:52p
activeFeb 2, '07 at 6:24p
posts6
users3
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase