Following up on my own message. The version with full debugging support
works flawlessly. Compiling with partial debug gives:
--------------------------------------
Core file created by program "postgres"
signal Bus error at [interval_accum:1575 +0x8,0x1201d7848]
Source not available
warning: Files compiled -g3: parameter values probably wrong
dbx) t
0 interval_accum(fcinfo = 0x14019b140) ["timestamp.c":1575, 0x1201d7848]
1 advance_transition_function(peraggstate = 0x14019ac60,
newVal = (unallocated - symbol optimized away),
isNull = (unallocated - symbol optimized away))
["nodeAgg.c":283, 0x12010e67c]
2 ExecAgg(node = 0x14019a2d0) ["nodeAgg.c":555, 0x12010eb6c]
3 ExecProcNode(node = 0x14019a2d0, parent = (nil))
["execProcnode.c":347, 0x12010791c]
4 ExecutePlan(estate = 0x14019a970, plan = 0x14019a2d0,
operation = (unallocated - symbol optimized away), numberTuples = 0,
direction = 708, destfunc = 0x14019b1f0)
["execMain.c":976, 0x1201058e4]
5 ExecutorRun(queryDesc = 0x14019b1f0, estate = 0x14019a970,
feature = (unallocated - symbol optimized away), count = 0)
["execMain.c":199, 0x12010491c]
6 ProcessQuery(parsetree = 0x14019a938, plan = 0x14019a970,
dest = (unallocated - symbol optimized away))
["pquery.c":293, 0x120192ba4]
7 pg_exec_query_string(query_string = 0x1401a1060 =
"select avg(f1) from interval_tbl;",
dest = (unallocated - symbol optimized away),
parse_context = 0x140109340)
["postgres.c":782, 0x120190424]
----------------------------
looking at line 1575 of timestamp.c I see :
/*
* XXX memcpy, instead of just extracting a pointer, to work around
* buggy array code: it won't ensure proper alignment of Interval
* objects on machines where double requires 8-byte alignment. That
* should be fixed, but in the meantime...
*/
memcpy(&sumX, DatumGetIntervalP(transdatums[0]), sizeof(Interval));
memcpy(&N, DatumGetIntervalP(transdatums[1]), sizeof(Interval));

I have no idea what array code is buggy but for now that this does not
work with Compaq's compiler at -O4. I will try to find out, what lower
optimization level, if any, will hide the problem.

Regards, Bernd
Am 14:01 16.11.01 -0500 schrieb Tom Lane:
"Tegge, Bernd" <tegge@repas-aeg.de> writes:
The interval test fails with the following msg:
--- 216,222 ----
-- known to change the allowed input syntax for type interval without
-- updating pg_aggregate.agginitval
select avg(f1) from interval_tbl;
! server closed the connection unexpectedly
This is bad :-(

I tried to reproduce the problem on the Alpha available at
SourceForge's compile farm. No luck --- regression tests run
perfectly there, at least with vanilla configuration (I used
"configure --enable-cassert"). So it doesn't seem hardware-
specific, but perhaps it depends on the OS.
Did you use gcc or Compaq's Alpha-Compilers ?

warp.dr.repas.de$ uname -a
OSF1 warp.dr.repas.de V5.0 910 alpha
warp.dr.repas.de$ cc -V
cc (cc)
Tru64 UNIX Compiler Driver 5.0
Compaq C V6.1-011 on Digital UNIX V5.0 (Rev. 910)
warp.dr.repas.de$ flex -V
flex version 2.5.4
warp.dr.repas.de$ bison -V
GNU Bison version 1.28

No core file and I don't know which of the many error messages in
postmaster.log are normal and which are not.
I've run the tests with debug enabled, and I see no further output
after
DEBUG: query: select avg(f1) from interval_tbl;
Is there not even a report of the backend crashing? If the postmaster
did not log a child-exit message then there's something more than a
plain old backend crash here.
Interval normally runs in a group of multiple parallel tests. I changed
the setup so that it ran as the only test. The result:
--------------------------------
test interval ...
Unaligned access pid=339539 <postgres> va=0x14019b 1bc pc=0x1201d7844
ra=0x1201d77e0 inst=0x8c000000
FAILED
DEBUG: plan: { AGG :startup_cost 22.50 :total_cost 22.50 :rows 1 :width 12
:qptargetlist ({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 1186
:restypmod -1 :resname avg :reskey
0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { AGGREG :aggname
avg :basetype 1186 :aggtype 1186 :target { VAR :varno 0 :varattno 1
:vartype 1186 :vartypmod -1 :varlev
elsup 0 :varnoold 1 :varoattno 1} :aggstar false :aggdistinct false }})
:qpqual <> :lefttree { SEQSCAN :startup_cost 0.00 :total_cost 20.00 :rows
1000 :width 12 :qptargetlist ({ T
ARGETENTRY :resdom { RESDOM :resno 1 :restype 1186 :restypmod -1 :resname
<> :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR
:varno 1 :varattno 1 :vartype 11
86 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 1}}) :qpqual <>
:lefttree <> :righttree <> :extprm () :locprm () :initplan <> :nprm
0 :scanrelid 1 } :righttree <> :extprm
() :locprm () :initplan <> :nprm 0 }
DEBUG: ProcessQuery
DEBUG: reaping dead processes
DEBUG: child process (pid 339539) was terminated by signal 10
--------------------------------------
It's been a long time since I have seen unaligned access messages, but
then I haven't done much development on OSF/1 for quite some time.
AFAIR this is only a warning, so the problem may lay elsewhere.
After some rtfm I found the core file and created a backtrace.
No source unfortunately, I'll have to build it with different compiler
option for that.

dbx version 5.0
Type 'help' for help.
Core file created by program "postgres"

signal Bus error at >*[interval_accum, 0x1201d7848] stt $f0, 8(sp)
(dbx) t
0 interval_accum(0x0, 0x0, 0x0, 0x100000002, 0x1401c1038) [0x1201d7848]
1 (unknown)() [0x12010e67c]
2 ExecAgg(0x1, 0x0, 0x14019abd8, 0x140175000, 0x1200aae50) [0x12010eb6c]
3 ExecProcNode(0x1200aae50, 0x4a2, 0x1201058e8, 0x14019a970,
0x14019a970) [0x12010791c]
4 (unknown)() [0x1201058e4]
5 ExecutorRun(0x14019a970, 0x2, 0x1, 0x0, 0x0) [0x12010491c]
6 ProcessQuery(0x140081ba8, 0x14019af88, 0x0, 0x140078440, 0x0)
[0x120192ba4]
7 pg_exec_query_string(0x1401a1060, 0x140109340, 0x100000002,
0x1401a1588, 0x1401a1588) [0x120190424]
8 PostgresMain(0x11fffaa08, 0x1400e1ba9, 0x100000005, 0x1400deb20, 0x0)
[0x1201922a4]
9 (unknown)() [0x12016063c]
10 (unknown)() [0x12015fb0c]
11 (unknown)() [0x12015e40c]
12 PostmasterMain(0x0, 0x0, 0x0, 0x0, 0x0) [0x12015e054]
13 main(0x1, 0x0, 0x12940, 0x400000006, 0x12004acc0) [0x12012718c]


regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Search Discussions

  • Tom Lane at Nov 19, 2001 at 4:31 pm

    "Tegge, Bernd" <tegge@repas-aeg.de> writes:
    Following up on my own message. The version with full debugging support
    works flawlessly. Compiling with partial debug gives:
    0 interval_accum(fcinfo = 0x14019b140) ["timestamp.c":1575, 0x1201d7848]
    Hmm. The "memcpy" at that line is intended specifically to get around
    any possible misaligned-pointer problem. I have a nasty feeling that
    Compaq's compiler is misoptimizing the memcpy into a
    load-and-store-double kind of instruction sequence. Can you pull out
    the generated assembly code for that routine so we can look?
    I have no idea what array code is buggy but for now that this does not
    work with Compaq's compiler at -O4. I will try to find out, what lower
    optimization level, if any, will hide the problem.
    It could indeed be an overoptimization issue. We could certainly reduce
    the -O4 in template/osf (is that the one your system uses?) if that
    helps.

    regards, tom lane
  • Tegge, Bernd at Nov 19, 2001 at 6:58 pm

    Am 11:38 19.11.01 -0500 schrieb Tom Lane:
    Hmm. The "memcpy" at that line is intended specifically to get around
    any possible misaligned-pointer problem. I have a nasty feeling that
    Compaq's compiler is misoptimizing the memcpy into a
    load-and-store-double kind of instruction sequence.
    I wouldn't be surprised. Memcpy et al. are being inlined at even O-level 1.
    The optimizer might notice a data size of 8 and reduce the code to a single
    load/store operation. However, this is usually only a performance problem.
    You get an alignment exception and the exception handler corrects the
    fetches/writes the missing part ( besides writing the address to stdout).

    BTW. : The Linux version of the Compaq Compiler is available for download
    from http://www.unix.digital.com/linux/compaq_c/index.html. You can even
    sign up for a test account on an True64 or Linux/Alpha system.

    If it is doing that, perhaps it could be convinced not to with
    explicit casts, say

    memcpy((void *) &sumX, (void *) DatumGetIntervalP(transdatums[0]),
    sizeof(Interval));
    memcpy((void *) &N, (void *) DatumGetIntervalP(transdatums[1]),
    sizeof(Interval));
    Nope, this does not fix it. Compiling timestamp.c with the additional
    options "-noinline -nointrinsic" does, but I would still hesitate to
    use this on the whole project.

    (note similar code in interval_avg would also need to be fixed). If
    that works it'd be nicer than making a global reduction in optimization
    level...
    I've got a rather ugly but usable workaround. See attached timestamp.c
    If you can think of anything better, I'm open to suggestions.
  • Tom Lane at Nov 19, 2001 at 7:28 pm

    "Tegge, Bernd" <tegge@repas-aeg.de> writes:
    I've got a rather ugly but usable workaround. See attached timestamp.c
    My, that *is* ugly. Surely there's gotta be something cleaner.

    I don't quite understand how it is that the Compaq compiler works at
    all, if it thinks it can optimize random memcpy operations into
    opcodes that assume aligned addresses. We should be coredumping in a
    lot more places than just this. Since we're not, there's got to be
    some fairly straightforward way of defeating the optimization.
    The extra memcpy looks to me like black magic that doesn't really have
    anything directly to do with the problem.

    I'm surprised that the (void *) cast didn't fix it. Perhaps it would
    work to use DatumGetPointer rather than DatumGetIntervalP --- that is,
    never give the compiler any hint that the source might be considered
    double-aligned in the first place.

    regards, tom lane
  • Tegge, Bernd at Nov 21, 2001 at 8:45 am

    At 14:27 19.11.01 -0500, Tom Lane wrote:
    "Tegge, Bernd" <tegge@repas-aeg.de> writes:
    I've got a rather ugly but usable workaround. See attached timestamp.c
    My, that *is* ugly. Surely there's gotta be something cleaner.

    I don't quite understand how it is that the Compaq compiler works at
    all, if it thinks it can optimize random memcpy operations into
    opcodes that assume aligned addresses.
    Well, if both operands are ptr to double and the compiler/runtime
    system aligns doubles except on explicit request (frex #pragma noalign)
    the optimizer probably thought it was safe to replace the memcpy by a
    load/store double operation. It probably should not have done this
    after casting the pointers to void*, but it did ...
    We should be coredumping in a
    lot more places than just this. Since we're not, there's got to be
    some fairly straightforward way of defeating the optimization.
    The extra memcpy looks to me like black magic that doesn't really have
    anything directly to do with the problem.
    I had to use the temp vars after the assignment. Otherwise the compiler
    optimized them away. Sometimes this thing is amazing.

    I'm surprised that the (void *) cast didn't fix it. Perhaps it would
    work to use DatumGetPointer rather than DatumGetIntervalP --- that is,
    never give the compiler any hint that the source might be considered
    double-aligned in the first place.
    Thanks, *that* did it. We should just extend the comment block above
    to going back to DatumGetIntervalP if the array code ever gets fixed.
  • Tom Lane at Nov 21, 2001 at 2:37 pm

    "Tegge, Bernd" <tegge@repas-aeg.de> writes:
    Thanks, *that* did it. We should just extend the comment block above
    to going back to DatumGetIntervalP if the array code ever gets fixed.
    Excellent, I'll apply the patch.

    regards, tom lane

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-ports @
categoriespostgresql
postedNov 19, '01 at 4:21p
activeNov 21, '01 at 2:37p
posts6
users2
websitepostgresql.org
irc#postgresql

2 users in discussion

Tom Lane: 3 posts Tegge, Bernd: 3 posts

People

Translate

site design / logo © 2022 Grokbase