On Tue, Sep 6, 2011 at 6:05 PM, Tom Lane wrote:
Robert Haas <email@example.com> writes:
On Tue, Sep 6, 2011 at 5:34 PM, Tom Lane wrote:
And I doubt
that the goal is worth taking risks for.
I am unable to count the number of times that I have had a customer
come to me and say "well, the backend crashed". And I go look at
their logs and I have no idea what happened.
gdb and print debug_query_string?
Surely you're kidding. These are customer systems which I frequently
don't even have access to. They don't always have gdb installed
(sometimes they are Windows systems) and if they do the customer isn't
likely to know how to use it, and even if they do they don't think the
better of us for needing such a tool to troubleshoot a crash. Even if
none of that were an issue, gdb is only going to work if you attach it
before the crash or have a core dump available. Typically you don't
know the crash is going to happen and core dumps aren't enabled
I don't dispute that this would be nice to have. But I don't think that
it's sane to compromise the postmaster's reliability in order to print
information of doubtful accuracy.
In practice, I think very few crashes will clobber it. A lot of
crashes are going to be caused by a null pointer deference in some
random part of the program, an assertion failure, the OOM killer, etc.
It's certainly POSSIBLE that it could get clobbered, but it shouldn't
be very likely; and as Marti says, with proper defensive coding, the
worst case scenario if it does happen should be some log garbage.
If you want to do something that doesn't violate the system's basic
design goals, think about setting up a SIGSEGV handler that tries to
print debug_query_string via elog before crashing. It might well crash
too, but it won't be risking taking out more of the database with it.
I don't think that's adequate. You need to trap a lot more than just
SIGSEGV to catch all the crashes - there's also SIGABRT and SIGILL and
a bunch of other ones, including SIGKILL. I think you really, really
need something that executes outside the context of the dying process.
TBH, I'm very unclear what could cause the postmaster to go belly-up
copying a bounded amount of data out of shared memory for logging
purposes only. It's surely possible to make the code safe against any
sequence of bytes that might be found there. The only real danger
seems to be that the memory access itself might trigger a segmentation
fault of some sort - but how is that going to happen? The child can't
unmap the address space in the parent, can it? If it's a real danger,
perhaps we could fork off a dedicated child process just to read the
relevant portion of shared memory and emit a log message - but I'm not
seeing what plausible scenario that would guard against.