On Fri, 2009-01-16 at 18:43 -0500, Tom Lane wrote:

What is happening is that autovacuum_do_vac_analyze contains

old_cxt = MemoryContextSwitchTo(AutovacMemCxt);
...
vacuum(vacstmt, relids);
...
MemoryContextSwitchTo(old_cxt);

and at the time it is called by process_whole_db, CurrentMemoryContext
points at TopTransactionContext. Which gets destroyed because
vacuum()
internally finishes that transaction and starts a new one. When we
come out of vacuum(), CurrentMemoryContext again points at
TopTransactionContext, but *its not the same one*. The closing
MemoryContextSwitchTo is installing a stale pointer, which then
remains active into CommitTransaction. It's a wonder this code ever
works.
Can we add something to memory contexts to make this fail every time?

Seems like we should be able to detect this class of error. If we can't
it seems likely that a number of similar cases exist with other
contexts.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

Search Discussions

  • Tom Lane at Jan 17, 2009 at 4:35 pm

    Simon Riggs writes:
    Can we add something to memory contexts to make this fail every time?
    No, not really. AFAICS the reason for Alvaro not seeing it must be that
    on his machine the new transaction happens to allocate its
    TopTransactionContext control block right in the same place where the
    old one was.

    We could have a debugging mode in which pfree'd space is never recycled
    for reuse (just filled with 0xdeadbeef or whatever and left to sit).
    But it would not be practical for anything except short debugging
    sessions. In fact, for a case like this, only *very* short debugging
    sessions, because you couldn't even recycle after transaction commit.
    So the chance of finding problems in seldom-exercised code paths, as
    this one is, would be pretty small anyway.

    regards, tom lane
  • Simon Riggs at Jan 17, 2009 at 5:00 pm

    On Sat, 2009-01-17 at 11:35 -0500, Tom Lane wrote:
    Simon Riggs <simon@2ndQuadrant.com> writes:
    Can we add something to memory contexts to make this fail every time?
    No, not really. AFAICS the reason for Alvaro not seeing it must be that
    on his machine the new transaction happens to allocate its
    TopTransactionContext control block right in the same place where the
    old one was.
    Can we put a identifier into header of each control block, an ascending
    value that is unlikely duplicate between calls? That way we'd be able to
    tell immediately it wasn't the same block, even if it is in the same
    place. (In assert mode only, of course).

    --
    Simon Riggs www.2ndQuadrant.com
    PostgreSQL Training, Services and Support
  • Tom Lane at Jan 17, 2009 at 6:58 pm

    Simon Riggs writes:
    Can we put a identifier into header of each control block, an ascending
    value that is unlikely duplicate between calls? That way we'd be able to
    tell immediately it wasn't the same block,
    Same block than what? Unless you can somehow hide that ID number in
    a MemoryContext pointer, the staleness of the pointer won't be evident.

    regards, tom lane
  • Alvaro Herrera at Jan 17, 2009 at 5:16 pm

    Tom Lane wrote:
    Simon Riggs <simon@2ndQuadrant.com> writes:
    Can we add something to memory contexts to make this fail every time?
    No, not really. AFAICS the reason for Alvaro not seeing it must be that
    on his machine the new transaction happens to allocate its
    TopTransactionContext control block right in the same place where the
    old one was.
    But freed memory is clobbered, so if we were to have an assert that
    checks the node tag, it should show up. In fact, we do have such an
    assert, but only for compilers other than GCC, because the inline
    version of palloc() cannot have it for lack of infrastructure.

    Maybe we should patch palloc.h so that it only uses the inline version
    when not in assert mode. Something like the attached, except that for
    some reason the test case still fails to report any error for me :-(
    Maybe the cpp boolean logic is fooling me.

    --
    Alvaro Herrera http://www.CommandPrompt.com/
    PostgreSQL Replication, Consulting, Custom Development, 24x7 support
  • Tom Lane at Jan 17, 2009 at 7:04 pm

    Alvaro Herrera writes:
    Tom Lane wrote:
    No, not really. AFAICS the reason for Alvaro not seeing it must be that
    on his machine the new transaction happens to allocate its
    TopTransactionContext control block right in the same place where the
    old one was.
    But freed memory is clobbered, so if we were to have an assert that
    checks the node tag, it should show up. In fact, we do have such an
    assert, but only for compilers other than GCC, because the inline
    version of palloc() cannot have it for lack of infrastructure.
    Well, but production installations don't have either memory clobbering
    or Asserts, so fooling with that wouldn't have helped anyway. I suspect
    what really happened here is that the bug was created by some late
    change during 8.1 development, and nobody ever exercised the
    anti-wraparound code path after that in an assert-enabled build :-(
    In a non-assert build there's a fairly good chance that it'd still
    work because the context header would still be there undamaged.

    One thing we could try that would cost a lot less than de-inlining
    palloc is to have MemoryContextDelete intentionally zero the methods
    pointer. That still does nothing for the create-new-context-in-same-
    spot issue, but at least it would catch palloc in a context that had
    been deleted and not yet recycled.

    regards, tom lane
  • Alvaro Herrera at Jan 20, 2009 at 12:23 pm

    Tom Lane wrote:
    Alvaro Herrera <alvherre@commandprompt.com> writes:
    But freed memory is clobbered, so if we were to have an assert that
    checks the node tag, it should show up. In fact, we do have such an
    assert, but only for compilers other than GCC, because the inline
    version of palloc() cannot have it for lack of infrastructure.
    Well, but production installations don't have either memory clobbering
    or Asserts, so fooling with that wouldn't have helped anyway. I suspect
    what really happened here is that the bug was created by some late
    change during 8.1 development, and nobody ever exercised the
    anti-wraparound code path after that in an assert-enabled build :-(
    In a non-assert build there's a fairly good chance that it'd still
    work because the context header would still be there undamaged.
    Well, my builds are all assert-enabled, and I still wasn't able to make
    it crash in any way (the new context being allocated in the same
    position as the old one is the only explanation I have, but I did not
    investigate whether that's what happening). Maybe Greg Stark's idea of
    offsetting pointers returned by palloc could have helped to find the
    problem from the outset.

    --
    Alvaro Herrera http://www.CommandPrompt.com/
    PostgreSQL Replication, Consulting, Custom Development, 24x7 support
  • Gregory Stark at Jan 17, 2009 at 5:55 pm

    Tom Lane writes:

    Simon Riggs <simon@2ndQuadrant.com> writes:
    Can we add something to memory contexts to make this fail every time?
    No, not really. AFAICS the reason for Alvaro not seeing it must be that
    on his machine the new transaction happens to allocate its
    TopTransactionContext control block right in the same place where the
    old one was.

    We could have a debugging mode in which pfree'd space is never recycled
    for reuse (just filled with 0xdeadbeef or whatever and left to sit).
    Hm, I wonder how much more practical it would be if we recycled it but offset
    the pointer by maxalign. We would waste 4/8 bytes per palloc/free cycle
    instead of the whole chunk.

    (Whether palloc could actually do this would be another question, there are
    lots of allocator algorithms that wouldn't be able to, I think.)

    If we had a more formalized save_memory_context()/restore_memory_context()
    which gave you more than just a pointer we could do better for this particular
    case. save_memory_context() could hand you a struct with a pointer as well as
    some sanity check values about the context you're saving.

    --
    Gregory Stark
    EnterpriseDB http://www.enterprisedb.com
    Ask me about EnterpriseDB's On-Demand Production Tuning

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedJan 17, '09 at 8:53a
activeJan 20, '09 at 12:23p
posts8
users4
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase