COPY IN loops in heap_multi_insert() extending the table until it fills the
disk when trying to insert a wide row into a table with a low fill-factor.
Internally fill-factor is implemented by reserving some space space on a
page. For large enough rows and small enough fill-factor bulk_multi_insert()
can't fit the row even on a new empty page, so it keeps allocating new pages
but is never able to place the row. It should always put at least one row on
an empty page.

In the excerpt below saveFreeSpace is the reserved space for the fill-factor.

while (ndone < ntuples)
{ ...
/*
* Find buffer where at least the next tuple will fit. If the page is
* all-visible, this will also pin the requisite visibility map page.
*/
buffer = RelationGetBufferForTuple(relation, heaptuples[ndone]->t_len,
...
/* Put as many tuples as fit on this page */
for (nthispage = 0; ndone + nthispage < ntuples; nthispage++)
{
HeapTuple heaptup = heaptuples[ndone + nthispage];

if (PageGetHeapFreeSpace(page) < MAXALIGN(heaptup->t_len) + saveFreeSpace)
break;

RelationPutHeapTuple(relation, buffer, heaptup);
}
...Do a bunch of dirtying and logging etc ...
}

This was introduced in 9.2 as part of the bulk insert speedup.

One more point, in the case where we don't insert any rows, we still do all the
dirtying and logging work even though we did not modify the page. I have tried
skip all this if no rows are added (nthispage == 0), but my access method foo
is sadly out of date, so someone should take a skeptical look at that.

A test case and patch against 9.2.2 is attached. It fixes the problem and passes
make check. Most of the diff is just indentation changes. Whoever tries this will
want to test this on a small partition by itself.

-dg

--
David Gould 510 282 0869 daveg@sonic.net
If simplicity worked, the world would be overrun with insects.

Search Discussions

  • Andres Freund at Dec 12, 2012 at 11:27 am

    On 2012-12-12 03:04:19 -0800, David Gould wrote:
    COPY IN loops in heap_multi_insert() extending the table until it fills the
    disk when trying to insert a wide row into a table with a low fill-factor.
    Internally fill-factor is implemented by reserving some space space on a
    page. For large enough rows and small enough fill-factor bulk_multi_insert()
    can't fit the row even on a new empty page, so it keeps allocating new pages
    but is never able to place the row. It should always put at least one row on
    an empty page.
    Heh. Nice one. Did you hit that in practice?
    One more point, in the case where we don't insert any rows, we still do all the
    dirtying and logging work even though we did not modify the page. I have tried
    skip all this if no rows are added (nthispage == 0), but my access method foo
    is sadly out of date, so someone should take a skeptical look at that.

    A test case and patch against 9.2.2 is attached. It fixes the problem and passes
    make check. Most of the diff is just indentation changes. Whoever tries this will
    want to test this on a small partition by itself.
    ISTM this would be fixed with a smaller footprint by just making

    if (PageGetHeapFreeSpace(page) < MAXALIGN(heaptup->t_len) + saveFreeSpace)

    if (!PageIsEmpty(page) &&
    PageGetHeapFreeSpace(page) < MAXALIGN(heaptup->t_len) + saveFreeSpace)

    I think that should work?

    Greetings,

    Andres Freund

    --
    Andres Freund http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
  • Heikki Linnakangas at Dec 12, 2012 at 11:56 am

    On 12.12.2012 13:27, Andres Freund wrote:
    On 2012-12-12 03:04:19 -0800, David Gould wrote:
    One more point, in the case where we don't insert any rows, we still do all the
    dirtying and logging work even though we did not modify the page. I have tried
    skip all this if no rows are added (nthispage == 0), but my access method foo
    is sadly out of date, so someone should take a skeptical look at that.

    A test case and patch against 9.2.2 is attached. It fixes the problem and passes
    make check. Most of the diff is just indentation changes. Whoever tries this will
    want to test this on a small partition by itself.
    ISTM this would be fixed with a smaller footprint by just making

    if (PageGetHeapFreeSpace(page)< MAXALIGN(heaptup->t_len) + saveFreeSpace)

    if (!PageIsEmpty(page)&&
    PageGetHeapFreeSpace(page)< MAXALIGN(heaptup->t_len) + saveFreeSpace)

    I think that should work?
    Yeah, seems that it should, although PageIsEmpty() is no guarantee that
    the tuple fits, because even though PageIsEmpty() returns true, there
    might be dead line pointers consuming so much space that the tuple at
    hand doesn't fit. However, RelationGetBufferForTuple() won't return such
    a page, it guarantees that the first tuple does indeed fit on the page
    it returns. For the same reason, the later check that at least one tuple
    was actually placed on the page is not necessary.

    I committed a slightly different version, which unconditionally puts the
    first tuple on the page, and only applies the freespace check to the
    subsequent tuples. Since RelationGetBufferForTuple() guarantees that the
    first tuple fits, we can trust that, like heap_insert does.

    --- a/src/backend/access/heap/heapam.c
    +++ b/src/backend/access/heap/heapam.c
    @@ -2172,8 +2172,12 @@ heap_multi_insert(Relation relation, HeapTuple
    *tuples, int ntuples,
    /* NO EREPORT(ERROR) from here till changes are logged */
    START_CRIT_SECTION();

    - /* Put as many tuples as fit on this page */
    - for (nthispage = 0; ndone + nthispage < ntuples; nthispage++)
    + /*
    + * RelationGetBufferForTuple has ensured that the first tuple fits.
    + * Put that on the page, and then as many other tuples as fit.
    + */
    + RelationPutHeapTuple(relation, buffer, heaptuples[ndone]);
    + for (nthispage = 1; ndone + nthispage < ntuples; nthispage++)
    {
    HeapTuple heaptup = heaptuples[ndone + nthispage];


    Thanks for the report!

    - Heikki
  • David Gould at Dec 12, 2012 at 12:23 pm

    On Wed, 12 Dec 2012 13:56:08 +0200 Heikki Linnakangas wrote:

    However, RelationGetBufferForTuple() won't return such
    a page, it guarantees that the first tuple does indeed fit on the page
    it returns. For the same reason, the later check that at least one tuple
    was actually placed on the page is not necessary.

    I committed a slightly different version, which unconditionally puts the
    first tuple on the page, and only applies the freespace check to the
    subsequent tuples. Since RelationGetBufferForTuple() guarantees that the
    first tuple fits, we can trust that, like heap_insert does.

    --- a/src/backend/access/heap/heapam.c
    +++ b/src/backend/access/heap/heapam.c
    @@ -2172,8 +2172,12 @@ heap_multi_insert(Relation relation, HeapTuple
    *tuples, int ntuples,
    /* NO EREPORT(ERROR) from here till changes are logged */
    START_CRIT_SECTION();

    - /* Put as many tuples as fit on this page */
    - for (nthispage = 0; ndone + nthispage < ntuples; nthispage++)
    + /*
    + * () has ensured that the first tuple fits.
    + * Put that on the page, and then as many other tuples as fit.
    + */
    + RelationPutHeapTuple(relation, buffer, heaptuples[ndone]);
    + for (nthispage = 1; ndone + nthispage < ntuples; nthispage++)
    {
    HeapTuple heaptup = heaptuples[ndone + nthispage];
    I don't know if this is the same thing. At least in the comments I was
    reading trying to figure this out there was some concern that someone
    else could change the space on the page. Does RelationGetBufferForTuple()
    guarantee against this too?

    -dg

    --
    David Gould 510 282 0869 daveg@sonic.net
    If simplicity worked, the world would be overrun with insects.
  • Heikki Linnakangas at Dec 12, 2012 at 12:27 pm

    On 12.12.2012 14:24, David Gould wrote:
    I don't know if this is the same thing. At least in the comments I was
    reading trying to figure this out there was some concern that someone
    else could change the space on the page. Does RelationGetBufferForTuple()
    guarantee against this too?
    Yeah, RelationGetBufferForTuple grabs a lock on the page before
    returning it. For comparison, plain heap_insert does simply this:
    buffer = RelationGetBufferForTuple(relation, heaptup->t_len,
    InvalidBuffer, options, bistate,
    &vmbuffer, NULL);

    /* NO EREPORT(ERROR) from here till changes are logged */
    START_CRIT_SECTION();

    RelationPutHeapTuple(relation, buffer, heaptup);
    - Heikki
  • David Gould at Dec 12, 2012 at 12:17 pm

    On Wed, 12 Dec 2012 12:27:11 +0100 Andres Freund wrote:
    On 2012-12-12 03:04:19 -0800, David Gould wrote:

    COPY IN loops in heap_multi_insert() extending the table until it fills the
    Heh. Nice one. Did you hit that in practice?
    Yeah, with a bunch of hosts that run postgres on a ramdisk, and that copy
    happens late in the initial setup script for new hosts. The first batch of
    new hosts to be setup with 9.2 filled the ramdisk, oomed and fell over
    within a minute. Since the script setups up a lot of stuff we had no idea
    at first who oomed.
    ISTM this would be fixed with a smaller footprint by just making

    if (PageGetHeapFreeSpace(page) < MAXALIGN(heaptup->t_len) + saveFreeSpace)

    if (!PageIsEmpty(page) &&
    PageGetHeapFreeSpace(page) < MAXALIGN(heaptup->t_len) + saveFreeSpace)

    I think that should work?
    I like PageIsEmpty() better (and would have used if I I knew), but I'm not
    so crazy about the negation.

    -dg

    --
    David Gould 510 282 0869 daveg@sonic.net
    If simplicity worked, the world would be overrun with insects.
  • Heikki Linnakangas at Dec 12, 2012 at 12:23 pm

    On 12.12.2012 14:17, David Gould wrote:
    On Wed, 12 Dec 2012 12:27:11 +0100
    Andres Freundwrote:
    On 2012-12-12 03:04:19 -0800, David Gould wrote:

    COPY IN loops in heap_multi_insert() extending the table until it fills the
    Heh. Nice one. Did you hit that in practice?
    Yeah, with a bunch of hosts that run postgres on a ramdisk, and that copy
    happens late in the initial setup script for new hosts. The first batch of
    new hosts to be setup with 9.2 filled the ramdisk, oomed and fell over
    within a minute. Since the script setups up a lot of stuff we had no idea
    at first who oomed.
    The bug's been fixed now, but note that huge tuples like this will
    always cause the table to be extended. Even if there are completely
    empty pages in the table, after a vacuum. Even a completely empty
    existing page is not considered spacious enough in this case, because
    it's still too small when you take fillfactor into account, so the
    insertion will always extend the table. If you regularly run into this
    situation, you might want to raise your fillfactor..

    - Heikki
  • David Gould at Dec 12, 2012 at 1:29 pm

    On Wed, 12 Dec 2012 14:23:12 +0200 Heikki Linnakangas wrote:

    The bug's been fixed now, but note that huge tuples like this will
    always cause the table to be extended. Even if there are completely
    empty pages in the table, after a vacuum. Even a completely empty
    existing page is not considered spacious enough in this case, because
    it's still too small when you take fillfactor into account, so the
    insertion will always extend the table. If you regularly run into this
    situation, you might want to raise your fillfactor..
    Actually, we'd like it lower. Ideally, one row per page.

    We lose noticable performance when we raise fill-factor above 10. Even 20 is
    slower.

    During busy times these hosts sometimes fall into a stable state
    with very high cpu use mostly in s_lock() and LWLockAcquire() and I think
    PinBuffer plus very high system cpu in the scheduler (I don't have the perf
    trace in front of me so take this with a grain of salt). In this mode they
    fall from the normal 7000 queries per second to below 3000. Once in this
    state they tend to stay that way. If we turn down the number of incoming
    requests they go back to normal. Our conjecture is that most requests are
    for only a few keys and so we have multiple sessions contending for few
    pages and convoying in the buffer manager. The table is under 20k rows, but
    the hot items are probably only a couple hundred different rows. The busy
    processes are doing reads only, but there is some update activity on this
    table too.

    Ah, found an email with the significant part of the perf output:
    ... set number of client threads = number of postgres backends = 70. That way
    all my threads have constant access to a backend and they just spin in a tight
    loop running the same query over and over (with different values). ... this
    seems to have tapped into 9.2's resonant frequency, right now we're spending
    almost all our time spin locking. ...
    762377.00 71.0% s_lock /usr/local/bin/postgres
    22279.00 2.1% LWLockAcquire /usr/local/bin/postgres
    18916.00 1.8% LWLockRelease /usr/local/bin/postgres
    I was trying to resurrect the pthread s_lock() patch to see if that helps,
    but it did not apply at all and I have not had time to persue it.

    We have tried lots of number of processes and get the best result with
    about ten less active postgresql backends than HT cores. System is 128GB
    with:

    processor : 79
    vendor_id : GenuineIntel
    cpu family : 6
    model : 47
    model name : Intel(R) Xeon(R) CPU E7-L8867 @ 2.13GHz
    stepping : 2
    cpu MHz : 2128.478
    cache size : 30720 KB

    -dg

    --
    David Gould 510 282 0869 daveg@sonic.net
    If simplicity worked, the world would be overrun with insects.
  • Robert Haas at Dec 14, 2012 at 8:39 pm

    On Wed, Dec 12, 2012 at 8:29 AM, David Gould wrote:
    We lose noticable performance when we raise fill-factor above 10. Even 20 is
    slower. Whoa.
    During busy times these hosts sometimes fall into a stable state
    with very high cpu use mostly in s_lock() and LWLockAcquire() and I think
    PinBuffer plus very high system cpu in the scheduler (I don't have the perf
    trace in front of me so take this with a grain of salt). In this mode they
    fall from the normal 7000 queries per second to below 3000.
    I have seen signs of something similar to this when running pgbench -S
    tests at high concurrency. I've never been able to track down where
    the problem is happening. My belief is that once a spinlock starts to
    be contended, there's some kind of death spiral that can't be arrested
    until the workload eases up. But I haven't had much luck identifying
    exactly which spinlock is the problem or if it even is just one...

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • David Gould at Dec 14, 2012 at 11:37 pm

    On Fri, 14 Dec 2012 15:39:44 -0500 Robert Haas wrote:
    On Wed, Dec 12, 2012 at 8:29 AM, David Gould wrote:
    We lose noticable performance when we raise fill-factor above 10. Even 20 is
    slower.
    Whoa.
    Any interest in a fill-factor patch to place exactly one row per page? That
    would be the least contended. There are applications where it might help.
    During busy times these hosts sometimes fall into a stable state
    with very high cpu use mostly in s_lock() and LWLockAcquire() and I think
    PinBuffer plus very high system cpu in the scheduler (I don't have the perf
    trace in front of me so take this with a grain of salt). In this mode they
    fall from the normal 7000 queries per second to below 3000.
    I have seen signs of something similar to this when running pgbench -S
    tests at high concurrency. I've never been able to track down where
    I think I may have seen that with pgbench -S too. I did not have time to
    investigate more, but out of a sequence of three minute runs I got most
    runs at 300k+ qps and but a couple were around 200k qps.
    the problem is happening. My belief is that once a spinlock starts to
    be contended, there's some kind of death spiral that can't be arrested
    until the workload eases up. But I haven't had much luck identifying
    exactly which spinlock is the problem or if it even is just one...
    I agree about the death spiral. I think what happens is all the backends
    get synchcronized by waiting and they are more likely to contend again.

    -dg

    --
    David Gould 510 282 0869 daveg@sonic.net
    If simplicity worked, the world would be overrun with insects.
  • Tom Lane at Dec 12, 2012 at 3:37 pm

    Heikki Linnakangas writes:
    The bug's been fixed now, but note that huge tuples like this will
    always cause the table to be extended. Even if there are completely
    empty pages in the table, after a vacuum. Even a completely empty
    existing page is not considered spacious enough in this case, because
    it's still too small when you take fillfactor into account, so the
    insertion will always extend the table.
    Seems like that's a bug in itself: there's no reason to reject an empty
    existing page.

    regards, tom lane

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedDec 12, '12 at 11:03a
activeDec 14, '12 at 11:37p
posts11
users5
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase