Hello,

I'm testing the recently changes of WAL entries for freezing-tuples.
VACUUM FREEZE took more time. The cause seems to be flushing WAL buffers.

Vacuuming processes free buffers into freelist. The buffers in freelist is
preferentially used on next allocation of buffers. Then, if the buffer is
dirty, the allocator must write it before reuse. However, there are few buffers
in freelist typically, buffers made dirty recently are reused too soon
-- The WAL entries for the dirty buffer has not been flushed yet, so the
allocator flushes WAL, writes the buffer, and finally reuses it.


One solution is always keeping some buffers in freelist. If there were
N buffers in freelist, the necessity of WAL-flusing was reduced to 1/N,
because all WAL entries are flushed when we do one of them.

The attached patch is an experimental implementation of the above. Keeping
32 buffers seems to be enough when executed separately. With some background
jobs, other numbers may be better.

N | time | XLogWrite/XLogFlush
---+-------+---------------------
1 | 68.2s | 25.6%
8 | 57.4s | 10.8%
32 | 54.0s | 3.4%

[initial data]
$ pgbench -s 40 -i;
# VACUUM FREEZE
[test]
# UPDATE accounts SET aid=aid WHERE random() < 0.005;
# checkpoint;
# VACUUM FREEZE accounts;


I cannot see the above problem in non-freeze vacuum. The number buffers
in freelist increases on index-vacuuming phase. When the vacuum found
seldom used buffers (refcount==0 and usage_count==0), they are added into
freelist. So the WAL entries generated in index-vacuuming or heap-vacuuming
phase are not so serious. However, entries for FREEZE are generated in
heap-scanning phase, it is before index-vacuuming.

Are there any better fixes? Comments welcome.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Search Discussions

  • Simon Riggs at Dec 18, 2006 at 3:50 pm

    On Mon, 2006-12-18 at 11:55 +0900, ITAGAKI Takahiro wrote:

    I'm testing the recently changes of WAL entries for freezing-tuples.
    VACUUM FREEZE took more time. The cause seems to be flushing WAL buffers.
    Great thinking.
    Vacuuming processes free buffers into freelist. The buffers in freelist is
    preferentially used on next allocation of buffers. Then, if the buffer is
    dirty, the allocator must write it before reuse. However, there are few buffers
    in freelist typically, buffers made dirty recently are reused too soon
    -- The WAL entries for the dirty buffer has not been flushed yet, so the
    allocator flushes WAL, writes the buffer, and finally reuses it.
    I think what you are saying is: VACUUM places blocks so that they are
    immediately reused. This stops shared_buffers from being polluted by
    vacuumed-blocks, but it also means that almost every write becomes a
    backend dirty write when VACUUM is working, bgwriter or not. That also
    means that we flush WAL more often than we otherwise would.
    One solution is always keeping some buffers in freelist. If there were
    N buffers in freelist, the necessity of WAL-flusing was reduced to 1/N,
    because all WAL entries are flushed when we do one of them.
    That sounds very similar to an idea I'd been working on which I'd called
    cache looping. There is a related (but opposite) problem with sequential
    scans - they don't move through the cache fast enough. A solution to
    both issues is to have the Vacuum/SeqScans continually reuse a small
    pool of buffers, rather than request the next one from the buffer
    manager in the normal way.
    The attached patch is an experimental implementation of the above. Keeping
    32 buffers seems to be enough when executed separately. With some background
    jobs, other numbers may be better.

    N | time | XLogWrite/XLogFlush
    ---+-------+---------------------
    1 | 68.2s | 25.6%
    8 | 57.4s | 10.8%
    32 | 54.0s | 3.4%
    I think this is good proof; well done.
    From above my thinking would be to have a more general implementation:
    Each backend keeps a list of cache buffers to reuse in its local loop,
    rather than using the freelist as a global list. That way the technique
    would work even when we have multiple Vacuums working concurrently. It
    would also then be possible to use this for the SeqScan case as well.

    Cache looping would be implemented by a modified BufferAlloc routine,
    say BufferScanAlloc() that is called only when a StrategyUseCacheLoop()
    has been called during SeqScan or VacuumScan. strategy_cache_loop would
    replace strategy_hint_vacuum.

    Each backend would have a list of previous N buffers touched. When N
    =Nmax, we would link to oldest buffer to form a linked ring. Each time
    we need next buffer we read from the ring rather than from the main
    clock sweep. If the buffer identified is pinned, then we drop that from
    the ring and apply normally for a new buffer and keep that instead. At
    the end of the scan, we simply forget the buffer ring.

    Another connected thought is the idea of a having a FullBufferList - the
    opposite of a free buffer list. When VACUUM/INSERT/COPY fills a block we
    notify the buffer manager that this block needs writing ahead of other
    buffers, so that the bgwriter can work more effectively. That seems like
    it would help with both this current patch and the additional thoughts
    above.
    [initial data]
    $ pgbench -s 40 -i;
    # VACUUM FREEZE
    [test]
    # UPDATE accounts SET aid=aid WHERE random() < 0.005;
    # checkpoint;
    # VACUUM FREEZE accounts;


    I cannot see the above problem in non-freeze vacuum. The number buffers
    in freelist increases on index-vacuuming phase. When the vacuum found
    seldom used buffers (refcount==0 and usage_count==0), they are added into
    freelist. So the WAL entries generated in index-vacuuming or heap-vacuuming
    phase are not so serious. However, entries for FREEZE are generated in
    heap-scanning phase, it is before index-vacuuming.
    This happens for setting hint-bits also in normal operation, which might
    only occur once in most test situations. In practice, this can occur
    each time we touch a row and then VACUUM, so we end up re-writing the
    block many times in the way you describe.

    IIRC Heikki was thinking of altering the way VACUUM works to avoid it
    writing out blocks that it was going to come back to in the second phase
    anyway. That would go some way to alleviating the problem you describe,
    but wouldn't go as far as the technique you suggest.

    --
    Simon Riggs
    EnterpriseDB http://www.enterprisedb.com
  • Tom Lane at Dec 18, 2006 at 4:13 pm

    "Simon Riggs" <simon@2ndquadrant.com> writes:
    I think what you are saying is: VACUUM places blocks so that they are
    immediately reused. This stops shared_buffers from being polluted by
    vacuumed-blocks, but it also means that almost every write becomes a
    backend dirty write when VACUUM is working, bgwriter or not. That also
    means that we flush WAL more often than we otherwise would.
    Do we care? As long as the writes are done by the vacuum process, ISTM
    this is taking load off the foreground query processes, by saving them
    from having to do writes.

    In any case, I'm unclear on why we should add a boatload of complexity
    to improve performance of something that's done as rarely as VACUUM
    FREEZE is. Quite aside from maintainability concerns, even a few extra
    cycles added to the more common code paths would make it a net
    performance loss overall.

    regards, tom lane
  • Simon Riggs at Dec 18, 2006 at 9:35 pm

    On Mon, 2006-12-18 at 11:13 -0500, Tom Lane wrote:
    "Simon Riggs" <simon@2ndquadrant.com> writes:
    I think what you are saying is: VACUUM places blocks so that they are
    immediately reused. This stops shared_buffers from being polluted by
    vacuumed-blocks, but it also means that almost every write becomes a
    backend dirty write when VACUUM is working, bgwriter or not. That also
    means that we flush WAL more often than we otherwise would.
    Do we care? As long as the writes are done by the vacuum process, ISTM
    this is taking load off the foreground query processes, by saving them
    from having to do writes.
    I'm not bothered about speeding up VACUUM FREEZE at all, but the effect
    noted by Itagaki-san is clearly real and so can easily effect other
    processes. I believe it does effect other backends and had already noted
    what I thought was that effect myself. If we had better server
    instrumentation it would be easy to demonstrate either way.
    In any case, I'm unclear on why we should add a boatload of complexity
    to improve performance of something that's done as rarely as VACUUM
    FREEZE is. Quite aside from maintainability concerns, even a few extra
    cycles added to the more common code paths would make it a net
    performance loss overall.
    As I noted, this isn't just VACUUM FREEZE (why would it be?), but all
    VACUUMs - that *is* a common code path on a busy system. VACUUM FREEZE
    simply dirties more blocks and has a more clearly noticeable effect.
    From your comments we clearly need more testing to demonstrate the
    effect on normal backends before we move to a solution.

    --
    Simon Riggs
    EnterpriseDB http://www.enterprisedb.com
  • ITAGAKI Takahiro at Dec 19, 2006 at 8:53 am

    "Simon Riggs" wrote:

    I think what you are saying is: VACUUM places blocks so that they are
    immediately reused. This stops shared_buffers from being polluted by
    vacuumed-blocks, but it also means that almost every write becomes a
    backend dirty write when VACUUM is working, bgwriter or not. That also
    means that we flush WAL more often than we otherwise would.
    That's right. I think it's acceptable that vacuuming process writes dirty
    buffers made by itself, because only the process slows down; other backends
    can run undisturbedly. However, frequent WAL flushing should be avoided.

    I found the problem when I ran VACUUM FREEZE separately. But if there were
    some backends, dirty buffers made by VACUUM would be reused by those backends,
    not by the vacuuming process.
    From above my thinking would be to have a more general implementation:
    Each backend keeps a list of cache buffers to reuse in its local loop,
    rather than using the freelist as a global list. That way the technique
    would work even when we have multiple Vacuums working concurrently. It
    would also then be possible to use this for the SeqScan case as well.
    Great idea! The troubles are in the usage of buffers by SeqScan and VACUUM.
    The former uses too many buffers and the latter uses too few buffers.
    Your cache-looping will work around both cases.
    Another connected thought is the idea of a having a FullBufferList - the
    opposite of a free buffer list. When VACUUM/INSERT/COPY fills a block we
    notify the buffer manager that this block needs writing ahead of other
    buffers, so that the bgwriter can work more effectively. That seems like
    it would help with both this current patch and the additional thoughts
    above.
    Do you mean that bgwriter should take care of buffers in freelist, not only
    ones in the tail of LRU? We might need activity control of bgwriter. Buffers
    are reused rapidly in VACUUM or bulk insert, so bgwriter is not sufficient
    if its settings are same as usual.

    Regards,
    ---
    ITAGAKI Takahiro
    NTT Open Source Software Center
  • Jim C. Nasby at Dec 28, 2006 at 12:13 pm

    On Tue, Dec 19, 2006 at 05:53:06PM +0900, ITAGAKI Takahiro wrote:
    "Simon Riggs" wrote:
    Another connected thought is the idea of a having a FullBufferList - the
    opposite of a free buffer list. When VACUUM/INSERT/COPY fills a block we
    notify the buffer manager that this block needs writing ahead of other
    buffers, so that the bgwriter can work more effectively. That seems like
    it would help with both this current patch and the additional thoughts
    above.
    Do you mean that bgwriter should take care of buffers in freelist, not only
    ones in the tail of LRU? We might need activity control of bgwriter. Buffers
    are reused rapidly in VACUUM or bulk insert, so bgwriter is not sufficient
    if its settings are same as usual.
    Actually, if I understand the code, the "LRU" stuff actually only hits
    the free list. Also, the only thing that runs the clock sweep (which is
    what maintains the LRU-type info) is a backend requesting a page and not
    finding one on the free list.
    --
    Jim Nasby jim@nasby.net
    EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
  • Bruce Momjian at Feb 3, 2007 at 2:09 am
    Is this a TODO item?

    ---------------------------------------------------------------------------

    ITAGAKI Takahiro wrote:
    "Simon Riggs" wrote:
    I think what you are saying is: VACUUM places blocks so that they are
    immediately reused. This stops shared_buffers from being polluted by
    vacuumed-blocks, but it also means that almost every write becomes a
    backend dirty write when VACUUM is working, bgwriter or not. That also
    means that we flush WAL more often than we otherwise would.
    That's right. I think it's acceptable that vacuuming process writes dirty
    buffers made by itself, because only the process slows down; other backends
    can run undisturbedly. However, frequent WAL flushing should be avoided.

    I found the problem when I ran VACUUM FREEZE separately. But if there were
    some backends, dirty buffers made by VACUUM would be reused by those backends,
    not by the vacuuming process.
    From above my thinking would be to have a more general implementation:
    Each backend keeps a list of cache buffers to reuse in its local loop,
    rather than using the freelist as a global list. That way the technique
    would work even when we have multiple Vacuums working concurrently. It
    would also then be possible to use this for the SeqScan case as well.
    Great idea! The troubles are in the usage of buffers by SeqScan and VACUUM.
    The former uses too many buffers and the latter uses too few buffers.
    Your cache-looping will work around both cases.
    Another connected thought is the idea of a having a FullBufferList - the
    opposite of a free buffer list. When VACUUM/INSERT/COPY fills a block we
    notify the buffer manager that this block needs writing ahead of other
    buffers, so that the bgwriter can work more effectively. That seems like
    it would help with both this current patch and the additional thoughts
    above.
    Do you mean that bgwriter should take care of buffers in freelist, not only
    ones in the tail of LRU? We might need activity control of bgwriter. Buffers
    are reused rapidly in VACUUM or bulk insert, so bgwriter is not sufficient
    if its settings are same as usual.

    Regards,
    ---
    ITAGAKI Takahiro
    NTT Open Source Software Center



    ---------------------------(end of broadcast)---------------------------
    TIP 9: In versions below 8.0, the planner will ignore your desire to
    choose an index scan if your joining column's datatypes do not
    match
    --
    Bruce Momjian bruce@momjian.us
    EnterpriseDB http://www.enterprisedb.com

    + If your life is a hard drive, Christ can be your backup. +
  • Jim Nasby at Feb 6, 2007 at 3:29 am
    I think there's improvement to be made in how we track buffer usage
    in general. Seqscans still hold the same weight as any other
    operation, the freelist is of questionable value, and there's a lot
    of work done to find a free buffer out of the pool, for example.
    On Feb 2, 2007, at 8:08 PM, Bruce Momjian wrote:


    Is this a TODO item?

    ----------------------------------------------------------------------
    -----

    ITAGAKI Takahiro wrote:
    "Simon Riggs" wrote:
    I think what you are saying is: VACUUM places blocks so that they
    are
    immediately reused. This stops shared_buffers from being polluted by
    vacuumed-blocks, but it also means that almost every write becomes a
    backend dirty write when VACUUM is working, bgwriter or not. That
    also
    means that we flush WAL more often than we otherwise would.
    That's right. I think it's acceptable that vacuuming process
    writes dirty
    buffers made by itself, because only the process slows down; other
    backends
    can run undisturbedly. However, frequent WAL flushing should be
    avoided.

    I found the problem when I ran VACUUM FREEZE separately. But if
    there were
    some backends, dirty buffers made by VACUUM would be reused by
    those backends,
    not by the vacuuming process.
    From above my thinking would be to have a more general
    implementation:
    Each backend keeps a list of cache buffers to reuse in its local
    loop,
    rather than using the freelist as a global list. That way the
    technique
    would work even when we have multiple Vacuums working
    concurrently. It
    would also then be possible to use this for the SeqScan case as
    well.
    Great idea! The troubles are in the usage of buffers by SeqScan
    and VACUUM.
    The former uses too many buffers and the latter uses too few buffers.
    Your cache-looping will work around both cases.
    Another connected thought is the idea of a having a
    FullBufferList - the
    opposite of a free buffer list. When VACUUM/INSERT/COPY fills a
    block we
    notify the buffer manager that this block needs writing ahead of
    other
    buffers, so that the bgwriter can work more effectively. That
    seems like
    it would help with both this current patch and the additional
    thoughts
    above.
    Do you mean that bgwriter should take care of buffers in freelist,
    not only
    ones in the tail of LRU? We might need activity control of
    bgwriter. Buffers
    are reused rapidly in VACUUM or bulk insert, so bgwriter is not
    sufficient
    if its settings are same as usual.

    Regards,
    ---
    ITAGAKI Takahiro
    NTT Open Source Software Center



    ---------------------------(end of
    broadcast)---------------------------
    TIP 9: In versions below 8.0, the planner will ignore your desire to
    choose an index scan if your joining column's datatypes do not
    match
    --
    Bruce Momjian bruce@momjian.us
    EnterpriseDB http://www.enterprisedb.com

    + If your life is a hard drive, Christ can be your backup. +

    ---------------------------(end of
    broadcast)---------------------------
    TIP 2: Don't 'kill -9' the postmaster
    --
    Jim Nasby jim@nasby.net
    EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedDec 18, '06 at 2:55a
activeFeb 6, '07 at 3:29a
posts8
users6
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase