FAQ
I wonder if we have tested the reasoning behind having
SKIP_PAGES_THRESHOLD and the magic number of 32 assigned to it
currently. While looking at the code after a long time and doing some
tests, I realized that a manual VACUUM would always scan first 31
pages of a relation which has not received any write activity since
the last VACUUM. On closer inspection, I realized that we have
deliberately put in this hook to ensure that we use visibility maps
only when we see at least SKIP_PAGES_THRESHOLD worth of all-visible
sequential pages to take advantage of possible OS seq scan
optimizations.

My statistical skills are limited, but wouldn't that mean that for a
fairly well distributed write activity across a large table, if there
are even 3-4% update/deletes, we would most likely hit a
not-all-visible page for every 32 pages scanned ? That would mean that
almost entire relation will be scanned even if the visibility map
tells us that only 3-4% pages require scanning ? And the probability
will increase with the increase in the percentage of updated/deleted
tuples. Given that the likelihood of anyone calling VACUUM (manually
or through autovac settings) on a table which has less than 3-4%
updates/deletes is very low, I am worried that might be loosing all
advantages of visibility maps for a fairly common use case.

Do we have any numbers to prove what we have today is good ? Sorry, I
may not have followed the discussions very closely in the past and not
sure if this has been debated/tested already.

Thanks,
Pavan





--
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

Search Discussions

  • Tom Lane at May 27, 2011 at 2:06 pm

    Pavan Deolasee writes:
    My statistical skills are limited, but wouldn't that mean that for a
    fairly well distributed write activity across a large table, if there
    are even 3-4% update/deletes, we would most likely hit a
    not-all-visible page for every 32 pages scanned ?
    Huh? With a typical table density of several dozen tuples per page, an
    update ratio in that range would mean that just about every page would
    have something for VACUUM to do, if the modified tuples are evenly
    distributed. The case where the skip optimization has some use is where
    there are large "cold" sections that have no changes at all.

    Having said that, I don't know how carefully we tested different values
    for SKIP_PAGES_THRESHOLD.

    regards, tom lane
  • Pavan Deolasee at May 27, 2011 at 3:07 pm

    On Fri, May 27, 2011 at 7:36 PM, Tom Lane wrote:
    Pavan Deolasee <pavan.deolasee@gmail.com> writes:
    My statistical skills are limited, but wouldn't that mean that for a
    fairly well distributed write activity across a large table, if there
    are even 3-4% update/deletes, we would most likely hit a
    not-all-visible page for every 32 pages scanned ?
    Huh?  With a typical table density of several dozen tuples per page, an
    update ratio in that range would mean that just about every page would
    have something for VACUUM to do, if the modified tuples are evenly
    distributed.  The case where the skip optimization has some use is where
    there are large "cold" sections that have no changes at all.
    I was pretty sure that I would have done my maths wrong :-) So that
    means, even for far lesser update ratio, we would pretty much scan
    every block and vacuum many of them for a typical well distributed
    updates. Hmm. That means the idea of a single pass vacuum is
    interesting even after visibility maps.

    Thanks,
    Pavan

    --
    Pavan Deolasee
    EnterpriseDB     http://www.enterprisedb.com
  • Cédric Villemain at May 27, 2011 at 2:07 pm

    2011/5/27 Pavan Deolasee <pavan.deolasee@gmail.com>:
    I wonder if we have tested the reasoning behind having
    SKIP_PAGES_THRESHOLD and the magic number of 32 assigned to it
    currently. While looking at the code after a long time and doing some
    tests, I realized that a manual VACUUM would always scan first 31
    pages of a relation which has not received any write activity since
    the last VACUUM. On closer inspection, I realized that we have
    deliberately put in this hook to ensure that we use visibility maps
    only when we see at least SKIP_PAGES_THRESHOLD worth of all-visible
    sequential pages to take advantage of possible OS seq scan
    optimizations.

    My statistical skills are limited, but wouldn't that mean that for a
    fairly well distributed write activity across a large table, if there
    are even 3-4% update/deletes, we would most likely hit a
    not-all-visible page for every 32 pages scanned ? That would mean that
    The page skip is still based on VM.
    So you wonder what are the chances of a VM not up-to-date when we access it ?
    almost entire relation will be scanned even if the visibility map
    tells us that only 3-4% pages require scanning ?  And the probability
    will increase with the increase in the percentage of updated/deleted
    tuples. Given that the likelihood of anyone calling VACUUM (manually
    or through autovac settings) on a table which has less than 3-4%
    updates/deletes is very low, I am worried that might be loosing all
    advantages of visibility maps for a fairly common use case.

    Do we have any numbers to prove what we have today is good ? Sorry, I
    may not have followed the discussions very closely in the past and not
    sure if this has been debated/tested already.

    Thanks,
    Pavan





    --
    Pavan Deolasee
    EnterpriseDB     http://www.enterprisedb.com

    --
    Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-hackers


    --
    Cédric Villemain               2ndQuadrant
    http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support
  • Cédric Villemain at May 27, 2011 at 2:38 pm

    2011/5/27 Cédric Villemain <cedric.villemain.debian@gmail.com>:
    2011/5/27 Pavan Deolasee <pavan.deolasee@gmail.com>:
    I wonder if we have tested the reasoning behind having
    SKIP_PAGES_THRESHOLD and the magic number of 32 assigned to it
    currently. While looking at the code after a long time and doing some
    tests, I realized that a manual VACUUM would always scan first 31
    pages of a relation which has not received any write activity since
    the last VACUUM. On closer inspection, I realized that we have
    deliberately put in this hook to ensure that we use visibility maps
    only when we see at least SKIP_PAGES_THRESHOLD worth of all-visible
    sequential pages to take advantage of possible OS seq scan
    optimizations.

    My statistical skills are limited, but wouldn't that mean that for a
    fairly well distributed write activity across a large table, if there
    are even 3-4% update/deletes, we would most likely hit a
    not-all-visible page for every 32 pages scanned ? That would mean that
    The page skip is still based on VM.
    So you wonder what are the chances of a VM not up-to-date when we access it ?
    re-reading the mails and answers, I misunderstood the case you exposed.
    almost entire relation will be scanned even if the visibility map
    tells us that only 3-4% pages require scanning ?  And the probability
    will increase with the increase in the percentage of updated/deleted
    tuples. Given that the likelihood of anyone calling VACUUM (manually
    or through autovac settings) on a table which has less than 3-4%
    updates/deletes is very low, I am worried that might be loosing all
    advantages of visibility maps for a fairly common use case.

    Do we have any numbers to prove what we have today is good ? Sorry, I
    may not have followed the discussions very closely in the past and not
    sure if this has been debated/tested already.

    Thanks,
    Pavan





    --
    Pavan Deolasee
    EnterpriseDB     http://www.enterprisedb.com

    --
    Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-hackers


    --
    Cédric Villemain               2ndQuadrant
    http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support


    --
    Cédric Villemain               2ndQuadrant
    http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support
  • Heikki Linnakangas at May 27, 2011 at 2:11 pm

    On 27.05.2011 16:52, Pavan Deolasee wrote:
    On closer inspection, I realized that we have
    deliberately put in this hook to ensure that we use visibility maps
    only when we see at least SKIP_PAGES_THRESHOLD worth of all-visible
    sequential pages to take advantage of possible OS seq scan
    optimizations.
    That, and the fact that if you skip any page, you can't advance
    relfrozenxid.
    My statistical skills are limited, but wouldn't that mean that for a
    fairly well distributed write activity across a large table, if there
    are even 3-4% update/deletes, we would most likely hit a
    not-all-visible page for every 32 pages scanned ? That would mean that
    almost entire relation will be scanned even if the visibility map
    tells us that only 3-4% pages require scanning ? And the probability
    will increase with the increase in the percentage of updated/deleted
    tuples. Given that the likelihood of anyone calling VACUUM (manually
    or through autovac settings) on a table which has less than 3-4%
    updates/deletes is very low, I am worried that might be loosing all
    advantages of visibility maps for a fairly common use case.
    Well, as with normal queries, it's usually faster to just seqscan the
    whole table if you need to access more than a few percent of the pages,
    because sequential I/O is so much faster than random I/O. The visibility
    map really only helps if all the updates are limited to some part of the
    table. For example, if you only recent records are updated frequently,
    and old ones are almost never touched.
    Do we have any numbers to prove what we have today is good ? Sorry, I
    may not have followed the discussions very closely in the past and not
    sure if this has been debated/tested already.
    I think that number was chosen quite arbitrary. When you consider
    updating relfrozenxid, it's a bit difficult to decide what the optimal
    value would be; if you decide to skip pages you might have to perform an
    extra anti-wraparound somewhere down the line.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Greg Stark at May 27, 2011 at 3:11 pm

    On Fri, May 27, 2011 at 7:11 AM, Heikki Linnakangas wrote:
    Well, as with normal queries, it's usually faster to just seqscan the whole
    table if you need to access more than a few percent of the pages, because
    sequential I/O is so much faster than random I/O.
    Well it's not strictly random access, you're still reading
    sequentially, you're just skipping some pages. It'll never be slower
    than a sequential scan it just might not be any faster. In my testing
    reading every 8th page took exactly as long as reading every page,
    which makes sense as the drive still has to seek to every track
    exactly as if you were reading sequentially. IIRC reading less than
    every 8th page started seeing a speedup.
    Do we have any numbers to prove what we have today is good ? Sorry, I
    may not have followed the discussions very closely in the past and not
    sure if this has been debated/tested already.
    I think that number was chosen quite arbitrary. When you consider updating
    relfrozenxid, it's a bit difficult to decide what the optimal value would
    be; if you decide to skip pages you might have to perform an extra
    anti-wraparound somewhere down the line.
    It would be nice if the VM had a bit for "all-frozen" but that
    wouldn't help much except in the case of truly cold data. We could
    perhaps keep the frozen data per segment or per VM page (which covers
    a large section of the table) which would at least mean that would
    have a fixed amount of data become vacuum-dirty when a tuple is
    updated rather than a whole table which could be arbitrarily large.

    Separately it's a bit strange that we actually have to visit the
    pages. We have all the information we need in the VM to determine
    whether there's a run of 32 vacuum-clean pages. Why can't we look at
    the next 32 pages and if they're all vacuum-clean then skip looking at
    the heap at all for them. What we do now is do the regular vacuum
    algorithm and only after we've processed 32 pages in a row realize
    that it was a waste of effort.



    --
    greg
  • Robert Haas at May 27, 2011 at 6:11 pm

    On Fri, May 27, 2011 at 11:10 AM, Greg Stark wrote:
    It would be nice if the VM had a bit for "all-frozen" but that
    wouldn't help much except in the case of truly cold data. We could
    perhaps keep the frozen data per segment or per VM page (which covers
    a large section of the table) which would at least mean that would
    have a fixed amount of data become vacuum-dirty when a tuple is
    updated rather than a whole table which could be arbitrarily large.
    Instead of just having one bit, it might be useful to have a
    relfrozenxid counter for each, say, 64MB chunk, rather than just one
    for the whole table. At least in theory, that would give us the
    possibility of freezing only portions of the table that were most
    urgently in need of it. I'm not sure how exactly what algorithm we'd
    want to apply, though.

    In general, ISTM that the problem with VACUUM is that we don't know
    whether we're "keeping up" or "getting behind". For
    checkpoint_completion_target, we measure how fast we're writing pages
    relative to when the checkpoint needs to be done. We write faster if
    we get behind, where behind can mean either that checkpoint_segments
    is going to expire too soon, or that checkpoint_timeout is going to
    expire too soon. VACUUM has a very similar problem: operations that
    use XIDs or create dead tuples create the need for maintenance which
    VACUUM then performs. We want to vacuum fast enough to keep up with
    the work, but not so fast that we tax the I/O subsystem more than
    necessary. But unlike the checkpoint process, vacuum's
    decision-making is all local: it has no idea whether it's keeping up.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Pavan Deolasee at May 29, 2011 at 4:01 pm

    On Fri, May 27, 2011 at 8:40 PM, Greg Stark wrote:
    Separately it's a bit strange that we actually have to visit the
    pages. We have all the information we need in the VM to determine
    whether there's a run of 32 vacuum-clean pages. Why can't we look at
    the next 32 pages and if they're all vacuum-clean then skip looking at
    the heap at all for them. What we do now is do the regular vacuum
    algorithm and only after we've processed 32 pages in a row realize
    that it was a waste of effort.
    May be we want to have visibility map APIs to test if a chunk of pages
    are all visible or not. We can then use that API to test every N
    blocks (where N is the number where continuous sequential scans would
    still be better than sequential scans with gaps) and either read all
    of them sequentially or just skip all of them.

    Thanks,
    Pavan

    --
    Pavan Deolasee
    EnterpriseDB     http://www.enterprisedb.com
  • Pavan Deolasee at May 27, 2011 at 3:14 pm

    On Fri, May 27, 2011 at 7:41 PM, Heikki Linnakangas wrote:
    On 27.05.2011 16:52, Pavan Deolasee wrote:

    On closer inspection, I realized that we have
    deliberately put in this hook to ensure that we use visibility maps
    only when we see at least SKIP_PAGES_THRESHOLD worth of all-visible
    sequential pages to take advantage of possible OS seq scan
    optimizations.
    That, and the fact that if you skip any page, you can't advance
    relfrozenxid.
    Hmm. For a significantly large table, wouldn't it be the case that we
    would most likely skip one page somewhere ? Would it be better that we
    instead do a full scan every once in a while instead of relying on a
    not-so-well-understood heuristic ?
    My statistical skills are limited, but wouldn't that mean that for a
    fairly well distributed write activity across a large table, if there
    are even 3-4% update/deletes, we would most likely hit a
    not-all-visible page for every 32 pages scanned ? That would mean that
    almost entire relation will be scanned even if the visibility map
    tells us that only 3-4% pages require scanning ?  And the probability
    will increase with the increase in the percentage of updated/deleted
    tuples. Given that the likelihood of anyone calling VACUUM (manually
    or through autovac settings) on a table which has less than 3-4%
    updates/deletes is very low, I am worried that might be loosing all
    advantages of visibility maps for a fairly common use case.
    Well, as with normal queries, it's usually faster to just seqscan the whole
    table if you need to access more than a few percent of the pages, because
    sequential I/O is so much faster than random I/O. The visibility map really
    only helps if all the updates are limited to some part of the table.
    The vacuum scan is not a complete random scan. So I am not sure how
    effective a complete seq scan be. May be we need to run some tests to
    measure that too before we choose one over the other.

    Thanks,
    Pavan


    --
    Pavan Deolasee
    EnterpriseDB     http://www.enterprisedb.com
  • Bruce Momjian at Jun 3, 2011 at 7:16 pm

    Heikki Linnakangas wrote:
    On 27.05.2011 16:52, Pavan Deolasee wrote:
    On closer inspection, I realized that we have
    deliberately put in this hook to ensure that we use visibility maps
    only when we see at least SKIP_PAGES_THRESHOLD worth of all-visible
    sequential pages to take advantage of possible OS seq scan
    optimizations.
    That, and the fact that if you skip any page, you can't advance
    relfrozenxid.
    My statistical skills are limited, but wouldn't that mean that for a
    fairly well distributed write activity across a large table, if there
    are even 3-4% update/deletes, we would most likely hit a
    not-all-visible page for every 32 pages scanned ? That would mean that
    almost entire relation will be scanned even if the visibility map
    tells us that only 3-4% pages require scanning ? And the probability
    will increase with the increase in the percentage of updated/deleted
    tuples. Given that the likelihood of anyone calling VACUUM (manually
    or through autovac settings) on a table which has less than 3-4%
    updates/deletes is very low, I am worried that might be loosing all
    advantages of visibility maps for a fairly common use case.
    Well, as with normal queries, it's usually faster to just seqscan the
    whole table if you need to access more than a few percent of the pages,
    because sequential I/O is so much faster than random I/O. The visibility
    map really only helps if all the updates are limited to some part of the
    table. For example, if you only recent records are updated frequently,
    and old ones are almost never touched.
    I realize we just read the pages from the kernel to maintain sequential
    I/O, but do we actually read the contents of the page if we know it
    doesn't need vacuuming? If so, do we need to?

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://enterprisedb.com

    + It's impossible for everything to be true. +
  • Heikki Linnakangas at Jun 6, 2011 at 7:40 am

    On 03.06.2011 22:16, Bruce Momjian wrote:
    I realize we just read the pages from the kernel to maintain sequential
    I/O, but do we actually read the contents of the page if we know it
    doesn't need vacuuming? Yes.
    If so, do we need to?
    Not necessarily, but it allows us to freeze old tuples, and doesn't cost
    much anyway.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Greg Stark at Jun 7, 2011 at 10:59 am

    On Jun 3, 2011 8:38 PM, "Bruce Momjian" wrote:
    I realize we just read the pages from the kernel to maintain sequential
    I/O, but do we actually read the contents of the page if we know it
    doesn't need vacuuming? If so, do we need to?
    I dont follow. What's your question?

    Tom's final version does basically the optimal combination of the above I
    think.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedMay 27, '11 at 1:52p
activeJun 7, '11 at 10:59a
posts13
users8
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase