FAQ
There has been a lot of recent discussion about the visibility map (for
index-only scans) and hint bits (trying to avoid double-writing a
table).

I wonder if we could fix both of these at the same time. Once the
visibility map is reliable, can we use that to avoid updating the hint
bits on all rows on a page?

For bulk loads, all the pages are going have the same xid and all be
visible, so instead of writing the entire table, we just write the
visibility map.

I think the problem is that we have the PD_ALL_VISIBLE page flag, which
requires a write of the page as well. Could we get by with only the
visibility bits and remove PD_ALL_VISIBLE?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Search Discussions

  • Merlin Moncure at May 5, 2011 at 6:11 pm

    On Thu, May 5, 2011 at 11:59 AM, Bruce Momjian wrote:
    There has been a lot of recent discussion about the visibility map (for
    index-only scans) and hint bits (trying to avoid double-writing a
    table).
    I still think a small tqual.c maintained cache of hint bits will
    effectively eliminate hint bit i/o issues surrounding bulk loads. Tom
    fired a shot across the bow regarding the general worthiness of that
    technique though (see:
    http://postgresql.1045698.n5.nabble.com/Process-local-hint-bit-cache-td4270229.html)
    :(. I can rig up a cleaned up version of the patch pretty
    easily...it's a local change and fairly simple.

    I don't think there is any way to remove the hint bits without
    suffering some other problem.

    merlin
  • Bruce Momjian at May 5, 2011 at 6:34 pm

    Merlin Moncure wrote:
    On Thu, May 5, 2011 at 11:59 AM, Bruce Momjian wrote:
    There has been a lot of recent discussion about the visibility map (for
    index-only scans) and hint bits (trying to avoid double-writing a
    table).
    I still think a small tqual.c maintained cache of hint bits will
    effectively eliminate hint bit i/o issues surrounding bulk loads. Tom
    fired a shot across the bow regarding the general worthiness of that
    technique though (see:
    http://postgresql.1045698.n5.nabble.com/Process-local-hint-bit-cache-td4270229.html)
    :(. I can rig up a cleaned up version of the patch pretty
    easily...it's a local change and fairly simple.

    I don't think there is any way to remove the hint bits without
    suffering some other problem.
    Was that the idea that the pages had to fit in the cache and be updated
    with hint bits before being written to disk? Restricting that to the
    size of the buffer cache seemed very limiting.

    One 8k visibilty map page can hold bits for 1/2 gig of heap pages so I
    thought that would be a better all-visible indictor and avoid many
    all-visible page writes in bulk load cases.

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://enterprisedb.com

    + It's impossible for everything to be true. +
  • Merlin Moncure at May 5, 2011 at 6:52 pm

    On Thu, May 5, 2011 at 1:34 PM, Bruce Momjian wrote:
    Merlin Moncure wrote:
    On Thu, May 5, 2011 at 11:59 AM, Bruce Momjian wrote:
    There has been a lot of recent discussion about the visibility map (for
    index-only scans) and hint bits (trying to avoid double-writing a
    table).
    I still think a small tqual.c maintained cache of hint bits will
    effectively eliminate hint bit i/o issues surrounding bulk loads.  Tom
    fired a shot across the bow regarding the general worthiness of that
    technique though (see:
    http://postgresql.1045698.n5.nabble.com/Process-local-hint-bit-cache-td4270229.html)
    :(.  I can rig up a cleaned up version of the patch pretty
    easily...it's a local change and fairly simple.

    I don't think there is any way to remove the hint bits without
    suffering some other problem.
    Was that the idea that the pages had to fit in the cache and be updated
    with hint bits before being written to disk?  Restricting that to the
    size of the buffer cache seemed very limiting.

    One 8k visibilty map page can hold bits for 1/2 gig of heap pages so I
    thought that would be a better all-visible indictor and avoid many
    all-visible page writes in bulk load cases.
    no, that was my first idea -- check visibility when you evict. that
    helps a different problem but not bulk loads. One way it could help
    is for marking PD_ALL_VISIBLE. This might also be a winner but there
    is some valid skepticism that adding more work for bgwriter is really
    a good idea.

    The tqual cache idea is such that there is a small cache that
    remembers the commit/cancel status of recently seen transactions. If
    scan a tuple and the status is known via cache, you set the bit but
    don't mark the page dirty. That way, if you are scanning a lot of
    unhinted tuples with similar xid, you don't need to jam out i/o. I
    think the general concept is clean, but it might need some buy in from
    tom and some performance testing for justification.

    The alternate 'cleaner' approach of maintaining larger transam.c cache
    had some downsides I saw no simple workaround for.

    merlin
  • Kevin Grittner at May 5, 2011 at 7:00 pm

    Merlin Moncure wrote:

    a small cache that remembers the commit/cancel status of recently
    seen transactions.
    How is that different from the head of the clog SLRU?

    -Kevin
  • Merlin Moncure at May 5, 2011 at 7:20 pm

    On Thu, May 5, 2011 at 2:00 PM, Kevin Grittner wrote:
    Merlin Moncure wrote:
    a small cache that remembers the commit/cancel status of recently
    seen transactions.
    How is that different from the head of the clog SLRU?
    several things:
    *) any slru access requires lock (besides the lock itself, you are
    spending cycles in critical path)
    *) cache access happens at different stage of processing in
    HeapTupleSatisfiesMVCC: both TransactionIdIsCurrentTransactionId and
    TransactionIdIsInProgress have to be checked first. Logically, it's
    extension of hint bit check itself, not expansion of lower levels of
    caching
    *) in tqual.c you can sneak in some small optimizations like only
    caching the bit if it's known good in the WAL (XlogNeedsFlush). That
    way you don't need to keep checking it over and over for the same
    trasaction
    *) slru level accesses happen too late to give much benefit:

    I can't stress enough how tight HeapTupleSatisfiesMVCC is. On my
    workstation VM, each non inline function call shows up measurably in
    profiling. I think anything you do here has to be inline, hand
    rolled, and very tight (you can forget anything around dynahash).
    Delegating the cache management to transam or (even worse) slru level
    penalizes some workloads non-trivially.

    merlin
  • Merlin Moncure at May 6, 2011 at 2:42 pm

    On Thu, May 5, 2011 at 2:20 PM, Merlin Moncure wrote:
    On Thu, May 5, 2011 at 2:00 PM, Kevin Grittner
    wrote:
    Merlin Moncure wrote:
    a small cache that remembers the commit/cancel status of recently
    seen transactions.
    How is that different from the head of the clog SLRU?
    several things:
    *) any slru access requires lock (besides the lock itself, you are
    spending cycles in critical path)
    *) cache access happens at different stage of processing in
    HeapTupleSatisfiesMVCC: both TransactionIdIsCurrentTransactionId and
    TransactionIdIsInProgress have to be checked first. Logically, it's
    extension of hint bit check itself, not expansion of lower levels of
    caching
    *) in tqual.c you can sneak in some small optimizations like only
    caching the bit if it's known good in the WAL (XlogNeedsFlush).  That
    way you don't need to keep checking it over and over for the same
    trasaction
    *) slru level accesses happen too late to give much benefit:

    I can't stress enough how tight HeapTupleSatisfiesMVCC is.  On my
    workstation VM, each non inline function call shows up measurably in
    profiling.  I think anything you do here has to be inline, hand
    rolled, and very tight (you can forget anything around dynahash).
    Delegating the cache management to transam or (even worse) slru level
    penalizes some workloads non-trivially.
    An updated patch is attached. It's still WIP, but I need a little
    guidance before going further.

    What I did:
    *) Added a lot of source level comments that should explain better
    what's happening and why
    *) Fixed a significant number of goofs in the earlier patch.
    *) Reorganized the interaction with HeapTupleSatisfiesMVCC. In
    particular SetHintBits() is returning if it actually set the bit
    because I can use that information.

    What's not done:
    *) Only commit bits are cached, and caching action is only happening
    in HeapTupleSatisfiesMVCC. I'm not sure yet if it's better to store
    invalid bits in the same cache or in a separate one. I'm not sure if
    the other satisfies routines should also be engaging the cache.
    Translated from nerd speak, that means I haven't yet done the research
    to see when they are fired and if they are bottlenecks :-).

    *) I'd like to reach some sort of consensus with Tom if there is any
    point in going further in direction. Not so much on how the mechanics
    of how the cache work, but that it is at the tqual.c level and the
    changes to HeapTuplesSatisfiesMVCC. In particular. I think caching at
    transam.c level is a dead end on performance grounds regardless of how
    you implement the cache.

    Some points of note:
    *) Is it acceptable to use static definition of memory like that. If
    not, should there be a more standard allocation under
    CacheMemoryContext?

    *) Testing for the benefit is simple: just create a bunch of records
    and seqscan the table (select count(*)). Without the patch the first
    scan is slower and does a bunch of i/o. With it, it does not.

    *) The cache overhead is *almost* not measurable. As best I can tell
    we are looking at maybe 1% ish overhead in synthetic scan heavy
    workloads (i think this is a fair price to pay for all the i/o
    savings). The degenerate case of repeated 'rollups' is really
    difficult to generate, even synthetically -- if the cache is
    performing lousily the regular hint bit action tends to protect it.
    Performance testing under real workloads is going to give better info
    here.

    merlin
  • Robert Haas at May 5, 2011 at 6:45 pm

    On Thu, May 5, 2011 at 12:59 PM, Bruce Momjian wrote:
    I wonder if we could fix both of these at the same time.  Once the
    visibility map is reliable, can we use that to avoid updating the hint
    bits on all rows on a page?
    I don't think so. There are two problems:

    1. If there is a long-running transaction on the system, it will not
    be possible to set PD_ALL_VISIBLE, but hint bits can still be set. So
    there could be a significant performance regression if we don't set
    hint bits in that case.

    2. Making the visibility map crash-safe will mean making setting hint
    bits emit XLOG records, so it can't be done on Hot Standby servers at
    all, and it's much more expensive than just setting a hint bit on the
    master.
    For bulk loads, all the pages are going have the same xid and all be
    visible, so instead of writing the entire table, we just write the
    visibility map.

    I think the problem is that we have the PD_ALL_VISIBLE page flag, which
    requires a write of the page as well.  Could we get by with only the
    visibility bits and remove PD_ALL_VISIBLE?
    In some ways, that would make things much simpler. But to make that
    work, every insert/update/delete to a page would have to pin the
    visibility map page and clear PD_ALL_VISIBLE if appropriate, so it
    might not be good from a performance standpoint, especially in
    high-concurrency workloads. Right now, if PD_ALL_VISIBLE isn't set,
    we don't bother touching the visibility map page, which seems like a
    possibly important optimization.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedMay 5, '11 at 4:59p
activeMay 6, '11 at 2:42p
posts8
users4
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase