Maybe I'm being overly simplistic or incorrect here, but I was
thinking that there might be a route to reducing hint bit impact to
the main sufferers of the feature without adding too much pain in the
general case. I'm unfortunately convinced there is no getting rid of
them -- in fact their utility will become even more apparent with
faster storage and the pendulum of optimization swings back to the cpu
side.

My idea is to reserve a bit in the page header, say PD_ALL_SAME_XMIN
that indicates all the tuples are from the same transaction and set it
when the first insertion tuple hits the page and unset it when any
tuple is added from another xmin/touched/deleted. The point here is
to set up a cheap check at the page level that we can make when a page
is getting evicted from the bufmgr. If the bit is set, we grab off
the xmin of the first tuple on the page and test it for visibility
(assuming the hint bit is not already set). If we get a thumbs up on
the transaction, we can look the page and set all tuple hints as
during the page evict/sync process. We don't worry about
logging/crash safety on the 'all same' hint because it's only
interesting to this bufmgr check (it can even be cleared when page is
loaded).

Without this bit, the only way to set hint bits going during bufmgr
eviction is to do a visibility check on every tuple, which would
probably be prohibitively expensive. Since OLTP environments would
rarely see this bit, they would not have to pay for the check.

Also, we can maybe tweak the bufmgr to prefer not to evict pages with
this bit set if it's known they are not yet written out to primary
storage. Maybe this impossible or not logical...just thinking out
loud. Anyways, if this actually works, shared buffers can start to
play a role of mitigating hint bit i/o as long as the transaction
resolves before pages start jumping out into storage. If you couple
this with a facility to do bulk loads that break up transactions on
regular intervals, you have a good shot at getting all your hint bits
written out properly in large load situation.

You might be able to do similar tricks with deletes -- I haven't
thought about that. Also there might be some interplay with vacuum or
some other deal breaker -- curious to see if I have something worth
further thought here.

merlin

Search Discussions

  • Jim Nasby at Mar 25, 2011 at 3:35 pm

    On Mar 25, 2011, at 9:52 AM, Merlin Moncure wrote:
    Without this bit, the only way to set hint bits going during bufmgr
    eviction is to do a visibility check on every tuple, which would
    probably be prohibitively expensive. Since OLTP environments would
    rarely see this bit, they would not have to pay for the check.
    IIRC one of the biggest costs is accessing the CLOG, but what if the bufmgr.c/bgwriter didn't use the same CLOG lookup mechanism as backends did? Unlike when a backend is inspecting visibility, it's not necessary for something like bgwriter to know exact visibility as long as it doesn't mark something as visible when it shouldn't. If it uses a different CLOG caching/accessing method that lags behind the real CLOG then the worst-case scenario is that there's a delay on setting hint bits. But getting grwiter to dothis would likely still be a huge win over forcing backends to worry about it. It's also possible that the visibility check itself could be simplified.

    BTW, I don't think you want to play these games when a backend is evicting a page because you'll be slowing a real backend down.
    --
    Jim C. Nasby, Database Architect jim@nasby.net
    512.569.9461 (cell) http://jim.nasby.net
  • Merlin Moncure at Mar 25, 2011 at 4:40 pm

    On Fri, Mar 25, 2011 at 10:34 AM, Jim Nasby wrote:
    On Mar 25, 2011, at 9:52 AM, Merlin Moncure wrote:
    Without this bit, the only way to set hint bits going during bufmgr
    eviction is to do a visibility check on every tuple, which would
    probably be prohibitively expensive.  Since OLTP environments would
    rarely see this bit, they would not have to pay for the check.
    IIRC one of the biggest costs is accessing the CLOG, but what if the bufmgr.c/bgwriter didn't use the same CLOG lookup mechanism as backends did? Unlike when a backend is inspecting visibility, it's not necessary for something like bgwriter to know exact visibility as long as it doesn't mark something as visible when it shouldn't. If it uses a different CLOG caching/accessing method that lags behind the real CLOG then the worst-case scenario is that there's a delay on setting hint bits. But getting grwiter to dothis would likely still be a huge win over forcing backends to worry about it. It's also possible that the visibility check itself could be simplified.

    BTW, I don't think you want to play these games when a backend is evicting a page because you'll be slowing a real backend down.
    Well, I'm not so sure -- as noted above, you only pay for the check
    above when all the records in a page are new, and only once per page,
    not once per tuple. Basically, only when you are bulk jamming records
    through the buffers. The amoritized cost of the clog lookup is going
    to be near zero (maybe you could put a fuse in that would get tripped
    if there weren't enough tuples in the page to justify the check).

    If you are bulk loading more data than you have shared buffers, then
    you get zero benefit. However, you might having the makings of a
    strategy of dealing with hint bit i/o in user land. (by breaking up
    transactions, tweaking shared buffers, etc).

    merlin
  • Heikki Linnakangas at Mar 25, 2011 at 7:32 pm

    On 25.03.2011 16:52, Merlin Moncure wrote:
    Without this bit, the only way to set hint bits going during bufmgr
    eviction is to do a visibility check on every tuple, which would
    probably be prohibitively expensive.
    I don't think the naive approach of scanning all tuples would be too
    bad, actually. The hint bits only need to be set once, and it'd be
    bgwriter shouldering the overhead.

    The problem with setting hing bits when a buffer is evicted is that it
    doesn't help with the bulk load case. The hint bits can't be set for a
    bulk load until the load is finished and the transaction commits.

    Maybe it would still be worthwhile to have bgwriter set hint bits, to
    reduce I/O caused by hint bit updates in an OLTP workload, but that's
    not what people usually complain about.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Merlin Moncure at Mar 25, 2011 at 7:44 pm

    On Fri, Mar 25, 2011 at 2:32 PM, Heikki Linnakangas wrote:
    On 25.03.2011 16:52, Merlin Moncure wrote:

    Without this bit, the only way to set hint bits going during bufmgr
    eviction is to do a visibility check on every tuple, which would
    probably be prohibitively expensive.
    I don't think the naive approach of scanning all tuples would be too bad,
    actually. The hint bits only need to be set once, and it'd be bgwriter
    shouldering the overhead.

    The problem with setting hing bits when a buffer is evicted is that it
    doesn't help with the bulk load case. The hint bits can't be set for a bulk
    load until the load is finished and the transaction commits.
    Not the true bulk load case. However, if you can break up a load into
    multiple transactions and sneak out 10-100mb of pages into the buffer
    per transaction, you have a good chance of getting most/all the bits
    out correct before bgwriter eats them up. I was thinking to also
    teach bgwriter to keep xmin flagged pages in a separate lower priority
    pool so that it didn't race to them before the transaction had a
    chance to go in.

    Long term, I'm imagining more direct transaction control in the
    backend, either via autonomous transactions, or stored procedures with
    explicit transaction control, so we don't have to load N gigabytes in
    a single transaction.
    Maybe it would still be worthwhile to have bgwriter set hint bits, to reduce
    I/O caused by hint bit updates in an OLTP workload, but that's not what
    people usually complain about.
    well, if bgwriter does it, you lose the ability to bail the clog check
    via TransactionIdIsCurrentTransactionId, right? If it's done in the
    bufmgr you at least have a chance to not have to go all the way out.
    Either way though, you at least have to teach bgwriter to be more
    cooperative.

    merlin
  • Robert Haas at Mar 25, 2011 at 8:18 pm

    On Fri, Mar 25, 2011 at 3:32 PM, Heikki Linnakangas wrote:
    On 25.03.2011 16:52, Merlin Moncure wrote:

    Without this bit, the only way to set hint bits going during bufmgr
    eviction is to do a visibility check on every tuple, which would
    probably be prohibitively expensive.
    I don't think the naive approach of scanning all tuples would be too bad,
    actually. The hint bits only need to be set once, and it'd be bgwriter
    shouldering the overhead.
    I was thinking the same thing. The only thing I'm worried about is
    whether it'd make the bgwriter less responsive; we already have some
    issues in that department.
    The problem with setting hing bits when a buffer is evicted is that it
    doesn't help with the bulk load case. The hint bits can't be set for a bulk
    load until the load is finished and the transaction commits.

    Maybe it would still be worthwhile to have bgwriter set hint bits, to reduce
    I/O caused by hint bit updates in an OLTP workload, but that's not what
    people usually complain about.
    Yeah.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Merlin Moncure at Mar 28, 2011 at 1:48 pm

    On Fri, Mar 25, 2011 at 3:18 PM, Robert Haas wrote:
    On Fri, Mar 25, 2011 at 3:32 PM, Heikki Linnakangas
    wrote:
    On 25.03.2011 16:52, Merlin Moncure wrote:

    Without this bit, the only way to set hint bits going during bufmgr
    eviction is to do a visibility check on every tuple, which would
    probably be prohibitively expensive.
    I don't think the naive approach of scanning all tuples would be too bad,
    actually. The hint bits only need to be set once, and it'd be bgwriter
    shouldering the overhead.
    I was thinking the same thing.  The only thing I'm worried about is
    whether it'd make the bgwriter less responsive; we already have some
    issues in that department.
    I'd like to experiment on this and see what comes out. If the
    bgwriter was to be granted the ability to inspect buffers and set
    hints, it needs to be able to peek in and inspect the buffer itself
    which it currently doesn't do FWICT. I was thinking about setting a
    flag in the buffer (BM_HEAP) that gets set by the loader which flags
    the buffer for later inspection. Is there a simpler way to do this?

    It may turn out to be a dud, but I'd still like to play with the all
    visible bit and see how that interacts with data loading, both with
    and without special bgwriter logic (i'm going to kludge in a crude
    mechanism to try to prefer non all visible pages). The reason why I
    like it is the optimization is narrow and the risk of downside is low,
    although it's up a notch on the complexity level. If you do end up
    retooling the bgwriter to set hint bits broadly, there are some tricks
    you can do to reduce the number of useless clog checks you do (that
    is, you fault through to an in progress transaction). They involve
    changing the way the scan works, maybe even organizing buffers into
    multiple priority pools, so it's complicated and has to be done very
    carefully.

    I think you guys are correct: the logic belongs in the bgwriter.
    Generally speaking, it looks like the best route to minimizing hint
    bit pain is to if at all possible write them out set so they don't
    have to be rewritten later (Stephen's approach to leverage in
    transaction table creation is another way of attempting to do that).

    merlin
  • Robert Haas at Mar 28, 2011 at 2:10 pm

    On Mon, Mar 28, 2011 at 9:48 AM, Merlin Moncure wrote:
    I'd like to experiment on this and see what comes out. Great!
    If the
    bgwriter was to be granted the ability to inspect buffers and set
    hints, it needs to be able to peek in and inspect the buffer itself
    which it currently doesn't do FWICT.
    That matches my understanding.
    I was thinking about setting a
    flag in the buffer (BM_HEAP) that gets set by the loader which flags
    the buffer for later inspection.  Is there a simpler way to do this?
    Hmm. That's slightly crufty, but it might be OK. At least, I don't
    have a better idea.
    I think you guys are correct: the logic belongs in the bgwriter.
    Generally speaking, it looks like the best route to minimizing hint
    bit pain is to if at all possible write them out set so they don't
    have to be rewritten later (Stephen's approach to leverage in
    transaction table creation is another way of attempting to do that).
    Yeah.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Tom Lane at Mar 28, 2011 at 2:19 pm

    Robert Haas writes:
    On Mon, Mar 28, 2011 at 9:48 AM, Merlin Moncure wrote:
    I was thinking about setting a
    flag in the buffer (BM_HEAP) that gets set by the loader which flags
    the buffer for later inspection.  Is there a simpler way to do this?
    Hmm. That's slightly crufty, but it might be OK. At least, I don't
    have a better idea.
    The major problem with all of this is that the bgwriter has no idea
    which buffers contain heap pages. And I'm not convinced it's a good
    idea to try to let it know that. If we get to the point where bgwriter
    is trying to do catalog accesses, we are in for a world of pain.
    (Can you say "modularity violation"? How about "deadlock"?)

    regards, tom lane
  • Kevin Grittner at Mar 28, 2011 at 2:29 pm

    Tom Lane wrote:

    The major problem with all of this is that the bgwriter has no
    idea which buffers contain heap pages. And I'm not convinced it's
    a good idea to try to let it know that. If we get to the point
    where bgwriter is trying to do catalog accesses, we are in for a
    world of pain. (Can you say "modularity violation"? How about
    "deadlock"?)
    How about having a BackgroundPrepareForWriteFunction variable
    associated with each page the bgwriter might see, which would be a
    pointer to a function to call (if the variable is not NULL) before
    writing? The bgwriter would still have no idea what kind of page it
    was or what the function did....

    -Kevin
  • Merlin Moncure at Mar 28, 2011 at 2:49 pm

    On Mon, Mar 28, 2011 at 9:29 AM, Kevin Grittner wrote:
    Tom Lane wrote:
    The major problem with all of this is that the bgwriter has no
    idea which buffers contain heap pages.  And I'm not convinced it's
    a good idea to try to let it know that.  If we get to the point
    where bgwriter is trying to do catalog accesses, we are in for a
    world of pain. (Can you say "modularity violation"?  How about
    "deadlock"?)
    How about having a BackgroundPrepareForWriteFunction variable
    associated with each page the bgwriter might see, which would be a
    pointer to a function to call (if the variable is not NULL) before
    writing?  The bgwriter would still have no idea what kind of page it
    was or what the function did....
    Well, that is much cleaner from abstraction point of view but you lose
    the ability to adjust scan priority before flushing out the page...I'm
    assuming by the time this function is called, you've already made the
    decision to write it out. (maybe priority is necessary and maybe it
    isn't, but I don't like losing the ability to tune at that level).

    You could though put a priority inspection facility behind a similar
    abstraction fence (BackgroundGetWritePriority) though. Maybe that's
    more trouble than it's worth though.

    merlin
  • Jim Nasby at Apr 5, 2011 at 3:24 pm

    On Mar 28, 2011, at 9:48 AM, Merlin Moncure wrote:
    On Mon, Mar 28, 2011 at 9:29 AM, Kevin Grittner
    wrote:
    Tom Lane wrote:
    The major problem with all of this is that the bgwriter has no
    idea which buffers contain heap pages. And I'm not convinced it's
    a good idea to try to let it know that. If we get to the point
    where bgwriter is trying to do catalog accesses, we are in for a
    world of pain. (Can you say "modularity violation"? How about
    "deadlock"?)
    How about having a BackgroundPrepareForWriteFunction variable
    associated with each page the bgwriter might see, which would be a
    pointer to a function to call (if the variable is not NULL) before
    writing? The bgwriter would still have no idea what kind of page it
    was or what the function did....
    Well, that is much cleaner from abstraction point of view but you lose
    the ability to adjust scan priority before flushing out the page...I'm
    assuming by the time this function is called, you've already made the
    decision to write it out. (maybe priority is necessary and maybe it
    isn't, but I don't like losing the ability to tune at that level).

    You could though put a priority inspection facility behind a similar
    abstraction fence (BackgroundGetWritePriority) though. Maybe that's
    more trouble than it's worth though.
    Merlin, does your new work on CLOG caching negate anything in this thread? I think there's some ideas here worth further investigation and want to make sure they don't get lost.
    --
    Jim C. Nasby, Database Architect jim@nasby.net
    512.569.9461 (cell) http://jim.nasby.net
  • Merlin Moncure at Apr 5, 2011 at 3:59 pm

    On Tue, Apr 5, 2011 at 9:49 AM, Jim Nasby wrote:
    On Mar 28, 2011, at 9:48 AM, Merlin Moncure wrote:
    On Mon, Mar 28, 2011 at 9:29 AM, Kevin Grittner
    wrote:
    Tom Lane wrote:
    The major problem with all of this is that the bgwriter has no
    idea which buffers contain heap pages.  And I'm not convinced it's
    a good idea to try to let it know that.  If we get to the point
    where bgwriter is trying to do catalog accesses, we are in for a
    world of pain. (Can you say "modularity violation"?  How about
    "deadlock"?)
    How about having a BackgroundPrepareForWriteFunction variable
    associated with each page the bgwriter might see, which would be a
    pointer to a function to call (if the variable is not NULL) before
    writing?  The bgwriter would still have no idea what kind of page it
    was or what the function did....
    Well, that is much cleaner from abstraction point of view but you lose
    the ability to adjust scan priority before flushing out the page...I'm
    assuming by the time this function is called, you've already made the
    decision to write it out.  (maybe priority is necessary and maybe it
    isn't, but I don't like losing the ability to tune at that level).

    You could though put a priority inspection facility behind a similar
    abstraction fence (BackgroundGetWritePriority) though.  Maybe that's
    more trouble than it's worth though.
    Merlin, does your new work on CLOG caching negate anything in this thread? I think there's some ideas here worth further investigation and want to make sure they don't get lost.
    No, they don't -- and I plan to work on this independently.

    The performance tradeoffs here are much more complicated and will
    require extensive benchmarking to analyze. A process local clog
    cache, if it can be made to work (and that's be no means certain) is
    going to affect how this is put together. In particular, i'd be even
    more disinclined to adjust scan priorty or do anything fancy like that
    -- and more amenable to checking every tuple. I'm particularly
    interested in setting the PD_ALL_VISIBLE bit at eviction time if it's
    available to be set and the page is already dirty.

    merlin
  • Robert Haas at Mar 28, 2011 at 2:44 pm

    On Mon, Mar 28, 2011 at 10:19 AM, Tom Lane wrote:
    Robert Haas <robertmhaas@gmail.com> writes:
    On Mon, Mar 28, 2011 at 9:48 AM, Merlin Moncure wrote:
    I was thinking about setting a
    flag in the buffer (BM_HEAP) that gets set by the loader which flags
    the buffer for later inspection.  Is there a simpler way to do this?
    Hmm.  That's slightly crufty, but it might be OK.  At least, I don't
    have a better idea.
    The major problem with all of this is that the bgwriter has no idea
    which buffers contain heap pages.  And I'm not convinced it's a good
    idea to try to let it know that.  If we get to the point where bgwriter
    is trying to do catalog accesses, we are in for a world of pain.
    (Can you say "modularity violation"?  How about "deadlock"?)
    Well, that's why Merlin was suggesting having the backends that read
    the buffers in flag the heap pages as BM_HEAP. Then the background
    writer can just examine that bit.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedMar 25, '11 at 2:52p
activeApr 5, '11 at 3:59p
posts14
users6
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase