FAQ
Hi,

after having discussed $subject shortly over dinner yesterday, while I
should have been preparing the slides for my talk I noticed that there
might be a rather easy way to get rid of freezing.

I think that the existence of hint bits and the crash safe visibility
maps should provide sufficient tooling to make freezing unneccessary
without loosing much information for debugging if we modify the way
vacuum works a bit.

Currently, aside from recovery, we only set all visible in vacuum.

vacuumlazy.c's lazy_scan_heap currently works like:

for (blkno = 0; blkno < nblocks; blkno++)
{
     if (!scan_all && invisible)
        continue;

     /* cannot lock buffer immediately */
     if (!ConditionalLockBufferForCleanup(buf))
     {
         if (!scan_all)
             continue;

         /* don't block if we don't need freezing */
         if (!lazy_check_needs_freeze(buf))
            continue;

         /* now wait for cleanup lock */
         LockBufferForCleanup(buf);
     }

     for (tuple in all_tuples)
     {
         cleanup_tuple();
     }

     if (nfrozen > 0)
        log_heap_freeze()

     if (all_visible)
     {
         PageSetAllVisible(page);
         visibilitymap_set(page);
     }
}

In other words, if we don't need to make sure there aren't any old
tuples, we only scan visible parts of the relation. If we are making a
freeze vacuum we scan the whole relation, waiting for a cleanup lock on
the relation if necessary.

We currently need to make sure we scanned the whole relation and have
frozen everything to have a sensible relfrozenxid for a relation.

So, what I propose instead is basically:
1) only vacuum non-all-visible pages, even when doing it for
    anti-wraparound
2) When we can set all-visible guarantee that all tuples on the page are
    fully hinted. During recovery do the same, so we don't need to log
    all hint bits.
    We can do this with only an exclusive lock on the buffer, we don't
    need a cleanup lock.
3) When we cannot mark a page all-visible or we cannot get the cleanup
    lock, remember the oldest xmin on that page. We could set all visible
    in the former case, but we want the page to be cleaned up sometime
    soonish.
4) If we can get the cleanup lock, purge dead tuples from the page and
    the indexes, just as today. Set the page as all-visible.

That way we know that any page that is all-visible doesn't ever need to
look at xmin/xmax since we are sure to have set all relevant hint
bits.

We don't even necessarily need to log the hint bits for all items since
the redo for all_visible could make sure all items are hinted. The only
problem is knowing up to where we can truncate pg_clog...

Makes sense?

Greetings,

Andres Freund

--
  Andres Freund http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Search Discussions

  • Andres Freund at May 23, 2013 at 6:11 pm

    On 2013-05-23 19:51:48 +0200, Andres Freund wrote:
    I think that the existence of hint bits and the crash safe visibility
    maps should provide sufficient tooling to make freezing unneccessary
    without loosing much information for debugging if we modify the way
    vacuum works a bit.
    That way we know that any page that is all-visible doesn't ever need to
    look at xmin/xmax since we are sure to have set all relevant hint
    bits.
    One case that would make this problematic is row level locks on
    tuples. We would need to unset all visible for them, otherwise we might
    do the wrong thing when looking at xmax...

    Greetings,

    Andres Freund

    --
      Andres Freund http://www.2ndQuadrant.com/
      PostgreSQL Development, 24x7 Support, Training & Services
  • Andres Freund at May 23, 2013 at 7:03 pm

    On 2013-05-23 19:51:48 +0200, Andres Freund wrote:
    We currently need to make sure we scanned the whole relation and have
    frozen everything to have a sensible relfrozenxid for a relation.

    So, what I propose instead is basically:
    1) only vacuum non-all-visible pages, even when doing it for
    anti-wraparound
    2) When we can set all-visible guarantee that all tuples on the page are
    fully hinted. During recovery do the same, so we don't need to log
    all hint bits.
    We can do this with only an exclusive lock on the buffer, we don't
    need a cleanup lock.
    3) When we cannot mark a page all-visible or we cannot get the cleanup
    lock, remember the oldest xmin on that page. We could set all visible
    in the former case, but we want the page to be cleaned up sometime
    soonish.
    4) If we can get the cleanup lock, purge dead tuples from the page and
    the indexes, just as today. Set the page as all-visible.

    That way we know that any page that is all-visible doesn't ever need to
    look at xmin/xmax since we are sure to have set all relevant hint
    bits.
    Heikki noticed that I made quite the omission here which is that you
    would need to mark tuples as all visible as well. I was thinking about
    using HEAP_MOVED_OFF | HEAP_MOVED_IN as a hint for that.

    Greetings,

    Andres Freund

    --
      Andres Freund http://www.2ndQuadrant.com/
      PostgreSQL Development, 24x7 Support, Training & Services
  • Hannu Krosing at May 24, 2013 at 3:50 am

    On 05/23/2013 10:03 PM, Andres Freund wrote:
    On 2013-05-23 19:51:48 +0200, Andres Freund wrote:
    We currently need to make sure we scanned the whole relation and have
    frozen everything to have a sensible relfrozenxid for a relation.

    So, what I propose instead is basically:
    1) only vacuum non-all-visible pages, even when doing it for
    anti-wraparound
    2) When we can set all-visible guarantee that all tuples on the page are
    fully hinted. During recovery do the same, so we don't need to log
    all hint bits.
    We can do this with only an exclusive lock on the buffer, we don't
    need a cleanup lock.
    3) When we cannot mark a page all-visible or we cannot get the cleanup
    lock, remember the oldest xmin on that page. We could set all visible
    in the former case, but we want the page to be cleaned up sometime
    soonish.
    4) If we can get the cleanup lock, purge dead tuples from the page and
    the indexes, just as today. Set the page as all-visible.

    That way we know that any page that is all-visible doesn't ever need to
    look at xmin/xmax since we are sure to have set all relevant hint
    bits.
    Heikki noticed that I made quite the omission here which is that you
    would need to mark tuples as all visible as well. I was thinking about
    using HEAP_MOVED_OFF | HEAP_MOVED_IN as a hint for that.
    We could have a "vacuum_less=true" mode, where instead of marking tuples
    all visible
    here you actually freeze them, that is set the xid to frozen. You will
    get less forensic
    capability in exchange of less vacuuming.

    Maybe also add an "early_freeze" hint bit to mark this situation.

    Or maybe set the tuples frozenxid when un-marking the page as all
    visible to delay
    the effects a little ?

    Hannu
    Greetings,

    Andres Freund
  • Robert Haas at May 24, 2013 at 2:09 am

    On Thu, May 23, 2013 at 1:51 PM, Andres Freund wrote:
    So, what I propose instead is basically:
    1) only vacuum non-all-visible pages, even when doing it for
    anti-wraparound
    Check. We might want an option to force a scan of the whole relation.
    2) When we can set all-visible guarantee that all tuples on the page are
    fully hinted. During recovery do the same, so we don't need to log
    all hint bits.
    We can do this with only an exclusive lock on the buffer, we don't
    need a cleanup lock.
    I don't think this works. Emitting XLOG_HEAP_VISIBLE for a heap page
    does not emit an FPI for the heap page, only (if needed) for the
    visibility map page. So a subsequent crash that tears the page could
    keep XLOG_HEAP_VISIBLE but lose other changes on the page - i.e. the
    hint bits.
    3) When we cannot mark a page all-visible or we cannot get the cleanup
    lock, remember the oldest xmin on that page. We could set all visible
    in the former case, but we want the page to be cleaned up sometime
    soonish.
    I think you mean "in the latter case" not "in the former case". If
    not, then I'm confused.
    4) If we can get the cleanup lock, purge dead tuples from the page and
    the indexes, just as today. Set the page as all-visible.

    That way we know that any page that is all-visible doesn't ever need to
    look at xmin/xmax since we are sure to have set all relevant hint
    bits.

    We don't even necessarily need to log the hint bits for all items since
    the redo for all_visible could make sure all items are hinted. The only
    problem is knowing up to where we can truncate pg_clog...
    The redo for all_visible cannot make sure all items are hinted.
    Again, there's no FPI on the heap page. The heap page could in fact
    contain dead tuples at the time we mark it all-visible. Consider, for
    example:

    0. Checkpoint.
    1. The buffer becomes all visible.
    2. A tuple is inserted, making the buffer not-all-visible.
    3. The page is written by the OS.
    4. Crash.

    Now, recovery will first find the record marking the buffer
    all-visible, and will mark it all-visible. Now the all-visible bit on
    the page is flat-out wrong, but it doesn't matter because we haven't
    reached consistency. Next we'll find the heap-insert record, which
    will have an FPI, since it's the first WAL-logged change to the buffer
    since the last checkpoint. Now the FPI fixes everything and we're
    back in a sane state.

    Now in this particular case it wouldn't hurt anything if the redo
    routine that set the all-visible bit also hinted all the tuples,
    because the FPI is going to overwrite it anyway. But suppose in lieu
    of steps (3) and (4) we write half of the page and then crash, leaving
    behind a torn page. Now it's pretty crazy to think about trying to
    hint tuples; the page may be in a completely insane state.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Andres Freund at May 24, 2013 at 2:53 pm

    On 2013-05-23 22:09:02 -0400, Robert Haas wrote:
    On Thu, May 23, 2013 at 1:51 PM, Andres Freund wrote:
    So, what I propose instead is basically:
    1) only vacuum non-all-visible pages, even when doing it for
    anti-wraparound
    Check. We might want an option to force a scan of the whole relation.
    Yea, thought of that as well. VACUUM (DEEP) ;).
    3) When we cannot mark a page all-visible or we cannot get the cleanup
    lock, remember the oldest xmin on that page. We could set all visible
    in the former case, but we want the page to be cleaned up sometime
    soonish.
    I think you mean "in the latter case" not "in the former case". If
    not, then I'm confused.
    Uh. Yes.
    We don't even necessarily need to log the hint bits for all items since
    the redo for all_visible could make sure all items are hinted. The only
    problem is knowing up to where we can truncate pg_clog...
    [all-visible cannot restore hint bits without FPI because of torn pages]
    I haven't yet thought about this sufficiently yet. I think we might have
    a chance of working around this, let me ponder a bit.

    But even if that means needing a full page write via the usual mechanism
    for all visible if any hint bits needed to be set we are still out far
    ahead of the current state imo.
    * cleanup would quite possibly do an FPI shortly after in vacuum
       anyway. If we do it for all visible, it possibly does not need to be
       done for it.
    * freezing would FPI almost guaranteedly since we do it so much
       later.
    * Not having to rescan the whole heap will be a bigger cost saving...

    Greetings,

    Andres Freund

    --
      Andres Freund http://www.2ndQuadrant.com/
      PostgreSQL Development, 24x7 Support, Training & Services
  • Robert Haas at May 24, 2013 at 3:29 pm

    On Fri, May 24, 2013 at 10:53 AM, Andres Freund wrote:
    [all-visible cannot restore hint bits without FPI because of torn pages]
    I haven't yet thought about this sufficiently yet. I think we might have
    a chance of working around this, let me ponder a bit.
    Yeah. I too feel like there might be a solution. But I don't know
    have something specific in mind, yet anyway.
    But even if that means needing a full page write via the usual mechanism
    for all visible if any hint bits needed to be set we are still out far
    ahead of the current state imo.
    * cleanup would quite possibly do an FPI shortly after in vacuum
    anyway. If we do it for all visible, it possibly does not need to be
    done for it.
    * freezing would FPI almost guaranteedly since we do it so much
    later.
    * Not having to rescan the whole heap will be a bigger cost saving...
    The basic problem is that if the data is going to be removed before it
    would have gotten frozen, then the extra FPIs are just overhead. In
    effect, we're just deciding to freeze a lot sooner. And while that
    might well be beneficial in some use cases (e.g. the data's already in
    cache) it might also not be so beneficial (the table is larger than
    cache and would have been dropped before freezing kicked in).

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Andres Freund at May 24, 2013 at 3:52 pm

    On 2013-05-24 11:29:10 -0400, Robert Haas wrote:
    But even if that means needing a full page write via the usual mechanism
    for all visible if any hint bits needed to be set we are still out far
    ahead of the current state imo.
    * cleanup would quite possibly do an FPI shortly after in vacuum
    anyway. If we do it for all visible, it possibly does not need to be
    done for it.
    * freezing would FPI almost guaranteedly since we do it so much
    later.
    * Not having to rescan the whole heap will be a bigger cost saving...
    The basic problem is that if the data is going to be removed before it
    would have gotten frozen, then the extra FPIs are just overhead. In
    effect, we're just deciding to freeze a lot sooner.
    Well, freezing without removing information for debugging.
    And while that
    might well be beneficial in some use cases (e.g. the data's already in
    cache) it might also not be so beneficial (the table is larger than
    cache and would have been dropped before freezing kicked in).
    Not sure how caching comes into play here? At this point we know the
    page to be in cache already since vacuum is looking at it anyway?

    I think it's not really comparable since in those situations we a)
    already do an XLogInsert(). b) already dirty the page. so the only
    change is that we possibly write an additionall full page image. If
    there is actually near future DML write activity that would make the
    all-visible superflous that would have to FPI likely anyway.

    Greetings,

    Andres Freund

    --
      Andres Freund http://www.2ndQuadrant.com/
      PostgreSQL Development, 24x7 Support, Training & Services
  • Robert Haas at May 24, 2013 at 4:12 pm

    On Fri, May 24, 2013 at 11:52 AM, Andres Freund wrote:
    The basic problem is that if the data is going to be removed before it
    would have gotten frozen, then the extra FPIs are just overhead. In
    effect, we're just deciding to freeze a lot sooner.
    Well, freezing without removing information for debugging.
    Sure, but what I'm trying to avoid is incurring the WAL cost of
    freezing. If we didn't mind paying that sooner, we could just drop
    vacuum_freeze_min/table_age. But we do mind that.
    And while that
    might well be beneficial in some use cases (e.g. the data's already in
    cache) it might also not be so beneficial (the table is larger than
    cache and would have been dropped before freezing kicked in).
    Not sure how caching comes into play here? At this point we know the
    page to be in cache already since vacuum is looking at it anyway? OK, true.
    I think it's not really comparable since in those situations we a)
    already do an XLogInsert(). b) already dirty the page. so the only
    change is that we possibly write an additionall full page image. If
    there is actually near future DML write activity that would make the
    all-visible superflous that would have to FPI likely anyway.
    Well, if there's near-future write activity, then freezing is pretty
    worthless anyway. What I'm trying to avoid is adding WAL overhead in
    the case where there *isnt* any near-future write activity, like
    inserting 100MB of data into an existing table.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Robert Haas at May 24, 2013 at 4:00 pm

    On Fri, May 24, 2013 at 11:29 AM, Robert Haas wrote:
    On Fri, May 24, 2013 at 10:53 AM, Andres Freund wrote:
    [all-visible cannot restore hint bits without FPI because of torn pages]
    I haven't yet thought about this sufficiently yet. I think we might have
    a chance of working around this, let me ponder a bit.
    Yeah. I too feel like there might be a solution. But I don't know
    have something specific in mind, yet anyway.
    One thought I had is that it might be beneficial to freeze when a page
    ceases to be all-visible, rather than when it becomes all-visible.
    Any operation that makes the page not-all-visible is going to emit an
    FPI anyway, so we don't have to worry about torn pages in that case.
    Under such a scheme, we'd have to enforce the rule that xmin and xmax
    are ignored for any page that is all-visible; and when a page ceases
    to be all-visible, we have to go back and really freeze the
    pre-existing tuples. I think we might be able to use the existing
    all_visible_cleared/new_all_visible_cleared flags to trigger this
    behavior, without adding anything new to WAL at all.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Hannu Krosing at May 25, 2013 at 6:23 am

    On 05/24/2013 07:00 PM, Robert Haas wrote:
    On Fri, May 24, 2013 at 11:29 AM, Robert Haas wrote:
    On Fri, May 24, 2013 at 10:53 AM, Andres Freund wrote:
    [all-visible cannot restore hint bits without FPI because of torn pages]
    I haven't yet thought about this sufficiently yet. I think we might have
    a chance of working around this, let me ponder a bit.
    Yeah. I too feel like there might be a solution. But I don't know
    have something specific in mind, yet anyway.
    One thought I had is that it might be beneficial to freeze when a page
    ceases to be all-visible, rather than when it becomes all-visible.
    That what I aimed to describe in my mail earlier, but your
    description is much clearer :)
    Any operation that makes the page not-all-visible is going to emit an
    FPI anyway, so we don't have to worry about torn pages in that case.
    Under such a scheme, we'd have to enforce the rule that xmin and xmax
    are ignored for any page that is all-visible;
    Agreed. We already relay on all-visible pages enough that we
    can trust it to be correct. Making that universal rule should not
    add any risks .
    The rule "page all-visible ==> assume all tuples frozen" would
    also enable VACUUM FREEZE to only work only on the
    non-all-visible pages .
    and when a page ceases
    to be all-visible, we have to go back and really freeze the
    pre-existing tuples.
    We can do this unconditionally, or in milder case use vacuum_freeze_min_age
    if we want to retain xids for forensic purposes.
    I think we might be able to use the existing
    all_visible_cleared/new_all_visible_cleared flags to trigger this
    behavior, without adding anything new to WAL at all.
    This seems to be easiest

    --
    Hannu Krosing
    PostgreSQL Consultant
    Performance, Scalability and High Availability
    2ndQuadrant Nordic OÜ
  • Simon Riggs at May 25, 2013 at 10:14 am

    On 24 May 2013 17:00, Robert Haas wrote:
    On Fri, May 24, 2013 at 11:29 AM, Robert Haas wrote:
    On Fri, May 24, 2013 at 10:53 AM, Andres Freund wrote:
    [all-visible cannot restore hint bits without FPI because of torn pages]
    I haven't yet thought about this sufficiently yet. I think we might have
    a chance of working around this, let me ponder a bit.
    Yeah. I too feel like there might be a solution. But I don't know
    have something specific in mind, yet anyway.
    One thought I had is that it might be beneficial to freeze when a page
    ceases to be all-visible, rather than when it becomes all-visible.
    Any operation that makes the page not-all-visible is going to emit an
    FPI anyway, so we don't have to worry about torn pages in that case.
    Under such a scheme, we'd have to enforce the rule that xmin and xmax
    are ignored for any page that is all-visible; and when a page ceases
    to be all-visible, we have to go back and really freeze the
    pre-existing tuples. I think we might be able to use the existing
    all_visible_cleared/new_all_visible_cleared flags to trigger this
    behavior, without adding anything new to WAL at all.
    I like the idea but it would mean we'd have to freeze in the
    foreground path rather in a background path.

    Have we given up on the double buffering idea to remove FPIs
    completely? If we did that, then this wouldn't work.

    Anyway, I take it the direction of this idea is that "we don't need a
    separate freezemap, just use the vismap". That seems to be forcing
    ideas down a particular route we may regret. I'd rather just keep
    those things separate, even if we manage to merge the WAL actions for
    most of the time.


    Some other related thoughts:

    ISTM that if we really care about keeping xids for debug purposes that
    it could be a parameter. For the mainline, we just freeze blocks at
    the same time we do page pruning.

    I think the right way is actually to rethink and simplify all this
    complexity of Freezing/Pruning/Hinting/Visibility

    --
      Simon Riggs http://www.2ndQuadrant.com/
      PostgreSQL Development, 24x7 Support, Training & Services
  • Hannu Krosing at May 26, 2013 at 12:15 pm

    On 05/25/2013 01:14 PM, Simon Riggs wrote:
    On 24 May 2013 17:00, Robert Haas wrote:
    On Fri, May 24, 2013 at 11:29 AM, Robert Haas wrote:
    On Fri, May 24, 2013 at 10:53 AM, Andres Freund wrote:
    [all-visible cannot restore hint bits without FPI because of torn pages]
    I haven't yet thought about this sufficiently yet. I think we might have
    a chance of working around this, let me ponder a bit.
    Yeah. I too feel like there might be a solution. But I don't know
    have something specific in mind, yet anyway.
    One thought I had is that it might be beneficial to freeze when a page
    ceases to be all-visible, rather than when it becomes all-visible.
    Any operation that makes the page not-all-visible is going to emit an
    FPI anyway, so we don't have to worry about torn pages in that case.
    Under such a scheme, we'd have to enforce the rule that xmin and xmax
    are ignored for any page that is all-visible; and when a page ceases
    to be all-visible, we have to go back and really freeze the
    pre-existing tuples. I think we might be able to use the existing
    all_visible_cleared/new_all_visible_cleared flags to trigger this
    behavior, without adding anything new to WAL at all.
    I like the idea but it would mean we'd have to freeze in the
    foreground path rather in a background path.

    Have we given up on the double buffering idea to remove FPIs
    completely? If we did that, then this wouldn't work.

    Anyway, I take it the direction of this idea is that "we don't need a
    separate freezemap, just use the vismap". That seems to be forcing
    ideas down a particular route we may regret. I'd rather just keep
    those things separate, even if we manage to merge the WAL actions for
    most of the time.


    Some other related thoughts:

    ISTM that if we really care about keeping xids for debug purposes that
    it could be a parameter. For the mainline, we just freeze blocks at
    the same time we do page pruning.

    I think the right way is actually to rethink and simplify all this
    complexity of Freezing/Pruning/Hinting/Visibility
    I think that tis xmin, xmax business is mainly leftovers from the time when
    PostgreSQL was a full history database. If we are happy to descide that we
    do not want to resurrect this feature, at least not the same way, then
    freezing
    at the earliest or most convenient possibility seems the way to go .

    The "forensic" part has always been just a nice side effect of this
    design and
    not the main design considerataion.

    --
    Hannu Krosing
    PostgreSQL Consultant
    Performance, Scalability and High Availability
    2ndQuadrant Nordic OÜ
  • Robert Haas at May 28, 2013 at 2:15 pm

    On Sat, May 25, 2013 at 6:14 AM, Simon Riggs wrote:
    One thought I had is that it might be beneficial to freeze when a page
    ceases to be all-visible, rather than when it becomes all-visible.
    Any operation that makes the page not-all-visible is going to emit an
    FPI anyway, so we don't have to worry about torn pages in that case.
    Under such a scheme, we'd have to enforce the rule that xmin and xmax
    are ignored for any page that is all-visible; and when a page ceases
    to be all-visible, we have to go back and really freeze the
    pre-existing tuples. I think we might be able to use the existing
    all_visible_cleared/new_all_visible_cleared flags to trigger this
    behavior, without adding anything new to WAL at all.
    I like the idea but it would mean we'd have to freeze in the
    foreground path rather in a background path.
    That's true, but I think with this approach it would be really cheap.
    The overhead of setting a few bits in a page is very small compared to
    the overhead of emitting a WAL record. We'd have to test it, but I
    wouldn't be surprised to find the cost is too small to measure.
    Have we given up on the double buffering idea to remove FPIs
    completely? If we did that, then this wouldn't work.
    I don't see why those things are mutually exclusive. What is the relationship?
    Anyway, I take it the direction of this idea is that "we don't need a
    separate freezemap, just use the vismap". That seems to be forcing
    ideas down a particular route we may regret. I'd rather just keep
    those things separate, even if we manage to merge the WAL actions for
    most of the time.
    Hmm. To me it seems highly desirable to merge those things, because
    they're basically the same thing. The earliest time at which we can
    freeze a tuple is when it's all-visible, and the only argument I've
    ever heard for waiting longer is to preserve the original xmin for
    forensic purposes, which I think we can do anyway. I have posted a
    patch for that on another thread. I don't like having two separate
    concepts where one will do; I think the fact that it is structured
    that way today is mostly an artifact of one setting being page-level
    and the other tuple-level, which is a thin excuse for so much
    complexity.
    I think the right way is actually to rethink and simplify all this
    complexity of Freezing/Pruning/Hinting/Visibility
    I agree, but I think that's likely to have to wait until we get a
    pluggable storage API, and then a few years beyond that for someone to
    develop the technology to enable the new and better way. In the
    meantime, if we can eliminate or even reduce the impact of freezing in
    the near term, I think that's worth doing.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Simon Riggs at Jun 1, 2013 at 5:56 pm

    On 28 May 2013 15:15, Robert Haas wrote:
    On Sat, May 25, 2013 at 6:14 AM, Simon Riggs wrote:

    I think the right way is actually to rethink and simplify all this
    complexity of Freezing/Pruning/Hinting/Visibility
    I agree, but I think that's likely to have to wait until we get a
    pluggable storage API, and then a few years beyond that for someone to
    develop the technology to enable the new and better way. In the
    meantime, if we can eliminate or even reduce the impact of freezing in
    the near term, I think that's worth doing.
    I think we can do better more quickly than that.

    Andres' basic idea of skipping freeze completely was a valuable one
    and is the right way forwards. And it looks like the epoch based
    approach that Heikki and I have come up seems likely to end up
    somewhere workable.

    --
      Simon Riggs http://www.2ndQuadrant.com/
      PostgreSQL Development, 24x7 Support, Training & Services
  • Josh Berkus at May 24, 2013 at 7:49 pm
    Andres,

    If I understand your solution correctly, though, this doesn't really
    help the pathological case for freezing, which is the time-oriented
    append-only table. For data which isn't being used, allvisible won't be
    set either because it won't have been read, no? Is it still cheaper to
    set allvisible than vacuum freeze even in that case?

    Don't get me wrong, I'm in favor of this if it fixes the other (more
    common) cases. I just want to be clear on the limitations.

    --
    Josh Berkus
    PostgreSQL Experts Inc.
    http://pgexperts.com
  • Andres Freund at May 24, 2013 at 8:03 pm

    On 2013-05-24 15:49:31 -0400, Josh Berkus wrote:
    If I understand your solution correctly, though, this doesn't really
    help the pathological case for freezing, which is the time-oriented
    append-only table. For data which isn't being used, allvisible won't be
    set either because it won't have been read, no? Is it still cheaper to
    set allvisible than vacuum freeze even in that case?
    all visible is only set in vacuum and it determines which parts of a
    table will be scanned in a non full table vacuum. So, since we won't
    regularly start vacuum in the insert only case there will still be a
    batch of work at once. But nearly all of that work is *already*
    performed. We would just what the details of that around for a
    bit. *But* since we now would only need to vacuum the non all-visible
    part that would get noticeably cheaper as well.

    I think for that case we should run vacuum more regularly for insert
    only tables since we currently don't do regularly enough which a) increases
    the amount of work needed at once and b) prevents index only scans from
    working there.

    Greetings,

    Andres Freund

    --
      Andres Freund http://www.2ndQuadrant.com/
      PostgreSQL Development, 24x7 Support, Training & Services
  • Josh Berkus at May 25, 2013 at 10:37 pm
    Andres,
    all visible is only set in vacuum and it determines which parts of a
    table will be scanned in a non full table vacuum. So, since we won't
    regularly start vacuum in the insert only case there will still be a
    batch of work at once. But nearly all of that work is *already*
    performed. We would just what the details of that around for a
    bit. *But* since we now would only need to vacuum the non all-visible
    part that would get noticeably cheaper as well.
    Yeah, I can see that. Seems worthwhile, then.
    I think for that case we should run vacuum more regularly for insert
    only tables since we currently don't do regularly enough which a) increases
    the amount of work needed at once and b) prevents index only scans from
    working there.
    Yes. I'm not sure how we would set this though; I think it's another
    example of how autovacuum's parameters for when to vaccuum etc. are too
    simple-minded for the real world. Doing an all-visible scan on an
    insert-only table, for example, should be based on XID age and not on %
    inserted, no?

    Speaking of which, I need to get on revamping the math for autoanalyze.

    Mind you, in the real-world insert-only table case, this does create
    extra IO -- real insert-only tables often have a few rows ( < 5% ) which
    are updated/deleted. Vacuum would see these and want to clean the pages
    up, which would create much more substantial IO. It might still be a
    good tradeoff, but we should be aware of it.

    Unless we want a special VACUUM ALL VISIBLE mode. I vote no, unless we
    demonstrate some really convincing case for it.

    --
    Josh Berkus
    PostgreSQL Experts Inc.
    http://pgexperts.com
  • Josh Berkus at May 27, 2013 at 2:22 am
    Andres,

    I was talking this over with Jeff on the plane, and we wanted to be
    clear on your goals here: are you looking to eliminate the *write* cost
    of freezing, or just the *read* cost of re-reading already frozen pages?

    If just the latter, what about just adding a bit to the visibility map
    to indicate that the page is frozen? That seems simpler than what
    you're proposing.

    --
    Josh Berkus
    PostgreSQL Experts Inc.
    http://pgexperts.com
  • Andres Freund at May 28, 2013 at 2:18 pm

    On 2013-05-26 16:58:58 -0700, Josh Berkus wrote:
    I was talking this over with Jeff on the plane, and we wanted to be
    clear on your goals here: are you looking to eliminate the *write* cost
    of freezing, or just the *read* cost of re-reading already frozen pages?
    Both. The latter is what I have seen causing more hurt, but the former
    alone is painful enough.

    Greetings,

    Andres Freund

    --
      Andres Freund http://www.2ndQuadrant.com/
      PostgreSQL Development, 24x7 Support, Training & Services
  • Josh Berkus at May 28, 2013 at 11:11 pm

    On 05/28/2013 07:17 AM, Andres Freund wrote:
    On 2013-05-26 16:58:58 -0700, Josh Berkus wrote:
    I was talking this over with Jeff on the plane, and we wanted to be
    clear on your goals here: are you looking to eliminate the *write* cost
    of freezing, or just the *read* cost of re-reading already frozen pages?
    Both. The latter is what I have seen causing more hurt, but the former
    alone is painful enough.
    I guess I don't see how your proposal is reducing the write cost for
    most users then?

    - for users with frequently, randomly updated data, pdallvisible would
    not be ever set, so they still need to be rewritten to freeze
    - for users with append-only tables, allvisible would never be set since
    those pages don't get vacuumed
    - it would prevent us from getting rid of allvisible, which has a
    documented and known write overhead

    This means that your optimization would benefit only users whose pages
    get updated occasionally (enough to trigger vaccuum) but not too
    frequently (which would unset allvisible). While we lack statistics,
    intuition suggests that this is a minority of databases.

    If we just wanted to reduce read cost, why not just take a simpler
    approach and give the visibility map a "isfrozen" bit? Then we'd know
    which pages didn't need rescanning without nearly as much complexity.
    That would also make it more effective to do precautionary vacuum freezing.

    --
    Josh Berkus
    PostgreSQL Experts Inc.
    http://pgexperts.com
  • Andres Freund at May 28, 2013 at 11:22 pm

    On 2013-05-28 09:29:26 -0700, Josh Berkus wrote:
    On 05/28/2013 07:17 AM, Andres Freund wrote:
    On 2013-05-26 16:58:58 -0700, Josh Berkus wrote:
    I was talking this over with Jeff on the plane, and we wanted to be
    clear on your goals here: are you looking to eliminate the *write* cost
    of freezing, or just the *read* cost of re-reading already frozen pages?
    Both. The latter is what I have seen causing more hurt, but the former
    alone is painful enough.
    I guess I don't see how your proposal is reducing the write cost for
    most users then?

    - for users with frequently, randomly updated data, pdallvisible would
    not be ever set, so they still need to be rewritten to freeze
    If they update all data they simply never need to get frozen since they
    are not old enough.
    - for users with append-only tables, allvisible would never be set since
    those pages don't get vacuumed
    They do get vacuumed at least every autovacuum_freeze_max_age even
    now. And we should vacuum them more often to make index only scan work
    without manual intervention.
    - it would prevent us from getting rid of allvisible, which has a
    documented and known write overhead Aha.
    This means that your optimization would benefit only users whose pages
    get updated occasionally (enough to trigger vaccuum) but not too
    frequently (which would unset allvisible). While we lack statistics,
    intuition suggests that this is a minority of databases.
    I don't think that follows.
    If we just wanted to reduce read cost, why not just take a simpler
    approach and give the visibility map a "isfrozen" bit? Then we'd know
    which pages didn't need rescanning without nearly as much complexity.
    That would also make it more effective to do precautionary vacuum freezing.
    Because we would still write/dirty/xlog the changes three times?

    Greetings,

    Andres Freund

    --
      Andres Freund http://www.2ndQuadrant.com/
      PostgreSQL Development, 24x7 Support, Training & Services
  • Robert Haas at May 28, 2013 at 11:52 pm

    On Tue, May 28, 2013 at 12:29 PM, Josh Berkus wrote:
    On 05/28/2013 07:17 AM, Andres Freund wrote:
    On 2013-05-26 16:58:58 -0700, Josh Berkus wrote:
    I was talking this over with Jeff on the plane, and we wanted to be
    clear on your goals here: are you looking to eliminate the *write* cost
    of freezing, or just the *read* cost of re-reading already frozen pages?
    Both. The latter is what I have seen causing more hurt, but the former
    alone is painful enough.
    I guess I don't see how your proposal is reducing the write cost for
    most users then?

    - for users with frequently, randomly updated data, pdallvisible would
    not be ever set, so they still need to be rewritten to freeze
    Do these users never run vacuum? As of 9.3, vacuum phase 2 will
    typically set PD_ALL_VISIBLE on each relevant page. The only time
    that this WON'T happen is if an insert, update, or delete hits the
    page after phases 1 of vacuum and before phase 2 of vacuum. I don't
    think that's going to be the common case.
    - for users with append-only tables, allvisible would never be set since
    those pages don't get vacuumed
    There's no good solution for append-only tables. Eventually, they
    will get vacuumed, and when that happens, PD_ALL_VISIBLE will be set,
    and freezing will also happen. I don't think anything that is being
    proposed here is going to make that a whole lot better, but it
    shouldn't make it any worse than it is now, either. Since it's
    probably not solvable without a rewrite of the heap AM, I'm not going
    to feel too bad about that.
    - it would prevent us from getting rid of allvisible, which has a
    documented and known write overhead
    Again, I think this is going to be much less of an issue with 9.3, for
    the reason explained above. In 9.2 and prior, we'd scan a page with
    dead tuples, prune them to line pointers, vacuum the indexes, and then
    mark the dead pointers as unused. Then, the NEXT vacuum would revisit
    the same page and dirty it again ONLY to mark it all-visible. But in
    9.3, the first vacuum will mark the page all-visible at the same time
    it marks the dead line pointers unused. So the write overhead of
    PD_ALL_VISIBLE should basically be gone. If it's not, it would be
    good to know why.
    If we just wanted to reduce read cost, why not just take a simpler
    approach and give the visibility map a "isfrozen" bit? Then we'd know
    which pages didn't need rescanning without nearly as much complexity.
    That would break pg_upgrade, which would have to remove visibility map
    forks when upgrading. More importantly, it would require another
    round of complex changes to the write-ahead logging in this area.
    It's not obvious to me that we'd end up ahead of where we are today,
    although perhaps I am a pessimist.
    That would also make it more effective to do precautionary vacuum freezing.
    But wouldn't it be a whole lot nicer if we just didn't have to do
    vacuum freezing AT ALL? The point here is to absorb freezing into
    some other operation that we already have to do.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Jeff Davis at May 29, 2013 at 5:11 pm

    On Tue, 2013-05-28 at 19:51 -0400, Robert Haas wrote:
    If we just wanted to reduce read cost, why not just take a simpler
    approach and give the visibility map a "isfrozen" bit? Then we'd know
    which pages didn't need rescanning without nearly as much complexity.
    That would break pg_upgrade, which would have to remove visibility map
    forks when upgrading. More importantly, it would require another
    round of complex changes to the write-ahead logging in this area.
    It's not obvious to me that we'd end up ahead of where we are today,
    although perhaps I am a pessimist.
    If we removed PD_ALL_VISIBLE, then this would be very simple, right? We
    would just follow normal logging rules for setting the visible or frozen
    bit.

    Regards,
      Jeff Davis
  • Jeff Davis at May 29, 2013 at 5:18 pm

    On Tue, 2013-05-28 at 09:29 -0700, Josh Berkus wrote:
    - it would prevent us from getting rid of allvisible, which has a
    documented and known write overhead
    It would? I don't think these proposals are necessarily in conflict.
    It's not entirely clear to me how they fit together in detail, but it
    seems like it may be possible -- it may even simplify things.

    Regards,
      Jeff Davis
  • Jim Nasby at May 24, 2013 at 4:53 pm

    On 5/24/13 9:53 AM, Andres Freund wrote:
    We don't even necessarily need to log the hint bits for all items since
    the redo for all_visible could make sure all items are hinted. The only
    problem is knowing up to where we can truncate pg_clog...
    [all-visible cannot restore hint bits without FPI because of torn pages]
    I haven't yet thought about this sufficiently yet. I think we might have
    a chance of working around this, let me ponder a bit.

    But even if that means needing a full page write via the usual mechanism
    for all visible if any hint bits needed to be set we are still out far
    ahead of the current state imo.
    * cleanup would quite possibly do an FPI shortly after in vacuum
    anyway. If we do it for all visible, it possibly does not need to be
    done for it.
    * freezing would FPI almost guaranteedly since we do it so much
    later.
    * Not having to rescan the whole heap will be a bigger cost saving...
    Would we only set all the hint bits within vacuum? If so I don't think the WAL hit matters at all, because vacuum is almost always a background, throttled process.
    --
    Jim C. Nasby, Data Architect jim@nasby.net
    512.569.9461 (cell) http://jim.nasby.net

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedMay 23, '13 at 5:51p
activeJun 1, '13 at 5:56p
posts26
users8
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase