FAQ

[PostgreSQL-Hackers] visibility map and reltuples

Ned T. Crigler
Dec 13, 2008 at 11:05 pm
It appears that the visibility map patch is causing pg_class.reltuples to be
set improperly after a vacuum. For example, it is set to 0 if the map
indicated that no pages in the heap needed to be scanned.

Perhaps reltuples should not be updated unless every page was scanned during
the vacuum?

--
Ned T. Crigler
reply

Search Discussions

7 responses

  • Heikki Linnakangas at Dec 14, 2008 at 11:15 am

    Ned T. Crigler wrote:
    It appears that the visibility map patch is causing pg_class.reltuples to be
    set improperly after a vacuum. For example, it is set to 0 if the map
    indicated that no pages in the heap needed to be scanned.

    Perhaps reltuples should not be updated unless every page was scanned during
    the vacuum?
    Yeah, vacuum shouldn't overwrite reltuples if it hasn't scanned all pages.

    The interplay of vacuum and analyze in VACUUM ANALYZE needs to be
    changed too. Currently, the analyze after vacuum doesn't overwrite
    reltuples, because the one calculated by vacuum is based on scanning all
    pages, and is thus more accurate than the one estimated from the sample
    (which is not true anymore, as you pointed out). I think the vacuum
    needs to somehow tell analyze whether it updated reltuples or not, so
    that analyze can update reltuples if the vacuum didn't scan all pages.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Heikki Linnakangas at Dec 15, 2008 at 9:01 am

    Heikki Linnakangas wrote:
    Ned T. Crigler wrote:
    It appears that the visibility map patch is causing pg_class.reltuples
    to be
    set improperly after a vacuum. For example, it is set to 0 if the map
    indicated that no pages in the heap needed to be scanned.

    Perhaps reltuples should not be updated unless every page was scanned
    during
    the vacuum?
    Yeah, vacuum shouldn't overwrite reltuples if it hasn't scanned all pages.
    Because we use reltuples divided by relpages in the planner, we probably
    shouldn't update relpages either if we don't update reltuples.
    Otherwise, if the table has grown a lot since we last updated reltuples,
    the reltuples / relpages ratio would be less, not more, accurate, if
    relpages is updated to a new higher value but reltuples is not.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Greg Stark at Dec 15, 2008 at 1:18 pm
    I wonder if we should switch to keeping reltuplesperpage instead. Then
    a partial vacuum could update it by taking the average number of
    tuples per page forbthe pages it saw. Perhaps adjusting it to the
    weights average between the old value and the new value based on how
    many pages were seen.

    I suppose there's no reason we can't update reltuples using that same
    logic though it would be a big opaque.

    --
    Greg

    On 15 Dec 2008, at 04:01, Heikki Linnakangas wrote:

    Heikki Linnakangas wrote:
    Ned T. Crigler wrote:
    It appears that the visibility map patch is causing
    pg_class.reltuples to be
    set improperly after a vacuum. For example, it is set to 0 if the
    map
    indicated that no pages in the heap needed to be scanned.

    Perhaps reltuples should not be updated unless every page was
    scanned during
    the vacuum?
    Yeah, vacuum shouldn't overwrite reltuples if it hasn't scanned all
    pages.
    Because we use reltuples divided by relpages in the planner, we
    probably shouldn't update relpages either if we don't update
    reltuples. Otherwise, if the table has grown a lot since we last
    updated reltuples, the reltuples / relpages ratio would be less, not
    more, accurate, if relpages is updated to a new higher value but
    reltuples is not.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com

    --
    Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-hackers
  • Heikki Linnakangas at Dec 15, 2008 at 1:24 pm

    Greg Stark wrote:
    I wonder if we should switch to keeping reltuplesperpage instead. Then a
    partial vacuum could update it by taking the average number of tuples
    per page forbthe pages it saw. Perhaps adjusting it to the weights
    average between the old value and the new value based on how many pages
    were seen.
    The pages scanned by a partial vacuum isn't a random sample of pages in
    the table. That would bias the reltuplesperpage value towards those
    pages that are updated more.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Tom Lane at Dec 15, 2008 at 3:56 pm

    Heikki Linnakangas writes:
    Greg Stark wrote:
    I wonder if we should switch to keeping reltuplesperpage instead. Then a
    partial vacuum could update it by taking the average number of tuples
    per page forbthe pages it saw. Perhaps adjusting it to the weights
    average between the old value and the new value based on how many pages
    were seen.
    The pages scanned by a partial vacuum isn't a random sample of pages in
    the table. That would bias the reltuplesperpage value towards those
    pages that are updated more.
    Yeah ... and it's highly likely that repeatedly-updated pages would have
    more dead space than never-updated ones, so there'd be a systematic
    creep towards underestimation of the total tuple count.

    I think your previous sketch is right: suppress update of reltuples (and
    relpages) from a partial vacuum scan, and ensure that the analyze phase
    is allowed to do it instead if it happens during VACUUM ANALYZE.

    regards, tom lane
  • Heikki Linnakangas at Dec 17, 2008 at 9:16 am

    Tom Lane wrote:
    I think your previous sketch is right: suppress update of reltuples (and
    relpages) from a partial vacuum scan, and ensure that the analyze phase
    is allowed to do it instead if it happens during VACUUM ANALYZE.
    We also mustn't reset n_live_tuples in pgstat in partial vacuum.
    Committed a patch to do that.

    Thanks for the report!

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Greg Smith at Dec 15, 2008 at 6:00 pm

    On Mon, 15 Dec 2008, Greg Stark wrote:

    I wonder if we should switch to keeping reltuplesperpage instead.
    It would be preferrable to not touch the user side of reltuples if
    possible, since it's the only instant way to get a good estimate of the
    number of rows in a table right now. That's been a regular application
    technique for at least two years now, since
    http://www.varlena.com/GeneralBits/120.php popularized it.

    --
    * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Related Discussions