FAQ
I am trying to identify tables with significant diskspace "leakage" due to
in appropriately low max_fsm_pages settings. I can see the results of
VACUUM ANALYZE VERBOSE output counts of tuples and unused tuples, and
understand that (1 - (tuples/unused)) is the amount of diskspace available
to be reclaimed with a VACUUM FULL or dump/reload.

Is there a way to identify the numbers of unused tuples without performing a
VACUUM? Is it stored in a system table anywhere? Other ideas on how to
identify disk bloat short of forcing downtime?

TIA.

Search Discussions

  • Jeffrey W. Baker at May 14, 2004 at 5:47 pm

    On Fri, 2004-05-14 at 10:10, Ed L. wrote:
    I am trying to identify tables with significant diskspace "leakage" due to
    in appropriately low max_fsm_pages settings. I can see the results of
    VACUUM ANALYZE VERBOSE output counts of tuples and unused tuples, and
    understand that (1 - (tuples/unused)) is the amount of diskspace available
    to be reclaimed with a VACUUM FULL or dump/reload.

    Is there a way to identify the numbers of unused tuples without performing a
    VACUUM? Is it stored in a system table anywhere? Other ideas on how to
    identify disk bloat short of forcing downtime?
    You can calculate the number of bytes per row, multiply by the number of
    live tuples (count(1) from table), and subtract that from the actual #
    of bytes in the on-disk representation. The difference is wasted space.

    -jwb
  • Ed L. at May 14, 2004 at 10:54 pm

    On Friday May 14 2004 11:47, Jeffrey W. Baker wrote:
    Is there a way to identify the numbers of unused tuples without
    performing a VACUUM? Is it stored in a system table anywhere? Other
    ideas on how to identify disk bloat short of forcing downtime?
    You can calculate the number of bytes per row, multiply by the number of
    live tuples (count(1) from table), and subtract that from the actual #
    of bytes in the on-disk representation. The difference is wasted space.
    That works, but with umpteen clusters to manage, I'm really hoping for a
    SQL-based check so it can be done remotely and non-interactively. Maybe it
    is too much to keep track of, but it would be cool if VACUUM updated a
    system table with the same info it spits out during verbose mode. That
    would be very helpful in auto-identifying leakage and also a recent case
    where the cpu:real time ratio during vacuum went thru the roof due to I/O
    overload from leakage.
  • Ed L. at May 16, 2004 at 5:41 pm
    Here's an attempt at a query to estimate diskspace leakage. This
    leakage might occur when max_fsm_pages and/or max_fsm_relations are
    set too low. Not sure which of the two approaches below (leak1 or
    leak2) is more accurate? Is there a better way via SQL?

    The query uses the 'dbsize' project from contrib. Dbsize has a
    function called relation_size() which performs a 'stat' to get
    actual disk usage for a database and/or table. I use the column
    pg_class.reltuples instead of actually counting rows because I
    suspect that would essentially flush our OS cache of useful pages,
    degrading performance. This query assumes you're keeping stats
    updated.

    SELECT c.relname,
    SUM(s.avg_width) as width,
    CAST(c.reltuples as BIGINT) AS tuples,
    CAST(SUM(s.avg_width) * c.reltuples/1048576 AS INTEGER) AS tupdu,
    c.relpages AS pages,
    CAST(c.relpages * 8192 / 1048576 AS INTEGER) AS pgdu,
    relation_size(s.tablename)/1048576 AS reldu,
    CAST((relation_size(s.tablename)
    - SUM(s.avg_width) * c.reltuples)/1048576 AS INTEGER) AS leak1,
    CAST((relation_size(s.tablename)
    - c.relpages * 8192) / 1048576 AS INTEGER) AS leak2
    FROM pg_stats s, pg_class c
    WHERE c.relname NOT LIKE 'pg_%'
    AND c.relname = s.tablename
    GROUP BY c.oid, s.tablename, c.reltuples, c.relpages, pgdu
    ORDER BY tupdu;

    relname | width | tuples | tupdu | pages | pgdu | reldu | leak1 | leak2
    ---------------+-------+---------+-------+--------+------+-------+-------+-------
    table_1766485 | 27 | 198 | 0 | 12 | 0 | 0 | 0 | 0
    table_1766443 | 186 | 0 | 0 | 9317 | 72 | 72 | 73 | 0
    table_1766439 | 83 | 0 | 0 | 10 | 0 | 0 | 0 | 0
    table_1766435 | 27 | 0 | 0 | 0 | 0 | 0 | 0 | 0
    table_1766437 | 30 | 0 | 0 | 0 | 0 | 0 | 0 | 0
    table_1766421 | 23 | 2 | 0 | 1 | 0 | 0 | 0 | 0
    table_1766451 | 30 | 189822 | 5 | 1754 | 13 | 13 | 8 | 0
    table_1766396 | 48 | 278781 | 13 | 3185 | 24 | 24 | 12 | 0
    table_1766391 | 74 | 200826 | 14 | 3271 | 25 | 25 | 11 | 0
    table_1766446 | 36 | 504594 | 17 | 4881 | 38 | 38 | 21 | 0
    table_1766426 | 149 | 2241719 | 319 | 55555 | 434 | 434 | 116 | 0
    table_1766456 | 888 | 390657 | 331 | 637949 | 887 | 4983 | 4653 | 4096
    table_1766399 | 596 | 732708 | 416 | 41876 | 327 | 327 | -89 | 0
    (13 rows)

    The basic column definitions are:

    tupdu(MB) = avg_width * reltuples
    pgdu(MB) = relpages * 8K/page
    reldu(MB) = relation_size(tablename) (src/contrib/dbsize)
    leak1 = reldu - tupdu
    leak2 = reldu - pgdu

    Not sure how we ended up with a couple of cases where the number of
    mb on disk was less than the estimated size; maybe we had some
    deletions after the last update of pg_stats?
  • Ed L. at May 17, 2004 at 6:17 pm
    [reposting...original seems to have been lost in ether...]

    Here's an attempt at a query to estimate diskspace leakage. This
    leakage might occur when max_fsm_pages and/or max_fsm_relations are
    set too low. Not sure which of the two approaches below (leak1 or
    leak2) is more accurate? Is there a better way via SQL?

    The query uses the 'dbsize' project from contrib. Dbsize has a
    function called relation_size() which performs a 'stat' to get
    actual disk usage for a database and/or table. I use the column
    pg_class.reltuples instead of actually counting rows because I
    suspect that would essentially flush our OS cache of useful pages,
    degrading performance. This query assumes you're keeping stats
    updated.

    SELECT c.relname,
    SUM(s.avg_width) as width,
    CAST(c.reltuples as BIGINT) AS tuples,
    CAST(SUM(s.avg_width) * c.reltuples/1048576 AS INTEGER) AS tupdu,
    c.relpages AS pages,
    CAST(c.relpages * 8192 / 1048576 AS INTEGER) AS pgdu,
    relation_size(s.tablename)/1048576 AS reldu,
    CAST((relation_size(s.tablename)
    - SUM(s.avg_width) * c.reltuples)/1048576 AS INTEGER) AS leak1,
    CAST((relation_size(s.tablename)
    - c.relpages * 8192) / 1048576 AS INTEGER) AS leak2
    FROM pg_stats s, pg_class c
    WHERE c.relname NOT LIKE 'pg_%'
    AND c.relname = s.tablename
    GROUP BY c.oid, s.tablename, c.reltuples, c.relpages, pgdu
    ORDER BY tupdu;

    relname | width | tuples | tupdu | pages | pgdu | reldu | leak1 |
    leak2
    ---------------+-------+---------+-------+--------+------+-------+-------+-------
    table_1766485 | 27 | 198 | 0 | 12 | 0 | 0 | 0 |

    table_1766443 | 186 | 0 | 0 | 9317 | 72 | 72 | 73 |

    table_1766439 | 83 | 0 | 0 | 10 | 0 | 0 | 0 |

    table_1766435 | 27 | 0 | 0 | 0 | 0 | 0 | 0 |

    table_1766437 | 30 | 0 | 0 | 0 | 0 | 0 | 0 |

    table_1766421 | 23 | 2 | 0 | 1 | 0 | 0 | 0 |

    table_1766451 | 30 | 189822 | 5 | 1754 | 13 | 13 | 8 |

    table_1766396 | 48 | 278781 | 13 | 3185 | 24 | 24 | 12 |

    table_1766391 | 74 | 200826 | 14 | 3271 | 25 | 25 | 11 |

    table_1766446 | 36 | 504594 | 17 | 4881 | 38 | 38 | 21 |

    table_1766426 | 149 | 2241719 | 319 | 55555 | 434 | 434 | 116 |

    table_1766456 | 888 | 390657 | 331 | 637949 | 887 | 4983 | 4653 |
    4096
    table_1766399 | 596 | 732708 | 416 | 41876 | 327 | 327 | -89 |

    (13 rows)

    The basic column definitions are:

    tupdu(MB) = avg_width * reltuples
    pgdu(MB) = relpages * 8K/page
    reldu(MB) = relation_size(tablename) (src/contrib/dbsize)
    leak1 = reldu - tupdu
    leak2 = reldu - pgdu

    Not sure how we ended up with a couple of cases where the number of
    mb on disk was less than the estimated size; maybe we had some
    deletions after the last update of pg_stats?

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-general @
categoriespostgresql
postedMay 14, '04 at 5:13p
activeMay 17, '04 at 6:17p
posts5
users2
websitepostgresql.org
irc#postgresql

2 users in discussion

Ed L.: 4 posts Jeffrey W. Baker: 1 post

People

Translate

site design / logo © 2022 Grokbase