FAQ
Hi,

There are some releases that autovacuum was enabled by default and, up to now
there is an easy way to estimate the number of autovacuum workers. I tune it
observing if the number of slots are saturated for a period of time. I'm
having a hard time trying to do this. I want to add a LOG message such as

LOG: maximum number of autovacuum workers reached
HINT: Consider increasing autovacuum_max_workers (currently 5).

And also a view (say pg_stat_autovacuum) to expose some autovacuum information
such as (i) number of autovacuum workers (ii) number of tables that needs
analyze/vacuum and are scheduled to (iii) number of autovacuum count (iv)
number of autoanalyze count. While I am in this topic, it would be nice to
expose the analyze/vacuum count and threshold per table. This information
should go to pg_stat_*_tables but it already has too much fields. Maybe it is
time to split autovacuum information into another statistic view?

Comments?


--
Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Search Discussions

  • Magnus Hagander at Jun 8, 2011 at 10:10 pm

    On Jun 9, 2011 12:01 AM, "Euler Taveira de Oliveira" wrote:
    Hi,

    There are some releases that autovacuum was enabled by default and, up to
    now there is an easy way to estimate the number of autovacuum workers. I
    tune it observing if the number of slots are saturated for a period of time.
    I'm having a hard time trying to do this. I want to add a LOG message such
    as
    LOG: maximum number of autovacuum workers reached
    HINT: Consider increasing autovacuum_max_workers (currently 5).
    That would be very useful.
    And also a view (say pg_stat_autovacuum) to expose some autovacuum
    information such as (i) number of autovacuum workers (ii) number of tables
    that needs analyze/vacuum and are scheduled to (iii) number of

    Part of that is on my personal todo already, so I'd be happy to review that
    :)
    autovacuum count (iv) number of autoanalyze count. While I am in this
    topic, it would be nice to expose the analyze/vacuum count and threshold per
    table. This information should go to pg_stat_*_tables but it already has too
    much fields. Maybe it is time to split autovacuum information into another
    statistic view?

    That is configuration information and not statistics, so IMHO it does not
    belong in pg_stat_*.

    And if relation parameters are to be exposed more than they are now it
    should be done for all, not just autovacuum.

    /Magnus
  • Robert Haas at Jun 8, 2011 at 11:35 pm

    On Wed, Jun 8, 2011 at 5:54 PM, Euler Taveira de Oliveira wrote:
    LOG: maximum number of autovacuum workers reached
    HINT: Consider increasing autovacuum_max_workers (currently 5).

    Comments?
    Is the hint correct? I mean, what if there were 100 small tables that
    needed vacuuming all at the same time. We'd hit this limit no matter
    how high you set autovacuum_max_workers, but it wouldn't be right to
    set it to 101 just because every once in a blue moon you might trip
    over the limit.

    I think it'd be really useful to expose some more data in this area
    though. One random idea is - remember the time at which a table was
    first observed to need vacuuming. Clear the timestamp when it gets
    vacuumed. Then you can do:

    SELECT blahblah FROM wumpity WHERE
    time_at_which_we_first_noticed_it_needed_vacuuming < now() - '1
    hour'::interval;

    ...or something of the sort. That way you can alert if autovacuum
    starts to fall too far behind, but you get to pick the definition of
    "too far behind".

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Tom Lane at Jun 9, 2011 at 1:50 am

    Robert Haas writes:
    I think it'd be really useful to expose some more data in this area
    though. One random idea is - remember the time at which a table was
    first observed to need vacuuming. Clear the timestamp when it gets
    vacuumed. Then you can do:
    As far as I recall that logic, there is no delay between when we know
    that a table needs vacuumed and when we do it. I don't see the point of
    introducing any such delay, either.

    regards, tom lane
  • Robert Haas at Jun 9, 2011 at 2:15 am

    On Wed, Jun 8, 2011 at 9:50 PM, Tom Lane wrote:
    Robert Haas <robertmhaas@gmail.com> writes:
    I think it'd be really useful to expose some more data in this area
    though.  One random idea is - remember the time at which a table was
    first observed to need vacuuming. Clear the timestamp when it gets
    vacuumed.  Then you can do:
    As far as I recall that logic, there is no delay between when we know
    that a table needs vacuumed and when we do it.  I don't see the point of
    introducing any such delay, either.
    Well, if there are more tables that need vacuuming than there are
    workers available at any given time, there will be a delay. We
    probably don't keep track of that delay at present, but we could.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Greg Smith at Jun 9, 2011 at 3:53 pm

    Robert Haas wrote:
    Well, if there are more tables that need vacuuming than there are
    workers available at any given time, there will be a delay. We
    probably don't keep track of that delay at present, but we could.
    There are at least four interesting numbers to collect each time
    autovacuum runs:

    1) This one, when was the threshold crossed. I believe one of the AV
    workers would have to pause periodically to update these if they're all
    busy doing work.
    2) What time did the last autovacuum start at
    3) How many dead rows were there at the point when it started
    4) When did the last autovacuum end (currently the only value stored)

    There may be a 5th piece of state I haven't looked at yet worth
    exposing/saving, something related to how much work was skipped by the
    partial vacuum logic introduced in 8.4. I haven't looked at that code
    enough to know which is the right metric to measure its effectiveness
    by, but I have tis gut feel it's eventually going to be critical for
    distinguishing between the various common types of vacuum-heavy
    workloads that show up.

    All of these need to be stored in a system table/view, so that an admin
    can run a query to answer questions like:

    -What is AV doing right now?
    -How far behind is AV on tables it needs to clean but hasn't even
    started on?
    -How long is the average AV taking on my big tables?
    -As I change the AV parameters, what does it do to the runtimes against
    my big tables?

    As someone who is found by a lot of people whose problems revolve around
    databases with heavy writes or update churn, limitations in the current
    state of tracking what autovacuum does have been moving way up my
    priority list the last year. I now have someone who is always running
    autovacuum on the same table, 24x7. It finishes every two days, and
    when it does the 20% threshold is already crossed for it to start
    again. The "wait until a worker was available" problem isn't there, but
    I need a good wasy to track all of the other three things to have a hope
    of improving their situation. Right now getting the data I could use
    takes parsing log file output and periodic dumps of pg_stat_user_tables,
    then stitching the whole mess together.

    You can't run a heavily updated database in the TB+ range and make sense
    of what autovacuum is doing without a large effort matching output from
    log_autovacuum_min_duration and the stats that are visible in
    pg_stat_user_tables. It must get easier than that to support the sort
    of bigger tables it's possible to build now. And if this data starts
    getting tracked, we can start to move toward AV parameters that are
    actually aiming at real-world units, too.

    --
    Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
    PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
  • Bernd Helmle at Jun 9, 2011 at 8:49 pm

    --On 9. Juni 2011 11:53:22 -0400 Greg Smith wrote:

    There are at least four interesting numbers to collect each time autovacuum
    runs:

    1) This one, when was the threshold crossed. I believe one of the AV workers
    would have to pause periodically to update these if they're all busy doing
    work.
    2) What time did the last autovacuum start at
    3) How many dead rows were there at the point when it started
    4) When did the last autovacuum end (currently the only value stored)
    This is what i've found me thinking about repeatingly in the past, too. I'd go
    further and expose the info or details issued by VACUUM VERBOSE into the view,
    too, at least the number of pages visited and cleaned (or dead but not yet
    cleaned). Customers are heavily interested in these numbers and i've found
    pgfouine to provide those numbers very useful. To have this information in a
    view would make monitoring infrastructure for this much easier (depending how
    easy or expensive tracking of those informations is, i didn't look into the
    code yet).

    --
    Thanks

    Bernd
  • Greg Smith at Jun 9, 2011 at 9:36 pm

    On 06/09/2011 04:43 PM, Bernd Helmle wrote:
    I'd go further and expose the info or details issued by VACUUM VERBOSE
    into the view, too, at least the number of pages visited and cleaned
    (or dead but not yet cleaned). Customers are heavily interested in
    these numbers and i've found pgfouine to provide those numbers very
    useful.
    Agreed there. The fact that VACUUM VERBOSE reports them suggests
    they're not too terribly difficult to track either.

    What we'd probably need to do with those is handle them like the other
    stats in the system: store a total number for visited/cleaned/dead for
    each relation, then increment the total as each vacuum finishes. That
    way, you could point a standard monitoring system at it and see trends.
    Just saving the last snapshot of data there isn't as useful.

    I'm seeing these as being like the counters in pg_stat_bgwriter; while
    it's easy to think of VACUUM "what work happened?" data as info you just
    want the last snapshot of, a continuous incrementing counter can do that
    and a lot of other things too. Anyone who is extracting useful data
    from pg_stat_bgwriter can use the same logic to track this data, even if
    it only moves forward in big chunks as vacuum completes. And it may be
    feasible to update it in the middle, too.

    Stepping into implementation for a second, the stats that are showing up
    in pg_stat_user_tables are being driven by a PgStat_MsgVacuum message
    coming out of the stats collector when it finishes. While that's the
    obvious place to put some more stuff, that's not necessarily the right
    way to build a better monitoring infrastructure. Two things to consider:

    -It's not really aimed at being called multiple times for one operation
    ("needs vacuum", "started vacuum", "finished vacuum"
    -There is a mix of things that make sense as long-term counters and
    things that update as snapshots--the timestamps are the main thing there.

    I haven't thought about it enough to have a real opinion on whether you
    can squeeze everything into the existing message by adding more fields,
    or if another type of message is necessary. Just pointing out that it's
    not trivially obvious which approach is better.

    What is unambiguous is that all this new data is really going to need a
    new view for it, pg_stat_vacuum or something like that. The fields that
    are already in pg_stat_user_tables can stay there as deprecated for a
    while, but this all wants to be in its own new view.

    This would really be a nice medium sized feature that DBAs would love,
    and it would help adoption on big sites. I have some ideas on how to
    get some funding to develop it because I keep running into this, but if
    someone wants to run with the idea I'd be happy to just help instead.

    --
    Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
    PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
  • Tom Lane at Jun 9, 2011 at 9:42 pm

    Greg Smith writes:
    What we'd probably need to do with those is handle them like the other
    stats in the system: store a total number for visited/cleaned/dead for
    each relation, then increment the total as each vacuum finishes.
    As Robert said, we're already seeing scalability problems with the
    pg_stats subsystem. I'm not eager to add a bunch more per-table
    counters, at least not without some prior work to damp down the ensuing
    performance hit.

    regards, tom lane
  • Greg Smith at Jun 9, 2011 at 10:29 pm

    On 06/09/2011 05:41 PM, Tom Lane wrote:
    As Robert said, we're already seeing scalability problems with the
    pg_stats subsystem. I'm not eager to add a bunch more per-table
    counters, at least not without some prior work to damp down the ensuing
    performance hit.
    That's fair. Anyone who is running into the sort of autovacuum issues
    prompting this discussion would happily pay the overhead to get better
    management of that; it's one of the easiest things to justify more
    per-table stats on IMHO. Surely the per-tuple counters are vastly more
    of a problem than these messages could ever be.

    But concerns about stats overload are why I was highlighting issues
    around sending multiple messages per vacuum, and why incremental updates
    as it runs are unlikely to work out. Balancing that trade-off, getting
    enough data to help but not so such the overhead is obnoxious, is the
    non obvious tricky part of the design here.

    --
    Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
    PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
  • Tom Lane at Jun 9, 2011 at 10:37 pm

    Greg Smith writes:
    That's fair. Anyone who is running into the sort of autovacuum issues
    prompting this discussion would happily pay the overhead to get better
    management of that; it's one of the easiest things to justify more
    per-table stats on IMHO. Surely the per-tuple counters are vastly more
    of a problem than these messages could ever be.
    No, it's the total number of counters that I'm concerned about, not
    so much when they get updated.

    regards, tom lane
  • Alvaro Herrera at Jun 9, 2011 at 4:25 pm

    Excerpts from Tom Lane's message of mié jun 08 21:50:22 -0400 2011:
    Robert Haas <robertmhaas@gmail.com> writes:
    I think it'd be really useful to expose some more data in this area
    though. One random idea is - remember the time at which a table was
    first observed to need vacuuming. Clear the timestamp when it gets
    vacuumed. Then you can do:
    As far as I recall that logic, there is no delay between when we know
    that a table needs vacuumed and when we do it. I don't see the point of
    introducing any such delay, either.
    Autovacuum checks each table twice. When it first connects to a
    database it grabs a complete list of relations needing vacuum. Then it
    starts vacuuming, and before processing each relation, it rechecks.

    So there *is* a delay, which corresponds to how long it took to process
    the tables that preceded it in the list. Robert's suggestion would seem
    to make sense. I'm not sure how to implement it: do we want some more
    (highly volatile) data points in pgstat? Do we need some other
    mechanism? This seems like a use case for pg_class_nt (see
    http://archives.postgresql.org/pgsql-patches/2006-06/msg00114.php)


    In any case, given the "rebalancing" feature of vacuum_cost_delay (which
    increases the delay the more workers there are), the only "solution" to
    the problem of falling behind is reducing the delay parameter. If you
    just add more workers, they start working more slowly.

    --
    Álvaro Herrera <alvherre@commandprompt.com>
    The PostgreSQL Company - Command Prompt, Inc.
    PostgreSQL Replication, Consulting, Custom Development, 24x7 support
  • Tom Lane at Jun 9, 2011 at 4:40 pm

    Alvaro Herrera writes:
    In any case, given the "rebalancing" feature of vacuum_cost_delay (which
    increases the delay the more workers there are), the only "solution" to
    the problem of falling behind is reducing the delay parameter. If you
    just add more workers, they start working more slowly.
    Yeah. Note also that if you're not running a pretty recent minor
    release, you're exposed to this bug:

    Author: Tom Lane <tgl@sss.pgh.pa.us>
    Branch: master [b58c25055] 2010-11-19 22:28:20 -0500
    Branch: REL9_0_STABLE Release: REL9_0_2 [b5efc0940] 2010-11-19 22:28:25 -0500
    Branch: REL8_4_STABLE Release: REL8_4_6 [fab2af30d] 2010-11-19 22:28:30 -0500
    Branch: REL8_3_STABLE Release: REL8_3_13 [6cb9d5113] 2010-11-19 22:28:35 -0500

    Fix leakage of cost_limit when multiple autovacuum workers are active.

    When using default autovacuum_vac_cost_limit, autovac_balance_cost relied
    on VacuumCostLimit to contain the correct global value ... but after the
    first time through in a particular worker process, it didn't, because we'd
    trashed it in previous iterations. Depending on the state of other autovac
    workers, this could result in a steady reduction of the effective
    cost_limit setting as a particular worker processed more and more tables,
    causing it to go slower and slower. Spotted by Simon Poole (bug #5759).
    Fix by saving and restoring the GUC variables in the loop in do_autovacuum.

    In passing, improve a few comments.

    Back-patch to 8.3 ... the cost rebalancing code has been buggy since it was
    put in.


    regards, tom lane
  • Robert Haas at Jun 9, 2011 at 4:56 pm

    On Thu, Jun 9, 2011 at 12:25 PM, Alvaro Herrera wrote:
    Excerpts from Tom Lane's message of mié jun 08 21:50:22 -0400 2011:
    Robert Haas <robertmhaas@gmail.com> writes:
    I think it'd be really useful to expose some more data in this area
    though.  One random idea is - remember the time at which a table was
    first observed to need vacuuming. Clear the timestamp when it gets
    vacuumed.  Then you can do:
    As far as I recall that logic, there is no delay between when we know
    that a table needs vacuumed and when we do it.  I don't see the point of
    introducing any such delay, either.
    Autovacuum checks each table twice.  When it first connects to a
    database it grabs a complete list of relations needing vacuum.  Then it
    starts vacuuming, and before processing each relation, it rechecks.

    So there *is* a delay, which corresponds to how long it took to process
    the tables that preceded it in the list.
    There's that, plus as of 9.1 autovacuum will skip tables that it can't
    immediately take ShareUpdateExclusiveLock on (except in case of
    impending wraparound). So in the worst case we might have tables left
    over from a previous run of the autovacuum worker that never got
    processed. And, of course, in any version, it's also possible for
    autovacuum to get booted off the table by a conflicting lock request
    that arrives midway through; the system might get shut down in the
    middle of the run; autovacuum might error out in medias res; etc.
    Robert's suggestion would seem
    to make sense.  I'm not sure how to implement it: do we want some more
    (highly volatile) data points in pgstat?  Do we need some other
    mechanism?  This seems like a use case for pg_class_nt (see
    http://archives.postgresql.org/pgsql-patches/2006-06/msg00114.php)
    I haven't looked at the implementation, but I like the concept. It's
    tempting to just shove everything in pgstat, but we already have
    scalability limits there.
    In any case, given the "rebalancing" feature of vacuum_cost_delay (which
    increases the delay the more workers there are), the only "solution" to
    the problem of falling behind is reducing the delay parameter.  If you
    just add more workers, they start working more slowly.
    Unless, of course, you already have the delay set to zero, in which
    case starting more workers might be all you can do. The case where
    this really matters is with big tables, I think. If you have #(big
    tables) > #(autovacuum workers), it seems like you could end up in a
    situation where the big tables pin down all the workers and no small
    tables can get vacuumed for a long time. Perhaps we can tune the
    algorithm to compensate for that in some way, but I think we need more
    raw data to work with first.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Euler Taveira de Oliveira at Jun 9, 2011 at 3:02 am

    Em 08-06-2011 20:35, Robert Haas escreveu:
    Is the hint correct? I mean, what if there were 100 small tables that
    needed vacuuming all at the same time. We'd hit this limit no matter
    how high you set autovacuum_max_workers, but it wouldn't be right to
    set it to 101 just because every once in a blue moon you might trip
    over the limit.
    I think so. You are picturing a scene with only one message. It is the same
    case of the too-frequent-checkpoint messages; i.e., you should look if those
    messages have some periodicity.
    I think it'd be really useful to expose some more data in this area
    though. One random idea is - remember the time at which a table was
    first observed to need vacuuming. Clear the timestamp when it gets
    vacuumed. Then you can do:
    Hmmm. But this fine grained information alone doesn't help tuning the number
    of autovacuum workers. I consider counters easier to implement and simpler to
    analyze. But the timestamp idea has its merit because we already have a
    similar statistic (last timestamp table was vacuumed or analyzed).


    --
    Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/
    PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
  • Robert Haas at Jun 9, 2011 at 4:20 am

    On Wed, Jun 8, 2011 at 10:55 PM, Euler Taveira de Oliveira wrote:
    Em 08-06-2011 20:35, Robert Haas escreveu:
    Is the hint correct?  I mean, what if there were 100 small tables that
    needed vacuuming all at the same time.  We'd hit this limit no matter
    how high you set autovacuum_max_workers, but it wouldn't be right to
    set it to 101 just because every once in a blue moon you might trip
    over the limit.
    I think so. You are picturing a scene with only one message. It is the same
    case of the too-frequent-checkpoint messages; i.e., you should look if those
    messages have some periodicity.
    Yeah, maybe. I'm just not sure there would be an easy way for users
    to judge when they should or should not make a change.
    I think it'd be really useful to expose some more data in this area
    though.  One random idea is - remember the time at which a table was
    first observed to need vacuuming. Clear the timestamp when it gets
    vacuumed.  Then you can do:
    Hmmm. But this fine grained information alone doesn't help tuning the number
    of autovacuum workers. I consider counters easier to implement and simpler
    to analyze. But the timestamp idea has its merit because we already have a
    similar statistic (last timestamp table was vacuumed or analyzed).
    Well, it won't directly tell you how many you need. But certainly if
    you see things getting further and further behind, you know you need
    more.

    Or, alternatively, you need to reduce vacuum_cost_delay. IME, that's
    actually the most common cause of this problem.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Bruce Momjian at Oct 14, 2011 at 4:45 pm

    Robert Haas wrote:
    On Wed, Jun 8, 2011 at 10:55 PM, Euler Taveira de Oliveira
    wrote:
    Em 08-06-2011 20:35, Robert Haas escreveu:
    Is the hint correct? ?I mean, what if there were 100 small tables that
    needed vacuuming all at the same time. ?We'd hit this limit no matter
    how high you set autovacuum_max_workers, but it wouldn't be right to
    set it to 101 just because every once in a blue moon you might trip
    over the limit.
    I think so. You are picturing a scene with only one message. It is the same
    case of the too-frequent-checkpoint messages; i.e., you should look if those
    messages have some periodicity.
    Yeah, maybe. I'm just not sure there would be an easy way for users
    to judge when they should or should not make a change.
    I think it'd be really useful to expose some more data in this area
    though. ?One random idea is - remember the time at which a table was
    first observed to need vacuuming. Clear the timestamp when it gets
    vacuumed. ?Then you can do:
    Hmmm. But this fine grained information alone doesn't help tuning the number
    of autovacuum workers. I consider counters easier to implement and simpler
    to analyze. But the timestamp idea has its merit because we already have a
    similar statistic (last timestamp table was vacuumed or analyzed).
    Well, it won't directly tell you how many you need. But certainly if
    you see things getting further and further behind, you know you need
    more.

    Or, alternatively, you need to reduce vacuum_cost_delay. IME, that's
    actually the most common cause of this problem.
    This thread from June died because there was concern about the overhead
    of additional autovacuum statistics, and I have to say I am not
    super-excited about it either because most users will not use them.

    Ideally we would have something like checkpoint_warning that warns users
    in the log when there are too few autovacuum workers and cleanup is
    being delayed.

    The big trick is how to accurately measure this. The amount of time
    that a table waits to be vacuumed probably isn't relevant enough --- it
    might have been days since it was last vacuumed, and waiting 10 minutes
    isn't a big deal, so it is hard to say what _scale_ we would give users
    for that warning that would make sense. We could compare it to the time
    since the last autovacuum, but if the table is suddently heavily
    modified, that doesn't help either.

    I think it has to drive off of the 'n_dead_tuples' statistic value for the
    table. I was toying with the idea of comparing the n_dead_tuples value
    at the time the table is first scanned for autovacuum consideration, and
    the value at the time an autovacuum worker actually starts scanning the
    table.

    The problem there is that if someone does a massive DELETE in that time
    interval, or does an UPDATE on all the rows, it would think that
    autovacuum should have been there to mark some dead rows, but it was
    not. In the case of DELETE, having autovacuum work earlier would not
    have helped, but it would have helped in the UPDATE case.

    We could look at table size growth during that period. If the
    autovacuum had run earlier, we would have used that dead space, but is
    wasn't recorded by autovacuum yet, but again, it seems vague.

    Ideas?

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://enterprisedb.com

    + It's impossible for everything to be true. +
  • Josh Berkus at Oct 14, 2011 at 5:11 pm

    Ideally we would have something like checkpoint_warning that warns users
    in the log when there are too few autovacuum workers and cleanup is
    being delayed.
    I don't think that any table-stats based approach is going to work. I
    think you need to measure the queue of tables which need autovacuuming.
    So you do something like:

    If > 10% of tables and > 10 tables need autovac/autoanalyze for more
    than one polling interval in a row, then emit a warning.

    Note that there are solutions other than adding workers; the user could
    also lower the polling interval, decrease vacuum_delay, or do other
    things to make autovac faster.

    This would require tracking stats about the size of the autovac queue.

    --
    Josh Berkus
    PostgreSQL Experts Inc.
    http://pgexperts.com
  • Robert Haas at Oct 14, 2011 at 5:31 pm

    On Fri, Oct 14, 2011 at 12:59 PM, Josh Berkus wrote:
    Ideally we would have something like checkpoint_warning that warns users
    in the log when there are too few autovacuum workers and cleanup is
    being delayed.
    I don't think that any table-stats based approach is going to work.  I
    think you need to measure the queue of tables which need autovacuuming.
    So you do something like:

    If > 10% of tables and > 10 tables need autovac/autoanalyze for more
    than one polling interval in a row, then emit a warning.

    Note that there are solutions other than adding workers; the user could
    also lower the polling interval, decrease vacuum_delay, or do other
    things to make autovac faster.

    This would require tracking stats about the size of the autovac queue.
    Right. It's my feeling that that's exactly what we need to do. It's
    similar to what we already do for checkpoint spreading, except applied
    to a different system maintenance activity. What would be really neat
    is if we could not just detect the problem, but actually adjust the
    cost delay on the fly to try to fix it - again, like we do with
    checkpoints.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedJun 8, '11 at 10:01p
activeOct 14, '11 at 5:31p
posts19
users9
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase