FAQ

Really really slow select count(*)

Felix
Feb 4, 2011 at 2:46 pm
I am having huge performance problems with a table. Performance deteriorates
every day and I have to run REINDEX and ANALYZE on it every day. auto
vacuum is on. yes, I am reading the other thread about count(*) :)

but obviously I'm doing something wrong here


explain analyze select count(*) from fastadder_fastadderstatus;

Aggregate (cost=62458.73..62458.74 rows=1 width=0) (actual
time=77130.000..77130.000 rows=1 loops=1)
-> Seq Scan on fastadder_fastadderstatus (cost=0.00..61701.18
rows=303018 width=0) (actual time=50.000..76930.000 rows=302479 loops=1)
Total runtime: *77250.000 ms*

directly after REINDEX and ANALYZE:

Aggregate (cost=62348.70..62348.71 rows=1 width=0) (actual
time=15830.000..15830.000 rows=1 loops=1)
-> Seq Scan on fastadder_fastadderstatus (cost=0.00..61613.16
rows=294216 width=0) (actual time=30.000..15570.000 rows=302479 loops=1)
Total runtime: 15830.000 ms

still very bad for a 300k row table

a similar table:

explain analyze select count(*) from fastadder_fastadderstatuslog;

Aggregate (cost=8332.53..8332.54 rows=1 width=0) (actual
time=1270.000..1270.000 rows=1 loops=1)
-> Seq Scan on fastadder_fastadderstatuslog (cost=0.00..7389.02
rows=377402 width=0) (actual time=0.000..910.000 rows=377033 loops=1)
Total runtime: 1270.000 ms


It gets updated quite a bit each day, and this is perhaps the problem.
To me it doesn't seem like that many updates

100-500 rows inserted per day
no deletes

10k-50k updates per day
mostly of this sort: set priority=1 where id=12345

is it perhaps this that is causing the performance problem ?

I could rework the app to be more efficient and do updates using batches
where id IN (1,2,3,4...)

I assume that means a more efficient index update compared to individual
updates.

There is one routine that updates position_in_queue using a lot (too many)
update statements.
Is that likely to be the culprit ?

*What else can I do to investigate ?*


Table
"public.fastadder_fastadderstatus"
Column | Type |
Modifiers
-------------------+--------------------------+------------------------------------------------------------------------
id | integer | not null default
nextval('fastadder_fastadderstatus_id_seq'::regclass)
apt_id | integer | not null
service_id | integer | not null
agent_priority | integer | not null
priority | integer | not null
last_validated | timestamp with time zone |
last_sent | timestamp with time zone |
last_checked | timestamp with time zone |
last_modified | timestamp with time zone | not null
running_status | integer |
validation_status | integer |
position_in_queue | integer |
sent | boolean | not null default false
built | boolean | not null default false
webid_suffix | integer |
build_cache | text |
Indexes:
"fastadder_fastadderstatus_pkey" PRIMARY KEY, btree (id)
"fastadder_fastadderstatus_apt_id_key" UNIQUE, btree (apt_id,
service_id)
"fastadder_fastadderstatus_agent_priority" btree (agent_priority)
"fastadder_fastadderstatus_apt_id" btree (apt_id)
"fastadder_fastadderstatus_built" btree (built)
"fastadder_fastadderstatus_last_checked" btree (last_checked)
"fastadder_fastadderstatus_last_validated" btree (last_validated)
"fastadder_fastadderstatus_position_in_queue" btree (position_in_queue)
"fastadder_fastadderstatus_priority" btree (priority)
"fastadder_fastadderstatus_running_status" btree (running_status)
"fastadder_fastadderstatus_service_id" btree (service_id)
Foreign-key constraints:
"fastadder_fastadderstatus_apt_id_fkey" FOREIGN KEY (apt_id) REFERENCES
nsproperties_apt(id) DEFERRABLE INITIALLY DEFERRED
"fastadder_fastadderstatus_service_id_fkey" FOREIGN KEY (service_id)
REFERENCES fastadder_fastadderservice(id) DEFERRABLE INITIALLY DEFERRED


thanks !
reply

Search Discussions

65 responses

  • Hubert depesz lubaczewski at Feb 4, 2011 at 2:49 pm

    On Fri, Feb 04, 2011 at 03:46:35PM +0100, felix wrote:
    directly after REINDEX and ANALYZE:

    Aggregate (cost=62348.70..62348.71 rows=1 width=0) (actual
    time=15830.000..15830.000 rows=1 loops=1)
    -> Seq Scan on fastadder_fastadderstatus (cost=0.00..61613.16
    rows=294216 width=0) (actual time=30.000..15570.000 rows=302479 loops=1)
    Total runtime: 15830.000 ms
    do run vacuum of the table. reindex doesn't matter for seq scans, and
    analyze, while can help choose different plan - will not help here
    anyway.

    Best regards,

    depesz
  • Greg Smith at Feb 4, 2011 at 2:56 pm

    felix wrote:
    explain analyze select count(*) from fastadder_fastadderstatus;

    Aggregate (cost=62458.73..62458.74 rows=1 width=0) (actual
    time=77130.000..77130.000 rows=1 loops=1)
    -> Seq Scan on fastadder_fastadderstatus (cost=0.00..61701.18
    rows=303018 width=0) (actual time=50.000..76930.000 rows=302479 loops=1)
    Total runtime: *77250.000 ms*
    PostgreSQL version? If you're running on 8.3 or earlier, I would be
    suspicous that your Free Space Map has been overrun.

    What you are seeing is that the table itself is much larger on disk than
    it's supposed to be. That can be caused by frequent UPDATEs if you
    don't have vacuum cleanup working effectively, you'll get lots of dead
    sections left behind from UPDATEs in the middle. The best way to fix
    all this is to run CLUSTER on the table. That will introduce a bit of
    downtime while it holds a lock on the table (only a few minutes based on
    what you've shown here), but the copy you'll have afterwards won't be
    spread all over disk anymore.

    --
    Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
    PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
    "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
  • Shaun Thomas at Feb 4, 2011 at 3:03 pm

    On 02/04/2011 08:56 AM, Greg Smith wrote:

    PostgreSQL version? If you're running on 8.3 or earlier, I would be
    suspicous that your Free Space Map has been overrun.
    That's my first inclination. If he says autovacuum is running, there's
    no way it should be bloating the table that much.

    Felix, If you're running a version before 8.4, what is your
    max_fsm_pages setting? If it's too low, autovacuum won't save you, and
    your tables will continue to grow daily unless you vacuum full
    regularly, and I wouldn't recommend that to my worst enemy. ;)

    --
    Shaun Thomas
    OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
    312-676-8870
    sthomas@peak6.com

    ______________________________________________

    See http://www.peak6.com/email_disclaimer.php
    for terms and conditions related to this email
  • Shaun Thomas at Feb 4, 2011 at 4:35 pm

    On 02/04/2011 10:03 AM, felix wrote:

    max_fsm_pages | 153600 | Sets the
    maximum number of disk pages for which free space is tracked.
    max_fsm_relations | 1000 | Sets the
    maximum number of tables and indexes for which free space is tracked.

    how do I determine the best size or if that's the problem ?
    Well, the best way is to run:

    vacuumdb -a -v -z &>vacuum.log

    And at the end of the log, it'll tell you how many pages it wants, and
    how many pages were available.

    From the sounds of your database, 150k is way too small. If a single
    table is getting 10-50k updates per day, it's a good chance a ton of
    other tables are getting similar traffic. With max_fsm_pages at that
    setting, any update beyond 150k effectively gets forgotten, and
    forgotten rows aren't reused by new inserts or updates.

    Your database has probably been slowly expanding for months without you
    realizing it. The tables that get the most turnover will be hit the
    hardest, as it sounds like what happened here.

    You can stop the bloating by setting the right max_fsm_pages setting,
    but you'll either have to go through and VACUUM FULL every table in your
    database, or dump/restore to regain all the lost space and performance
    (the later would actually be faster). Before I even touch an older
    PostgreSQL DB, I set it to some value over 3-million just as a starting
    value to be on the safe side. A little used memory is a small price to
    pay for stopping gradual expansion.

    Your reindex was a good idea. Indexes do sometimes need that. But your
    base tables need work too. Unless you're on 8.4 or above, auto_vacuum
    isn't enough.

    Just to share an anecdote, I was with a company about five years ago and
    they also used the default max_fsm_pages setting. Their DB had expanded
    to 40GB and was filling their disk, only a couple weeks before
    exhausting it. I set the max_fsm_pages setting to 2-million, set up a
    bunch of scripts to vacuum-full the tables from smallest to largest (to
    make enough space for the larger tables, you see) and the database ended
    up at less than 20GB.

    --
    Shaun Thomas
    OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
    312-676-8870
    sthomas@peak6.com

    ______________________________________________

    See http://www.peak6.com/email_disclaimer.php
    for terms and conditions related to this email
  • Felix at Feb 4, 2011 at 5:38 pm

    On Fri, Feb 4, 2011 at 5:35 PM, Shaun Thomas wrote:

    vacuumdb -a -v -z &>vacuum.log

    And at the end of the log, it'll tell you how many pages it wants, and how
    many pages were available.
    this is the dev, not live. but this is after it gets done with that table:

    CPU 0.00s/0.00u sec elapsed 0.00 sec.
    INFO: analyzing "public.fastadder_fastadderstatus"
    INFO: "fastadder_fastadderstatus": scanned 2492 of 2492 pages, containing
    154378 live rows and 0 dead rows; 30000 rows in sample, 154378 estimated
    total rows

    and there's nothing at the end of the whole vacuum output about pages

    actual command:

    vacuumdb -U postgres -W -v -z djns4 &> vacuum.log

    I tried it with all databases too

    ?

    thanks
  • Scott Marlowe at Feb 4, 2011 at 5:40 pm

    On Fri, Feb 4, 2011 at 10:38 AM, felix wrote:
    On Fri, Feb 4, 2011 at 5:35 PM, Shaun Thomas wrote:


    vacuumdb -a -v -z &>vacuum.log

    And at the end of the log, it'll tell you how many pages it wants, and how
    many pages were available.
    this is the dev, not live. but this is after it gets done with that table:
    CPU 0.00s/0.00u sec elapsed 0.00 sec.
    INFO:  analyzing "public.fastadder_fastadderstatus"
    INFO:  "fastadder_fastadderstatus": scanned 2492 of 2492 pages, containing
    154378 live rows and 0 dead rows; 30000 rows in sample, 154378 estimated
    total rows
    and there's nothing at the end of the whole vacuum output about pages
    actual command:
    vacuumdb -U postgres -W -v -z djns4 &> vacuum.log
    I tried it with all databases too
    I believe you have to run it on the whole db to get that output.
  • Felix at Feb 4, 2011 at 5:45 pm
    vacuumdb -a -v -z -U postgres -W &> vacuum.log

    that's all, isn't it ?

    it did each db

    8.3 in case that matters

    the very end:

    There were 0 unused item pointers.
    0 pages are entirely empty.
    CPU 0.00s/0.00u sec elapsed 0.00 sec.
    INFO: analyzing "public.seo_partnerlinkcategory"
    INFO: "seo_partnerlinkcategory": scanned 0 of 0 pages, containing 0 live
    rows and 0 dead rows; 0 rows in sample, 0 estimated total rows


    On Fri, Feb 4, 2011 at 6:40 PM, Scott Marlowe wrote:

    I tried it with all databases too
    I believe you have to run it on the whole db to get that output.
  • Felix at Feb 4, 2011 at 6:29 pm
    vacuumdb -a -v -z -U postgres -W &> vacuum.log
    Password:
    Password:
    Password:
    Password:
    Password:
    Password:
    Password:
    Password:
    Password:
    Password:
    Password:
    cruxnu:nsbuildout crucial$

    do you think its possible that it just doesn't have anything to complain
    about ?
    or the password is affecting it ?

    In any case I'm not sure I want to run this even at night on production.

    what is the downside to estimating max_fsm_pages too high ?

    3000000 should be safe
    its certainly not 150k

    I have one very large table (10m) that is being analyzed before I warehouse
    it.
    that could've been the monster that ate the free map.
    I think today I've learned that even unused tables affect postgres
    performance.


    and do you agree that I should turn CLUSTER ON ?
    I have no problem to stop all tasks to this table at night and just reload
    it


    On Fri, Feb 4, 2011 at 6:47 PM, Shaun Thomas wrote:

    On 02/04/2011 11:44 AM, felix wrote:

    the very end:
    There were 0 unused item pointers.
    0 pages are entirely empty.
    CPU 0.00s/0.00u sec elapsed 0.00 sec.
    INFO: analyzing "public.seo_partnerlinkcategory"
    INFO: "seo_partnerlinkcategory": scanned 0 of 0 pages, containing 0 live
    rows and 0 dead rows; 0 rows in sample, 0 estimated total rows
    That looks to me like it didn't finish. Did you fork it off with '&' or run
    it and wait until it gave control back to you?

    It really should be telling you how many pages it wanted, and are in use.
    If not, something odd is going on.


    --
    Shaun Thomas
    OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
    312-676-8870
    sthomas@peak6.com

    ______________________________________________

    See http://www.peak6.com/email_disclaimer.php
    for terms and conditions related to this email
  • Scott Marlowe at Feb 4, 2011 at 6:35 pm
    You can run vacuum verbose on just the postgres database and get the
    global numbers at the end. gotta be a superuser as well.

    # \c postgres postgres
    postgres=# vacuum verbose;
    .... lots deleted.
    DETAIL: A total of 7664 page slots are in use (including overhead).
    7664 page slots are required to track all free space.
    Current limits are: 1004800 page slots, 5000 relations, using 6426 kB.
  • Shaun Thomas at Feb 4, 2011 at 6:36 pm

    On 02/04/2011 12:14 PM, felix wrote:

    do you think its possible that it just doesn't have anything to
    complain about ? or the password is affecting it ?
    Why is it asking for the password over and over again? It shouldn't be
    doing that. And also, are you running this as a user with superuser
    privileges? You might want to think about setting up a .pgpass file, or
    setting up local trust for the postgres user so you can run maintenance
    without having to manually enter a password.
    In any case I'm not sure I want to run this even at night on
    production.
    You should be. Even with auto vacuum turned on, all of our production
    systems get a nightly vacuum over the entire list of databases. It's non
    destructive, and about the only thing that happens is disk IO. If your
    app has times where it's not very busy, say 3am, it's a good time.

    This is especially true since your free space map is behind.

    We actually turn off autovacuum because we have a very transactionally
    intense DB, and if autovacuum launches on a table in the middle of the
    day, our IO totally obliterates performance. We only run a nightly
    vacuum over all the databases when very few users or scripts are using
    anything.
    what is the downside to estimating max_fsm_pages too high ?
    Nothing really. It uses more memory to track it, but on modern servers,
    it's not a concern. The only risk is that you don't know what the real
    setting should be, so you may not completely stop your bloating.
    and do you agree that I should turn CLUSTER ON ?
    Cluster isn't really something you turn on, but something you do. It's
    like vacuum full, in that it basically rebuilds the table and all
    indexes from scratch. The major issue you'll run into is that it
    reorders the table by the index you chose, so you'd best select the
    primary key unless you have reasons to use something else. And you have
    to do it table by table, which will really suck since we already know
    your whole db has bloated, not just one or two tables.

    You're going to be doing some scripting, buddy. :) Well, unless you just
    do a dump/restore and start over with sane postgresql.conf settings.
    I have no problem to stop all tasks to this table at night and just
    reload it
    That will work for this table. Just keep in mind all your tables have
    been suffering since you installed this database. Tables with the
    highest turnover were hit hardest, but they all have non-ideal sizes
    compared to what they would be if your maintenance was working.

    --
    Shaun Thomas
    OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
    312-676-8870
    sthomas@peak6.com

    ______________________________________________

    See http://www.peak6.com/email_disclaimer.php
    for terms and conditions related to this email
  • Felix at Feb 4, 2011 at 7:26 pm

    On Fri, Feb 4, 2011 at 7:34 PM, Shaun Thomas wrote:

    Why is it asking for the password over and over again? It shouldn't be
    doing that.
    because I asked it to: -W
    on the production server I need to enter password and I'm testing on dev
    first.

    I just sudo tried it but still no report


    and do you agree that I should turn CLUSTER ON ?
    Cluster isn't really something you turn on, but something you do.

    djns4=# cluster fastadder_fastadderstatus;
    ERROR: there is no previously clustered index for table
    "fastadder_fastadderstatus"

    http://www.postgresonline.com/journal/archives/10-How-does-CLUSTER-ON-improve-index-performance.html

    djns4=# alter table fastadder_fastadderstatus CLUSTER ON
    fastadder_fastadderstatus_pkey; ALTER TABLE djns4=# CLUSTER
    fastadder_fastadderstatus; CLUSTER

    ok, that's why I figured I was turning something on. the table has been
    altered.

    it will be pk ordered, new entries always at the end and no deletes

    but this means I have to manually run cluster from time to time, right ? not
    that there will be much or any reordering. or it should be fine going
    forward with vacuum and enlarging the free space memory map.


    It's like vacuum full, in that it basically rebuilds the table and all
    indexes from scratch. The major issue you'll run into is that it reorders
    the table by the index you chose, so you'd best select the primary key
    unless you have reasons to use something else. And you have to do it table
    by table, which will really suck since we already know your whole db has
    bloated, not just one or two tables.
    do we know that ? many of the tables are fairly static.

    only this one is seriously borked, and yet other related tables seem to be
    fine.



    You're going to be doing some scripting, buddy. :) Well, unless you just do
    a dump/restore and start over with sane postgresql.conf settings.

    well who knew the defaults were unsane ? :)

    scripting this is trivial, I already have the script

    I have made the mistake of doing VACUUM FULL in the past. in fact on this
    table, and it had to be killed because it took down my entire website !
    that may well be the major borking event. a credit to postgres that the
    table still functions if that's the case.

    scott marlowe:

    begin;
    select * into temporaryholdingtable order by somefield;
    truncate oldtable;
    insert into oldtables select * from temporaryholdingtable;
    commit;

    that sounds like a good approach.

    gentlemen, 300,000 + thanks for your generous time !
    (a small number, I know)

    -felix
  • Scott Marlowe at Feb 4, 2011 at 7:35 pm

    On Fri, Feb 4, 2011 at 12:26 PM, felix wrote:
    I just sudo tried it but still no report
    It's not about who you are in Unix / Linux, it's about who you are in
    Postgresql. \du will show you who is a superusr. psql -U username
    will let you connect as that user.
  • Felix at Feb 4, 2011 at 7:59 pm
    ah right, duh.
    yes, I did it as -U postgres, verified as a superuser

    just now did it from inside psql as postgres

    \c djns4
    vacuum verbose analyze;

    still no advice on the pages


    On Fri, Feb 4, 2011 at 8:34 PM, Scott Marlowe wrote:
    On Fri, Feb 4, 2011 at 12:26 PM, felix wrote:
    I just sudo tried it but still no report
    It's not about who you are in Unix / Linux, it's about who you are in
    Postgresql. \du will show you who is a superusr. psql -U username
    will let you connect as that user.
  • Shaun Thomas at Feb 4, 2011 at 8:00 pm

    On 02/04/2011 01:59 PM, felix wrote:


    still no advice on the pages
    I think it just hates you.

    --
    Shaun Thomas
    OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
    312-676-8870
    sthomas@peak6.com

    ______________________________________________

    See http://www.peak6.com/email_disclaimer.php
    for terms and conditions related to this email
  • Felix at Feb 4, 2011 at 8:14 pm
    it probably has good reason to hate me.



    ns=> SELECT n.nspname AS schema_name, c.relname AS table_name,
    ns-> c.reltuples AS row_count,
    ns-> c.relpages*8/1024 AS mb_used,
    ns-> pg_total_relation_size(c.oid)/1024/1024 AS total_mb_used
    ns-> FROM pg_class c
    ns-> JOIN pg_namespace n ON (n.oid=c.relnamespace)
    ns-> WHERE c.relkind = 'r'
    ns-> ORDER BY total_mb_used DESC
    ns-> LIMIT 20;
    schema_name | table_name | row_count | mb_used |
    total_mb_used
    -------------+----------------------------------+-------------+---------+---------------
    public | django_session | 1.47843e+07 | 4122 |
    18832
    public | traffic_tracking2010 | 9.81985e+06 | 811 |
    1653
    public | mailer_mailingmessagelog | 7.20214e+06 | 441 |
    1082
    public | auth_user | 3.20077e+06 | 572 |
    791
    public | fastadder_fastadderstatus | 302479 | 458 |
    693
    public | registration_registrationprofile | 3.01345e+06 | 248 |
    404
    public | reporting_dp_6c93734c | 1.1741e+06 | 82 |
    224
    public | peoplez_contact | 79759 | 18 |
    221
    public | traffic_tracking201101 | 1.49972e+06 | 163 |
    204
    public | reporting_dp_a3439e2a | 1.32739e+06 | 82 |
    187
    public | nsproperties_apthistory | 44906 | 69 |
    126
    public | nsproperties_apt | 30780 | 71 |
    125
    public | clients_showingrequest | 85175 | 77 |
    103
    public | reporting_dp_4ffe04ad | 330252 | 26 |
    63
    public | fastadder_fastadderstatuslog | 377402 | 28 |
    60
    public | nsmailings_officememotoagent | 268345 | 15 |
    52
    public | celery_taskmeta | 5041 | 12 |
    32
    public | mailer_messagelog | 168298 | 24 |
    32
    public | datapoints_job | 9167 | 12 |
    23
    public | fastadder_fastadderstatus_errors | 146314 | 7 |
    21

    oh and there in the footnotes to django they say "dont' forget to run the
    delete expired sessions management every once in a while". thanks guys.

    it won't run now because its too big, I can delete them from psql though

    well just think how sprightly my website will run tomorrow once I fix these.



    On Fri, Feb 4, 2011 at 9:00 PM, Shaun Thomas wrote:

    On 02/04/2011 01:59 PM, felix wrote:


    still no advice on the pages
    I think it just hates you.


    --
    Shaun Thomas
    OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
    312-676-8870
    sthomas@peak6.com

    ______________________________________________

    See http://www.peak6.com/email_disclaimer.php
    for terms and conditions related to this email
  • Shaun Thomas at Feb 4, 2011 at 8:38 pm

    On 02/04/2011 02:14 PM, felix wrote:

    oh and there in the footnotes to django they say "dont' forget to run
    the delete expired sessions management every once in a while".
    thanks guys.
    Oh Django... :)
    it won't run now because its too big, I can delete them from psql though
    You might be better off deleting the inverse. You know, start a
    transaction, select all the sessions that *aren't* expired, truncate the
    table, insert them back into the session table, and commit.

    BEGIN;
    CREATE TEMP TABLE foo_1 AS
    SELECT * FROM django_session WHERE date_expired < CURRENT_DATE;
    TRUNCATE django_session;
    INSERT INTO django_session SELECT * from foo_1;
    COMMIT;

    Except I don't actually know what the expired column is. You can figure
    that out pretty quick, I assume. That'll also have the benefit of
    cleaning up the indexes and the table all at once. If you just do a
    delete, the table won't change at all, except that it'll have less
    active records.
    well just think how sprightly my website will run tomorrow once I fix
    these.
    Maybe. :)

    --
    Shaun Thomas
    OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
    312-676-8870
    sthomas@peak6.com

    ______________________________________________

    See http://www.peak6.com/email_disclaimer.php
    for terms and conditions related to this email
  • Pierre C at Feb 5, 2011 at 9:13 am

    On Fri, 04 Feb 2011 21:37:56 +0100, Shaun Thomas wrote:
    On 02/04/2011 02:14 PM, felix wrote:

    oh and there in the footnotes to django they say "dont' forget to run
    the delete expired sessions management every once in a while".
    thanks guys.
    Oh Django... :)
    it won't run now because its too big, I can delete them from psql though
    You might be better off deleting the inverse. You know, start a
    transaction, select all the sessions that *aren't* expired, truncate the
    table, insert them back into the session table, and commit.
    Note that for a session table, that is updated very often, you can use the
    postgres' HOT feature which will create a lot less dead rows. Look it up
    in the docs.
  • Scott Marlowe at Feb 6, 2011 at 7:02 pm

    On Fri, Feb 4, 2011 at 1:14 PM, felix wrote:
    schema_name |            table_name            |  row_count  | mb_used |
    total_mb_used
    -------------+----------------------------------+-------------+---------+---------------
    public      | django_session                   | 1.47843e+07 |    4122 |
    18832
    So does this row still have 15M rows in it? Any old ones you can
    delete, then run cluster on the table?
  • Shaun Thomas at Feb 4, 2011 at 7:40 pm

    On 02/04/2011 01:26 PM, felix wrote:

    because I asked it to: -W on the production server I need to enter
    password and I'm testing on dev first.
    Right. I'm just surprised it threw up the prompt so many times.
    I just sudo tried it but still no report
    Nono... you have to run the vacuum command with the -U for a superuser
    in the database. Like the postgres user.
    but this means I have to manually run cluster from time to time, right ?
    not that there will be much or any reordering. or it should be fine
    going forward with vacuum and enlarging the free space memory map.
    It should be fine going forward. You only need to re-cluster if you want
    to force the table to remain in the order you chose, since it doesn't
    maintain the order for updates and new inserts. Since you're only doing
    it as a cleanup, that's not a concern for you.
    do we know that ? many of the tables are fairly static. only this
    one is seriously borked, and yet other related tables seem to be
    fine.
    Probably not in your case. I just mean that any non-static table is
    going to have this problem. If you know what those are, great. I don't
    usually have that luxury, so I err on the side of assuming the whole DB
    is borked. :)

    Also, here's a query you may find useful in the future. It reports the
    top 20 tables by size, but also reports the row counts and what not.
    It's a good way to find possibly bloated tables, or tables you could
    archive:

    SELECT n.nspname AS schema_name, c.relname AS table_name,
    c.reltuples AS row_count,
    c.relpages*8/1024 AS mb_used,
    pg_total_relation_size(c.oid)/1024/1024 AS total_mb_used
    FROM pg_class c
    JOIN pg_namespace n ON (n.oid=c.relnamespace)
    WHERE c.relkind = 'r'
    ORDER BY total_mb_used DESC
    LIMIT 20;

    The total_mb_used column is the table + all of the indexes and toast
    table space. The mb_used is just for the table itself. This will also
    help you see index bloat, or if a table has too much toasted data.
    well who knew the defaults were unsane ? :)
    Not really "unsane," but for any large database, they're not ideal. This
    also goes for the default_statistics_target setting. If you haven't
    already, you may want to bump this up to 100 from the default of 10. Not
    enough stats can make the planner ignore indexes and other bad things,
    and it sounds like your DB is big enough to benefit from that.

    Later versions have made 100 the default, so you'd just be catching up. :)

    --
    Shaun Thomas
    OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
    312-676-8870
    sthomas@peak6.com

    ______________________________________________

    See http://www.peak6.com/email_disclaimer.php
    for terms and conditions related to this email
  • Greg Smith at Feb 4, 2011 at 6:38 pm

    felix wrote:
    and do you agree that I should turn CLUSTER ON ?
    I have no problem to stop all tasks to this table at night and just
    reload it
    You don't turn it on; it's a one time operation that does a cleanup. It
    is by far the easiest way to clean up the mess you have right now.
    Moving forward, if you have max_fsm_pages set to an appropriate number,
    you shouldn't end up back in this position again. But VACUUM along
    won't get you out of there, and VACUUM FULL is always a worse way to
    clean this up than CLUSTER.

    --
    Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
    PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
    "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
  • Scott Marlowe at Feb 4, 2011 at 7:01 pm

    On Fri, Feb 4, 2011 at 11:38 AM, Greg Smith wrote:
    You don't turn it on; it's a one time operation that does a cleanup.  It is
    by far the easiest way to clean up the mess you have right now.  Moving
    forward, if you have max_fsm_pages set to an appropriate number, you
    shouldn't end up back in this position again.  But VACUUM along won't get
    you out of there, and VACUUM FULL is always a worse way to clean this up
    than CLUSTER.
    note that for large, randomly ordered tables, cluster can be pretty
    slow, and you might want to do the old:

    begin;
    select * into temporaryholdingtable order by somefield;
    truncate oldtable;
    insert into oldtables select * from temporaryholdingtable;
    commit;

    for fastest performance. I've had Cluster take hours to do that the
    above does in 1/4th the time.
  • Shaun Thomas at Feb 4, 2011 at 7:18 pm

    On 02/04/2011 01:01 PM, Scott Marlowe wrote:

    begin;
    select * into temporaryholdingtable order by somefield;
    truncate oldtable;
    insert into oldtables select * from temporaryholdingtable;
    commit;
    That's usually how I do it, except for larger tables, I also throw in a
    DROP INDEX for all the indexes on the table before the insert, and
    CREATE INDEX statements afterwards.

    Which actually brings up a question I've been wondering to myself that I
    may submit to [HACKERS]: Can we add a a parallel option to the reindexdb
    command? We added one to pg_restore, so we already know it works.

    I have a bunch of scripts that get all the indexes in the database and
    order them by size (so they're distributed evenly), round-robin them
    into separate REINDEX sql files, and launches them all in parallel
    depending on how many threads you want, but that's so hacky I feel dirty
    every time I use it.

    --
    Shaun Thomas
    OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
    312-676-8870
    sthomas@peak6.com

    ______________________________________________

    See http://www.peak6.com/email_disclaimer.php
    for terms and conditions related to this email
  • Shaun Thomas at Feb 4, 2011 at 5:45 pm

    On 02/04/2011 11:38 AM, felix wrote:

    CPU 0.00s/0.00u sec elapsed 0.00 sec.
    INFO: analyzing "public.fastadder_fastadderstatus"
    INFO: "fastadder_fastadderstatus": scanned 2492 of 2492 pages,
    containing 154378 live rows and 0 dead rows; 30000 rows in sample,
    154378 estimated total rows

    and there's nothing at the end of the whole vacuum output about pages
    I'm not sure if it gives it to you if you pick a single DB, but if you
    use -a for all, you should get something at the very end like this:

    INFO: free space map contains 1365918 pages in 1507 relations
    DETAIL: A total of 1326656 page slots are in use (including overhead).
    1326656 page slots are required to track all free space.
    Current limits are: 3000000 page slots, 3500 relations, using 38784 kB.
    VACUUM

    That's on our dev system. Your dev table seems properly sized, but prod
    probably isn't. If you run an all-database vacuum after-hours, you'll
    see the stuff at the end. And if your 'page slots are required' is
    greater than your 'page slots are in use,' you've got a problem.

    --
    Shaun Thomas
    OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
    312-676-8870
    sthomas@peak6.com

    ______________________________________________

    See http://www.peak6.com/email_disclaimer.php
    for terms and conditions related to this email
  • Felix at Feb 6, 2011 at 10:49 am
    BRUTAL


    http://www.postgresql.org/docs/8.3/interactive/runtime-config-resource.html
    max_fsm_pages

    See Section 17.4.1<http://www.postgresql.org/docs/8.3/interactive/kernel-resources.html#SYSVIPC>
    for
    information on how to adjust those parameters, if necessary.

    I see absolutely nothing in there about how to set those parameters.

    several hours later (
    where is my data directory ? 8.4 shows it in SHOW ALL; 8.3 does not.
    conf files ? "in the data directory" no, its in /etc/postgres/8.3/main
    where is pg_ctl ?
    what user do I need to be ? postgres
    then why was it installed in the home dir of a user that does not have
    permissions to use it ??
    )


    cd /home/crucial/bin

    /home/crucial/bin/pg_ctl -D /var/lib/postgresql/8.3/main reload

    reload does not reset max_fsm_pages, I need to actually restart the server.

    postgres@nestseekers:/home/crucial/bin$ /home/crucial/bin/pg_ctl -D
    /var/lib/postgresql/8.3/main restart
    waiting for server to shut
    down............................................................... failed
    pg_ctl: server does not shut down


    OK, my mistake. probably I have to disconnect all clients. I don't want
    to do a "planned maintenance" right now.

    so I go to sleep

    the server restarts itself an hour later.

    but no, it fails to restart because this memory setting you recommend is not
    possible without reconfiguring the kernel.


    postgres@nestseekers:/home/crucial/bin$ 2011-02-06 05:18:00 EST LOG: could
    not load root certificate file "root.crt": No such file or directory
    2011-02-06 05:18:00 EST DETAIL: Will not verify client certificates.
    2011-02-06 05:18:00 EST FATAL: could not create shared memory segment:
    Invalid argument
    2011-02-06 05:18:00 EST DETAIL: Failed system call was shmget(key=5432001,
    size=35463168, 03600).
    2011-02-06 05:18:00 EST HINT: This error usually means that PostgreSQL's
    request for a shared memory segment exceeded your kernel's SHMMAX parameter.
    You can either reduce the request size or reconfigure the kernel with
    larger SHMMAX. To reduce the request size (currently 35463168 bytes),
    reduce PostgreSQL's shared_buffers parameter (currently 3072) and/or its
    max_connections parameter (currently 103).
    If the request size is already small, it's possible that it is less than
    your kernel's SHMMIN parameter, in which case raising the request size or
    reconfiguring SHMMIN is called for.
    The PostgreSQL documentation contains more information about shared memory
    configuration.
    ^C

    *and the website is down for the next 6 hours while I sleep.*

    total disaster

    after a few tries I get it to take an max_fsm_pages of 300k

    postgres@nestseekers:/home/crucial/bin$ 2011-02-06 05:19:26 EST LOG: could
    not load root certificate file "root.crt": No such file or directory
    2011-02-06 05:19:26 EST DETAIL: Will not verify client certificates.
    2011-02-06 05:19:26 EST LOG: database system was shut down at 2011-02-06
    00:07:41 EST
    2011-02-06 05:19:27 EST LOG: autovacuum launcher started
    2011-02-06 05:19:27 EST LOG: database system is ready to accept connections
    ^C



    2011-02-06 05:33:45 EST LOG: checkpoints are occurring too frequently (21
    seconds apart)
    2011-02-06 05:33:45 EST HINT: Consider increasing the configuration
    parameter "checkpoint_segments".


    ??


    From my perspective: the defaults for postgres 8.3 result in a database that
    does not scale and fails dramatically after 6 months. changing that default
    is brutally difficult and can only really be done by adjusting something in
    the kernel.


    I have clustered that table, its still unbelievably slow.
    I still don't know if this bloat due to the small free space map has
    anything to do with why the table is performing like this.

    On Fri, Feb 4, 2011 at 5:35 PM, Shaun Thomas wrote:


    You can stop the bloating by setting the right max_fsm_pages setting,




    but you'll either have to go through and VACUUM FULL every table in your
    database, or dump/restore to regain all the lost space and performance (the
    later would actually be faster). Before I even touch an older PostgreSQL DB,
    I set it to some value over 3-million just as a starting value to be on the
    safe side. A little used memory is a small price to pay for stopping gradual
    expansion.
  • Scott Marlowe at Feb 6, 2011 at 3:23 pm

    On Sun, Feb 6, 2011 at 3:48 AM, felix wrote:
    BRUTAL SNIP
    OK, my mistake.   probably I have to disconnect all clients.  I don't want
    to do a "planned maintenance" right now.
    so I go to sleep
    the server restarts itself an hour later.
    but no, it fails to restart because this memory setting you recommend is not
    possible without reconfiguring the kernel. SNIP
    and the website is down for the next 6 hours while I sleep.
    total disaster
    Let's review:
    1: No test or staging system used before production
    2: DB left in an unknown state (trying to shut down, not able)
    3: No monitoring software to tell you when the site is down
    4: I'm gonna just go ahead and guess no backups were taken either, or
    are regularly taken.

    This website can't be very important, if that's the way you treat it.
    Number 1 up there becomes even worse because it was your first time
    trying to make this particular change in Postgresql. If it is
    important, you need to learn how to start treating it that way. Even
    the most junior of sys admins or developers I work with know we test
    it a couple times outside of production before just trying it there.
    And my phone starts complaining a minute after the site stops
    responding if something does go wrong the rest of the time. Do not
    lay this at anyone else's feet.
    From my perspective: the defaults for postgres 8.3 result in a database that
    does not scale and fails dramatically after 6 months.
    Agreed. Welcome to using shared memory and the ridiculously low
    defaults on most flavors of unix or linux.
    changing that default
    is brutally difficult and can only really be done by adjusting something in
    the kernel.
    Please, that's a gross exaggeration. The sum totoal to changing them is:

    run sysctl -a|grep shm
    copy out proper lines to cahnge
    edit sysctl.conf
    put new lines in there with changes
    sudo sysctl -p # applies changes
    edit the appropriate postgresql.conf, make changes
    sudo /etc/init.d/postgresql-8.3 stop
    sudo /etc/init.d/postgresql-8.3 start
    I have clustered that table, its still unbelievably slow.
    Did you actually delete the old entries before clustering it? if it's
    still got 4G of old sessions or whatever in it, clustering ain't gonna
    help.
    I still don't know if this bloat due to the small free space map has
    anything to do with why the table is performing like this.
    Since you haven't show us what changes, if any, have happened to the
    table, neither do we :)
  • Pierre C at Feb 6, 2011 at 7:19 pm

    I have clustered that table, its still unbelievably slow.
    Did you actually delete the old entries before clustering it? if it's
    still got 4G of old sessions or whatever in it, clustering ain't gonna
    help.
    Also, IMHO it is a lot better to store sessions in something like
    memcached, rather than imposing this rather large load on the main
    database...

    PS : if your site has been down for 6 hours, you can TRUNCATE your
    sessions table...
  • Scott Marlowe at Feb 6, 2011 at 7:24 pm

    On Sun, Feb 6, 2011 at 12:19 PM, Pierre C wrote:
    I have clustered that table, its still unbelievably slow.
    Did you actually delete the old entries before clustering it?  if it's
    still got 4G of old sessions or whatever in it, clustering ain't gonna
    help.
    Also, IMHO it is a lot better to store sessions in something like memcached,
    rather than imposing this rather large load on the main database...

    PS : if your site has been down for 6 hours, you can TRUNCATE your sessions
    table...
    Agreed. When I started where I am sessions were on pg and falling
    over all the time. Because I couldn't change it at the time, I was
    forced to make autovac MUCH more aggressive. I didn't have to crank
    up fsm a lot really but did a bit. Then just ran a vacuum full /
    reindex across the sessions table and everything was fine after that.
    But we could handle 100x time the load for sessions with memcached I
    bet.
  • Felix at Feb 7, 2011 at 1:56 am
    yeah, it already uses memcached with db save. nothing important in session
    anyway

    the session table is not the issue

    and I never clustered that one or ever will

    thanks for the tip, also the other one about HOT

    On Sun, Feb 6, 2011 at 8:19 PM, Pierre C wrote:


    I have clustered that table, its still unbelievably slow.
    Did you actually delete the old entries before clustering it? if it's
    still got 4G of old sessions or whatever in it, clustering ain't gonna
    help.
    Also, IMHO it is a lot better to store sessions in something like
    memcached, rather than imposing this rather large load on the main
    database...

    PS : if your site has been down for 6 hours, you can TRUNCATE your sessions
    table...
  • Felix at Feb 7, 2011 at 1:52 am

    On Sun, Feb 6, 2011 at 4:23 PM, Scott Marlowe wrote:

    Let's review:

    1: No test or staging system used before production
    >

    no, I do not have a full ubuntu machine replicating the exact memory and
    application load of the production server.

    this was changing one configuration parameter. something I was advised to
    do, read about quite a bit, tested on my development server (mac) and then
    proceeded to do at 6 am on Sunday morning, our slowest time.


    2: DB left in an unknown state (trying to shut down, not able)
    >

    what ?

    I checked the site, everything was normal. I went in via psql and tried
    some queries for about half an hour and continued to monitor the site. then
    I went to bed at 7am (EU time).

    Why did it shutdown so much later ?

    I have never restarted postgres before, so this was all new to me. I
    apologize that I wasn't born innately with such knowledge.

    So is it normal for postgres to report that it failed to shut down, operate
    for an hour and then go ahead and restart itself ?

    3: No monitoring software to tell you when the site is down
    >

    of course I have monitoring software. both external and internal. but it
    doesn't come and kick me out of bed. yes, I need an automated cel phone
    call. that was the first thing I saw to afterwards.


    4: I'm gonna just go ahead and guess no backups were taken either, or
    are regularly taken.
    WTF ? of course I have backups. I just went through a very harsh down
    period event. I fail to see why it is now necessary for you to launch such
    an attack on me.

    Perhaps the tone of my post sounded like I was blaming you, or at least you
    felt that way. Why do you feel that way ?

    Why not respond with: "ouch ! did you check this ... that...." say
    something nice and helpful. correct my mistakes



    This website can't be very important, if that's the way you treat it.
    just to let you know, that is straight up offensive

    This is high traffic real estate site. Downtime is unacceptable. I had
    less downtime than this when I migrated to the new platform.

    I spent rather a large amount of time reading and questioning here. I asked
    many questions for clarification and didn't do ANYTHING until I was sure it
    was the correct solution. I didn't just pull some shit off a blog and start
    changing settings at random.

    I double checked opinions against different people and I searched for more
    docs on that param. Amazingly none of the ones I found commented on the
    shared memory issue and I didn't even understand the docs discussing shared
    memory because it didn't seem to apply to what I was doing. that's my
    misunderstanding. I come her to share my misunderstanding.



    And my phone starts complaining a minute after the site stops
    responding if something does go wrong the rest of the time. Do not
    lay this at anyone else's feet.
    I didn't. There is not even the slightest hint of that in my post.

    I came here and posted the details of where I went wrong and what confused
    me about the documentation that I followed. That's so other people can
    follow it and so somebody here can comment on it.


    changing that default
    is brutally difficult and can only really be done by adjusting something in
    the kernel.
    Please, that's a gross exaggeration. The sum totoal to changing them is:

    run sysctl -a|grep shm
    copy out proper lines to cahnge
    edit sysctl.conf
    put new lines in there with changes
    sudo sysctl -p # applies changes
    edit the appropriate postgresql.conf, make changes
    sudo /etc/init.d/postgresql-8.3 stop
    sudo /etc/init.d/postgresql-8.3 start
    Considering how splendidly the experiment with changing fsm_max_pages went,
    I think you can understand that I have no desire to experiment with kernel
    settings.

    It is easy for you because you ALREADY KNOW everything involved. I am not a
    sysadmin and we don't have one. My apologies for that.

    so does the above mean that I don't have to restart the entire server, just
    postgres ? I assumed that changing kernel settings means rebooting the
    server.


    I have clustered that table, its still unbelievably slow.

    Did you actually delete the old entries before clustering it? if it's
    still got 4G of old sessions or whatever in it, clustering ain't gonna
    help.
    its a different table. the problem one has only 300k rows

    the problem is not the size, the problem is the speed is catastrophic


    I still don't know if this bloat due to the small free space map has
    anything to do with why the table is performing like this.
    Since you haven't show us what changes, if any, have happened to the
    table, neither do we :)
    sorry, it didn't seem to be the most important topic when I got out of bed
  • Craig Ringer at Feb 7, 2011 at 3:05 am

    On 07/02/11 09:52, felix wrote:

    So is it normal for postgres to report that it failed to shut down,
    operate for an hour and then go ahead and restart itself ?
    That's pretty wacky. Did you shut it down via pg_ctl or using an init
    script / "service" command in your OS?

    It shouldn't matter, but it'd be good to know. If the problem is with an
    init script, then knowing which OS and version you're on would help. If
    it was with psql directly, that's something that can be looked into.
    this was changing one configuration parameter. something I was advised
    to do, read about quite a bit, tested on my development server (mac) and
    then proceeded to do at 6 am on Sunday morning, our slowest time.
    System V shared memory is awful - but it's really the only reasonable
    alternative for a multi-process (rather than multi-threaded) server.

    PostgreSQL could use mmap()ed temp files, but that'd add additional
    overheads and they'd potentially get flushed from main memory unless the
    memory was mlock()ed. As mlock() has similar limits and configuration
    methods to system V shared memory, you get back to the same problem in a
    slightly different form.

    What would possibly help would be if Pg could fall back to lower
    shared_buffers automatically, screaming about it in the logs but still
    launching. OTOH, many people don't check the logs, so they'd think their
    new setting had taken effect and it hadn't - you've traded one usability
    problem for another. Even if Pg issued WARNING messages to each client
    that connected, lots of (non-psql) clients don't display them, so many
    users would never know.

    Do you have a suggestion about how to do this better? The current
    approach is known to be rather unlovely, but nobody's come up with a
    better one that works reasonably and doesn't trample on other System V
    shared memory users that may exist on the system.
    so does the above mean that I don't have to restart the entire server,
    just postgres ? I assumed that changing kernel settings means rebooting
    the server.
    Nope. sysctl settings like shmmax may be changed on the fly.

    --
    System & Network Administrator
    POST Newspapers
  • Marti Raudsepp at Feb 7, 2011 at 10:37 am

    On Mon, Feb 7, 2011 at 05:03, Craig Ringer wrote:
    What would possibly help would be if Pg could fall back to lower
    shared_buffers automatically, screaming about it in the logs but still
    launching. OTOH, many people don't check the logs, so they'd think their
    new setting had taken effect and it hadn't - you've traded one usability
    problem for another. Even if Pg issued WARNING messages to each client
    that connected, lots of (non-psql) clients don't display them, so many
    users would never know.

    Do you have a suggestion about how to do this better? The current
    approach is known to be rather unlovely, but nobody's come up with a
    better one that works reasonably and doesn't trample on other System V
    shared memory users that may exist on the system.
    We could do something similar to what Apache does -- provide distros
    with a binary to check the configuration file in advance. This check
    program is launched before the "restart" command, and if it fails, the
    server is not restarted.

    Regards,
    Marti
  • Felix at Feb 7, 2011 at 3:05 pm
    +1

    this is exactly what I was looking for at the time: a -t (configtest)
    option to pg_ctl

    and I think it should fall back to lower shared buffers and log it.

    SHOW ALL; would show the used value


    On Mon, Feb 7, 2011 at 11:30 AM, Marti Raudsepp wrote:
    On Mon, Feb 7, 2011 at 05:03, Craig Ringer wrote:
    What would possibly help would be if Pg could fall back to lower
    shared_buffers automatically, screaming about it in the logs but still
    launching. OTOH, many people don't check the logs, so they'd think their
    new setting had taken effect and it hadn't - you've traded one usability
    problem for another. Even if Pg issued WARNING messages to each client
    that connected, lots of (non-psql) clients don't display them, so many
    users would never know.

    Do you have a suggestion about how to do this better? The current
    approach is known to be rather unlovely, but nobody's come up with a
    better one that works reasonably and doesn't trample on other System V
    shared memory users that may exist on the system.
    We could do something similar to what Apache does -- provide distros
    with a binary to check the configuration file in advance. This check
    program is launched before the "restart" command, and if it fails, the
    server is not restarted.

    Regards,
    Marti
  • Scott Marlowe at Feb 7, 2011 at 6:42 pm

    On Mon, Feb 7, 2011 at 8:05 AM, felix wrote:
    +1
    this is exactly what I was looking for at the time:  a -t (configtest)
    option to pg_ctl
    and I think it should fall back to lower shared buffers and log it.
    SHOW ALL; would show the used value
    however, much like apache, this might not have gotten caught. In
    order to catch it we'd have to see how much shared mem was available,
    and I think you have to actually allocate it to find out if you can.
    Since pg is already running, allocating shared_buffers / fsm twice
    might fail when allocating it once would succeed.
  • Craig Ringer at Feb 7, 2011 at 11:49 pm

    On 02/07/2011 06:30 PM, Marti Raudsepp wrote:
    On Mon, Feb 7, 2011 at 05:03, Craig Ringerwrote:
    What would possibly help would be if Pg could fall back to lower
    shared_buffers automatically, screaming about it in the logs but still
    launching. OTOH, many people don't check the logs, so they'd think their
    new setting had taken effect and it hadn't - you've traded one usability
    problem for another. Even if Pg issued WARNING messages to each client
    that connected, lots of (non-psql) clients don't display them, so many
    users would never know.

    Do you have a suggestion about how to do this better? The current
    approach is known to be rather unlovely, but nobody's come up with a
    better one that works reasonably and doesn't trample on other System V
    shared memory users that may exist on the system.
    We could do something similar to what Apache does -- provide distros
    with a binary to check the configuration file in advance. This check
    program is launched before the "restart" command, and if it fails, the
    server is not restarted.
    That would work for config file errors (and would probably be a good
    idea) but won't help with bad shared memory configuration. When Pg is
    already running, it's usually not possible for a test program to claim
    the amount of shared memory the config file says to allocate, because Pg
    is already using it. Nonetheless, Pg will work fine when restarted.

    --
    Craig Ringer
  • Greg Smith at Feb 7, 2011 at 7:06 pm

    Craig Ringer wrote:
    What would possibly help would be if Pg could fall back to lower
    shared_buffers automatically, screaming about it in the logs but still
    launching.
    This is exactly what initdb does when it produces an initial setting for
    shared_buffers that goes into the postgresql.conf file. It wouldn't be
    hard to move that same logic into a loop that executed when startup
    failed to allocated enough memory.

    There are two problems here, one almost solved, the other more
    philosphical. It used to be that max_fsm_pages and wal_buffers could be
    large enough components to the allocation that reducing them might
    actually be a necessary fix, too. With the removal of the former and a
    method to automatically set the latter now available, the remaining
    components to the shared memory sizing computation are probably possible
    to try and fix automatically if the kernel limits are too low.

    But it's unclear whether running in a degraded mode, where performance
    might be terrible, with only a log message is preferrable to stopping
    and forcing the DBA's attention toward the mistake that was made
    immediately. Log files get rotated out, and it's not hard to imagine
    this problem coming to haunt someone only a month or two later--by which
    time the change to shared_buffers is long forgotten, and the log message
    complaining about it lost too. Accordingly I would expect any serious
    attempt to add some auto-reduction behavior to be beset with argument,
    and I'd never consider writing such a thing as a result. Too many
    non-controversial things I could work on instead.

    --
    Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
    PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
    "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
  • Craig Ringer at Feb 7, 2011 at 11:56 pm

    On 02/08/2011 03:05 AM, Greg Smith wrote:

    Accordingly I would expect any serious
    attempt to add some auto-reduction behavior to be beset with argument,
    and I'd never consider writing such a thing as a result. Too many
    non-controversial things I could work on instead.
    Yep. I expressed my own doubts in the post I suggested that in.

    If Pg did auto-correct down, it'd be necessary to scream about it
    angrily and continuously, not just once during startup. Given that it's
    clear many people never even look at the logs ("what logs? where are
    they?") I think Pg would also have to send notices to the client.
    Problem is, many clients don't process notices/warnings, so particularly
    slack admins won't see that either.

    I'm not particularly excited about the idea.

    --
    Craig Ringer
  • Scott Marlowe at Feb 7, 2011 at 3:14 am

    On Sun, Feb 6, 2011 at 6:52 PM, felix wrote:
    On Sun, Feb 6, 2011 at 4:23 PM, Scott Marlowe wrote:

    Let's review:

    1: No test or staging system used before production
    no, I do not have a full ubuntu machine replicating the exact memory and
    application load of the production server.
    this was changing one configuration parameter. something I was advised to
    do, read about quite a bit, tested on my development server (mac) and then
    proceeded to do at 6 am on Sunday morning, our slowest time.
    I would strongly suggest you at least test these changes out
    elsewhere. It doesn't have to exactly match, but if you had a machine
    that was even close to test on you'd have known what to expect.
    Virtual machines are dirt simple to set up now. So not having one
    inexcusable.
    2: DB left in an unknown state (trying to shut down, not able)
    what ?
    You told it to restart, which is a stop and a start. It didn't stop.
    It was in an unknown state. With settings in its config file you
    didn't know whether or not they worked because you hadn't tested them
    already on somthing similar.
    Why did it shutdown so much later ?
    Because that's when the last open connection from before when you told
    it to shutdown / restart.
    I have never restarted postgres before, so this was all new to me.
    Which is why you use a virtual machine to build a test lab so you CAN
    make these changes somewhere other than produciton.
    I apologize that I wasn't born innately with such knowledge.
    Guess what!? Neither was I! I do however know how to setup a test
    system so I don't test things on my production machine.
    So is it normal for postgres to report that it failed to shut down, operate
    for an hour and then go ahead and restart itself ?
    Yes. It eventually finished your restart you told it to do.
    3: No monitoring software to tell you when the site is down
    of course I have monitoring software.  both external and internal.  but it
    doesn't come and kick me out of bed.  yes, I need an automated cel phone
    call.  that was the first thing I saw to afterwards.
    Monitoring software that can't send you emails when things break is in
    need of having that feature enabled.
    4: I'm gonna just go ahead and guess no backups were taken either, or
    are regularly taken.
    WTF ?   of course I have backups.  I just went through a very harsh down
    period event.  I fail to see why it is now necessary for you to launch such
    an attack on me.
    No, it just seemed like your admin skills were pretty sloppy, so a
    lack of a backup wouldn't surprise me.
    Perhaps the tone of my post sounded like I was blaming you, or at least you
    felt that way.
    It felt more like you were blaming PostgreSQL for being overly
    complex, but I wasn't taking it all that personally.
    Why do you feel that way ? I don't.
    Why not respond with:  "ouch !  did you check this ... that...."  say
    something nice and helpful.  correct my mistakes
    I'd be glad to, but your message wasn't looking for help. go back and
    read it. It's one long complaint.
    This website can't be very important, if that's the way you treat it.
    just to let you know, that is straight up offensive
    Really? I'd say performing maintenance with no plan or pre-testing is
    far more offensive.
    This is high traffic real estate site.  Downtime is unacceptable.  I had
    less downtime than this when I migrated to the new platform.
    I expect you did more planning an testing?
    I spent rather a large amount of time reading and questioning here.  I asked
    many questions for clarification and didn't do ANYTHING until I was sure it
    was the correct solution.  I didn't just pull some shit off a blog and start
    changing settings at random.
    But yet you failed to test it on even the simplest similar system
    setup. And so you lacked the practical knowledge of how to make this
    change in production safely.
    I double checked opinions against different people and I searched for more
    docs on that param.  Amazingly none of the ones I found commented on the
    shared memory issue and I didn't even understand the docs discussing shared
    memory because it didn't seem to apply to what I was doing.  that's my
    misunderstanding.  I come her to share my misunderstanding.
    Well, that's useful. And I can see where there could be some changes
    made to the docs or a user friendly howto on how to increase shared
    memory and fsm and all that.
    Please, that's a gross exaggeration.  The sum totoal to changing them is:

    run sysctl -a|grep shm
    copy out proper lines to cahnge
    edit sysctl.conf
    put new lines in there with changes
    sudo sysctl -p  # applies changes
    edit the appropriate postgresql.conf, make changes
    sudo /etc/init.d/postgresql-8.3 stop
    sudo /etc/init.d/postgresql-8.3 start
    Considering how splendidly the experiment with changing fsm_max_pages went,
    I think you can understand that I have no desire to experiment with kernel
    settings.
    Experimenting is what you do on a test machine, not a production server.
    It is easy for you because you ALREADY KNOW everything involved.
    But this is important, it was NOT EASY the first time, and I certainly
    didn't try to make changes on a production server the first time.
    I am not a
    sysadmin and we don't have one.  My apologies for that.
    No need to apologize. Learn the skills needed to fill that role, or
    hire someone.
    so does the above mean that I don't have to restart the entire server, just
    postgres ?  I assumed that changing kernel settings means rebooting the
    server.
    Exactly. Just pgsql. You use sysctl -p to make the changes take effect.
    Did you actually delete the old entries before clustering it?  if it's
    still got 4G of old sessions or whatever in it, clustering ain't gonna
    help.
    its a different table.  the problem one has only 300k rows
    the problem is not the size, the problem is the speed is catastrophic
    Well, is it bloated? Which table in that previous post is it?
    sorry, it didn't seem to be the most important topic when I got out of bed
    If it's not coffee, it's not an important topic when I get out of bed.
  • Shaun Thomas at Feb 7, 2011 at 5:05 am

    I checked the site, everything was normal. I went in via psql and tried some
    queries for about half an hour and continued to monitor the site. then I went
    to bed at 7am (EU time).

    Why did it shutdown so much later ?
    That’s one of the things I talked about. To be safe, PG will start to shut down but disallow new connections, and *that’s all*. Old connections are grandfathered in until they disconnect, and when they all go away, it shuts down gracefully.

    pg_ctl –D /my/pg/dir stop –m fast
    pg_ctl –D /my/pg/dir start

    Is what you wanted.
    I have never restarted postgres before, so this was all new to me. I apologize
    that I wasn't born innately with such knowledge.
    Forget about it. But you need to learn your tools. Restarting the DB server is something you’ll need to do occasionally. Just like restarting your Django proxy or app. You need to be fully knowledgeable about every part of your tool-chain, or at least the parts you’re responsible for.
    I double checked opinions against different people and I searched for more docs
    on that param. Amazingly none of the ones I found commented on the shared
    memory issue and I didn't even understand the docs discussing shared memory
    because it didn't seem to apply to what I was doing.
    That’s no coincidence. I’ve seen that complaint if you increase shared_buffers, but not for max_fsm_pages. I guess I’m so used to bumping up shmmax and shmall that I forget how low default systems leave those values. But you do need to increase them. Every time. They’re crippling your install in more ways than just postgres.

    So far as your Django install, have you activated the memcache contrib. module? Your pages should be lazy-caching and rarely depend on the DB, if they can. You should also rarely be doing count(*) on a 300k row table, even if everything is cached and speedy. 300k row tables have nasty habits of becoming 3M row tables (or more) after enough time, and no amount of cache will save you from counting that. It’ll take 1 second or more every time eventually, and then you’ll be in real trouble. That’s an application design issue you need to address before it’s too late, or you have to rush and implement a hasty fix.

    I suggest setting your log_min_duration to 1000, so every query that takes longer than 1 second to execute is logged in your postgres logs. You can use that to track down trouble spots before they get really bad. That’s normally aggressive enough to catch the real problem queries without flooding your logs with too much output.

    Being a DBA sucks sometimes. ☺


    ______________________________________________

    See http://www.peak6.com/email_disclaimer.php
    for terms and conditions related to this email
  • Felix at Feb 8, 2011 at 3:17 am

    On Mon, Feb 7, 2011 at 6:05 AM, Shaun Thomas wrote:
    That’s one of the things I talked about. To be safe, PG will start to shut
    down but disallow new connections, and **that’s all**. Old connections are
    grandfathered in until they disconnect, and when they all go away, it shuts
    down gracefully.

    Well.... it said "Failed to shutdown ..............." and then returned
    control.
    and then proceeded to run for about an hour.

    I'm not sure how graceful that is.

    I generally take programs at their word. "Failed" is clearly past tense.



    So far as your Django install, have you activated the memcache contrib.
    module? Your pages should be lazy-caching and rarely depend on the DB, if
    they can.
    yes thanks my web app is very finely tuned and is working splendidly.
    I've been working on very large sites sites since 1998 and this client has
    been with me for 10 years already. its a fairly high traffic site.

    I've only been using postgres since we migrated in May

    but it is one particular table on postgres that has shit the sock drawer.



    You should also rarely be doing count(*) on a 300k row table, even if
    everything is cached and speedy.
    I'm not

    this is a test query that is obviously way out of bounds for acceptable
    response.

    there is something very very wrong with this table and I need to solve it
    ASAP.
    other tables that have less updates but similar sizes are not having this
    problem.

    there are foreign keys pointing to this table so its a bit tricky to just
    refill it, but I can think of one way. I'll have to do that.

    its only conjecture that the issue is file space bloat or free map problems.
    those are overall issues that I will get to as soon as I can. but this is
    table specific.


    That’s an application design issue you need to address before it’s too
    late, or you have to rush and implement a hasty fix.
    it is not an application design issue, though there are always improvements
    being made.

    Being a DBA sucks sometimes. J
    >

    I am not a DBA, I'm just trying to query a 300k row table.

    though I am happy to learn more. I know an awful lot about a lot of things.
    but you can't specialize in everything
  • Scott Marlowe at Feb 8, 2011 at 4:25 am

    On Mon, Feb 7, 2011 at 8:17 PM, felix wrote:
    On Mon, Feb 7, 2011 at 6:05 AM, Shaun Thomas wrote:

    That’s one of the things I talked about. To be safe, PG will start to shut
    down but disallow new connections, and *that’s all*. Old connections are
    grandfathered in until they disconnect, and when they all go away, it shuts
    down gracefully.
    Well.... it said "Failed to shutdown ..............."  and then returned
    control.
    and then proceeded to run for about an hour.
    I'm not sure how graceful that is.
    I generally take programs at their word.  "Failed" is clearly past tense.
    I agree that here what pg_ctl said and what it didn't aren't exactly
    the same thing.
    but it is one particular table on postgres that has shit the sock drawer.
    What queries are running slow, and what does explain analyze have to
    say about them?
    You should also rarely be doing count(*) on a 300k row table, even if
    everything is cached and speedy.
    I'm not
    this is a test query that is obviously way out of bounds for acceptable
    response.
    there is something very very wrong with this table and I need to solve it
    ASAP.
    other tables that have less updates but similar sizes are not having this
    problem.
    Is this the same problem you had at the beginning and were trying to
    fix with clustering and increasing fsm, or is this now a different
    table and a different problem?
    there are foreign keys pointing to this table so its a bit tricky to just
    refill it, but I can think of one way.  I'll have to do that.
    its only conjecture that the issue is file space bloat or free map problems.
    those are overall issues that I will get to as soon as I can. but this is
    table specific.
    What does the query you ran before that shows bloat show on this table now?
    That’s an application design issue you need to address before it’s too
    late, or you have to rush and implement a hasty fix.
    it is not an application design issue, though there are always improvements
    being made.
    If your application is doing select count(*) with either no where
    clause or with a very non-selective one, then it is somewhat of a
    design issue, and there are ways to make that faster. if it's a
    different query, show us what it and its explain analyze look like.
    Being a DBA sucks sometimes. J
    I am not a DBA, I'm just trying to query a 300k row table.
    though I am happy to learn more. I know an awful lot about a lot of things.
    but you can't specialize in everything
    Well the good news is that there's a LOT less arcana involved in keep
    pgsql happy than there is in keeping something like Oracle happy.
  • Shaun Thomas at Feb 8, 2011 at 2:23 pm

    On 02/07/2011 09:17 PM, felix wrote:

    Well.... it said "Failed to shutdown ..............." and then
    returned control. and then proceeded to run for about an hour. I'm
    not sure how graceful that is.
    Ah, but that was just the control script that sends the database the
    command to shut down. The 'graceful' part, is that the database is being
    nice to everyone trying to do things with the data inside.

    The control script has a timeout. So it'll send the command, wait a few
    seconds to see if the database responds, and then gives up. At that
    point, you can use a fast shutdown to tell the database not to be so
    nice, and it'll force disconnect all users and shut down as quickly as
    possible while maintaining data integrity.

    The easiest way to see this in action is to take a look at the postgres
    log files. In most default installs, this is in /your/pg/dir/pg_log and
    the files follow a postgresql-YYYY-MM-DD_HHMMSS.log format and generally
    auto-rotate. If not, set redirect_stderr to on, and make sure
    log_directory and log_filename are both set. Those are in your
    postgresql.conf, by the way. :)
    I've only been using postgres since we migrated in May
    Aha. Yeah... relatively new installs tend to have the worst growing
    pains. Once you shake this stuff out, you'll be much better off.
    its only conjecture that the issue is file space bloat or free map
    problems. those are overall issues that I will get to as soon as I can.
    but this is table specific.
    With 300k rows, count(*) isn't a good test, really. That's just on the
    edge of big-enough that it could be > 1-second to fetch from the disk
    controller, even if the table is fully vacuumed. And in your case, that
    table really will likely come from the disk controller, as your
    shared_buffers are set way too low. The default settings are not going
    to cut it for a database of your size, with the volume you say it's getting.

    But you need to put in those kernel parameters I suggested. And I know
    this sucks, but you also have to raise your shared_buffers and possibly
    your work_mem and then restart the DB. But this time, pg_ctl to invoke a
    fast stop, and then use the init script in /etc/init.d to restart it.
    I am not a DBA,
    You are now. :) You're administering a database, either as part of your
    job description, or because you have no choice because your company
    doesn't have an official DBA. Either way, you'll need to know this
    stuff. Which is why we're helping out.

    --
    Shaun Thomas
    OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
    312-676-8870
    sthomas@peak6.com

    ______________________________________________

    See http://www.peak6.com/email_disclaimer.php
    for terms and conditions related to this email
  • Maciek Sakrejda at Feb 8, 2011 at 4:23 pm

    Well.... it said "Failed to shutdown ..............." and then
    returned control. and then proceeded to run for about an hour. I'm
    not sure how graceful that is.
    Ah, but that was just the control script that sends the database the command
    to shut down. The 'graceful' part, is that the database is being nice to
    everyone trying to do things with the data inside.

    The control script has a timeout. So it'll send the command, wait a few
    seconds to see if the database responds, and then gives up.
    For what it's worth, I think that's the not-so-graceful part. The
    control script gives up, but the actual shutdown still occurs
    eventually, after all current connections have ended. I think most
    users will take pg_ctl at its word, and assume "Failed to shutdown"
    means "I couldn't shut down with this command, maybe you should try
    something else", and not "I couldn't shut down right now, although
    I'll get to it as soon as everyone disconnects.".

    ---
    Maciek Sakrejda | System Architect | Truviso

    1065 E. Hillsdale Blvd., Suite 215
    Foster City, CA 94404
    (650) 242-3500 Main
    www.truviso.com
  • Kevin Grittner at Feb 8, 2011 at 4:36 pm

    Maciek Sakrejda wrote:
    Well.... it said "Failed to shutdown ..............." and then
    returned control. and then proceeded to run for about an hour.
    I'm not sure how graceful that is.
    Ah, but that was just the control script that sends the database
    the command to shut down. The 'graceful' part, is that the
    database is being nice to everyone trying to do things with the
    data inside.

    The control script has a timeout. So it'll send the command, wait
    a few seconds to see if the database responds, and then gives up.
    For what it's worth, I think that's the not-so-graceful part. The
    control script gives up, but the actual shutdown still occurs
    eventually, after all current connections have ended. I think most
    users will take pg_ctl at its word, and assume "Failed to
    shutdown" means "I couldn't shut down with this command, maybe you
    should try something else", and not "I couldn't shut down right
    now, although I'll get to it as soon as everyone disconnects.".
    Yeah, current behavior with that shutdown option is the opposite of
    smart for any production environment I've seen. (I can see where it
    would be handy in development, though.) What's best in production
    is the equivalent of the fast option with escalation to immediate if
    necessary to ensure shutdown within the time limit.

    In my world, telling PostgreSQL to shut down PostgreSQL is most
    often because in a few minutes someone is going to pull the plug to
    move the server, an electrician is going to flip the circuit off to
    do some wiring, or (in one recent event) the building is on fire and
    the fire department is about to cut electrical power. In such
    situations, patiently waiting for a long-running query to complete
    is a Very Bad Idea, much less waiting for a connection pool to cycle
    all connections out. Telling the user that the shutdown failed,
    when what is really happening is that it will block new connections
    and keep waiting around indefinitely, with an actual shutdown at
    some ill-defined future moment is adding insult to injury.

    In my view, anyway....

    -Kevin
  • Marti Raudsepp at Feb 8, 2011 at 4:58 pm

    On Tue, Feb 8, 2011 at 18:36, Kevin Grittner wrote:
    Yeah, current behavior with that shutdown option is the opposite of
    smart for any production environment I've seen.  (I can see where it
    would be handy in development, though.)  What's best in production
    is the equivalent of the fast option with escalation to immediate if
    necessary to ensure shutdown within the time limit.
    +1, we should call it "dumb" :)

    Not accepting new connections with "the database system is shutting
    down" makes it even worse -- it means you can't log in to the server
    to inspect who's querying it or call pg_terminate_backend() on them.

    I couldn't find any past discussions about changing the default to "fast".
    Are there any reasons why that cannot be done in a future release?

    Regards,
    Marti
  • Scott Marlowe at Feb 8, 2011 at 5:31 pm

    On Tue, Feb 8, 2011 at 9:50 AM, Marti Raudsepp wrote:
    On Tue, Feb 8, 2011 at 18:36, Kevin Grittner
    wrote:
    Yeah, current behavior with that shutdown option is the opposite of
    smart for any production environment I've seen.  (I can see where it
    would be handy in development, though.)  What's best in production
    is the equivalent of the fast option with escalation to immediate if
    necessary to ensure shutdown within the time limit.
    +1, we should call it "dumb" :)

    Not accepting new connections with "the database system is shutting
    down" makes it even worse -- it means you can't log in to the server
    to inspect who's querying it or call pg_terminate_backend() on them.

    I couldn't find any past discussions about changing the default to "fast".
    Are there any reasons why that cannot be done in a future release?
    Or at least throw a hint the user's way that -m fast might be needed.
  • Maciek Sakrejda at Feb 8, 2011 at 5:58 pm

    I couldn't find any past discussions about changing the default to "fast".
    Are there any reasons why that cannot be done in a future release?
    Or at least throw a hint the user's way that -m fast might be needed.
    I think there are several issues here:

    1. Does pg_ctl give a clear indication of the outcome of a failed
    "smart" mode shutdown?
    2. Is the current "smart" shutdown mode behavior useful?
    3. Should the default shutdown mode be changed to "fast"?

    I think felix mainly complained about (1), and that's what I was
    talking about as well. The current message (I have only an 8.3 handy,
    but I don't imagine this has changed much) is:

    pg_ctl stop -t5
    waiting for server to shut down........ failed
    pg_ctl: server does not shut down

    This leaves out crucial information (namely, "but it will stop
    accepting new connections and shut down when all current connections
    are closed"). It seems like something along those lines should be
    added to the error message, or perhaps at least to pg_ctl
    documentation. Currently, the docs page (
    http://www.postgresql.org/docs/current/static/app-pg-ctl.html ) only
    hints at this, and pg_ctl --help does not really mention this at all.

    Of the two other issues, (3) seems reasonable (I have no strong
    feelings there either way), and (2) is probably a moot point (the
    behavior won't change in a backward-incompatible manner now, and if
    it's dethroned as default, that doesn't really matter).

    ---
    Maciek Sakrejda | System Architect | Truviso

    1065 E. Hillsdale Blvd., Suite 215
    Foster City, CA 94404
    (650) 242-3500 Main
    www.truviso.com
  • Kevin Grittner at Feb 8, 2011 at 7:00 pm

    Marti Raudsepp wrote:

    I couldn't find any past discussions about changing the default to
    "fast".
    It's not entirely unrelated to the "Linux LSB init script" in August
    and September of 1009:

    http://archives.postgresql.org/pgsql-hackers/2009-08/msg01843.php

    http://archives.postgresql.org/pgsql-hackers/2009-09/msg01963.php

    -Kevin
  • Greg Smith at Feb 8, 2011 at 8:09 pm

    Marti Raudsepp wrote:
    I couldn't find any past discussions about changing the default to "fast".
    Are there any reasons why that cannot be done in a future release?
    Well, it won't actually help as much as you might think. It's possible
    for clients to be in a state where fast shutdown doesn't work, either.
    You either have to kill them manually or use an immediate shutdown.

    Kevin and I both suggested a "fast plus timeout then immediate" behavior
    is what many users seem to want. My comments were at
    http://archives.postgresql.org/pgsql-hackers/2009-09/msg01145.php ; for
    an example of how fast shutdown can fail see
    http://archives.postgresql.org/pgsql-bugs/2009-03/msg00062.php

    --
    Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
    PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
    "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
  • Marti Raudsepp at Feb 8, 2011 at 9:10 pm

    On Tue, Feb 8, 2011 at 22:09, Greg Smith wrote:
    Kevin and I both suggested a "fast plus timeout then immediate" behavior is
    what many users seem to want.  My comments were at
    http://archives.postgresql.org/pgsql-hackers/2009-09/msg01145.php ; for an
    example of how fast shutdown can fail see
    http://archives.postgresql.org/pgsql-bugs/2009-03/msg00062.php
    True, I've hit that a few times too.

    Seems that a better solution would be implementing a new -m option
    that does this transparently?

    Regards,
    Marti
  • Kevin Grittner at Feb 8, 2011 at 9:20 pm

    Marti Raudsepp wrote:
    Greg Smith wrote:
    Kevin and I both suggested a "fast plus timeout then immediate"
    behavior is what many users seem to want.
    Seems that a better solution would be implementing a new -m option
    that does this transparently?
    Maybe. Another option might be to use -t or some new switch (or -t
    in combination with some new switch) as a time limit before
    escalating to the next shutdown mode.

    -Kevin

Related Discussions