I'm trying to troubleshoot a performance issue on an application ported
from Oracle to postgres. Now, I know the best way to get help is to post
the schema, explain analyze output, etc, etc -- unfortunately I can't do
that at the moment. However, maybe someone can point me in the right
direction to figure this out on my own. That said, here are a few details...

PostgreSQL 7.4.1
bash-2.03$ uname -a
SunOS col65 5.8 Generic_108528-27 sun4u sparc SUNW,Sun-Fire-280R

The problem is this: the application runs an insert, that fires off a
trigger, that cascades into a fairly complex series of functions, that
do a bunch of calculations, inserts, updates, and deletes. Immediately
after a postmaster restart, the first insert or two take about 1.5
minutes (undoubtedly this could be improved, but it isn't the main
issue). However by the second or third insert, the time increases to 7 -
9 minutes. Restarting the postmaster causes the cycle to repeat, i.e.
the first one or two inserts are back to the 1.5 minute range.

Any ideas spring to mind? I don't have much experience with Postgres on
Solaris -- could it be related to that somehow?

Thanks for any insights.

Joe

Search Discussions

  • Marty Scholes at Mar 13, 2004 at 2:51 am
    Six days ago I installed Pg 7.4.1 on Sparc Solaris 8 also. I am hopeful
    that we as well can migrate a bunch of our apps from Oracle.

    After doing some informal benchmarks and performance testing for the
    past week I am becoming more and more impressed with what I see.

    I have seen similar results to what you are describing.

    I found that running a full vacuum:

    vacuumdb -fza

    followed by a checkpoint makes it run fast again.

    Try timing the update with and without a full vacuum.

    I can't help but wonder if a clean shutdown includes some vacuuming.

    Obviously, in a production database this would be an issue.

    Please post back what you learn.

    Sincerely,
    Marty

    I have been doing a bunch of informat

    Joe Conway wrote:
    I'm trying to troubleshoot a performance issue on an application ported
    from Oracle to postgres. Now, I know the best way to get help is to post
    the schema, explain analyze output, etc, etc -- unfortunately I can't do
    that at the moment. However, maybe someone can point me in the right
    direction to figure this out on my own. That said, here are a few
    details...

    PostgreSQL 7.4.1
    bash-2.03$ uname -a
    SunOS col65 5.8 Generic_108528-27 sun4u sparc SUNW,Sun-Fire-280R

    The problem is this: the application runs an insert, that fires off a
    trigger, that cascades into a fairly complex series of functions, that
    do a bunch of calculations, inserts, updates, and deletes. Immediately
    after a postmaster restart, the first insert or two take about 1.5
    minutes (undoubtedly this could be improved, but it isn't the main
    issue). However by the second or third insert, the time increases to 7 -
    9 minutes. Restarting the postmaster causes the cycle to repeat, i.e.
    the first one or two inserts are back to the 1.5 minute range.

    Any ideas spring to mind? I don't have much experience with Postgres on
    Solaris -- could it be related to that somehow?

    Thanks for any insights.

    Joe

    ---------------------------(end of broadcast)---------------------------
    TIP 6: Have you searched our list archives?

    http://archives.postgresql.org
  • Joe Conway at Mar 13, 2004 at 4:11 pm

    Marty Scholes wrote:
    I have seen similar results to what you are describing.

    I found that running a full vacuum:

    vacuumdb -fza

    followed by a checkpoint makes it run fast again.

    Try timing the update with and without a full vacuum.
    Will do. I'll let you know how it goes.

    Thanks for the reply.

    Joe
  • Tom Lane at Mar 13, 2004 at 4:02 am

    Joe Conway writes:
    The problem is this: the application runs an insert, that fires off a
    trigger, that cascades into a fairly complex series of functions, that
    do a bunch of calculations, inserts, updates, and deletes. Immediately
    after a postmaster restart, the first insert or two take about 1.5
    minutes (undoubtedly this could be improved, but it isn't the main
    issue). However by the second or third insert, the time increases to 7 -
    9 minutes. Restarting the postmaster causes the cycle to repeat, i.e.
    the first one or two inserts are back to the 1.5 minute range.
    I realize this question might take some patience to answer, but what
    does the performance curve look like beyond three trials? Does it level
    off or continue to get worse? If it doesn't level off, does the
    degradation seem linear in the number of trials, or worse than linear?

    I have no ideas in mind, just trying to gather data ...

    regards, tom lane
  • Joe Conway at Mar 13, 2004 at 3:55 pm

    Tom Lane wrote:
    I realize this question might take some patience to answer, but what
    does the performance curve look like beyond three trials? Does it level
    off or continue to get worse? If it doesn't level off, does the
    degradation seem linear in the number of trials, or worse than linear?
    I try to gather some data during the weekend and report back.

    Thanks,

    Joe
  • Matthew T. O'Connor at Mar 15, 2004 at 8:40 pm

    Joe Conway wrote:

    A few pg_autovacuum questions came out of this:

    First, the default vacuum scaling factor is 2, which I think implies
    the big table should only get vacuumed every 56 million or so changes.
    I didn't come anywhere near that volume in my tests, yet the table did
    get vacuumed more than once (I was watching the pg_autovacuum log
    output). Do I misunderstand this setting?

    I think you understand correctly. A table with 1,000,000 rows should
    get vacuumed approx every 2,000,000 changes (assuming default values for
    -V ). FYI and insert and a delete count as one change, but and update
    counts as two.

    Unfortunately, the running with -d2 would show the numbers that
    pg_autovacuum is using to decide if it when it should vacuum or
    analyze. Also, are you sure that it vacuumed more than once and
    wasn't doing analyzes most of the time?

    Also, I'm not sure if 2 is a good default value for the scaling factor
    but I erred on the side of not vacuuming too often.
    Second, Matthew requested pg_autovacuum run with -d2; I found that
    with -d2 set, pg_autovacuum would immediately exit on start. -d0 and
    -d1 work fine however.

    That's unfortunate as that is the detail we need to see what
    pg_autovacuum thinks is really going on. We had a similar sounding
    crash on FreeBSD due to some unitialized variables that were being
    printed out by the debug code, however that was fixed a long time ago.
    Any chance you can look into this?
    That's all I can think of at the moment. I'd like to try the 7.4 patch
    that makes vacuum sleep every few pages -- can anyone point me to the
    latest and greatest that will apply to 7.4?

    Yes I would be very curious to see the results with the vacuum delay
    patch installed (is that patch applied to HEAD?)
  • Joe Conway at Mar 16, 2004 at 5:00 am

    Matthew T. O'Connor wrote:
    I think you understand correctly. A table with 1,000,000 rows should
    get vacuumed approx every 2,000,000 changes (assuming default values for
    -V ). FYI and insert and a delete count as one change, but and update
    counts as two.

    Unfortunately, the running with -d2 would show the numbers that
    pg_autovacuum is using to decide if it when it should vacuum or
    analyze. Also, are you sure that it vacuumed more than once and
    wasn't doing analyzes most of the time?
    Yeah, I'm sure. Snippets from the log:

    [...lots-o-tables...]
    [2004-03-14 12:44:48 PM] added table: specdb."public"."parametric_states"
    [2004-03-14 12:49:48 PM] Performing: VACUUM ANALYZE
    "public"."transaction_data"
    [2004-03-14 01:29:59 PM] Performing: VACUUM ANALYZE
    "public"."transaction_data"
    [2004-03-14 02:08:26 PM] Performing: ANALYZE "public"."out_of_spec"
    [2004-03-14 02:08:26 PM] Performing: VACUUM ANALYZE
    "public"."transaction_data"
    [2004-03-14 02:22:44 PM] Performing: VACUUM ANALYZE "public"."spc_graphs"
    [2004-03-14 03:06:45 PM] Performing: VACUUM ANALYZE "public"."out_of_spec"
    [2004-03-14 03:06:45 PM] Performing: VACUUM ANALYZE
    "public"."transaction_data"
    [2004-03-14 03:19:51 PM] Performing: VACUUM ANALYZE "public"."spc_graphs"
    [2004-03-14 03:21:09 PM] Performing: ANALYZE "public"."parametric_states"
    [2004-03-14 03:54:57 PM] Performing: ANALYZE "public"."out_of_spec"
    [2004-03-14 03:54:57 PM] Performing: VACUUM ANALYZE
    "public"."transaction_data"
    [2004-03-14 04:07:52 PM] Performing: VACUUM ANALYZE "public"."spc_graphs"
    [2004-03-14 04:09:33 PM] Performing: ANALYZE "public"."equip_status_history"
    [2004-03-14 04:09:33 PM] Performing: VACUUM ANALYZE
    "public"."parametric_states"
    [2004-03-14 04:43:46 PM] Performing: VACUUM ANALYZE "public"."out_of_spec"
    [2004-03-14 04:43:46 PM] Performing: VACUUM ANALYZE
    "public"."transaction_data"
    [2004-03-14 04:56:35 PM] Performing: VACUUM ANALYZE "public"."spc_graphs"
    [2004-03-14 04:58:32 PM] Performing: ANALYZE "public"."parametric_states"
    [2004-03-14 05:28:58 PM] added database: specdb

    This is the entire period of the first test, with default autovac
    settings. The table "public"."transaction_data" is the one with 28
    million active rows. The entire test run inserts about 600 x 600 =
    360,000 rows, out of which roughly two-thirds are later deleted.
    That's unfortunate as that is the detail we need to see what
    pg_autovacuum thinks is really going on. We had a similar sounding
    crash on FreeBSD due to some unitialized variables that were being
    printed out by the debug code, however that was fixed a long time ago.
    Any chance you can look into this?
    I can try. The server belongs to another department, and they are under
    the gun to get back on track with their testing. Also, they compiled
    without debug symbols, so I need to get permission to recompile.
    Yes I would be very curious to see the results with the vacuum delay
    patch installed (is that patch applied to HEAD?)
    Any idea where I can get my hands on the latest version. I found the
    original post from Tom, but I thought there was a later version with
    both number of pages and time to sleep as knobs.

    Thanks,

    Joe
  • Tom Lane at Mar 16, 2004 at 5:25 am

    Joe Conway writes:
    Any idea where I can get my hands on the latest version. I found the
    original post from Tom, but I thought there was a later version with
    both number of pages and time to sleep as knobs.
    That was as far as I got. I think Jan posted a more complex version
    that would still be reasonable to apply to 7.4.

    regards, tom lane
  • Joe Conway at Mar 16, 2004 at 5:39 am

    Tom Lane wrote:
    Joe Conway <mail@joeconway.com> writes:
    Any idea where I can get my hands on the latest version. I found the
    original post from Tom, but I thought there was a later version with
    both number of pages and time to sleep as knobs.
    That was as far as I got. I think Jan posted a more complex version
    that would still be reasonable to apply to 7.4.
    I thought that too, but was having trouble finding it. I'll look again.

    Thanks,

    Joe
  • Joe Conway at Mar 17, 2004 at 4:50 am

    Tom Lane wrote:
    Joe Conway <mail@joeconway.com> writes:
    Any idea where I can get my hands on the latest version. I found the
    original post from Tom, but I thought there was a later version with
    both number of pages and time to sleep as knobs.
    That was as far as I got. I think Jan posted a more complex version
    that would still be reasonable to apply to 7.4.
    I have tested Tom's original patch now. The good news -- it works great
    in terms of reducing the load imposed by vacuum -- almost to the level
    of being unnoticeable. The bad news -- in a simulation test which loads
    an hour's worth of data, even with delay set to 1 ms, vacuum of the
    large table exceeds two hours (vs 12-14 minutes with delay = 0). Since
    that hourly load is expected 7 x 24, this obviously isn't going to work.

    The problem with Jan's more complex version of the patch (at least the
    one I found - perhaps not the right one) is it includes a bunch of other
    experimental stuff that I'd not want to mess with at the moment. Would
    changing the input units (for the original patch) from milli-secs to
    micro-secs be a bad idea? If so, I guess I'll get to extracting what I
    need from Jan's patch.

    Thanks,

    Joe
  • Matthew T. O'Connor at Mar 17, 2004 at 5:14 am

    On Tue, 2004-03-16 at 23:49, Joe Conway wrote:
    I have tested Tom's original patch now. The good news -- it works great
    in terms of reducing the load imposed by vacuum -- almost to the level
    of being unnoticeable. The bad news -- in a simulation test which loads
    an hour's worth of data, even with delay set to 1 ms, vacuum of the
    large table exceeds two hours (vs 12-14 minutes with delay = 0). Since
    that hourly load is expected 7 x 24, this obviously isn't going to work.
    If memory serves, the problem is that you actually sleep 10ms even when
    you set it to 1. One of the thing changed in Jan's later patch was the
    ability to specify how many pages to work on before sleeping, rather
    than how long to sleep inbetween every 1 page. You might be able to do
    a quick hack and have it do 10 pages or so before sleeping.

    Matthew
  • Joe Conway at Mar 17, 2004 at 5:20 am

    Matthew T. O'Connor wrote:
    If memory serves, the problem is that you actually sleep 10ms even when
    you set it to 1. One of the thing changed in Jan's later patch was the
    ability to specify how many pages to work on before sleeping, rather
    than how long to sleep inbetween every 1 page. You might be able to do
    a quick hack and have it do 10 pages or so before sleeping.
    I thought I remembered something about that.

    It turned out to be less difficult than I first thought to extract the
    vacuum delay stuff from Jan's performance patch. I haven't yet tried it
    out, but it's attached in case you are interested. I'll report back once
    I have some results.

    Joe
  • Tom Lane at Mar 17, 2004 at 5:17 am

    Joe Conway writes:
    I have tested Tom's original patch now. The good news -- it works great
    in terms of reducing the load imposed by vacuum -- almost to the level
    of being unnoticeable. The bad news -- in a simulation test which loads
    an hour's worth of data, even with delay set to 1 ms, vacuum of the
    large table exceeds two hours (vs 12-14 minutes with delay = 0). Since
    that hourly load is expected 7 x 24, this obviously isn't going to work.
    Turns the dial down a bit too far then ...
    The problem with Jan's more complex version of the patch (at least the
    one I found - perhaps not the right one) is it includes a bunch of other
    experimental stuff that I'd not want to mess with at the moment. Would
    changing the input units (for the original patch) from milli-secs to
    micro-secs be a bad idea?
    Unlikely to be helpful; on most kernels the minimum sleep delay is 1 or
    10 msec, so asking for a few microsec is the same as asking for some
    millisec. I think what you need is a knob of the form "sleep N msec
    after each M pages of I/O". I'm almost certain that Jan posted such a
    patch somewhere between my original and the version you refer to above.

    regards, tom lane
  • Arthur Ward at Mar 17, 2004 at 3:40 pm

    The problem with Jan's more complex version of the patch (at least the
    one I found - perhaps not the right one) is it includes a bunch of other
    experimental stuff that I'd not want to mess with at the moment. Would
    changing the input units (for the original patch) from milli-secs to
    micro-secs be a bad idea? If so, I guess I'll get to extracting what I
    need from Jan's patch.
    Jan's vacuum-delay-only patch that nobody can find is here:

    http://archives.postgresql.org/pgsql-hackers/2003-11/msg00518.php

    I've been using it in testing & production without any problems.
  • Joe Conway at Mar 17, 2004 at 7:40 pm

    Arthur Ward wrote:
    Jan's vacuum-delay-only patch that nobody can find is here:

    http://archives.postgresql.org/pgsql-hackers/2003-11/msg00518.php

    I've been using it in testing & production without any problems.
    Great to know -- many thanks.

    I've hacked my own vacuum-delay-only patch form Jan's all_performance
    patch. It looks like the only difference is that it uses usleep()
    instead of select(). So far the tests look promising.

    Thanks,

    Joe
  • Matthew T. O'Connor at Mar 16, 2004 at 5:32 am

    Joe Conway wrote:

    Yeah, I'm sure. Snippets from the log:

    [...lots-o-tables...]
    [2004-03-14 12:44:48 PM] added table: specdb."public"."parametric_states"
    [2004-03-14 12:49:48 PM] Performing: VACUUM ANALYZE
    "public"."transaction_data"
    [2004-03-14 01:29:59 PM] Performing: VACUUM ANALYZE
    "public"."transaction_data"
    [2004-03-14 02:08:26 PM] Performing: ANALYZE "public"."out_of_spec"
    [2004-03-14 02:08:26 PM] Performing: VACUUM ANALYZE
    "public"."transaction_data"
    [2004-03-14 02:22:44 PM] Performing: VACUUM ANALYZE "public"."spc_graphs"
    [2004-03-14 03:06:45 PM] Performing: VACUUM ANALYZE
    "public"."out_of_spec"
    [2004-03-14 03:06:45 PM] Performing: VACUUM ANALYZE
    "public"."transaction_data"
    [2004-03-14 03:19:51 PM] Performing: VACUUM ANALYZE "public"."spc_graphs"
    [2004-03-14 03:21:09 PM] Performing: ANALYZE "public"."parametric_states"
    [2004-03-14 03:54:57 PM] Performing: ANALYZE "public"."out_of_spec"
    [2004-03-14 03:54:57 PM] Performing: VACUUM ANALYZE
    "public"."transaction_data"
    [2004-03-14 04:07:52 PM] Performing: VACUUM ANALYZE "public"."spc_graphs"
    [2004-03-14 04:09:33 PM] Performing: ANALYZE
    "public"."equip_status_history"
    [2004-03-14 04:09:33 PM] Performing: VACUUM ANALYZE
    "public"."parametric_states"
    [2004-03-14 04:43:46 PM] Performing: VACUUM ANALYZE
    "public"."out_of_spec"
    [2004-03-14 04:43:46 PM] Performing: VACUUM ANALYZE
    "public"."transaction_data"
    [2004-03-14 04:56:35 PM] Performing: VACUUM ANALYZE "public"."spc_graphs"
    [2004-03-14 04:58:32 PM] Performing: ANALYZE "public"."parametric_states"
    [2004-03-14 05:28:58 PM] added database: specdb

    Yeah, you're right.....
    This is the entire period of the first test, with default autovac
    settings. The table "public"."transaction_data" is the one with 28
    million active rows. The entire test run inserts about 600 x 600 =
    360,000 rows, out of which roughly two-thirds are later deleted.

    Strange... I wonder if this is some integer overflow problem. There was
    one reported recently and fixed as of CVS head yesterday, you might try
    that, however without the -d2 output I'm only guessing at why
    pg_autovacuum is vacuuming so much / so often.
    I can try. The server belongs to another department, and they are
    under the gun to get back on track with their testing. Also, they
    compiled without debug symbols, so I need to get permission to recompile.

    Good luck, I hope you can get permission. Would e nice to fix this
    little crash.
    Yes I would be very curious to see the results with the vacuum delay
    patch installed (is that patch applied to HEAD?)

    Any idea where I can get my hands on the latest version. I found the
    original post from Tom, but I thought there was a later version with
    both number of pages and time to sleep as knobs.

    I think Jan posted one a while back.... [searches archives...] But I
    must say I'm at a loss to find it in the archives. Anyone know where a
    good delay patch is for 7.4? If we can't find one, any chance you can
    do some testing with CVS HEAD just to see if that works any better. I
    know there has been a fair amount of work done to improve this situation
    (not just vacuum delay, but ARC etc...)
    .
  • Joe Conway at Mar 16, 2004 at 5:49 am

    Matthew T. O'Connor wrote:
    Strange... I wonder if this is some integer overflow problem. There was
    one reported recently and fixed as of CVS head yesterday, you might try
    that, however without the -d2 output I'm only guessing at why
    pg_autovacuum is vacuuming so much / so often.
    I'll see what I can do tomorrow to track it down.

    I have already recommended to the program manager that they switch to
    7.4.2 plus the autovacuum patch. Not sure they will be willing to make
    any changes at this stage in their release process though.
    If we can't find one, any chance you can
    do some testing with CVS HEAD just to see if that works any better. I
    know there has been a fair amount of work done to improve this situation
    (not just vacuum delay, but ARC etc...)
    I might do that, but not likely on Solaris. I can probably get a copy of
    the current database and testing scripts, and give it a try on one of my
    own machines (all Linux, either RHAS3, RH9, or Fedora).

    Joe
  • Tom Lane at Mar 13, 2004 at 3:39 pm

    Joe Conway writes:
    ... Immediately
    after a postmaster restart, the first insert or two take about 1.5
    minutes (undoubtedly this could be improved, but it isn't the main
    issue). However by the second or third insert, the time increases to 7 -
    9 minutes. Restarting the postmaster causes the cycle to repeat, i.e.
    the first one or two inserts are back to the 1.5 minute range.
    Just to be clear on this: you have to restart the postmaster to bring
    the time back down? Simply starting a fresh backend session doesn't do
    it?

    Are you using particularly large values for shared_buffers or any of the
    other resource parameters?

    regards, tom lane
  • Joe Conway at Mar 13, 2004 at 4:07 pm

    Tom Lane wrote:
    Just to be clear on this: you have to restart the postmaster to bring
    the time back down? Simply starting a fresh backend session doesn't do
    it?
    Yes, a full postmaster restart is needed. It is a command line script
    that does the insert, so each one is a new backend.
    Are you using particularly large values for shared_buffers or any of the
    other resource parameters?
    I'll have to look at this again (I have to vpn in to the company lan
    which kills all my current connections) -- the server and application
    belong to another department at my employer.

    IIRC, shared buffers was reasonable, maybe 128MB. One thing that is
    worthy of note is that they are using pg_autovacuum and a very low
    vacuum_mem setting (1024). But I also believe that max_fsm_relations and
    max_fsm_pages have been bumped up from default (something like 10000 &
    200000).

    I'll post the non-default postgresql.conf settings shortly. The extended
    tests discussed in the nearby post will take a bit more time to get.

    Thanks,

    Joe
  • Josh Berkus at Mar 13, 2004 at 4:52 pm
    Joe,
    IIRC, shared buffers was reasonable, maybe 128MB. One thing that is
    worthy of note is that they are using pg_autovacuum and a very low
    vacuum_mem setting (1024). But I also believe that max_fsm_relations and
    max_fsm_pages have been bumped up from default (something like 10000 &
    200000).
    pg_autovacuum may be your problem. Imagine this:

    1) The chain of updates and inserts called by the procedures makes enough
    changes, on its own, to trigger pg_autovacuum.
    2) Because they have a big database, and a low vacuum_mem, a vacuum of the
    largest table takes noticable time, like several minutes.
    3) This means that the vacuum is still running during the second and
    succeeding events ....

    Something to check by watching the process list.

    FWIW, I don't use pg_autovacuum for databases which have frequent large batch
    updates; I find it results in uneven performance.

    Feel free to phone me if you're still stuck!

    --
    -Josh Berkus
    Aglio Database Solutions
    San Francisco
  • Matthew T. O'Connor at Mar 13, 2004 at 5:31 pm

    Joe Conway wrote:

    Tom Lane wrote:
    Just to be clear on this: you have to restart the postmaster to bring
    the time back down? Simply starting a fresh backend session doesn't do
    it?

    IIRC, shared buffers was reasonable, maybe 128MB. One thing that is
    worthy of note is that they are using pg_autovacuum and a very low
    vacuum_mem setting (1024). But I also believe that max_fsm_relations
    and max_fsm_pages have been bumped up from default (something like
    10000 & 200000).
    pg_autovacuum could be a problem if it's vacuuming too often. Have you
    looked to see if a vacuum or analyze is running while the server is
    slow? If so, have you played with the pg_autovacuum default vacuum and
    analyze thresholds? If it appears that it is related to pg_autovacuum
    please send me the command options used to run it and a logfile of it's
    output running at at a debug level of -d2


    Matthew
  • Andrew Sullivan at Mar 17, 2004 at 6:11 pm
    Sorry I haven't had a chance to reply to this sooner.
    On Fri, Mar 12, 2004 at 05:38:37PM -0800, Joe Conway wrote:
    The problem is this: the application runs an insert, that fires off a
    trigger, that cascades into a fairly complex series of functions, that
    do a bunch of calculations, inserts, updates, and deletes. Immediately
    after a postmaster restart, the first insert or two take about 1.5
    minutes (undoubtedly this could be improved, but it isn't the main
    issue). However by the second or third insert, the time increases to 7 -
    9 minutes. Restarting the postmaster causes the cycle to repeat, i.e.
    the first one or two inserts are back to the 1.5 minute range.
    The vacuum delay stuff that you're working on may help, but I can't
    really believe it's your salvation if this is happening after only a
    few minutes. No matter how much you're doing inside those functions,
    you surely can't be causing so many dead tuples that a vacuum is
    necessary that soon. Did you try not vacuuming for a little while to
    see if it helps?

    I didn't see it anywhere in this thread, but are you quite sure that
    you're not swapping? Note that vmstat on multiprocessor Solaris
    machines is not notoriously useful. You may want to have a look at
    what the example stuff in the SE Toolkit tells you, or what you get
    from sar. I believe you have to use a special kernel setting on
    Solaris to mark shared memory as being ineligible for swap.

    A

    --
    Andrew Sullivan | ajs@crankycanuck.ca
    This work was visionary and imaginative, and goes to show that visionary
    and imaginative work need not end up well.
    --Dennis Ritchie
  • Joe Conway at Mar 17, 2004 at 7:21 pm

    Andrew Sullivan wrote:
    Sorry I haven't had a chance to reply to this sooner.
    The vacuum delay stuff that you're working on may help, but I can't
    really believe it's your salvation if this is happening after only a
    few minutes. No matter how much you're doing inside those functions,
    you surely can't be causing so many dead tuples that a vacuum is
    necessary that soon. Did you try not vacuuming for a little while to
    see if it helps?
    I discussed it later in the thread, but we're adding about 400K rows per
    hour and deleting most of them after processing (note this is a
    commercial app, written and maintained by another department -- I can
    recommend changes, but this late into their release cycle they are very
    reluctant to change the app). This is 7 x 24 data collection from
    equipment, so there is no "slow" time to use as a maintenance window.

    But since the server in question is a test machine, I was able to shut
    everything off long enough to do a full vacuum -- it took about 12 hours.
    I didn't see it anywhere in this thread, but are you quite sure that
    you're not swapping? Note that vmstat on multiprocessor Solaris
    machines is not notoriously useful. You may want to have a look at
    what the example stuff in the SE Toolkit tells you, or what you get
    from sar. I believe you have to use a special kernel setting on
    Solaris to mark shared memory as being ineligible for swap.
    I'm (reasonably) sure there is no swapping. Minimum free memory (from
    top) is about 800 MB, and "vmstat -S" shows no swap-in or swap-out.

    I've been playing with a version of Jan's performance patch in the past
    few hours. Based on my simulations, it appears that a 1 ms delay every
    10 pages is just about right. The performance hit is negligible (based
    on overall test time, and cpu % used by the vacuum process). I still
    have a bit more analysis to do, but this is looking pretty good. More
    later...

    Joe
  • Matthew T. O'Connor at Mar 17, 2004 at 8:57 pm

    Andrew Sullivan wrote:
    The vacuum delay stuff that you're working on may help, but I can't
    really believe it's your salvation if this is happening after only a
    few minutes. No matter how much you're doing inside those functions,
    you surely can't be causing so many dead tuples that a vacuum is
    necessary that soon. Did you try not vacuuming for a little while to
    see if it helps?
    Some of this thread was taken off line so I'm not sure it was mentioned
    on the list, but a big part of the problem was that Joe was running into
    the same bug that Cott Lang ran into a while ago which caused the vacuum
    threshold to get set far too low resulting in vacuums far too often..
    This has been fixed and the patch has been committed unfortunately it
    didn't make it into 7.4.2, but it will be in 7.4.3 / 7.5.
    I didn't see it anywhere in this thread, but are you quite sure that
    you're not swapping? Note that vmstat on multiprocessor Solaris
    machines is not notoriously useful. You may want to have a look at
    what the example stuff in the SE Toolkit tells you, or what you get
    from sar. I believe you have to use a special kernel setting on
    Solaris to mark shared memory as being ineligible for swap.
    I haven't heard from Joe how things are going with the fixed
    pg_autovacuum but that in combination with the vacuum delay stuff should
    work well.

    Matthew

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-performance @
categoriespostgresql
postedMar 13, '04 at 1:42a
activeMar 17, '04 at 8:57p
posts24
users7
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase