Recently two systems here have suffered severely with excessive
temporary file creation during query execution. In one case it could
have been avoided by more stringent qa before application code release,
whereas the other is an ad-hoc system, and err...yes.

In both cases it would have been great to be able to constrain the
amount of temporary file space a query could use. In theory you can sort
of do this with the various ulimits, but it seems pretty impractical as
at that level all files look the same and you'd be just as likely to
unexpectedly crippled the entire db a few weeks later when a table grows...

I got to wonder how hard this would be to do in Postgres, and attached
is my (WIP) attempt. It provides a guc (max_temp_files_size) to limit
the size of all temp files for a backend and amends fd.c cancel
execution if the total size of temporary files exceeds this.

This is WIP, it does seem to work ok, but some areas/choices I'm not
entirely clear about are mentioned in the patch itself. Mainly:

- name of the guc... better suggestions welcome
- datatype for the guc - real would be good, but at the moment the nice
parse KB/MB/GB business only works for int

regards

Mark

Search Discussions

  • Robert Haas at Feb 18, 2011 at 1:34 pm

    On Thu, Feb 17, 2011 at 10:17 PM, Mark Kirkwood wrote:
    This is WIP, it does seem to work ok, but some areas/choices I'm not
    entirely clear about are mentioned in the patch itself. Mainly:

    - name of the guc... better suggestions welcome
    - datatype for the guc - real would be good, but at the moment the nice
    parse KB/MB/GB business only works for int
    Please add this to the next CommitFest:

    https://commitfest.postgresql.org/action/commitfest_view/open

    With respect to the datatype of the GUC, int seems clearly correct.
    Why would you want to use a float?

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Mark Kirkwood at Feb 18, 2011 at 9:19 pm

    On 19/02/11 02:34, Robert Haas wrote:
    Please add this to the next CommitFest:

    https://commitfest.postgresql.org/action/commitfest_view/open

    With respect to the datatype of the GUC, int seems clearly correct.
    Why would you want to use a float?
    Added. With respect to the datatype, using int with KB units means the
    largest temp size is approx 2047GB - I know that seems like a lot now...
    but maybe someone out there wants (say) their temp files limited to
    4096GB :-)

    Cheers

    Mark
  • Tom Lane at Feb 18, 2011 at 9:41 pm

    Mark Kirkwood writes:
    Added. With respect to the datatype, using int with KB units means the
    largest temp size is approx 2047GB - I know that seems like a lot now...
    but maybe someone out there wants (say) their temp files limited to
    4096GB :-)
    [ shrug... ] Sorry, I can't imagine a use case for this parameter where
    the value isn't a *lot* less than that. Maybe if it were global, but
    not if it's per-backend.

    regards, tom lane
  • Josh Berkus at Feb 18, 2011 at 7:41 pm
    Mark,
    I got to wonder how hard this would be to do in Postgres, and attached
    is my (WIP) attempt. It provides a guc (max_temp_files_size) to limit
    the size of all temp files for a backend and amends fd.c cancel
    execution if the total size of temporary files exceeds this.
    First, are we just talking about pgsql_tmp here, or the pg_temp
    tablespace? That is, just sort/hash files, or temporary tables as well?

    Second, the main issue with these sorts of macro-counters has generally
    been their locking effect on concurrent activity. Have you been able to
    run any tests which try to run lots of small externally-sorted queries
    at once on a multi-core machine, and checked the effect on throughput?

    --
    -- Josh Berkus
    PostgreSQL Experts Inc.
    http://www.pgexperts.com
  • Robert Haas at Feb 18, 2011 at 7:44 pm

    On Fri, Feb 18, 2011 at 2:41 PM, Josh Berkus wrote:
    Second, the main issue with these sorts of macro-counters has generally
    been their locking effect on concurrent activity.  Have you been able to
    run any tests which try to run lots of small externally-sorted queries
    at once on a multi-core machine, and checked the effect on throughput?
    Since it's apparently a per-backend limit, that doesn't seem relevant.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Josh Berkus at Feb 18, 2011 at 7:48 pm

    On 2/18/11 11:44 AM, Robert Haas wrote:
    On Fri, Feb 18, 2011 at 2:41 PM, Josh Berkus wrote:
    Second, the main issue with these sorts of macro-counters has generally
    been their locking effect on concurrent activity. Have you been able to
    run any tests which try to run lots of small externally-sorted queries
    at once on a multi-core machine, and checked the effect on throughput?
    Since it's apparently a per-backend limit, that doesn't seem relevant.
    Oh! I missed that.

    What good would a per-backend limit do, though?

    And what happens with queries which exceed the limit? Error message? Wait?


    --
    -- Josh Berkus
    PostgreSQL Experts Inc.
    http://www.pgexperts.com
  • Robert Haas at Feb 18, 2011 at 8:37 pm

    On Fri, Feb 18, 2011 at 2:48 PM, Josh Berkus wrote:
    On 2/18/11 11:44 AM, Robert Haas wrote:
    On Fri, Feb 18, 2011 at 2:41 PM, Josh Berkus wrote:
    Second, the main issue with these sorts of macro-counters has generally
    been their locking effect on concurrent activity.  Have you been able to
    run any tests which try to run lots of small externally-sorted queries
    at once on a multi-core machine, and checked the effect on throughput?
    Since it's apparently a per-backend limit, that doesn't seem relevant.
    Oh!  I missed that.

    What good would a per-backend limit do, though?

    And what happens with queries which exceed the limit?  Error message?  Wait?
    Well I have not RTFP, but I assume it'd throw an error. Waiting isn't
    going to accomplish anything.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Mark Kirkwood at Feb 18, 2011 at 9:34 pm

    On 19/02/11 08:48, Josh Berkus wrote:
    On 2/18/11 11:44 AM, Robert Haas wrote:
    On Fri, Feb 18, 2011 at 2:41 PM, Josh Berkuswrote:
    Second, the main issue with these sorts of macro-counters has generally
    been their locking effect on concurrent activity. Have you been able to
    run any tests which try to run lots of small externally-sorted queries
    at once on a multi-core machine, and checked the effect on throughput?
    Since it's apparently a per-backend limit, that doesn't seem relevant.
    Oh! I missed that.

    What good would a per-backend limit do, though?

    And what happens with queries which exceed the limit? Error message? Wait?
    By "temp files" I mean those in pgsql_tmp. LOL - A backend limit will
    have the same sort of usefulness as work_mem does - i.e stop a query
    eating all your filesystem space or bringing a server to its knees with
    io load. We have had this happen twice - I know of other folks who have too.

    Obviously you need to do the same sort of arithmetic as you do with
    work_mem to decide on a reasonable limit to cope with multiple users
    creating temp files. Conservative dbas might want to set it to (free
    disk)/max_connections etc. Obviously for ad-hoc systems it is a bit more
    challenging - but having a per-backend limit is way better than having
    what we have now, which is ... errr... nothing.

    As an example I'd find it useful to avoid badly written queries causing
    too much io load on the db backend of (say) a web system (i.e such a
    system should not *have* queries that want to use that much resource).

    To answer the other question, what happens when the limit is exceeded is
    modeled on statement timeout, i.e query is canceled and a message says
    why (exceeded temp files size).

    Cheers

    Mark
  • Josh Berkus at Feb 18, 2011 at 9:38 pm

    Obviously you need to do the same sort of arithmetic as you do with
    work_mem to decide on a reasonable limit to cope with multiple users
    creating temp files. Conservative dbas might want to set it to (free
    disk)/max_connections etc. Obviously for ad-hoc systems it is a bit more
    challenging - but having a per-backend limit is way better than having
    what we have now, which is ... errr... nothing. Agreed.
    To answer the other question, what happens when the limit is exceeded is
    modeled on statement timeout, i.e query is canceled and a message says
    why (exceeded temp files size).
    When does this happen? When you try to allocate the file, or when it
    does the original tape sort estimate?

    The disadvantage of the former is that the user waited for minutes in
    order to have their query cancelled. The disadvantage of the latter is
    that the estimate isn't remotely accurate.

    --
    -- Josh Berkus
    PostgreSQL Experts Inc.
    http://www.pgexperts.com
  • Mark Kirkwood at Feb 18, 2011 at 9:50 pm

    On 19/02/11 10:38, Josh Berkus wrote:
    To answer the other question, what happens when the limit is exceeded is
    modeled on statement timeout, i.e query is canceled and a message says
    why (exceeded temp files size).
    When does this happen? When you try to allocate the file, or when it
    does the original tape sort estimate?

    The disadvantage of the former is that the user waited for minutes in
    order to have their query cancelled. The disadvantage of the latter is
    that the estimate isn't remotely accurate.
    Neither - it checks each write (I think this is pretty cheap - adds two
    int and double + operations and a /, > operation to FileWrite). If the
    check shows you've written more than the limit, you get canceled. So you
    can exceed the limit by 1 buffer size.

    Yeah, the disadvantage is that (like statement timeout) it is a 'bottom
    of the cliff' type of protection. The advantage is there are no false
    positives...

    Cheers

    Mark
  • Josh Berkus at Feb 18, 2011 at 10:30 pm

    Yeah, the disadvantage is that (like statement timeout) it is a 'bottom
    of the cliff' type of protection. The advantage is there are no false
    positives...
    Yeah, just trying to get a handle on the proposed feature. I have no
    objections; it seems like a harmless limit for most people, and useful
    to a few.

    --
    -- Josh Berkus
    PostgreSQL Experts Inc.
    http://www.pgexperts.com
  • Mark Kirkwood at Feb 18, 2011 at 10:41 pm

    On 19/02/11 11:30, Josh Berkus wrote:
    Yeah, the disadvantage is that (like statement timeout) it is a 'bottom
    of the cliff' type of protection. The advantage is there are no false
    positives...
    Yeah, just trying to get a handle on the proposed feature. I have no
    objections; it seems like a harmless limit for most people, and useful
    to a few.
    No worries and sorry, I should have used the "per backend" phrase in the
    title to help clarify what was intended.

    Cheers

    Mark
  • Mark Kirkwood at Mar 9, 2011 at 11:17 pm
    New version:

    - adds documentation
    - adds category RESOURCES_DISK

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedFeb 18, '11 at 3:17a
activeMar 9, '11 at 11:17p
posts14
users4
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase