Hello,

Please let me ask you some questions about RelCache/SysCache/CatCache
design. I know I should post this to pgsql-general, but I decided to post
here because the content includes design questions.

<<Background>>
My customer is facing a "out of memory" problem during a batch job. I'd like
to know the cause and solutions. PostgreSQL version is 8.2.7 (32-bit on
Linux).

The batch job consists of two steps in a single psql session:

1. call some PL/pgSQL function (say "somefunc" here)
2. VACUUM tables (at this time, maintenance_work_mem=256MB)

The step 2 emitted the following messages in syslog.

ERROR: out of memory
DETAIL: Failed on request of size 268435452.
STATEMENT: VACUUM some_table_name

somefunc copies rows from a single table to 100,000 tables (table_1 -
table_100000) as follows:

[somefunc]
FOR id in 1 .. 100000 LOOP
check if the table "table_${ID}" exists by searching pg_class
if the table exists
INSERT INTO table_${id} SELECT * FROM some_table
WHERE pk = id;
else /* the table does not exist */
CREATE TABLE table_${id} AS SELECT * FROM some_table
WHERE pk = id;
END LOOP;

Before starting somefunc, the virtual memory of the backend postgres is
1.6GB, as reported by top command as "VIRT" column. When somefunc completes,
it becomes 2.6GB. So, VACUUM cannot allocate 256MB because the virtual
memory space is full.

This is all the information I have now. I requested the customer to collect
PostgreSQL server log so that memory context statistics can be obtained when
"out of memory" occurs. Plus, I asked for the result of "SHOW ALL" and the
minimal procedure to reproduce the problem. However, I'd like to ask your
opinions rather than waiting for the problem to happen again.


<<Question>>
I'm guessing that CacheMemoryContext might be using much memory, because
somefunc accesses as many as 100,000 tables. But I don't understand
RelCache/SysCache implementation yet.

Q1: When are the RelCache/SysCache entries removed from CacheMemoryContext?
Are they removed only when the corresponding relations are deleted? If so,
"many tables and indexes" is not friendly for the current PostgreSQL?

Q2: somefunc increased 1GB of virtual memory after accessing 100,000 tables.
This means that one table uses 10KB of local memory.
Is it common that this much memory is used for RelCache/SysCache or other
control information?
Does the number of attributes in a table affect local memory usage much?

Q3: I think one solution is to run VACUUM in a separate psql session.
Are there any other solutions you can think of?

Q4: The customer says one strange thing. If the 100,000 tables exist before
somefunc starts (i.e., somefunc just copy records), the virtual memory of
postgres does not increase.
Is there anything to reason about his comment?

Regards
MauMau

Search Discussions

  • Merlin Moncure at Sep 28, 2011 at 8:29 pm

    2011/9/28 MauMau <maumau307@gmail.com>:
    Hello,

    Please let me ask you some questions about RelCache/SysCache/CatCache
    design. I know I should post this to pgsql-general, but I decided to post
    here because the content includes design questions.

    <<Background>>
    My customer is facing a "out of memory" problem during a batch job. I'd like
    to know the cause and solutions. PostgreSQL version is 8.2.7 (32-bit on
    Linux).

    The batch job consists of two steps in a single psql session:

    1. call some PL/pgSQL function (say "somefunc" here)
    2. VACUUM tables (at this time, maintenance_work_mem=256MB)

    The step 2 emitted the following messages in syslog.

    ERROR:  out of memory
    DETAIL:  Failed on request of size 268435452.
    STATEMENT:  VACUUM some_table_name

    somefunc copies rows from a single table to 100,000 tables (table_1 -
    table_100000) as follows:

    [somefunc]
    FOR id in 1 .. 100000 LOOP
    check if the table "table_${ID}" exists by searching pg_class
    if the table exists
    INSERT INTO table_${id} SELECT * FROM some_table
    WHERE pk = id;
    else /* the table does not exist */
    CREATE TABLE table_${id} AS SELECT * FROM some_table
    WHERE pk = id;
    END LOOP;

    Before starting somefunc, the virtual memory of the backend postgres is
    1.6GB, as reported by top command as "VIRT" column. When somefunc completes,
    it becomes 2.6GB. So, VACUUM cannot allocate 256MB because the virtual
    memory space is full.

    This is all the information I have now. I requested the customer to collect
    PostgreSQL server log so that memory context statistics can be obtained when
    "out of memory" occurs. Plus, I asked for the result of "SHOW ALL" and the
    minimal procedure to reproduce the problem. However, I'd like to ask your
    opinions rather than waiting for the problem to happen again.


    <<Question>>
    I'm guessing that CacheMemoryContext might be using much memory, because
    somefunc accesses as many as 100,000 tables. But I don't understand
    RelCache/SysCache implementation yet.

    Q1: When are the RelCache/SysCache entries removed from CacheMemoryContext?
    Are they removed only when the corresponding relations are deleted? If so,
    "many tables and indexes" is not friendly for the current PostgreSQL?

    Q2: somefunc increased 1GB of virtual memory after accessing 100,000 tables.
    This means that one table uses 10KB of local memory.
    Is it common that this much memory is used for RelCache/SysCache or other
    control information?
    Does the number of attributes in a table affect local memory usage much?

    Q3: I think one solution is to run VACUUM in a separate psql session.
    Are there any other solutions you can think of?

    Q4: The customer says one strange thing. If the 100,000 tables exist before
    somefunc starts (i.e., somefunc just copy records), the virtual memory of
    postgres does not increase.
    Is there anything to reason about his comment?
    can we see all of your memory settings plus physical memory? the
    solution is probably going to be reducing shared buffers an/or adding
    physical memory.

    merlin
  • MauMau at Sep 29, 2011 at 12:22 pm
    From: "Merlin Moncure" <mmoncure@gmail.com>
    can we see all of your memory settings plus physical memory? the
    solution is probably going to be reducing shared buffers an/or adding
    physical memory.

    Thank you for your response.

    The amount of physical memory is 8GB, which is enough for the workload. I
    asked the customer for the output of "SHOW ALL", but I haven't received it
    yet. However, shared_buffers should be less than 1.6GB because, as I wrote
    in the previous mail, top command showed 1.6GB in "VIRT" column before
    executing somefunc() PL/pgSQL function.

    The direct cause of "out of memory" is that the virtual memory became full.
    32-bit Linux can allocate 3GB of user space in the virtual address space of
    each process. somefunc() used 1.0GB, which led to 2.6GB of virtual memory.
    After somefunc(), VACUUM tried to allocate 256MB of maintenance_work_mem.
    That allocation failed because the virtual address space was almost full.

    As you mentioned, decreasing shared_buffers will be one of the solutions.
    However, we want to know why somefunc() uses so much memory. Therefore, the
    following is the core question. Q2 and Q3 are supplementary ones. It is just
    my guess that RelCache/SysCache may be the cause.

    2011/9/28 MauMau <maumau307@gmail.com>:
    Q1: When are the RelCache/SysCache entries removed from
    CacheMemoryContext?
    Are they removed only when the corresponding relations are deleted? If so,
    "many tables and indexes" is not friendly for the current PostgreSQL?
    Regards
    MauMau
  • Alvaro Herrera at Sep 29, 2011 at 12:53 pm

    Excerpts from MauMau's message of jue sep 29 09:23:48 -0300 2011:

    The amount of physical memory is 8GB, which is enough for the workload. I
    asked the customer for the output of "SHOW ALL", but I haven't received it
    yet. However, shared_buffers should be less than 1.6GB because, as I wrote
    in the previous mail, top command showed 1.6GB in "VIRT" column before
    executing somefunc() PL/pgSQL function.
    You don't really know this; some operating systems (Linux in particular)
    does not show shared memory as in use by a process until it is accessed.
    It may very well have well over 1.6 GB of shared_buffers, yet not show
    that in VIRT.

    --
    Álvaro Herrera <alvherre@commandprompt.com>
    The PostgreSQL Company - Command Prompt, Inc.
    PostgreSQL Replication, Consulting, Custom Development, 24x7 support
  • MauMau at Sep 29, 2011 at 1:10 pm
    From: "Alvaro Herrera" <alvherre@commandprompt.com>
    You don't really know this; some operating systems (Linux in particular)
    does not show shared memory as in use by a process until it is accessed.
    It may very well have well over 1.6 GB of shared_buffers, yet not show
    that in VIRT.
    Oh, really? When I started psql just after I set shared_buffers to 2500MB
    and ran pg_ctl start, "ps -o vsz -p postgres_PID" showed about 2500MB+some.
    ps's vsz is also the amount of virtual memory. But I want to know the
    shared_buffers setting.

    Anyway, I'd appreciate if anyone could tell me about RelCache/SysCache. As
    far as I read the code, PostgreSQL seems to use memory for RelCache/SysCache
    without limit until the relations are dropped.

    Regards
    MauMau
  • Tom Lane at Sep 29, 2011 at 2:34 pm

    "MauMau" <maumau307@gmail.com> writes:
    Anyway, I'd appreciate if anyone could tell me about RelCache/SysCache. As
    far as I read the code, PostgreSQL seems to use memory for RelCache/SysCache
    without limit until the relations are dropped.
    That's correct. We used to have a limit on the size of catcache
    (if memory serves, it was something like 5000 entries). We got rid of
    it after observing that performance fell off a cliff as soon as you had
    a working set larger than the cache limit. Trust me, if we had a limit,
    you'd still be here complaining, the complaint would just take a
    different form ;-)

    I concur with Merlin's advice to rethink your schema. 100000 tables is
    far beyond what any sane design could require, and is costing you on
    many levels (I'm sure the OS and filesystem aren't that happy with it
    either).

    regards, tom lane
  • MauMau at Sep 29, 2011 at 3:02 pm
    From: "Tom Lane" <tgl@sss.pgh.pa.us>
    That's correct. We used to have a limit on the size of catcache
    (if memory serves, it was something like 5000 entries). We got rid of
    it after observing that performance fell off a cliff as soon as you had
    a working set larger than the cache limit. Trust me, if we had a limit,
    you'd still be here complaining, the complaint would just take a
    different form ;-)
    Yes, I can imagine. Now I'll believe that caching catalog entries in local
    memory without bound is one of PostgreSQL's elaborations for performance.
    64-bit computing makes that approach legit. Oracle avoids duplicate catalog
    entries by storing them in a shared memory, but that should necessate some
    kind of locking when accessing the shared catalog entries. PostgreSQL's
    approach, which does not require locking, is better for many-core
    environments.
    I concur with Merlin's advice to rethink your schema. 100000 tables is
    far beyond what any sane design could require, and is costing you on
    many levels (I'm sure the OS and filesystem aren't that happy with it
    either).
    I agree. I'll suggest that to the customer, too. Thank you very much.

    Regards
    MauMau
  • Merlin Moncure at Sep 29, 2011 at 1:39 pm

    On Thu, Sep 29, 2011 at 7:23 AM, MauMau wrote:
    From: "Merlin Moncure" <mmoncure@gmail.com>
    can we see all of your memory settings plus physical memory?  the
    solution is probably going to be reducing shared buffers an/or adding
    physical memory.

    Thank you for your response.

    The amount of physical memory is 8GB, which is enough for the workload. I
    asked the customer for the output of "SHOW ALL", but I haven't received it
    yet. However, shared_buffers should be less than 1.6GB because, as I wrote
    in the previous mail, top command showed 1.6GB in "VIRT" column before
    executing somefunc() PL/pgSQL function.

    The direct cause of "out of memory" is that the virtual memory became full.
    32-bit Linux can allocate 3GB of user space in the virtual address space of
    each process. somefunc() used 1.0GB, which led to 2.6GB of virtual memory.
    After somefunc(), VACUUM tried to allocate 256MB of maintenance_work_mem.
    That allocation failed because the virtual address space was almost full.

    As you mentioned, decreasing shared_buffers will be one of the solutions.
    However, we want to know why somefunc() uses so much memory. Therefore, the
    following is the core question. Q2 and Q3 are supplementary ones. It is just
    my guess that RelCache/SysCache may be the cause.
    Oh -- I missed earlier that this was 32 bit o/s. Well, I'd consider
    drastically reducing shared buffers, down to say 256-512mb range.
    Postgres function plans and various other structures, tables,
    attributes are indeed cached and can use up a considerable amount of
    memory in pathological cases -- this is largely depending on the
    number of tables/views, number of functions and number of connections.
    I briefly looked at the relcache etc a little while back on a related
    complaint and the takeaway is that the caching is heavy handed and
    fairly brute force but legit and a huge win for most cases. This stuff
    lives in the cache memory context and a couple of users (not that
    many) have bumped into high memory usage. Solutions tend to include:

    *) not rely on implementation that requires 100000 tables
    *) use connection pooler
    *) reset connections
    *) go to 64 bit o/s
    *) reduce shared_buffers for leaner memory profile (especially in 32 bit os)


    Like I said, this doesn't really come up this often but the 'real'
    solution in terms of postgrs is probably some kind of upper bound in
    the amount of cache memory used plus some intelligence in the cache
    implementation. This is tricky stuff though and so far no credible
    proposals have been made and the demand for the feature is not very
    high.

    merlin
  • Robert Haas at Sep 29, 2011 at 1:59 pm

    On Thu, Sep 29, 2011 at 9:39 AM, Merlin Moncure wrote:
    Like I said, this doesn't really come up this often but the 'real'
    solution in terms of postgrs is probably some kind of upper bound in
    the amount of cache memory used plus some intelligence in the cache
    implementation.  This is tricky stuff though and so far no credible
    proposals have been made and the demand for the feature is not very
    high.
    We (i.e. $EMPLOYER) have a customer who ran into this problem (i.e.
    relcache/syscache memory usage shooting through the roof) in testing,
    so I'm somewhat motivated to see if we can't come up with a fix. I am
    fairly sure that was on a 64-bit build, so the issue wasn't just that
    they didn't have enough address space. It seems that we used to have
    some kind of LRU algorithm to prevent excessive memory usage, but we
    rippped it out because it was too expensive (see commit
    8b9bc234ad43dfa788bde40ebf12e94f16556b7f). I don't have a brilliant
    idea at the moment, but I wonder if we could come up with something
    that's cheap enough to manage that it doesn't materially affect
    performance in normal cases, but just kicks in when things get really
    out of control.

    A trivial algorithm would be - if you're about to run out of memory,
    flush all the caches; or evict 10% of the entries at random. Of
    course, the problem with anything like this is that it's hard to know
    when you're about to run out of memory before you actually do, and any
    hard-coded limit you care to set will sometimes be wrong. So maybe
    that's not the right approach. At the same time, I don't think that
    simply hoping the user has enough memory is an adequate answer.

    One thing to consider is that in some cases a user may plan to do
    something like touch every table in the database exactly once and then
    exit. In that case, if we knew in advance what the user's intentions
    were, we'd want to use an MRU eviction algorithm rather than LRU.
    Again, we don't know that in advance. But in such a use case it's
    reasonable for the user to expect that the amount of backend-private
    memory used for caching will not grow without bound.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Merlin Moncure at Sep 29, 2011 at 2:24 pm

    On Thu, Sep 29, 2011 at 8:59 AM, Robert Haas wrote:
    On Thu, Sep 29, 2011 at 9:39 AM, Merlin Moncure wrote:
    Like I said, this doesn't really come up this often but the 'real'
    solution in terms of postgrs is probably some kind of upper bound in
    the amount of cache memory used plus some intelligence in the cache
    implementation.  This is tricky stuff though and so far no credible
    proposals have been made and the demand for the feature is not very
    high.
    We (i.e. $EMPLOYER) have a customer who ran into this problem (i.e.
    relcache/syscache memory usage shooting through the roof) in testing,
    so I'm somewhat motivated to see if we can't come up with a fix.  I am
    fairly sure that was on a 64-bit build, so the issue wasn't just that
    they didn't have enough address space.  It seems that we used to have
    some kind of LRU algorithm to prevent excessive memory usage, but we
    rippped it out because it was too expensive (see commit
    8b9bc234ad43dfa788bde40ebf12e94f16556b7f).  I don't have a brilliant
    idea at the moment, but I wonder if we could come up with something
    that's cheap enough to manage that it doesn't materially affect
    performance in normal cases, but just kicks in when things get really
    out of control.

    A trivial algorithm would be - if you're about to run out of memory,
    flush all the caches; or evict 10% of the entries at random.  Of
    course, the problem with anything like this is that it's hard to know
    when you're about to run out of memory before you actually do, and any
    hard-coded limit you care to set will sometimes be wrong.  So maybe
    that's not the right approach.  At the same time, I don't think that
    simply hoping the user has enough memory is an adequate answer.

    One thing to consider is that in some cases a user may plan to do
    something like touch every table in the database exactly once and then
    exit.  In that case, if we knew in advance what the user's intentions
    were, we'd want to use an MRU eviction algorithm rather than LRU.
    Again, we don't know that in advance.  But in such a use case it's
    reasonable for the user to expect that the amount of backend-private
    memory used for caching will not grow without bound.
    I think this (cache memory usage) is a reasonable setting for a GUC,
    Maybe if you keep it very simple, say only activate cache cleanup when
    the limit is exceeded, you have more freedom to dump cache using
    fancier methods like a calculated benefit. You'd probably have to
    expose another knob to guarantee maximum cache sweep runtime though.
    Perhaps even user visible cache management features (an extension of
    DISCARD?) could be exposed...

    Hm, what might make this complicated is that you'd probably want all
    the various caches to live under the same umbrella with a central
    authority making decisions about what stays and what goes.
    On Thu, Sep 29, 2011 at 9:22 AM, MauMau wrote:
    * reduce shared_buffers
    * run somefunc() and VACUUM in different psql sessions
    * process 100,000 tables in multiple psql sessions
    that's a start. don't be afraid to reset the connection after
    somefunc() and at appropriate times from the 'processors'.

    merlin
  • Tom Lane at Sep 29, 2011 at 2:45 pm

    Robert Haas writes:
    ... It seems that we used to have
    some kind of LRU algorithm to prevent excessive memory usage, but we
    rippped it out because it was too expensive (see commit
    8b9bc234ad43dfa788bde40ebf12e94f16556b7f).
    Not only was it too expensive, but performance fell off a cliff as soon
    as you had a catalog working set large enough to cause the code to
    actually do something, I'm not in favor of putting anything like that
    back in ---- people who have huge catalogs will just start complaining
    about something different, ie, why did their apps get so much slower.

    The short answer here is "if you want a database with 100000 tables,
    you'd better be running it on more than desktop-sized hardware".

    regards, tom lane
  • Robert Haas at Sep 29, 2011 at 3:23 pm

    On Thu, Sep 29, 2011 at 10:45 AM, Tom Lane wrote:
    Robert Haas <robertmhaas@gmail.com> writes:
    ... It seems that we used to have
    some kind of LRU algorithm to prevent excessive memory usage, but we
    rippped it out because it was too expensive (see commit
    8b9bc234ad43dfa788bde40ebf12e94f16556b7f).
    Not only was it too expensive, but performance fell off a cliff as soon
    as you had a catalog working set large enough to cause the code to
    actually do something,  ...
    Sure, a big working set is going to cause a performance problem if you
    start flushing cache entries that are being regularly used. But the
    point is just because you have, at some time, accessed 100,000 tables
    during a session does not mean that your working set is that large.
    The working set is the set of things that you are actually using
    regularly, not the things you've *ever* accessed.

    In addition to the problem of blowing out memory, there are a number
    of other things about the current code that don't seem well-suited to
    dealing with large numbers of tables. For example, catcache hash
    tables can't be resized, so for very large numbers of entries you can
    potentially have to walk a very long chain. And, you can exhaust the
    shared memory space for the primary lock table, leading to, for
    example, inability to back up the database using pg_dump (ouch!).

    I can't really explain why people seem to keep wanting to create
    hundreds of thousands or even millions of tables, but it's not like
    MauMau's customer is the first one to try to do this, and I'm sure
    they won't be the last. I don't want to de-optimize the more common
    (and sensible) cases too much, but "slow" still trumps "fails
    outright".

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Merlin Moncure at Sep 29, 2011 at 3:43 pm

    On Thu, Sep 29, 2011 at 10:22 AM, Robert Haas wrote:
    I can't really explain why people seem to keep wanting to create
    hundreds of thousands or even millions of tables, but it's not like
    MauMau's customer is the first one to try to do this, and I'm sure
    they won't be the last.  I don't want to de-optimize the more common
    (and sensible) cases too much, but "slow" still trumps "fails
    outright".
    Yeah -- maybe baby steps in the right direction would be track cache
    memory usage and add instrumentation so the user could get a readout
    on usage -- this would also help us diagnose memory issues in the
    field. Also, thinking about it more, a DISCARD based cache flush
    (DISCARD CACHES TO xyz) wrapping a monolithic LRU sweep could help
    users deal with these cases without having to figure out how to make
    an implementation that pleases everyone.

    merlin
  • MauMau at Sep 29, 2011 at 2:19 pm
    From: "Merlin Moncure" <mmoncure@gmail.com>
    --------------------------------------------------
    Oh -- I missed earlier that this was 32 bit o/s. Well, I'd consider
    drastically reducing shared buffers, down to say 256-512mb range.
    Postgres function plans and various other structures, tables,
    attributes are indeed cached and can use up a considerable amount of
    memory in pathological cases -- this is largely depending on the
    number of tables/views, number of functions and number of connections.
    I briefly looked at the relcache etc a little while back on a related
    complaint and the takeaway is that the caching is heavy handed and
    fairly brute force but legit and a huge win for most cases. This stuff
    lives in the cache memory context and a couple of users (not that
    many) have bumped into high memory usage. Solutions tend to include:

    *) not rely on implementation that requires 100000 tables
    *) use connection pooler
    *) reset connections
    *) go to 64 bit o/s
    *) reduce shared_buffers for leaner memory profile (especially in 32 bit os)


    Like I said, this doesn't really come up this often but the 'real'
    solution in terms of postgrs is probably some kind of upper bound in
    the amount of cache memory used plus some intelligence in the cache
    implementation. This is tricky stuff though and so far no credible
    proposals have been made and the demand for the feature is not very
    high.
    --------------------------------------------------


    Thank you very much. I'm relieved I could understand the reason. I will
    report it to the customer and ask him to consider taking the following
    measures:

    * reduce shared_buffers
    * run somefunc() and VACUUM in different psql sessions
    * process 100,000 tables in multiple psql sessions

    Regards
    MauMau

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedSep 28, '11 at 11:48a
activeSep 29, '11 at 3:43p
posts14
users5
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase