FAQ
We had a problem with our PGA growing larger than physical memory, dragging the system down until we were forced to restart the database. We recently upgraded from 9i to 11gR2 on Solaris with 16Gb physical memory hosting an OLTP application. 4Gb is used for SGA and 400Mb for PGA_aggregate_target. AMM and ASMM have not been enabled. Workarea_size_policy is set to AUTO. This is a simple database. No RAC, no shared servers, no parallel processing.

One of our analysts launched what appeared to be a simple ad hoc query: select * from deniedhist where icn in ('1', '2', '3', ... , 'x').
Unfortunately, the analyst had over 400 ICNs listed. DENIEDHIST is a UNION ALL view representing a faux-partitioned array of 129 tables, each with its own unique index on ICN. The execution plan (my comments included) shows that it is searching each underlying table 'x' number of times, using the unique index each time. This is the plan from a test version of the SQL with only 5 ICNs. The rest of the plan shows the iteration through the other 128 underlying tables (other views of the plan show the unique table and index names).
1422 SELECT STATEMENT REMOTE CHOOSE
1421 VIEW POS.DENIEDHIST
1420 UNION-ALL
11 CONCATENATION -- This is table 1 of 129 in the view
2 TABLE ACCESS BY INDEX ROWID POS.DENIED_HISTORY
1 INDEX UNIQUE SCAN UNIQUE POS.DENIED_HISTORY_ICN
4 TABLE ACCESS BY INDEX ROWID POS.DENIED_HISTORY
3 INDEX UNIQUE SCAN UNIQUE POS.DENIED_HISTORY_ICN
6 TABLE ACCESS BY INDEX ROWID POS.DENIED_HISTORY
5 INDEX UNIQUE SCAN UNIQUE POS.DENIED_HISTORY_ICN
8 TABLE ACCESS BY INDEX ROWID POS.DENIED_HISTORY
7 INDEX UNIQUE SCAN UNIQUE POS.DENIED_HISTORY_ICN
10 TABLE ACCESS BY INDEX ROWID POS.DENIED_HISTORY
9 INDEX UNIQUE SCAN UNIQUE POS.DENIED_HISTORY_ICN
22 CONCATENATION -- This is table 2 of 129 in the view

...

I estimated that the original version probably had over 120,000 steps in its execution plan. It ran for nearly an hour until memory reached a critical point, almost 20 Gb and paging like mad, that required us to abort the database and restart. I can run the full original query on a standby machine which is physically identical but is still running 9i. It finishes in 5 minutes and the PGA only grows to 500 Mb. The execution plans, at least for the small test version of the query, are the same on both machines.

I understand the problems with large 'IN' lists. There was a conversation on that topic here at the end of October . Educating the analyst and redesigning the query are secondary concerns. I also realize, after reading a bunch of Tom Kyte posts, that pga_aggregate_target will not constrain the growth of the PGA.

My real question is why the 11g memory management, without AMM or ASMM, would allow the PGA to grow so large. In 15 years of operations, there must have been similar bad queries against the database. This happened a day after the 11g upgrade. Any suggestions?

Bob Laverty
Molina Medicaid Solutions

IMPORTANT NOTICE TO RECIPIENT: This email is meant only for the intended recipient of the transmission. In addition, this email may be a communication that is privileged by law. If you received this email in error, any review, use, disclosure, distribution, or copying of this email is strictly prohibited. Please notify us immediately of the error by return email, and please delete this email from your system. Thank you for your cooperation.

Search Discussions

  • Peter Hitchman at Jan 31, 2012 at 9:38 am
    Hi,
    There is a not on MOS - 460424.1, about Solaris 10 issues with PGA
    memory allocation. Not sure it fits your problem or not.
    Have you refreshed statistics on the 11.2 database? If not I would try
    that assuming that the issue is not down to contention between Solaris
    10 and Oracle memory management.

    Regards
    Pete
  • Mark W. Farnham at Jan 31, 2012 at 2:18 pm
    ulimit?

    The tradeoff here is whether to utilize virtual memory to allow gargantuan
    jobs to theoretically be processed at all. Seymour Cray nailed that in one
    when he was asked why his systems did not support virtual memory.

    It will always be possible to concoct queries and problems that cannot be
    solved within the limits of any arbitrary amount of real memory.

    And of course if you put in ulimit limitations and exceed them, then within
    some timeslice or interrupt that allows the kernel to figure that out the
    violating process gets killed. The side effects of the operating system
    killing Oracle processes is too complex to analyze in the general case.

    You mentioned that you had recently moved from 9i to 11gR2, but it is not
    clear to me whether your "analyst" had previously run the same query in 9i
    successfully. If so, *probably* it used an entirely different plan. Then
    again, I've often seen folks test limits on an upgraded system, for example
    in this context, by supplying a longer "in" list until something breaks.
    Those are the folks you hope to identify to play games on a "conference room
    pilot" or quality assurance test upgrade before you go live.

    Likewise, proper education to help them avoid unleashing experimental loads
    on the production system is usually part of the solution.

    Regards,

    mwf

    -----Original Message-----
    From: oracle-l-bounce@freelists.org
    On Behalf Of Robert Laverty
    Sent: Monday, January 30, 2012 6:13 PM
    To: oracle-l@freelists.org
    Subject: Killer SQL and PGA

    We had a problem with our PGA growing larger than physical memory, dragging
    the system down until we were forced to restart the database. We recently
    upgraded from 9i to 11gR2 on Solaris with 16Gb physical memory hosting an
    OLTP application. 4Gb is used for SGA and 400Mb for PGA_aggregate_target.
    AMM and ASMM have not been enabled. Workarea_size_policy is set to AUTO.
    This is a simple database. No RAC, no shared servers, no parallel
    processing.

    One of our analysts launched what appeared to be a simple ad hoc query:
    select * from deniedhist where icn in ('1', '2', '3', ... , 'x').
    Unfortunately, the analyst had over 400 ICNs listed. DENIEDHIST is a UNION
    ALL view representing a faux-partitioned array of 129 tables, each with its
    own unique index on ICN. The execution plan (my comments included) shows
    that it is searching each underlying table 'x' number of times, using the
    unique index each time. This is the plan from a test version of the SQL
    with only 5 ICNs. The rest of the plan shows the iteration through the
    other 128 underlying tables (other views of the plan show the unique table
    and index names).
    1422 SELECT STATEMENT REMOTE CHOOSE
    1421 VIEW POS.DENIEDHIST
    1420 UNION-ALL
    11 CONCATENATION
    -- This is table 1 of 129 in the view
    2 TABLE
    ACCESS BY INDEX ROWID POS.DENIED_HISTORY

    1 INDEX UNIQUE SCAN UNIQUE POS.DENIED_HISTORY_ICN
    4 TABLE
    ACCESS BY INDEX ROWID POS.DENIED_HISTORY

    3 INDEX UNIQUE SCAN UNIQUE POS.DENIED_HISTORY_ICN
    6 TABLE
    ACCESS BY INDEX ROWID POS.DENIED_HISTORY

    5 INDEX UNIQUE SCAN UNIQUE POS.DENIED_HISTORY_ICN
    8 TABLE
    ACCESS BY INDEX ROWID POS.DENIED_HISTORY

    7 INDEX UNIQUE SCAN UNIQUE POS.DENIED_HISTORY_ICN
    10 TABLE
    ACCESS BY INDEX ROWID POS.DENIED_HISTORY

    9 INDEX UNIQUE SCAN UNIQUE POS.DENIED_HISTORY_ICN
    22 CONCATENATION
    -- This is table 2 of 129 in the view

    ...

    I estimated that the original version probably had over 120,000 steps in its
    execution plan. It ran for nearly an hour until memory reached a critical
    point, almost 20 Gb and paging like mad, that required us to abort the
    database and restart. I can run the full original query on a standby
    machine which is physically identical but is still running 9i. It finishes
    in 5 minutes and the PGA only grows to 500 Mb. The execution plans, at
    least for the small test version of the query, are the same on both
    machines.

    I understand the problems with large 'IN' lists. There was a conversation
    on that topic here at the end of October . Educating the analyst and
    redesigning the query are secondary concerns. I also realize, after reading
    a bunch of Tom Kyte posts, that pga_aggregate_target will not constrain the
    growth of the PGA.

    My real question is why the 11g memory management, without AMM or ASMM,
    would allow the PGA to grow so large. In 15 years of operations, there must
    have been similar bad queries against the database. This happened a day
    after the 11g upgrade. Any suggestions?

    Bob Laverty
    Molina Medicaid Solutions

    IMPORTANT NOTICE TO RECIPIENT: This email is meant only for the intended
    recipient of the transmission. In addition, this email may be a
    communication that is privileged by law. If you received this email in
    error, any review, use, disclosure, distribution, or copying of this email
    is strictly prohibited. Please notify us immediately of the error by return
    email, and please delete this email from your system. Thank you for your
    cooperation.

    --
    http://www.freelists.org/webpage/oracle-l


    --
    http://www.freelists.org/webpage/oracle-l
  • Greg Rahn at Jan 31, 2012 at 7:19 pm
    If you have a reproducible test case, open an SR and file a bug. Even
    though PGA memory management is controlled via a "target" not a "limit",
    overshooting it by a such a significant is clearly not the expected
    behavior.
    Be sure to include a test case builder archive for the problem statement
    and a SQL Monitor report. See:
    http://blogs.oracle.com/optimizer/entry/oracle_keeps_closing_my_tar_because_i_cannot_provide_a_testcase_can_you_help
    https://raw.github.com/grahn/oracle_scripts/master/tcb.sh
    http://structureddata.org/2008/01/06/oracle-11g-real-time-sql-monitoring-using-dbms_sqltunereport_sql_monitor/

    On Mon, Jan 30, 2012 at 3:13 PM, Robert Laverty wrote:

    We had a problem with our PGA growing larger than physical memory,
    dragging the system down until we were forced to restart the database. We
    recently upgraded from 9i to 11gR2 on Solaris with 16Gb physical memory
    hosting an OLTP application. 4Gb is used for SGA and 400Mb for
    PGA_aggregate_target. AMM and ASMM have not been enabled.
    Workarea_size_policy is set to AUTO. This is a simple database. No RAC,
    no shared servers, no parallel processing.

    My real question is why the 11g memory management, without AMM or ASMM,
    would allow the PGA to grow so large. In 15 years of operations, there
    must have been similar bad queries against the database. This happened a
    day after the 11g upgrade. Any suggestions?
    --
    Regards,
    Greg Rahn | blog <http://bit.ly/u9N0i8> | twitter <http://bit.ly/v733dJ> |
    linkedin <http://linkd.in/gregrahn>


    --
    http://www.freelists.org/webpage/oracle-l

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouporacle-l @
categoriesoracle
postedJan 30, '12 at 11:14p
activeJan 31, '12 at 7:19p
posts4
users4
websiteoracle.com

People

Translate

site design / logo © 2022 Grokbase