FAQ
In one of our reporting environments (OSEE 10.2.0.2) we have
optimizer_dynamic_sampling cranked up to 5. We have a case where dynamic
sampling is pushing the CBO to choose a poor plan. Our situation highlights
Level 4, specifically "tables that have single-table predicates that
reference 2 or more columns". (from the
documentation<http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/stats.htm#sthref1491>
)

Looking at a 10046 trace, I see a OPT_DYN_SAMP query that counts a sample of
rows that match variations on the predicate. I have found that a 10046 can
always be good grounds for finding more questions than it answers. *grin*

To avoid the bandwidth, I will show just the relevant parts of the sampling
query:
SELECT /* OPT_DYN_SAMP */

...
FROM

(SELECT /* ... */

1 AS C1,
CASE

WHEN "RCRAPP1"."RCRAPP1_AIDY_CODE"=:bind1
AND "RCRAPP1"."RCRAPP1_INFC_CODE"=:bind2
AND "RCRAPP1"."RCRAPP1_CURR_REC_IND"=:bind3
AND "RCRAPP1"."RCRAPP1_VPDI_CODE"=:bind4

THEN 1

ELSE 0

END AS C2,

CASE

WHEN "RCRAPP1"."RCRAPP1_AIDY_CODE"=:bind1
AND "RCRAPP1"."RCRAPP1_VPDI_CODE"=:bind4

THEN 1

ELSE 0

END AS C3,

CASE

WHEN "RCRAPP1"."RCRAPP1_AIDY_CODE"=:bind1
AND "RCRAPP1"."RCRAPP1_VPDI_CODE"=:bind4

THEN 1

ELSE 0

END AS C4

FROM ...

SAMPLE BLOCK (.037704, 1) SEED (1) "RCRAPP1"

) SAMPLESUB

/

First question: Why the predicate variations? I am assuming that sometimes
C4 is different than C3, so I am going to ignore C4 for now.
2) How does the Optimizer use these sample counts to influence the costs and
subsequent access/join methods?

I believe what is happening in our case is that sampling is "just unlucky"
and it happens to come up with a inaccurate picture of how many rows match
(an order of magnitude too many). I can prove this by using this query
directly with different sample sizes.

I am hoping that by understanding dynamic sampling a little better, I can
understand how to resolve this issue. I realize that one can add the
dynamic_sampling query hint, and perhaps that is the "best" solution for
this. Aside from adjusting dynamic sampling (whether it be at the system
level or via a hint), is there any other way to address this situation?

TIA,

Search Discussions

  • David Aldridge at Dec 1, 2007 at 10:05 pm
    I wouold guess that this might be to do with the optimizer looking at different access paths to the table. For example, it might be not be much more selective to access the table via


    "RCRAPP1"."RCRAPP1_AIDY_CODE"=:bind1
    AND "RCRAPP1"."RCRAPP1_INFC_CODE"=:bind2
    AND "RCRAPP1"."RCRAPP1_CURR_REC_IND"=:bind3
    AND "RCRAPP1"."RCRAPP1_VPDI_CODE"=:bind4

    than it is to access it with:


    "RCRAPP1"."RCRAPP1_AIDY_CODE"=:bind1
    AND "RCRAPP1"."RCRAPP1_VPDI_CODE"=:bind4

    If there is one index on all four columns and one index on just the two columns then accessing via the two column index might be more efficient than using the four-column one.


    Just speculation though.

    In one of our reporting environments (OSEE 10.2.0.2) we have optimizer_dynamic_sampling cranked up to 5. We have a case where dynamic sampling is pushing the CBO to choose a poor plan. Our situation highlights Level 4, specifically "tables that have single-table predicates that reference 2 or more columns". (from the documentation)

    Looking at a 10046 trace, I see a OPT_DYN_SAMP query that counts a sample of rows that match variations on the predicate. I have found that a 10046 can always be good grounds for finding more questions than it answers. *grin*

    To avoid the bandwidth, I will show just the relevant parts of the sampling query:
    SELECT /* OPT_DYN_SAMP */

    ...
    FROM

    (SELECT /* ... */

    1 AS C1,
    CASE

    WHEN "RCRAPP1"."RCRAPP1_AIDY_CODE"=:bind1
    AND "RCRAPP1"."RCRAPP1_INFC_CODE"=:bind2
    AND "RCRAPP1"."RCRAPP1_CURR_REC_IND"=:bind3
    AND "RCRAPP1"."RCRAPP1_VPDI_CODE"=:bind4

    THEN 1

    ELSE 0

    END AS C2,

    CASE

    WHEN "RCRAPP1"."RCRAPP1_AIDY_CODE"=:bind1
    AND "RCRAPP1"."RCRAPP1_VPDI_CODE"=:bind4

    THEN 1

    ELSE 0

    END AS C3,

    CASE

    WHEN "RCRAPP1"."RCRAPP1_AIDY_CODE"=:bind1
    AND "RCRAPP1"."RCRAPP1_VPDI_CODE"=:bind4

    THEN 1

    ELSE 0

    END AS C4

    FROM ...

    SAMPLE BLOCK (.037704, 1) SEED (1) "RCRAPP1"

    ) SAMPLESUB

    /

    First question: Why the predicate variations? I am assuming that sometimes C4 is different than C3, so I am going to ignore C4 for now.
    2) How does the Optimizer use these sample counts to influence the costs and subsequent access/join methods?

    I believe what is happening in our case is that sampling is "just unlucky" and it happens to come up with a inaccurate picture of how many rows match (an order of magnitude too many). I can prove this by using this query directly with different sample sizes.

    I am hoping that by understanding dynamic sampling a little better, I can understand how to resolve this issue. I realize that one can add the dynamic_sampling query hint, and perhaps that is the "best" solution for this. Aside from adjusting dynamic sampling (whether it be at the system level or via a hint), is there any other way to address this situation?

    TIA,
  • Charles Schultz at Dec 1, 2007 at 10:34 pm

    On Dec 1, 2007 4:05 PM, David Aldridge wrote:

    I wouold guess that this might be to do with the optimizer looking at
    different access paths to the table. For example, it might be not be much
    more selective to access the table via

    "RCRAPP1"."RCRAPP1_AIDY_CODE"=:bind1
    AND "RCRAPP1"."RCRAPP1_INFC_CODE"=:bind2
    AND "RCRAPP1"."RCRAPP1_CURR_REC_IND"=:bind3
    AND "RCRAPP1"."RCRAPP1_VPDI_CODE"=:bind4
    than it is to access it with:

    "RCRAPP1"."RCRAPP1_AIDY_CODE"=:bind1
    AND "RCRAPP1"."RCRAPP1_VPDI_CODE"=:bind4

    If there is one index on all four columns and one index on just the two
    columns then accessing via the two column index might be more efficient than
    using the four-column one.

    Just speculation though.
    The interesting part is that if I calc stats (estimate_percent => null), the
    CBO chooses an index for C3. We do not have an index for all 4 columns, but
    under certain circumstances, the CBO will choose that other index for C2,
    using three of the four columns in an access path, and CURR_REC_IND as a
    filter. Those of you with astute eyes will notice the VPD column and
    correctly assume we are using FGAC as well. That ends up being the most
    optimal path, and I trying to understand why. Obviously, there is a whole
    lot more information I could share, but first I wanted to understand how
    dynamic sampling was playing a part in this.

    Rajeev, to answer your questions:

    1) Have tried with bucket 1, default and 254.
    2) exact
    3) Yes, all the time, every day. =) The parameters depend on what I am

    trying to do. I like to calc stats in these cases to give the CBO as much
    info as possible.

    Not sure how those questions/answers help, but there you go.

    Thanks all,
  • Rajeev Prabhakar at Dec 2, 2007 at 12:29 am
    Charles

    Unless you have already tried it, could you try the following
    and let us know the results :

    gathering stats with compute OR high sample (not AUTO).
    use dbms_stats.set_column_stats for the relevant column.

    -Rajeev
    On Dec 1, 2007 5:34 PM, Charles Schultz wrote:



    The interesting part is that if I calc stats (estimate_percent => null),
    the CBO chooses an index for C3. We do not have an index for all 4 columns,
    but under certain circumstances, the CBO will choose that other index for
    C2, using three of the four columns in an access path, and CURR_REC_IND as a
    filter. Those of you with astute eyes will notice the VPD column and
    correctly assume we are using FGAC as well. That ends up being the most
    optimal path, and I trying to understand why. Obviously, there is a whole
    lot more information I could share, but first I wanted to understand how
    dynamic sampling was playing a part in this.


    Rajeev, to answer your questions:
    1) Have tried with bucket 1, default and 254.
    2) exact
    3) Yes, all the time, every day. =) The parameters depend on what I am
    trying to do. I like to calc stats in these cases to give the CBO as much
    info as possible.

    Not sure how those questions/answers help, but there you go.


    Thanks all,


    --
    Charles Schultz
    --
    http://www.freelists.org/webpage/oracle-l
  • Rajeev Prabhakar at Dec 2, 2007 at 12:48 am
    One more thing, can you try deleting histograms and give it a try.
    On Dec 1, 2007 7:29 PM, Rajeev Prabhakar wrote:

    Charles

    Unless you have already tried it, could you try the following
    and let us know the results :

    a) gathering stats with compute OR high sample (not AUTO).
    b) use dbms_stats.set_column_stats for the relevant column.

    -Rajeev
    On Dec 1, 2007 5:34 PM, Charles Schultz wrote:



    The interesting part is that if I calc stats (estimate_percent => null),
    the CBO chooses an index for C3. We do not have an index for all 4 columns,
    but under certain circumstances, the CBO will choose that other index for
    C2, using three of the four columns in an access path, and CURR_REC_IND as a
    filter. Those of you with astute eyes will notice the VPD column and
    correctly assume we are using FGAC as well. That ends up being the most
    optimal path, and I trying to understand why. Obviously, there is a whole
    lot more information I could share, but first I wanted to understand how
    dynamic sampling was playing a part in this.


    Rajeev, to answer your questions:
    1) Have tried with bucket 1, default and 254.
    2) exact
    3) Yes, all the time, every day. =) The parameters depend on what I am
    trying to do. I like to calc stats in these cases to give the CBO as much
    info as possible.

    Not sure how those questions/answers help, but there you go.


    Thanks all,


    --
    Charles Schultz
    --
    http://www.freelists.org/webpage/oracle-l
  • Rajeev Prabhakar at Dec 1, 2007 at 10:14 pm
    Charles

    Do you have histograms collected for relevant column(s) ?
    What is the value of cursor_sharing parameter ?
    Do you use dbms_stats ? If yes, what parameters/options
    do you use ?

    -Rajeev
    On Nov 30, 2007 3:45 PM, Charles Schultz wrote:

    In one of our reporting environments (OSEE 10.2.0.2) we have
    optimizer_dynamic_sampling cranked up to 5. We have a case where dynamic
    sampling is pushing the CBO to choose a poor plan. Our situation highlights
    Level 4, specifically "tables that have single-table predicates that
    reference 2 or more columns". (from the documentation<http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/stats.htm#sthref1491>
    )

    Looking at a 10046 trace, I see a OPT_DYN_SAMP query that counts a sample
    of rows that match variations on the predicate. I have found that a 10046
    can always be good grounds for finding more questions than it answers.
    *grin*

    To avoid the bandwidth, I will show just the relevant parts of the
    sampling query:
    SELECT /* OPT_DYN_SAMP */
    ...
    FROM
    (SELECT /* ... */
    1 AS C1,
    CASE
    WHEN "RCRAPP1"."RCRAPP1_AIDY_CODE"=:bind1
    AND "RCRAPP1"."RCRAPP1_INFC_CODE"=:bind2
    AND "RCRAPP1"."RCRAPP1_CURR_REC_IND"=:bind3
    AND "RCRAPP1"."RCRAPP1_VPDI_CODE"=:bind4
    THEN 1
    ELSE 0
    END AS C2,
    CASE
    WHEN "RCRAPP1"."RCRAPP1_AIDY_CODE"=:bind1
    AND "RCRAPP1"."RCRAPP1_VPDI_CODE"=:bind4
    THEN 1
    ELSE 0
    END AS C3,
    CASE
    WHEN "RCRAPP1"."RCRAPP1_AIDY_CODE"=:bind1
    AND "RCRAPP1"."RCRAPP1_VPDI_CODE"=:bind4
    THEN 1
    ELSE 0
    END AS C4
    FROM ...
    SAMPLE BLOCK (.037704, 1) SEED (1) "RCRAPP1"
    ) SAMPLESUB
    /

    First question: Why the predicate variations? I am assuming that sometimes
    C4 is different than C3, so I am going to ignore C4 for now.
    2) How does the Optimizer use these sample counts to influence the costs
    and subsequent access/join methods?

    I believe what is happening in our case is that sampling is "just unlucky"
    and it happens to come up with a inaccurate picture of how many rows match
    (an order of magnitude too many). I can prove this by using this query
    directly with different sample sizes.

    I am hoping that by understanding dynamic sampling a little better, I can
    understand how to resolve this issue. I realize that one can add the
    dynamic_sampling query hint, and perhaps that is the "best" solution for
    this. Aside from adjusting dynamic sampling (whether it be at the system
    level or via a hint), is there any other way to address this situation?

    TIA,
    --
    Charles Schultz
    --
    http://www.freelists.org/webpage/oracle-l
  • Alberto Dell'Era at Dec 2, 2007 at 1:18 am

    2) How does the Optimizer use these sample counts to influence the costs and
    subsequent access/join methods?
    If you try this scenario, and collect a 10053 trace:

    exec dbms_random.seed(0);

    create table t (x int, y int, z int);

    create index t_x_idx on t(x);
    create index t_y_idx on t(y);
    create index t_x_y_idx on t(x,y);

    insert into t (x,y,z)
    select mod(rownum-1,10) x, mod(rownum-1,10) y, mod(rownum-1,10) z from
    dual connect by level <= 10000
    order by dbms_random.random;

    alter session set optimizer_dynamic_sampling=5;
    -- just to have pretty round numbers:
    alter session set "_optimizer_cost_model"=io;
    alter session set events '10053 trace name context forever, level 1';
    set autotrace traceonly explain
    select /*+ dynamic_sampling */ * from t where x = 1 and y = 1 and z = 1;
    set autotrace off
    alter session set events '10053 trace name context off';

    The sampling query (reformatted for clarity) is, in 10.2.0.3:

    SELECT /* OPT_DYN_SAMP */ ...

    FROM (SELECT /*+ IGNORE_WHERE_CLAUSE NO_PARALLEL("T") FULL("T")

    NO_PARALLEL_INDEX("T") */

    1 AS C1,
    CASE WHEN "T"."X"=1 AND "T"."Y"=1 AND "T"."Z"=1 THEN 1 ELSE 0 END AS C2,
    CASE WHEN "T"."Y"=1 THEN 1 ELSE 0 END AS C3,
    CASE WHEN "T"."Y"=1 AND "T"."X"=1 THEN 1 ELSE 0 END AS C4,
    CASE WHEN "T"."X"=1 THEN 1 ELSE 0 END AS C5

    C1 estimates num_rows, C2 the query filtered cardinality, and the others
    how much index keys are going to be visited if the index is chosen, and hence,
    the index selectivity - for each index eligible to be used.

    Executed dynamic sampling query:
    level : 5
    sample pct. : 100.000000
    actual sample size : 10000
    filtered sample card. : 1000 -- Alberto: from C2
    filtered sample card. (index T_Y_IDX): 1000 -- Alberto: from C3
    filtered sample card. (index T_X_Y_IDX): 1000 -- Alberto: from C4
    filtered sample card. (index T_X_IDX): 1000 -- Alberto: from C5
    orig. card. : 82
    block cnt. table stat. : 20
    block cnt. for sampling: 20
    max. sample block cnt. : 64
    sample block cnt. : 20
    min. sel. est. : 0.00000100
    index T_Y_IDX selectivity est.: 0.10000000
    index T_X_Y_IDX selectivity est.: 0.10000000
    index T_X_IDX selectivity est.: 0.10000000

    Of course the numbers returned by the query are in general corrected
    by the sampling factor - if you estimated by 1%, they are multiplied
    by 100. Here the sampling was 100%.

    The numbers are then plugged into the usual formulae; for example,
    the estimated cost of the index access for T_X_Y_IDX is

    Access Path: index (AllEqRange)
    Index: T_X_Y_IDX
    resc_io: 85.00 resc_cpu: 0
    ix_sel: 0.1 ix_sel_with_filters: 0.1
    Cost: 85.00 Resp: 85.00 Degree: 1

    From Jonathan's "Cost Based Oracle" or Wolfgang's famous paper,
    we get that the usual formula is
    blevel + ix_sel * leaf_blocks + ix_sel_with_filters * clustering_factor

    that is "blevel blocks are visited going from the root to the leaves,
    ix_sel * leaf_blocks are the leaf block got (visited, range scan),
    and ix_sel_with_filters * clustering_factor are the table blocks got".

    For this scenario, ix_sel = ix_sel_with_filters, since all index columns
    are used for access, none for (pre)filtering before following the rowid.

    blevel and clustering_factor are taken from the index stats (here
    clustering_factor is defaulted to 800):

    Index: T_X_Y_IDX Col#: 1 2 (NOT ANALYZED)
    LVLS: 1 #LB: 25 #DK: 100 LB/K: 1.00 DB/K: 1.00 CLUF: 800.00

    The leaf_blocks is AFAIK taken from the index segment header
    and set equal to the number of index blocks (in an index, almost
    all blocks are leaves):

    Dynamic sampling updated index stats.: T_X_Y_IDX, blocks=43

    so 1 + 0.1 * 43 + 0.1 * 800 = 85.3

    Notice that the clustering factor is not dynamically estimated, and since it
    is very frequently the major contributor, it is well worth to have
    the index stats in place.

    Caveat: this is how I understand it, I haven't made any exhaustive
    investigation. In fact most of this test case comes from my fuzzy recalls
    of a discussion I had with Wolfgang eons ago.
    It should be enough to answer much of your question anyway ...

    HTH

    Alberto

    --
    Alberto Dell'Era
    "the more you know, the faster you go"
    --
    http://www.freelists.org/webpage/oracle-l

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouporacle-l @
categoriesoracle
postedNov 30, '07 at 8:45p
activeDec 2, '07 at 1:18a
posts7
users4
websiteoracle.com

People

Translate

site design / logo © 2023 Grokbase