FAQ
I am working on 10.2.0.4 on Linux. I have a table with a highly skewed set
of data.

166067 rows, 52999 distinct values in an indexed column.
For 3000+ values have 3 rows each, about 60 values have over 100 rows each.
So I thought this is a good candidate for a histogram.

When I gather stats for all columns size auto I do not get a histogram on
this column.
I have to force gather_table_stats to create a histogram on this column
(I've tried various number of buckets from 25 to 254)

However, the optimizer doesn't seem to use the histogram in calculating the
cost
(That is causing problems in a more complex query, because the is causing
the plan not to change).

select /*+GATHER_PLAN_STATISTICS*/ count(*) from psroleuser where roleuser
='HVA_FUNC'
Id | Operation | Name | Starts | E-Rows | A-Rows |
A-Time | Buffers |
----------------------------------------------------------------------------
-----------------
1 | SORT AGGREGATE | | 1 | 1 | 1
00:00:00.01 | 4 |
* 2 | INDEX RANGE SCAN| PS_PSROLEUSER | 1 | 3 | 169
00:00:00.01 | 4 |
----------------------------------------------------------------------------
-----------------

I have another version of this table on a test database, but the volume is
slightly smaller (26000 rows, 3300 distinct values)
That uses the histogram exactly as I would expect, the estimated number of
rows is close to reality, and the execution plan of the complex query
changes.

I get the feeling I am missing something. Has anyone seen this before?

The compatible and OPTIMIZER_FEATURES are both set to 10.2.0.4.
Cursor Sharing is set to EXACT

regards

David Kurtz
Go-Faster Consultancy Ltd.
mailto:david.kurtz_at_go-faster.co.uk
web: www.go-faster.co.uk
Book: PeopleSoft for the Oracle DBA: http://www.psftdba.com
DBA Blogs: PeopleSoft: http://blog.psftdba.com, Oracle:
http://blog.go-faster.co.uk
PeopleSoft DBA Forum: http://groups.yahoo.com/group/psftdba

Search Discussions

  • Alberto Dell'Era at Jun 10, 2009 at 11:17 pm
    Assuming SIZE=254 and estimate_percent = null

    ndv=52999 > size=254 => height-balanced histogram

    Only if 'HVA_FUNC' is recorded as a popular value the histogram
    data will be used; if either 'HVA_FUNC' is not popular or it is not
    recorded in the histogram, num_rows*density is used.

    To build the histogram, rows are sorted and then sampled over
    an grid spaced approx num_rows / size = 166067 / 254 = 653.

    So a value most occur at least 2*653 times to be recorded as
    popular (= sampled at least two times), it might be recorded as
    popular if it occurs at least 653 times (depending on the exact sampling grid),
    it is never recorded as popular if it occurs less than 653 times.

    In the other case 26000 / 254 = 102, hence it is more likely that some
    values get recorded as popular (since you mention "60 values have over
    100 rows each",
    some of them might be recorded as popular).

    hth
    Alberto
    On Thu, Jun 11, 2009 at 00:20, David Kurtz wrote:
    I am working on 10.2.0.4 on Linux.  I have a table with a highly skewed set
    of data.

    166067 rows, 52999 distinct values in an indexed column.
    For 3000+ values have 3 rows each, about 60 values have over 100 rows each.
    So I thought this is a good candidate for a histogram. (snip)
    However, the optimizer doesn't seem to use the histogram in calculating the
    cost
    (That is causing problems in a more complex query, because the is causing
    the plan not to change).

    select /*+GATHER_PLAN_STATISTICS*/ count(*) from psroleuser where roleuser
    ='HVA_FUNC'
    ---------------------------------------------------------------------------------------------
    Id  | Operation         | Name          | Starts | E-Rows | A-Rows |
    A-Time   | Buffers |
    ---------------------------------------------------------------------------------------------
       1 |  SORT AGGREGATE   |               |      1 |      1 |      1
    00:00:00.01 |       4 |
    *  2 |   INDEX RANGE SCAN| PS_PSROLEUSER |      1 |      3 |    169
    00:00:00.01 |       4 |
    ---------------------------------------------------------------------------------------------

    I have another version of this table on a test database, but the volume is
    slightly smaller (26000 rows, 3300 distinct values)

    That uses the histogram exactly as I would expect, the estimated number of
    rows is close to reality, and the execution plan of the complex query
    changes.
    --
    Alberto Dell'Era
    "the more you know, the faster you go"
    --
    http://www.freelists.org/webpage/oracle-l
  • Allen, Brandon at Jun 10, 2009 at 11:19 pm
    I wouldn't consider "over 100 rows" vs "3 rows" to be "highly skewed" in relation to a total of 166,067 rows. You're talking about .0006% of the table vs. .0018% of the table - a very small percentage either way, and .0018%, or 100 rows, probably isn't nearly enough to warrant a full scan instead of an index scan if that's what you have in mind. Is that what you had in mind?

    You might want to test with a FULL() hint to see just how much worse the performance is with a full scan.

    Regards,
    Brandon

    Privileged/Confidential Information may be contained in this message or attachments hereto. Please advise immediately if you or your employer do not consent to Internet email for messages of this kind. Opinions, conclusions and other information in this message that do not relate to the official business of this company shall be understood as neither given nor endorsed by it.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouporacle-l @
categoriesoracle
postedJun 10, '09 at 10:20p
activeJun 10, '09 at 11:19p
posts3
users3
websiteoracle.com

People

Translate

site design / logo © 2023 Grokbase