FAQ
Christian,

I was reading your document "CBO - A Configuration Roadmap" .

You write :
"Histograms are essential for all columns referenced in WHERE clauses
that contain
skewed data. Notice that they are useful on non-indexed columns as
well! For simplicity
use SIZE SKEWONLY. If it takes too much time try SIZE AUTO6. If it's
still too slow or
the chosen number of buckets is not good (or the needed histogram
isn't created at all),
manually specify the list of columns."

Can you explain the second sentence about Histograms on non-indexed
columns as well ?
How would they be useful ? I would understand Histograms on
non-indexed columns
as providing _me_ information about the data in those columns and
which could allow
me to make a judgement call as to whether I should index the
columns. How does the
optimizer use Histograms on non-indexed columns ?

If you do not mind [as you do post frequently at ORACLE-L], I have
CC'd this to ORACLE-L
trusting that your reply might also be of interest to others on the list.

Hemant K Chitale
http://web.singnet.com.sg/~hkchital

Search Discussions

  • Mladen Gogala at Dec 28, 2006 at 4:31 pm
    Hemant, histograms provide input to CBO, which creates an execution
    plan. For what decision can histogram on an unidexed column be useful?
    Such histogram can be used to decide between sort/merge join or hash
    join. The resulting difference in execution speed can be dramatic.

    Mladen Gogala
    Sr. Oracle DBA
    1500 Broadway, 6th floor
    New York, NY 10036
    (212) 329-5201
    www.vmsinfo.com

    The Leader in Integrated Media Intelligence Solutions
    -----Original Message-----
    From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-
    bounce@freelists.org] On Behalf Of Hemant K Chitale
    Sent: Thursday, December 28, 2006 11:12 AM
    To: christian.antognini_at_trivadis.com
    Cc: oracle-l_at_freelists.org
    Subject: re "CBO - A Configuration Roadmap" -- Histograms on Non-
    Indexed Columns


    Christian,

    I was reading your document "CBO - A Configuration Roadmap" .

    You write :
    "Histograms are essential for all columns referenced in WHERE
    clauses
    that contain
    skewed data. Notice that they are useful on non-indexed columns as
    well! For simplicity
    use SIZE SKEWONLY. If it takes too much time try SIZE AUTO6. If it's
    still too slow or
    the chosen number of buckets is not good (or the needed histogram
    isn't created at all),
    manually specify the list of columns."

    Can you explain the second sentence about Histograms on non-indexed
    columns as well ?
    How would they be useful ? I would understand Histograms on
    non-indexed columns
    as providing _me_ information about the data in those columns and
    which could allow
    me to make a judgement call as to whether I should index the
    columns. How does the
    optimizer use Histograms on non-indexed columns ?

    If you do not mind [as you do post frequently at ORACLE-L], I have
    CC'd this to ORACLE-L
    trusting that your reply might also be of interest to others on the
    list.


    Hemant K Chitale
    http://web.singnet.com.sg/~hkchital


    --
    http://www.freelists.org/webpage/oracle-l
    --
    http://www.freelists.org/webpage/oracle-l
  • Wolfgang Breitling at Dec 28, 2006 at 4:44 pm

    At 09:12 AM 12/28/2006, Hemant K Chitale wrote:
    Can you explain the second sentence about Histograms on non-indexed
    columns as well ?
    How would they be useful ? I would understand Histograms on
    non-indexed columns
    as providing _me_ information about the data in those columns and
    which could allow
    me to make a judgement call as to whether I should index the
    columns. How does the
    optimizer use Histograms on non-indexed columns ?
    Column statistics give the CBO information about the selectivity of
    predicates on that column. Predicate selectivity in turn gives rise
    to cardinality estimates. Differences in selectivity, e.g. by having
    a histogram, lead to differences in cardinality estimates which in
    turn can, and does, lead to different execution plans and peformance.
    For a very simple example see my paper on "Histograms - Myths and facts".

    Regards

    Wolfgang Breitling
    Centrex Consulting Corporation
    www.centrexcc.com

    This email has been scanned by the MessageLabs Email Security System.
    For more information please visit http://www.messagelabs.com/email
  • Hemant K Chitale at Dec 29, 2006 at 2:45 pm
    Christian, Wolfgang, Mladen,

    I knew about cardinality driving execution plan if it is used to
    determine the access method against the table. However, if there was
    no index on the column, the only access on that column _standalone_
    would be a full table scan. {exactly the first paragraph's in
    the section on Histograms on Non-Indexed columns in
    Wolfgang's paper, eh ?|
    Now, from your emails and Christian's short example and
    Wolfgang's paper, I take the explanation that this cardinality would
    be used to determine the JOIN method (sort-merge, nested-loop or hash).

    Thank you.

    [[BTW : I read Christian's paper after encountering an issue with
    a query on a single table [no join] where a column in the query
    IS quite skewed but I thought that Christian's statement was a generalization
    and began wondering how the Histogram would help in my query
    {single table, no join}. I guess I was too quick to make that assumption. ]]

    So, now, I should also consider skewness in a column if it is in a Join.

    Hemant K Chitale
    http://web.singnet.com.sg/~hkchital
  • Wolfgang Breitling at Dec 29, 2006 at 3:07 pm

    At 07:45 AM 12/29/2006, Hemant K Chitale wrote:
    Christian, Wolfgang, Mladen, [snip]
    Now, from your emails and Christian's short example and
    Wolfgang's paper, I take the explanation that this cardinality would
    be used to determine the JOIN method (sort-merge, nested-loop or hash).
    and/or the join order

    Regards

    Wolfgang Breitling
    Centrex Consulting Corporation
    www.centrexcc.com

    This email has been scanned by the MessageLabs Email Security System.
    For more information please visit http://www.messagelabs.com/email
  • John Kanagaraj at Dec 29, 2006 at 8:00 pm
    Hemant,
    You should be able to use SYS.COL_USAGE$ to work out which Columns are
    being used in Join predicates using the following SQL:

    select r.name owner, o.name table, c.name column,
    equality_preds, equijoin_preds, nonequijoin_preds, range_preds,
    like_preds, null_preds, timestamp
    from sys.col_usage$ u, sys.obj$ o, sys.col$ c, sys.user$ r
    where o.obj# = u.obj# and c.obj# = u.obj# and c.col# = u.intcol#
    and o.owner# = r.user# and (u.equijoin_preds > 0 or u.nonequijoin_preds > 0);

    A MINUS against DBA_IND_COLUMNS should show up which columns *might*
    need Histograms....

    Titbit: As per Oracle kernel developers that I spoke to at OOW 06, the
    data in COL_USAGE is never "flushed". However, they stated that only
    rows with TIMESTAMP > 6 months are considered.

    --
    John Kanagaraj <><
    DB Soft Inc
    Phone: 408-970-7002 (W)

    The opinions and facts contained in this message are entirely mine
    and do not reflect those of my employer or customers **
    --
    http://www.freelists.org/webpage/oracle-l
  • Christian Antognini at Dec 29, 2006 at 6:35 am
    Hi Hemant

    You probably already got enough information from Mladen and especially
    from Wolfgang... Anyway, let me show you the demo I eventually show when
    I present that paper...
    Can you explain the second sentence about Histograms on non-indexed
    columns as well ?
    How would they be useful ? I would understand Histograms on
    non-indexed columns
    as providing _me_ information about the data in those columns and
    which could allow
    me to make a judgement call as to whether I should index the
    columns. How does the
    optimizer use Histograms on non-indexed columns ?
    First of all let's create a table with a primary key (ID) and two
    additional columns (COL1 and COL2). For the purpose of this demo
    especially COL1 is interesting... In fact it contains one time all the
    values from 1 up to 500 and 500 times the value 666.

    SQL> CREATE TABLE t (id, col1, col2, CONSTRAINT t1_pk PRIMARY KEY (id))
      2 AS
      3 SELECT rownum, CASE WHEN rownum>500 THEN 666 ELSE rownum END,
    rownum
      4 FROM dba_objects WHERE rownum <= 1000;

    Then let's gather statistics without histograms.

    SQL> BEGIN

      2 dbms_stats.gather_table_stats(

    3 ownname=>user,
    4 tabname=>'T',
    5 cascade=>TRUE,
    6 method_opt=>'for all columns size 1');

      7 END;

      8 /

    Now let's check the query optimizer estimations when COL1 is referenced
    in the WHERE clause... Since no histograms are available and that the
    query optimizer "only" knows min/max value and number of distinct
    values, its estimation for the predicate "t2.col1 = 666" (see line 2 of
    the execution plan) is that two rows are returned. Why two? Basically
    because it multiply the number of rows store in the table (1000) by the
    selectivity of the predicate (1/num_distinct = 1/501 = 0.002). The
    interesting thing to notice is that this estimation, i.e. the high
    selectivity, leads to a nested loop.

    SQL> SET AUTOTRACE TRACE EXP

    SQL> SELECT * FROM t t1 JOIN t t2 USING (id) WHERE t2.col1 = 666;

    Execution Plan

    Plan hash value: 2076359606
    Id | Operation | Name | Rows |
    ------------------------------------------------------
    0 | SELECT STATEMENT | | 2 |
    1 | NESTED LOOPS | | 2 |
    * 2 | TABLE ACCESS FULL | T | 2 |
    3 | TABLE ACCESS BY INDEX ROWID| T | 1 |
    * 4 | INDEX UNIQUE SCAN | T1_PK | 1 |
    ------------------------------------------------------

    Predicate Information (identified by operation id):

       2 - filter("T2"."COL1"=666)
       4 - access("T1"."ID"="T2"."ID")

    Then let's gather a histogram on COL1...

    SQL> BEGIN

      2 dbms_stats.gather_table_stats(

    3 ownname=>user,
    4 tabname=>'T',
    5 cascade=>TRUE,
    6 method_opt=>'for columns id size 1, col1 size 254');

      7 END;

      8 /

    Now the estimations have changed. In fact with the histogram on COL1 the
    query optimizer knows that the predicate "t2.col1 = 666" has a much
    higher selectivity (circa 0.5). And since the number of returned rows is
    much higher now the query optimizer decides to use a hash join...

    SQL> SELECT * FROM t t1 JOIN t t2 USING (id) WHERE t2.col1 = 666;

    Execution Plan

    Plan hash value: 2135975663
    Id | Operation | Name | Rows |
    0 | SELECT STATEMENT | | 506 |
    * 1 | HASH JOIN | | 506 |
    * 2 | TABLE ACCESS FULL| T | 506 |
    3 | TABLE ACCESS FULL| T | 1000 |
    -------------------------------------------

    Predicate Information (identified by operation id):

       1 - access("T1"."ID"="T2"."ID")
       2 - filter("T2"."COL1"=666)

    So, the aim of this demo is to show that the query optimizer, thanks to
    a histogram, doesn't change the access path for a table but the join
    method.

    Best regards,
    Chris

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouporacle-l @
categoriesoracle
postedDec 28, '06 at 4:12p
activeDec 29, '06 at 8:00p
posts7
users5
websiteoracle.com

People

Translate

site design / logo © 2022 Grokbase