FAQ
Oracle 11.2.0.2 with 2 node RAC
The batch job always runs on the first node.
The job uses three staging tables(Closer to 20 million,non-partitioned) to process interface jobs.
The sequence of events as follows:

1.Three staging tables are populated (approximately 80000 rows) with and from many application tables with a status flag of "P"

2.Compare between the previous days rows which has status flag of 'C' and today's rows.Report the differences of few column values.

Select statement runs in a loop and compare each emplid(bind value) between today's and yesterday's data.Each SQL is taking approximately 0.02 seconds.
This SQL is doing "Index Range Scan".Though the leaf blocks are increased every day,the range of scan would always be same as we process the same number of emplids every day.
The height of the index was 2 at the beginning and 4 as of now.

3.Update status flag from 'C' to 'H'

UPDATE staging SET FLAG='H' WHERE HHC_AUDIT_FLAG='C' AND JOB_ID=(SELECT MAX(JOB_ID) FROM STAGING WHERE JOB_ID<> 'Today's Job Id')

4.Update status flag from 'P' to 'C'

UPDATE staging SET FLAG='C' WHERE FLAG='P' AND job_id= 'today's job id'

Flag - Skew Data

H   COUNT(*)
- ----------
C      87509
H   33007762

Initially the job ran b/n 10-12 minutes and during the course of  over 6 months the run time has been increased to 80 minutes.

Theoretically,we are always processing same amount of data every day and the expectation of response time of this batch job is less than 15 minutes.
I wonder how partitioning would help improving the response time as none of the SQLs are doing full scan.

Thanks

Search Discussions

  • Jonathan Lewis at Jan 31, 2012 at 6:20 pm
    There are probably several design inefficiencies in your data structure and
    code, and you haven't given us enough information to make good guesses
    about where they are. It sounds as if your first step should be to enable
    extended SQL trace for the duration of the batch run so that you can see
    which SQL statements are responsible for most of the time, and find out
    what access paths they are taking.

    I doubt if you need to pay the price for partitioning for such a small data
    set, but I think you may need to review your indexing strategies and some
    of the SQL statements. (A correlated subquery to find "most recent" - i.e.
    your max() subquery is often a death trap with response time that increases
    with the data size unless you have the right indexes and the right path;
    having a status or flag value where almost all the rows end up at the same
    value is also an indexing death trap.) It's possible that there are a
    couple of fairly simple, safe, strategies that could get you back to your
    desired ten or fifteen minutes with 48 hours - but it's best for you to
    start with knowing where the time is going.

    Regards

    Jonathan Lewis
    http://jonathanlewis.wordpress.com
    Oracle Core (Apress 2011)
    http://www.apress.com/9781430239543


    ----- Original Message -----
    From: "Antony Raj" <ca_raj@yahoo.com>
    To: <oracle-l@freelists.org>
    Sent: Tuesday, January 31, 2012 5:25 PM
    Subject: degrade in performance


    Oracle 11.2.0.2 with 2 node RAC
    The batch job always runs on the first node.
    The job uses three staging tables(Closer to 20 million,non-partitioned) to
    process interface jobs.
    The sequence of events as follows:

    1.Three staging tables are populated (approximately 80000 rows) with and
    from many application tables with a status flag of "P"

    2.Compare between the previous days rows which has status flag of 'C' and
    today's rows.Report the differences of few column values.

    Select statement runs in a loop and compare each emplid(bind value) between
    today's and yesterday's data.Each SQL is taking approximately 0.02 seconds.
    This SQL is doing "Index Range Scan".Though the leaf blocks are increased
    every day,the range of scan would always be same as we process the same
    number of emplids every day.
    The height of the index was 2 at the beginning and 4 as of now.

    3.Update status flag from 'C' to 'H'

    UPDATE staging SET FLAG='H' WHERE HHC_AUDIT_FLAG='C' AND JOB_ID=(SELECT
    MAX(JOB_ID) FROM STAGING WHERE JOB_ID<> 'Today's Job Id')

    4.Update status flag from 'P' to 'C'

    UPDATE staging SET FLAG='C' WHERE FLAG='P' AND job_id= 'today's job id'

    Flag - Skew Data

    H COUNT(*)
    - ----------
    C 87509
    H 33007762

    Initially the job ran b/n 10-12 minutes and during the course of over 6
    months the run time has been increased to 80 minutes.

    Theoretically,we are always processing same amount of data every day and
    the expectation of response time of this batch job is less than 15 minutes.
    I wonder how partitioning would help improving the response time as none of
    the SQLs are doing full scan.

    Thanks
    --
    http://www.freelists.org/webpage/oracle-l




    -----
    No virus found in this message.
    Checked by AVG - www.avg.com
    Version: 2012.0.1901 / Virus Database: 2109/4777 - Release Date: 01/30/12


    --
    http://www.freelists.org/webpage/oracle-l
  • Andy Klock at Jan 31, 2012 at 6:33 pm
    Often when I hear about performance issues like this I think about Cary's
    story of the line painter:
    http://carymillsap.blogspot.com/2010/04/ramp.html
    On Tue, Jan 31, 2012 at 12:25 PM, Antony Raj wrote:

    ...

    Initially the job ran b/n 10-12 minutes and during the course of over 6
    months the run time has been increased to 80 minutes.

    --
    http://www.freelists.org/webpage/oracle-l
  • Antony Raj at Feb 2, 2012 at 8:49 pm
    Hi,

    I have a table T1 has 35 million rows.The following SQLs (Top 2) and many more runs every day.

    UPDATE T1 SET AUDIT_FLAG='H' WHERE AUDIT_FLAG='C' AND
    JOB_ID=(SELECT MAX(JOB_ID) FROM T1 WHERE JOB_ID <> :1);

    SELECT Z.EMPLID, Z.EMPL_RCD, Z.FIRST_NAME, Z.LAST_NAME, Z.MIDDLE_NAME,
    Z.COVERAGE_BEGIN_DT, DECODE(Z.EMPL_STATUS, 'U', 'A', Z.EMPL_STATUS),
    Z.UNION_CD, Z.PAYGROUP, Z.EXPECTED_RETURN_DT, Z.SEX, Z.DEPENDENT_BENEF,
    Z.RELATIONSHIP, Z.NATIONAL_ID, Z.BIRTHDATE, Z.STUDENT, Z.DISABLED,
    Z.MAR_STATUS, Z.MAR_STATUS_DT, Z.ADDRESS1, Z.ADDRESS2, Z.CITY, Z.STATE,
    Z.POSTAL, Z.COUNTRY, Z.DEDUCTION_BEGIN_DT, Z.COVRG_CD,
    Z.COVERAGE_ELECT,   Z.TERMINATION_DT, Z.BENEFIT_PLAN, Z.VENDOR_ID,
    Z.END_DT, Z.ELIG_CONFIG1,   Z.BEGIN_DT, Z.PHONE
    FROM T1 Z
    WHERE Z.JOB_ID = ( SELECT MAX(Z1.JOB_ID) FROM T1 Z1 WHERE Z1.EMPLID = Z.EMPLID AND Z1.AUDIT_FLAG ='H') And Z.EMPLID = :1
    AND NOT EXISTS ( SELECT 'X' FROM T1 Z2 WHERE Z2.EMPLID = Z.EMPLID AND Z2.DEPENDENT_BENEF = Z.DEPENDENT_BENEF AND   Z2.AUDIT_FLAG = 'C' and
    AUDIT_ACTN in ('A','C') ) ORDER BY Z.EMPLID;

    Unique Index Columns:

    JOB_ID
    EMPLID
    PLAN_TYPE
    DEPENDENT_BENEF

    I am thinking of adding another index with the following column order to satisfy the top 2 SQLs.
    JOB_ID,
    AUDIT_FLAG,
    EMPLID,
    DEPENDENT_BENEF,
    AUDIT_ACTN

    Column Name                    Null?      Type             NUM_DISTINCT
    ------------------------------ ---------- --------------- ------------
    JOB_ID                         NOT NULL   NUMBER(10,0)             394
    EMPLID                         NOT NULL   VARCHAR2(44)           31366
    PLAN_TYPE                      NOT NULL   VARCHAR2(8)                2
    DEPENDENT_BENEF                NOT NULL   VARCHAR2(8)               14
    AUDIT_ACTN                     NOT NULL   VARCHAR2(4)                3
    AUDIT_FLAG                     NOT NULL   VARCHAR2(4)                3

    My question is that should I consider the "most selective column" as a leading column?

    Thanks
  • Powell, Mark at Feb 2, 2012 at 10:00 pm
    No, you should make the "most selected" column the leading column of the multi-column index. That is if 6 queries reference one or both of the two columns and the 3 queries that reference only one of the columns reference the same column then you normally want that column to be the leading column (providing the data is not badly skewed).

    Ever case depends on the data distribution and the actual queries being ran but in general I suggest looking at what columns are referenced and which other indexes you would need to create if you choose a specific order.

    The least selective column being first could be beneficial if index compression is used. Again I find use of this feature has to be judged on a case by case basis.


    -----Original Message-----
    From: oracle-l-bounce@freelists.org On Behalf Of Antony Raj
    Sent: Thursday, February 02, 2012 3:48 PM
    To: oracle-l@freelists.org
    Subject: Concatenated Index Column Order - Does it really matters?

    Hi,

    I have a table T1 has 35 million rows.The following SQLs (Top 2) and many more runs every day.

    UPDATE T1 SET AUDIT_FLAG='H' WHERE AUDIT_FLAG='C' AND JOB_ID=(SELECT MAX(JOB_ID) FROM T1 WHERE JOB_ID <> :1);

    SELECT Z.EMPLID, Z.EMPL_RCD, Z.FIRST_NAME, Z.LAST_NAME, Z.MIDDLE_NAME, Z.COVERAGE_BEGIN_DT, DECODE(Z.EMPL_STATUS, 'U', 'A', Z.EMPL_STATUS), Z.UNION_CD, Z.PAYGROUP, Z.EXPECTED_RETURN_DT, Z.SEX, Z.DEPENDENT_BENEF,
    Z.RELATIONSHIP, Z.NATIONAL_ID, Z.BIRTHDATE, Z.STUDENT, Z.DISABLED, Z.MAR_STATUS, Z.MAR_STATUS_DT, Z.ADDRESS1, Z.ADDRESS2, Z.CITY, Z.STATE,
    Z.POSTAL, Z.COUNTRY, Z.DEDUCTION_BEGIN_DT, Z.COVRG_CD, Z.COVERAGE_ELECT,   Z.TERMINATION_DT, Z.BENEFIT_PLAN, Z.VENDOR_ID, Z.END_DT, Z.ELIG_CONFIG1,   Z.BEGIN_DT, Z.PHONE FROM T1 Z WHERE Z.JOB_ID = ( SELECT MAX(Z1.JOB_ID) FROM T1 Z1 WHERE Z1.EMPLID = Z.EMPLID AND Z1.AUDIT_FLAG ='H') And Z.EMPLID = :1 AND NOT EXISTS ( SELECT 'X' FROM T1 Z2 WHERE Z2.EMPLID = Z.EMPLID AND Z2.DEPENDENT_BENEF = Z.DEPENDENT_BENEF AND   Z2.AUDIT_FLAG = 'C' and AUDIT_ACTN in ('A','C') ) ORDER BY Z.EMPLID;

    Unique Index Columns:

    JOB_ID
    EMPLID
    PLAN_TYPE
    DEPENDENT_BENEF

    I am thinking of adding another index with the following column order to satisfy the top 2 SQLs.
    JOB_ID,
    AUDIT_FLAG,
    EMPLID,
    DEPENDENT_BENEF,
    AUDIT_ACTN

    Column Name                    Null?      Type             NUM_DISTINCT
    ------------------------------ ---------- --------------- ------------
    JOB_ID                         NOT NULL   NUMBER(10,0)             394
    EMPLID                         NOT NULL   VARCHAR2(44)           31366
    PLAN_TYPE                      NOT NULL   VARCHAR2(8)                2
    DEPENDENT_BENEF                NOT NULL   VARCHAR2(8)               14
    AUDIT_ACTN                     NOT NULL   VARCHAR2(4)                3
    AUDIT_FLAG                     NOT NULL   VARCHAR2(4)                3

    My question is that should I consider the "most selective column" as a leading column?

    Thanks
    --
    http://www.freelists.org/webpage/oracle-l


    --
    http://www.freelists.org/webpage/oracle-l
  • Antony Raj at Feb 10, 2012 at 8:22 pm
    Hi Mark,

    Consider the following columns(all of them are used as predicates in different queries) and their distinct values.Can you suggest the order of columns for creating an index?
    The table has ~ 32million rows.This table is also updated with the column DIT_FLAG.
    Can we keep a volatile column as a leading column even it's most selective?


    Column          NUM_DISTINCT
    PNO                394
    EMPLID           31366
    DEPBENEF            14
    EMPL_RCD             1
    PLANTYPE             2
    ACTN                 3
    DIT_FLAG             3


    Thanks


    ________________________________
    From: "Powell, Mark" <mark.powell2@hp.com>
    To: "oracle-l@freelists.org" <oracle-l@freelists.org>
    Sent: Thursday, February 2, 2012 4:57 PM
    Subject: RE: Concatenated Index Column Order - Does it really matters?

    No, you should make the "most selected" column the leading column of the multi-column index.  That is if 6 queries reference one or both of the two columns and the 3 queries that reference only one of the columns reference the same column then you normally want that column to be the leading column (providing the data is not badly skewed).

    Ever case depends on the data distribution and the actual queries being ran but in general I suggest looking at what columns are referenced and which other indexes you would need to create if you choose a specific order.

    The least selective column being first could be beneficial if index compression is used.  Again I find use of this feature has to be judged on a case by case basis.
  • David Fitzjarrell at Feb 10, 2012 at 9:03 pm
    I agree with Mark that the most selective column should be the leading column, however it does depend upon your data distribution and how the index will be used (as in which columns will be most often accessed).  Oracle can occasionally invoke an index skip scan although I usually find this to occur when the leading column is not the most selective.  This also depends upon which release of Oracle you're using; with 11g you could test this by creating the index as invisible then activating it at the session level to determine how the suite of queries utilize it.  Tweak the index as necessary then create the tested and approved configuration in production.

    I'll state again your situation depends upon how your data in the various columns is distributed.  There is no 'one-shot-works-in-every-situation' recommendation.
    David Fitzjarrell



    ________________________________
    From: Antony Raj <ca_raj@yahoo.com>
    To: "mark.powell2@hp.com" <mark.powell2@hp.com>; "oracle-l@freelists.org" <oracle-l@freelists.org>
    Sent: Friday, February 10, 2012 1:21 PM
    Subject: Re: Concatenated Index Column Order - Does it really matters?

    Hi Mark,

    Consider the following columns(all of them are used as predicates in different queries) and their distinct values.Can you suggest the order of columns for creating an index?
    The table has ~ 32million rows.This table is also updated with the column DIT_FLAG.
    Can we keep a volatile column as a leading column even it's most selective?


    Column          NUM_DISTINCT
    PNO                394
    EMPLID           31366
    DEPBENEF            14
    EMPL_RCD             1
    PLANTYPE             2
    ACTN                 3
    DIT_FLAG             3


    Thanks


    ________________________________
    From: "Powell, Mark" <mark.powell2@hp.com>
    To: "oracle-l@freelists.org" <oracle-l@freelists.org>
    Sent: Thursday, February 2, 2012 4:57 PM
    Subject: RE: Concatenated Index Column Order - Does it really matters?

    No, you should make the "most selected" column the leading column of the multi-column index.  That is if 6 queries reference one or both of the two columns and the 3 queries that reference only one of the columns reference the same column then you normally want that column to be the leading column (providing the data is not badly skewed).

    Ever case depends on the data distribution and the actual queries being ran but in general I suggest looking at what columns are referenced and which other indexes you would need to create if you choose a specific order.

    The least selective column being first could be beneficial if index compression is used.  Again I find use of this feature has to be judged on a case by case basis.
  • Antony Raj at Feb 10, 2012 at 9:18 pm
    Hi David,

    I agree with you.It's all depends on the data distribution.
    But in general,is it recommended to index a column which is getting updated even though it's used as a predicate in a query?
    My understanding is that when a column is updated,Oracle has to to delete an entry and make a new index entry and if we update quite a larger number of rows,then index maintenance would slow down the elapsed time of UPDATE.

    Thanks

    ________________________________
    From: David Fitzjarrell <oratune@yahoo.com>
    To: "ca_raj@yahoo.com" <ca_raj@yahoo.com>; "mark.powell2@hp.com" <mark.powell2@hp.com>; "oracle-l@freelists.org" <oracle-l@freelists.org>
    Sent: Friday, February 10, 2012 4:02 PM
    Subject: Re: Concatenated Index Column Order - Does it really matters?

    I agree with Mark that the most selective column should be the leading column, however it does depend upon your data distribution and how the index will be used (as in which columns will be most often accessed).  Oracle can occasionally invoke an index skip scan although I usually find this to occur when the leading column is not the most selective.  This also depends upon which release of Oracle you're using; with 11g you could test this by creating the index as invisible then activating it at the session level to determine how the suite of queries utilize it.  Tweak the index as necessary then create the tested and approved configuration in production.

    I'll state again your situation depends upon how your data in the various columns is distributed.  There is no 'one-shot-works-in-every-situation' recommendation.
    David Fitzjarrell



    ________________________________
    From: Antony Raj <ca_raj@yahoo.com>
    To: "mark.powell2@hp.com" <mark.powell2@hp.com>; "oracle-l@freelists.org" <oracle-l@freelists.org>
    Sent: Friday, February 10, 2012 1:21 PM
    Subject: Re: Concatenated Index Column Order - Does it really matters?

    Hi Mark,

    Consider the following columns(all of them are used as predicates in different queries) and their distinct values.Can you suggest the order of columns for creating an index?
    The table has ~ 32million rows.This table is also updated with the column DIT_FLAG.
    Can we keep a volatile column as a leading column even it's most selective?


    Column          NUM_DISTINCT
    PNO                394
    EMPLID           31366
    DEPBENEF            14
    EMPL_RCD             1
    PLANTYPE             2
    ACTN                 3
    DIT_FLAG             3


    Thanks


    ________________________________
    From: "Powell, Mark" <mark.powell2@hp.com>
    To: "oracle-l@freelists.org" <oracle-l@freelists.org>
    Sent: Thursday, February 2, 2012 4:57 PM
    Subject: RE: Concatenated Index Column Order - Does it really matters?

    No, you should make the "most selected" column the leading column of the multi-column index.  That is if 6 queries reference one or both of the two columns and the 3 queries that reference only one of the columns reference the same column then you normally want that column to be the leading column (providing the data is not badly skewed).

    Ever case depends on the data distribution and the actual queries being ran but in general I suggest looking at what columns are referenced and which other indexes you would need to create if you choose a specific order.

    The least selective column being first could be beneficial if index compression is used.  Again I find use of this feature has to be judged on a case by case basis.
  • Bobak, Mark at Feb 10, 2012 at 10:05 pm
    Hi Antony,

    (Note, this is a different Mark than replied previously. :-))

    Column order in a concatenated index certainly does matter. But, column volatility shouldn't be a consideration. Index query usage, how many queries use the index, and the most popular column in query predicates, should be considered. Secondary to that, all other things being equal, you should put your *least* selective columns first, so that you can leverage index compression.

    Hope that helps,

    -Mark

    -----Original Message-----
    From: oracle-l-bounce@freelists.org On Behalf Of Antony Raj
    Sent: Friday, February 10, 2012 3:22 PM
    To: mark.powell2@hp.com; oracle-l@freelists.org
    Subject: Re: Concatenated Index Column Order - Does it really matters?

    Hi Mark,

    Consider the following columns(all of them are used as predicates in different queries) and their distinct values.Can you suggest the order of columns for creating an index?
    The table has ~ 32million rows.This table is also updated with the column DIT_FLAG.
    Can we keep a volatile column as a leading column even it's most selective?


    Column          NUM_DISTINCT
    PNO                394
    EMPLID           31366
    DEPBENEF            14
    EMPL_RCD             1
    PLANTYPE             2
    ACTN                 3
    DIT_FLAG             3


    Thanks


    ________________________________
    From: "Powell, Mark" <mark.powell2@hp.com>
    To: "oracle-l@freelists.org" <oracle-l@freelists.org>
    Sent: Thursday, February 2, 2012 4:57 PM
    Subject: RE: Concatenated Index Column Order - Does it really matters?

    No, you should make the "most selected" column the leading column of the multi-column index.  That is if 6 queries reference one or both of the two columns and the 3 queries that reference only one of the columns reference the same column then you normally want that column to be the leading column (providing the data is not badly skewed).

    Ever case depends on the data distribution and the actual queries being ran but in general I suggest looking at what columns are referenced and which other indexes you would need to create if you choose a specific order.

    The least selective column being first could be beneficial if index compression is used.  Again I find use of this feature has to be judged on a case by case basis.
    --
    http://www.freelists.org/webpage/oracle-l




    --
    http://www.freelists.org/webpage/oracle-l
  • Wolfgang Breitling at Feb 13, 2012 at 1:28 am
    I beg to differ a bit. Column volatility should factor into the decision.
    I did encounter an extreme example a few years back. There was an update statement in a job which took several hours. When we looked into trying to tune it, the statement ran in a few seconds when we re-ran it. Digging into it I found that the faster update wasn't really a re-run of the orignal update. In the job stream the table was truncated and then re-loaded with some of the columns being initially assigned default values which would be updated to final values ( based on content from other tables ) later in the jobstream. On of the columns was initialized to 0 and later updated to huge values and one of the indexes had this column as the leading column, correct for the later use of the table. The update of the leading column had extreme index maintenance implication. Every index entry needed to be removed from the "left" edge of the index and inserted into the "right". That caused the runtime of several hours. When we "re-ran" the update the values for this column didn't
    change so there was no index maintenance. Altering the index unusable prior to the update statement ( not prior to the job since a truncate makes indexes usable ) made the update run several hours faster.
    You need to weight the benefit of an index in queries against the maintenance cost in dml and column order and volatility play an important role there.
    On 2012-02-10, at 3:02 PM, Bobak, Mark wrote:

    Column order in a concatenated index certainly does matter. But, column volatility shouldn't be a consideration.

    --
    http://www.freelists.org/webpage/oracle-l
  • Antony Raj at Feb 13, 2012 at 6:32 pm
    Thank you Wolfgang!
    I am 100% with your comment.

    ________________________________
    From: Wolfgang Breitling <breitliw@centrexcc.com>
    To: Mark.Bobak@proquest.com
    Cc: "ca_raj@yahoo.com" <ca_raj@yahoo.com>; "mark.powell2@hp.com" <mark.powell2@hp.com>; "oracle-l@freelists.org" <oracle-l@freelists.org>
    Sent: Sunday, February 12, 2012 8:27 PM
    Subject: Re: Concatenated Index Column Order - Does it really matters?

    I beg to differ a bit. Column volatility should factor into the decision.
    I did encounter an extreme example a few years back. There was an update statement in a job which took several hours. When we looked into trying to tune it, the statement ran in a few seconds when we re-ran it. Digging into it I found that the faster update wasn't really a re-run of the orignal update. In the job stream the table was truncated and then re-loaded with some of the columns being initially assigned default values which would be updated to final values ( based on content from other tables ) later in the jobstream. On of the columns was initialized to 0 and later updated to huge values and one of the indexes had this column as the leading column, correct for the later use of the table. The update of the leading column had extreme index maintenance implication. Every index entry needed to be removed from the "left" edge of the index and inserted into the "right". That caused the runtime of several hours. When we "re-ran" the update the values
    for this column didn't
    change so there was no index maintenance. Altering the index unusable prior to the update statement ( not prior to the job since a truncate makes indexes usable ) made the update run several hours faster.
    You need to weight the benefit of an index in queries against the maintenance cost in dml and column order and volatility play an important role there.
    On 2012-02-10, at 3:02 PM, Bobak, Mark wrote:

    Column order in a concatenated index certainly does matter.  But, column volatility shouldn't be a consideration.

    --
    http://www.freelists.org/webpage/oracle-l
    --
    http://www.freelists.org/webpage/oracle-l

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouporacle-l @
categoriesoracle
postedJan 31, '12 at 6:20p
activeFeb 13, '12 at 6:32p
posts11
users7
websiteoracle.com

People

Translate

site design / logo © 2022 Grokbase