FAQ
I have a large and busy OLTP table, 100GB or so, against which there is a need to capture changes. Until an asynchronous CDC solution is in place we have to rely on two columns: create_date and update_date (null until the first update), both being of DATE type of course.

These are currently unindexed, but there is a desire to index them to improve change capture performance for queries such as:

select ...
from ...

where (create_dt >= trunc(sysdate)-1 and create_dt < trunc(sysdate))
or (update_dt >= trunc(sysdate)-1 and update_dt < trunc(sysdate))

The desire is obviously to provide the maximum performance benefit while reducing the impact on the OLTP system.

I thought of four different indexing options:

i) Indexing the columns separately, leading in the best case to an unusual execution plan where the indexes are range scanned and the results merged before accessing the table.
ii) A single composite index (create_date,update_date),leading to a fast full index scan.
iii) A single composite index (update_date,create_date), rewriting the query predicate as ...

(create_dt >= trunc(sysdate)-1 and create_dt < trunc(sysdate) and update_date is null)
or (update_dt >= trunc(sysdate)-1 and update_dt < trunc(sysdate))
... and leading to two index range scans. (not sure about this)
iv) A single-column function based index on (Nvl(update_dt,create_dt)) and rewriting the predicate appropriately.

Whichever of these is chosen the growth pattern for the data naturally tends towards index contention as all the new values are equal to sysdate.

So the nub of my problem comes down to these questions:

Is it possible to mitigate the index contention problem with a suitably high initrans values? Does it seem likely that implementing a reverse index would be worthwhile in reducing contention (I could probably take the pain of the fast full scan if it was, although it would lead to a larger index due to the 50/50 block splits).

Would you regard implementing a function-based index as risky in any significant way on major tables of a busy OLTP system? How about a reverse function based index?

Basically, "what would you do"?

Maybe I'm overthinking it, but I'd like to go to the sytem developers with a fully thought out set of options (even if they do tell me to take a running jump anyway).

Search Discussions

  • Toon Koppelaars at Apr 9, 2008 at 3:37 am
    David,

    Is a row-trigger currently maintaining the CREATE_DT and UPDATE_DT columns?
    If so you could also opt for that row-trigger to dump (at create and update
    of a row) the PK-value of the row into a second new table, that only has the
    PK-column, together with some PROCESSED indicator.

    And then if you need to capture the changes use this small table as a
    nested-loop driver to your big one. First set processed flag to 'In process'
    or something, then process them, then delete all 'In process' rows (there
    could be new rows inserted during your processing, with process='n', by
    now).

    There would be *no* need to add any index, not on the big table (I'm
    assuming the PK is indexed...:-) ), and not on this new small one...

    Toon
    On 4/9/08, David Aldridge wrote:

    I have a large and busy OLTP table, 100GB or so, against which there is a
    need to capture changes. Until an asynchronous CDC solution is in place we
    have to rely on two columns: create_date and update_date (null until the
    first update), both being of DATE type of course.

    These are currently unindexed, but there is a desire to index them to
    improve change capture performance for queries such as:

    select ...
    from ...
    where (create_dt >= trunc(sysdate)-1 and create_dt < trunc(sysdate))
    or (update_dt >= trunc(sysdate)-1 and update_dt < trunc(sysdate))

    The desire is obviously to provide the maximum performance benefit while
    reducing the impact on the OLTP system.

    I thought of four different indexing options:

    i) Indexing the columns separately, leading in the best case to an unusual
    execution plan where the indexes are range scanned and the results merged
    before accessing the table.
    ii) A single composite index (create_date,update_date),leading to a fast
    full index scan.
    iii) A single composite index (update_date,create_date), rewriting the
    query predicate as ...
    (create_dt >= trunc(sysdate)-1 and create_dt < trunc(sysdate) and
    update_date is null)
    or (update_dt >= trunc(sysdate)-1 and update_dt < trunc(sysdate))
    ... and leading to two index range scans. (not sure about this)
    iv) A single-column function based index on (Nvl(update_dt,create_dt)) and
    rewriting the predicate appropriately.

    Whichever of these is chosen the growth pattern for the data naturally
    tends towards index contention as all the new values are equal to sysdate.

    So the nub of my problem comes down to these questions:

    Is it possible to mitigate the index contention problem with a suitably
    high initrans values? Does it seem likely that implementing a reverse index
    would be worthwhile in reducing contention (I could probably take the pain
    of the fast full scan if it was, although it would lead to a larger index
    due to the 50/50 block splits).

    Would you regard implementing a function-based index as risky in any
    significant way on major tables of a busy OLTP system? How about a reverse
    function based index?

    Basically, "what would you do"?

    Maybe I'm overthinking it, but I'd like to go to the sytem developers with
    a fully thought out set of options (even if they do tell me to take a
    running jump anyway).
    --
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
    Toon Koppelaars
    RuleGen BV
    +31-615907269
    toon_at_rulegen.com
    www.rulegen.com

    Author: "Applied Mathematics for Database Professionals"

    --
    http://www.freelists.org/webpage/oracle-l
  • Stephane Faroult at Apr 9, 2008 at 4:38 am
    David,

    First of all, forget about reverse indexes, because then you
    couldn't have range scans.
    There is something that is missing from your analysis, which is the
    relative ratio of inserts vs updates.
    Presumably, contention is mostly a concern for inserts, and create_dt.
    Unless updates are applied following a condition on create_dt, they
    shouldn't be much of an issue.
    I believe that you should consider your query as the union of two
    different SELECT statements (it's quite possible that this is what the
    optimizer does, anyway). For the SELECT that involves create_dt, what
    I'd like to have in such a case is a table partitioned by day, which
    would spare me an index and make contention not much worse than what it
    is without an index (moreover, depending on your data, subpartitioning
    might help). This would make update_dt the only column worth indexing,
    adding some overhead because of index management but little contention
    issues.

    If partitioning isn't an option (not least because the table is already
    partitioned), two separate columns are more costly than one, and I don't
    believe that ii) would be very efficient. I'd have the same doubts as
    you concerning iii), you have high odds of sending the optimizer on a
    wrong track with such a query. The solution I'd like best is probably
    the function-based index, it would cost a little extra CPU, and all the
    more that rows are frequently updated, but if CPU isn't the bottleneck
    it may be OK. Note also that if you have more updates than inserts, it
    should relieve contention. I wouldn't consider such a simple FBI to be
    much more of a risk than a regular index.

    HTH

    Stéphane Faroult

    David Aldridge wrote:
    I have a large and busy OLTP table, 100GB or so, against which there
    is a need to capture changes. Until an asynchronous CDC solution is in
    place we have to rely on two columns: create_date and update_date
    (null until the first update), both being of DATE type of course.

    These are currently unindexed, but there is a desire to index them to
    improve change capture performance for queries such as:

    select ...
    from ...
    where (create_dt >= trunc(sysdate)-1 and create_dt < trunc(sysdate))
    or (update_dt >= trunc(sysdate)-1 and update_dt < trunc(sysdate))

    The desire is obviously to provide the maximum performance benefit
    while reducing the impact on the OLTP system.

    I thought of four different indexing options:

    i) Indexing the columns separately, leading in the best case to an
    unusual execution plan where the indexes are range scanned and the
    results merged before accessing the table.
    ii) A single composite index (create_date,update_date),leading to a
    fast full index scan.
    iii) A single composite index (update_date,create_date), rewriting the
    query predicate as ...
    (create_dt >= trunc(sysdate)-1 and create_dt < trunc(sysdate) and
    update_date is null)
    or (update_dt >= trunc(sysdate)-1 and update_dt < trunc(sysdate))
    ... and leading to two index range scans. (not sure about this)
    iv) A single-column function based index on
    (Nvl(update_dt,create_dt)) and rewriting the predicate appropriately.
    Whichever of these is chosen the growth pattern for the data naturally
    tends towards index contention as all the new values are equal to
    sysdate.

    So the nub of my problem comes down to these questions:

    Is it possible to mitigate the index contention problem with a
    suitably high initrans values? Does it seem likely that implementing a
    reverse index would be worthwhile in reducing contention (I could
    probably take the pain of the fast full scan if it was, although it
    would lead to a larger index due to the 50/50 block splits).

    Would you regard implementing a function-based index as risky in any
    significant way on major tables of a busy OLTP system? How about a
    reverse function based index?

    Basically, "what would you do"?

    Maybe I'm overthinking it, but I'd like to go to the sytem developers
    with a fully thought out set of options (even if they do tell me to
    take a running jump anyway).
    --
    http://www.freelists.org/webpage/oracle-l
  • Oxnard Montalvo at Apr 9, 2008 at 1:46 pm
    If your Goal is to perform CDC and you are on 10g or higher I would strongly recommend you check out http://download.oracle.com/docs/cd/B19306_01/server.102/b14223/cdc.htm#i1028295 I have found the CDC process developed by Oracle works very well and solves many of these issues by avoiding them all together.

    Ox
  • David Aldridge at Apr 10, 2008 at 2:11 am
    Sadly not -- the code that maintains the table is untouchable. Thanks for the thought though Toon.

    Still, maybe a materialized view log capturing only the PK value ... hmmm.

    Original Message ----
    From: Toon Koppelaars
    To: david_at_david-aldridge.com
    Cc: oracle-l_at_freelists.org
    Sent: Tuesday, April 8, 2008 11:37:39 PM
    Subject: Re: Indexing options to avoid contention -- 10gR2

    David,
    Â

    Is a row-trigger currently maintaining the CREATE_DT and UPDATE_DT columns?
    If so you could also opt for that row-trigger to dump (at create and update of a row) the PK-value of the row into a second new table, that only has the PK-column, together with some PROCESSED indicator.
    Â

    And then if you need to capture the changes use this small table as a nested-loop driver to your big one. First set processed flag to 'In process' or something, then process them, then delete all 'In process' rows (there could be new rows inserted during your processing, with process='n', by now).
    Â

    There would be *no* need to add any index, not on the big table (I'm assuming the PK is indexed...:-) ), and not on this new small one...
    Â

    Toon
    Â

    On 4/9/08, David Aldridge wrote:
    I have a large and busy OLTP table, 100GB or so, against which there is a need to capture changes. Until an asynchronous CDC solution is in place we have to rely on two columns: create_date and update_date (null until the first update), both being of DATE type of course.

    These are currently unindexed, but there is a desire to index them to improve change capture performance for queries such as:
    Â

    select ...
    from ...

    where (create_dt >= trunc(sysdate)-1 and create_dt < trunc(sysdate))
    or (update_dt >= trunc(sysdate)-1 and update_dt < trunc(sysdate))

    The desire is obviously to provide the maximum performance benefit while reducing the impact on the OLTP system.
    Â

    I thought of four different indexing options:
    Â

    i) Indexing the columns separately, leading in the best case to an unusual execution plan where the indexes are range scanned and the results merged before accessing the table.
    ii) A single composite index (create_date,update_date),leading to a fast full index scan.
    iii) A single composite index (update_date,create_date), rewriting the query predicate as ...

       (create_dt >= trunc(sysdate)-1 and create_dt < trunc(sysdate) and update_date is null)
    or (update_dt >= trunc(sysdate)-1 and update_dt < trunc(sysdate))
    ... and leading to two index range scans. (not sure about this)
    iv) A single-column function based index on (Nvl(update_dt,create_dt)) and rewriting the predicate appropriately.
    Â

    Whichever of these is chosen the growth pattern for the data naturally tends towards index contention as all the new values are equal to sysdate.
    Â

    So the nub of my problem comes down to these questions:
    Â

    Is it possible to mitigate the index contention problem with a suitably high initrans values? Does it seem likely that implementing a reverse index would be worthwhile in reducing contention (I could probably take the pain of the fast full scan if it was, although it would lead to a larger index due to the 50/50 block splits).
    Â

    Would you regard implementing a function-based index as risky in any significant way on major tables of a busy OLTP system? How about a reverse function based index?
    Â

    Basically, "what would you do"?
    Â

    Maybe I'm overthinking it, but I'd like to go to the sytem developers with a fully thought out set of options (even if they do tell me to take a running jump anyway).

    --
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
    Toon Koppelaars
    RuleGen BV
    +31-615907269
    toon_at_rulegen.com
    www.rulegen.com

    Author: "Applied Mathematics for Database Professionals"
    --
    http://www.freelists.org/webpage/oracle-l
  • David Aldridge at Apr 10, 2008 at 2:25 am
    It's really a matter of the lesser of two evils, I think -- losing the index range scan or being vulnerable to index contention. The fast full index scan necessitated by the reverse index would still be better than the full table scan without any index, even if it's not up to the standards of an index range scan. I could cut down on the size by using a function-based index that nulls anything older than the first day that this process will run, which is a considerable amount of data, and redefining the historical cut-off every now and then to keep it trimmed down.

    Updates are probably more frequent than inserts, given the nature of the table and the data. After all, the average number of times for a row to be updated only has to be slightly greater than one for it to be grater than the number of inserts by definition.

    I've got doubts about the UNION -- I'd want to avoid the inherent DISTINCT by using UNION ALL, and slightly more complex logic in one of the two queries to avoid projecting the same row more than once if it was both created and updated within the capture time period. Unfortunately partitioning, while technically an option installed on the database, is not an option in the more casual sense. It would be a Really Big Deal and I have to tread very lightly on this system.

    I think you're right about the FBI. It does sound like an attractive option.

    Thanks Stephane

    ----- Original Message ----
    From: Stephane Faroult
    To: david_at_david-aldridge.com
    Cc: oracle-l_at_freelists.org
    Sent: Wednesday, April 9, 2008 12:38:27 AM
    Subject: Re: Indexing options to avoid contention -- 10gR2

    David,

    First of all, forget about reverse indexes, because then you
    couldn't have range scans.
    There is something that is missing from your analysis, which is the
    relative ratio of inserts vs updates.
    Presumably, contention is mostly a concern for inserts, and create_dt.
    Unless updates are applied following a condition on create_dt, they
    shouldn't be much of an issue.
    I believe that you should consider your query as the union of two
    different SELECT statements (it's quite possible that this is what the
    optimizer does, anyway). For the SELECT that involves create_dt, what
    I'd like to have in such a case is a table partitioned by day, which
    would spare me an index and make contention not much worse than what it
    is without an index (moreover, depending on your data, subpartitioning
    might help). This would make update_dt the only column worth indexing,
    adding some overhead because of index management but little contention
    issues.

    If partitioning isn't an option (not least because the table is already
    partitioned), two separate columns are more costly than one, and I don't
    believe that ii) would be very efficient. I'd have the same doubts as
    you concerning iii), you have high odds of sending the optimizer on a
    wrong track with such a query. The solution I'd like best is probably
    the function-based index, it would cost a little extra CPU, and all the
    more that rows are frequently updated, but if CPU isn't the bottleneck
    it may be OK. Note also that if you have more updates than inserts, it
    should relieve contention. I wouldn't consider such a simple FBI to be
    much more of a risk than a regular index.

    HTH

    Stéphane Faroult

    David Aldridge wrote:
    I have a large and busy OLTP table, 100GB or so, against which there
    is a need to capture changes. Until an asynchronous CDC solution is in
    place we have to rely on two columns: create_date and update_date
    (null until the first update), both being of DATE type of course.

    These are currently unindexed, but there is a desire to index them to
    improve change capture performance for queries such as:

    select ...
    from ...
    where (create_dt >= trunc(sysdate)-1 and create_dt < trunc(sysdate))
    or (update_dt >= trunc(sysdate)-1 and update_dt < trunc(sysdate))

    The desire is obviously to provide the maximum performance benefit
    while reducing the impact on the OLTP system.

    I thought of four different indexing options:

    i) Indexing the columns separately, leading in the best case to an
    unusual execution plan where the indexes are range scanned and the
    results merged before accessing the table.
    ii) A single composite index (create_date,update_date),leading to a
    fast full index scan.
    iii) A single composite index (update_date,create_date), rewriting the
    query predicate as ...
    (create_dt >= trunc(sysdate)-1 and create_dt < trunc(sysdate) and
    update_date is null)
    or (update_dt >= trunc(sysdate)-1 and update_dt < trunc(sysdate))
    ... and leading to two index range scans. (not sure about this)
    iv) A single-column function based index on
    (Nvl(update_dt,create_dt)) and rewriting the predicate appropriately.
    Whichever of these is chosen the growth pattern for the data naturally
    tends towards index contention as all the new values are equal to
    sysdate.

    So the nub of my problem comes down to these questions:

    Is it possible to mitigate the index contention problem with a
    suitably high initrans values? Does it seem likely that implementing a
    reverse index would be worthwhile in reducing contention (I could
    probably take the pain of the fast full scan if it was, although it
    would lead to a larger index due to the 50/50 block splits).

    Would you regard implementing a function-based index as risky in any
    significant way on major tables of a busy OLTP system? How about a
    reverse function based index?

    Basically, "what would you do"?

    Maybe I'm overthinking it, but I'd like to go to the sytem developers
    with a fully thought out set of options (even if they do tell me to
    take a running jump anyway).
    --
    http://www.freelists.org/webpage/oracle-l
  • David Aldridge at Apr 10, 2008 at 2:31 am
    If you're talking about Oracle Streams, then here's the problem with that -- it's not a revenue generator for Oracle because it just comes with EE anyway. It may well be robust technology etc.,but you need very good people to work on it because from some informal discussions I've had, including with Oracle people, support for Streams is not much of a priority.

    Our asynchronous CDC solution will be via Informatica PowerExchange, which is really a connector for LogMiner. It has the advantages that we already are licensed for it, we barely need any DBA or other team support to run it, and Informatica set it up for us gratis on our performance test environment.

    Original Message ----
    From: Oxnard Montalvo
    To: david_at_david-aldridge.com
    Cc: oracle-l_at_freelists.org
    Sent: Wednesday, April 9, 2008 9:46:33 AM
    Subject: Re: Indexing options to avoid contention -- 10gR2

    If your Goal is to perform CDC and you are on 10g or higher I would strongly recommend you check out http://download.oracle.com/docs/cd/B19306_01/server.102/b14223/cdc.htm#i1028295 I have found the CDC process developed by Oracle works very well and solves many of these issues by avoiding them all together.

    Ox

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouporacle-l @
categoriesoracle
postedApr 9, '08 at 1:29a
activeApr 10, '08 at 2:31a
posts7
users4
websiteoracle.com

People

Translate

site design / logo © 2021 Grokbase