FAQ
All,

In a 24*7 data warehouse (accessed by users globally from different
timezones), incremental ETL jobs are executed to load data from different
source systems. Due to excessive volume of source data the ETL window has
grown to over 8 hours and users from a particular timezone (the most
affected ones) have started complaining about performance of the system and
also have become very vocal about data inconsistency. e.g. figures of a
sales report can change as the ETL tool commits every 10K records.

In order to give the users a consistent view of data we are thinking of
using the Flashback query feature of the database. i.e. When the ETL starts,
flashback query will be enabled and the user queries (through a logon
trigger) will run against the flashback data and once the ETL completes the
queries will run against the tables.

Although the solution seems feasible, but having flashback to work for more
than 8 hours seems to be our biggest challenge. I wanted to verify if you
all agree with this approach or would you suggest using a simpler (but
costly) approach like routing queries to a read only standby database while
ETL jobs execute on the primary database.

Your suggestions are welcome. The oracle database version is 10.2.0.3, the
ETL tool used is Informatica and the reporting tool used is Siebel
Analytics.

Thanks
Deba.

Search Discussions

  • Dennis Williams at Mar 26, 2008 at 8:54 pm
    Deba,

    Are you using partitioning? Tim Gorman has a seminar that discusses how
    echange partition can be used to great advantage for just this situation.
    Basically you have a daily partition that you prepare as a table, then when
    it is complete, you exchange it with a partition. Unfortunately these
    methods work best when designed from the start, so probably not an immediate
    help to you, but something to keep in mind going forward.

    Dennis Williams
    On 3/26/08, Debaditya Chatterjee wrote:

    All,

    In a 24*7 data warehouse (accessed by users globally from different
    timezones), incremental ETL jobs are executed to load data from different
    source systems. Due to excessive volume of source data the ETL window has
    grown to over 8 hours and users from a particular timezone (the most
    affected ones) have started complaining about performance of the system and
    also have become very vocal about data inconsistency. e.g. figures of a
    sales report can change as the ETL tool commits every 10K records.

    In order to give the users a consistent view of data we are thinking of
    using the Flashback query feature of the database. i.e. When the ETL
    starts, flashback query will be enabled and the user queries (through a
    logon trigger) will run against the flashback data and once the ETL
    completes the queries will run against the tables.

    Although the solution seems feasible, but having flashback to work for
    more than 8 hours seems to be our biggest challenge. I wanted to verify if
    you all agree with this approach or would you suggest using a simpler (but
    costly) approach like routing queries to a read only standby database while
    ETL jobs execute on the primary database.

    Your suggestions are welcome. The oracle database version is 10.2.0.3, the
    ETL tool used is Informatica and the reporting tool used is Siebel
    Analytics.

    Thanks
    Deba.
    --
    http://www.freelists.org/webpage/oracle-l
  • Tony Aponte at Mar 26, 2008 at 10:15 pm
    The first thing I would say to you is to try the flashback/trigger
    strategy you've worked out and see how it affects the user queries. I
    have flashed back up to 10 days and still met query requirements.



    Another approach is the Kimball real-time data warehouse strategy that
    Dennis Williams already suggested.



    A third approach I would pursue is to use Analytic Workspaces. With
    that you can redirect the burden to the ETL workload instead and relieve
    the queries from having to indo the transactions. At the end of the ETL
    processing you would use the Workspace features to merge the changes
    that have been made to the base tables.



    Another approach could be to use Materialized Views and Query Rewrite
    against the base tables. This would give you control over when to
    publish the ETL updates by simply refreshing the MVs after the 8-hour
    jobs are complete, yet still provide a consistent view of the previous
    data.



    Tony Aponte



    From: oracle-l-bounce_at_freelists.org
    On Behalf Of Debaditya Chatterjee
    Sent: Wednesday, March 26, 2008 4:42 PM
    To: oracle-l_at_freelists.org
    Subject: Flashback query to ensure data consistency



    All,

    In a 24*7 data warehouse (accessed by users globally from different
    timezones), incremental ETL jobs are executed to load data from
    different source systems. Due to excessive volume of source data the ETL
    window has grown to over 8 hours and users from a particular timezone
    (the most affected ones) have started complaining about performance of
    the system and also have become very vocal about data inconsistency.
    e.g. figures of a sales report can change as the ETL tool commits every
    10K records.

    In order to give the users a consistent view of data we are thinking of
    using the Flashback query feature of the database. i.e. When the ETL
    starts, flashback query will be enabled and the user queries (through a
    logon trigger) will run against the flashback data and once the ETL
    completes the queries will run against the tables.

    Although the solution seems feasible, but having flashback to work for
    more than 8 hours seems to be our biggest challenge. I wanted to verify
    if you all agree with this approach or would you suggest using a simpler
    (but costly) approach like routing queries to a read only standby
    database while ETL jobs execute on the primary database.

    Your suggestions are welcome. The oracle database version is 10.2.0.3,
    the ETL tool used is Informatica and the reporting tool used is Siebel
    Analytics.

    Thanks
    Deba.
  • Ken Naim at Mar 27, 2008 at 1:19 am
    If you are using type 2 dimensions then run each query as of last night and
    you don't have to worry about the newly loaded data.



    Ken



    From: oracle-l-bounce_at_freelists.org
    On Behalf Of Tony Aponte
    Sent: Wednesday, March 26, 2008 6:15 PM
    To: debaditya.chatterjee_at_gmail.com; oracle-l_at_freelists.org
    Subject: RE: Flashback query to ensure data consistency



    The first thing I would say to you is to try the flashback/trigger strategy
    you've worked out and see how it affects the user queries. I have flashed
    back up to 10 days and still met query requirements.



    Another approach is the Kimball real-time data warehouse strategy that
    Dennis Williams already suggested.



    A third approach I would pursue is to use Analytic Workspaces. With that
    you can redirect the burden to the ETL workload instead and relieve the
    queries from having to indo the transactions. At the end of the ETL
    processing you would use the Workspace features to merge the changes that
    have been made to the base tables.



    Another approach could be to use Materialized Views and Query Rewrite
    against the base tables. This would give you control over when to publish
    the ETL updates by simply refreshing the MVs after the 8-hour jobs are
    complete, yet still provide a consistent view of the previous data.



    Tony Aponte



    From: oracle-l-bounce_at_freelists.org
    On Behalf Of Debaditya Chatterjee
    Sent: Wednesday, March 26, 2008 4:42 PM
    To: oracle-l_at_freelists.org
    Subject: Flashback query to ensure data consistency



    All,

    In a 24*7 data warehouse (accessed by users globally from different
    timezones), incremental ETL jobs are executed to load data from different
    source systems. Due to excessive volume of source data the ETL window has
    grown to over 8 hours and users from a particular timezone (the most
    affected ones) have started complaining about performance of the system and
    also have become very vocal about data inconsistency. e.g. figures of a
    sales report can change as the ETL tool commits every 10K records.

    In order to give the users a consistent view of data we are thinking of
    using the Flashback query feature of the database. i.e. When the ETL starts,
    flashback query will be enabled and the user queries (through a logon
    trigger) will run against the flashback data and once the ETL completes the
    queries will run against the tables.

    Although the solution seems feasible, but having flashback to work for more
    than 8 hours seems to be our biggest challenge. I wanted to verify if you
    all agree with this approach or would you suggest using a simpler (but
    costly) approach like routing queries to a read only standby database while
    ETL jobs execute on the primary database.

    Your suggestions are welcome. The oracle database version is 10.2.0.3, the
    ETL tool used is Informatica and the reporting tool used is Siebel
    Analytics.

    Thanks
    Deba.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouporacle-l @
categoriesoracle
postedMar 26, '08 at 8:41p
activeMar 27, '08 at 1:19a
posts4
users4
websiteoracle.com

People

Translate

site design / logo © 2022 Grokbase