I've implemented a pretty simple Materialized Views scheme. It's not
terribly complicated, and it works quite well.

This is what I do.

0) Initialize the materialized view environment. This will allow the
system to track which tables and views are part of a materialized view,
as well as when the last time it was refreshed was.

CREATE TABLE matviews (
mv_name NAME NOT NULL
, mv_view NAME NOT NULL
, last_refresh TIMESTAMP
);

1) Create a view. This will be what the materialized view should keep in
sync with. One column must be unique and non-null. This will be the
primary key of the materialized view.

2) Create a table -- the materialized view -- from the view. I've wrapped
this up into a pl/PgSQL function. Pseudo-code is basically:
- Create a table, the materialized view.
- Select everything from the corresponding view into the materialized
view.
- Insert a row into matviews, last_refresh = now().

3) Create a function called "<mv name>_refresh_row(<primary key type>)".
This will:
- Delete the row from the materialized view with that primary key
- Select the row with that primary key from the view and insert it into
the materialized view.

4) If there is any sort of time-dependence, create a function called "<mv
name>_refresh()". This will find all the rows that have changed due to
the time-dependence. It uses "last_refresh" from the matviews table, and
"now()" to determine the timespan.

This function needs to be called periodically.

5) Create triggers on all tables that contribute to the view.
- An insert trigger, that will discover the primary key(s) that the
inserted row will affect, and refreshes those rows (using the
*_refresh_row function)
- An update trigger, that will discover all the primary key(s) that the
updated row will affect, and refreshes those rows. Note that the primary
keys may be different if the column that determines the primary key is
changing.
- A delete trigger, that will dicover all the primary key(s) that the
updated row will affect, and refreshes those rows.

The system has been running in a production environment for over a week,
with only one problem: deadlock when we were inserting vast amounts of
new data.

Adding appropriate indexes to the materialized views has reduced the query
times for some of our most important queries to 1/300 of the original
time.

There were some issues with the time-sensitivity of the queries. For
instance, our list of members will decrease as time goes along due to the
expiration date of accounts. Because we were running the refresh once a
day, there were a few hours of the day where the materialized view would
say that the person is a member, but the actual data says he is not. We
rewrote our code to pull everything from the materialized view, greatly
simplifying the code, and also increasing performance.

My next step is to write a generic algorithm for handling the tasks
mentioned in step s 4 and 5. I've written these by hand so far, because I
can tell which columns of the updated/inserted/deleted row determine
which rows in the materialized view will be affected. Any help in this
area would be greatly appreciated.

After that, I would like to investigate whether or not it is possible to
register a function to be called when the transaction is committed. That
way, the materialized view update can be deferred until the transaction
is complete. This would enhance performance.

- --
Jonathan Gardner
jgardner@jonathangardner.net

Search Discussions

  • Simon Riggs at Feb 21, 2004 at 12:32 pm

    Jonathan M. Gardner
    I've implemented a pretty simple Materialized Views scheme. It's not
    terribly complicated, and it works quite well.
    Exciting news - excellent work. Starting simple was the right approach!
    There were some issues with the time-sensitivity of the queries. For
    instance, our list of members will decrease as time goes along due to the
    expiration date of accounts. Because we were running the refresh once a
    day, there were a few hours of the day where the materialized view would
    say that the person is a member, but the actual data says he is not. We
    rewrote our code to pull everything from the materialized view, greatly
    simplifying the code, and also increasing performance.
    That's just "part of the package" of using Materialized Views. That is
    an acceptable trade-off for the performance gains realised.
    My next step...
    Could I suggest that your next step is to sync up with the work being
    done on tuning the DBT-3 query workload? As I'm sure you're aware, that
    is very similar to TPC-H workload, where most of the commercial RDBMS
    vendors utilise Materialized Views to enhance certain queries. Focusing
    on that workload may then suggest to you what the next steps to take
    are, now that you have solved the specific problems of your own
    workloads, though using a generic approach. I think ... Mark Wong, Josh
    Berkus and Tom Lane are currently involved with DBT-3 testing on the
    OSDL test environment.

    Materialized Views and improved join-ordering are the next two best
    angles of attack on the DBT-3 workload, IMHO.

    I very much look forward to further news.

    Best Regards, Simon Riggs
  • Tom Lane at Feb 21, 2004 at 4:43 pm

    "Simon Riggs" <simon@2ndquadrant.com> writes:
    Could I suggest that your next step is to sync up with the work being
    done on tuning the DBT-3 query workload? As I'm sure you're aware, that
    is very similar to TPC-H workload, where most of the commercial RDBMS
    vendors utilise Materialized Views to enhance certain queries.
    Oh? As far as I can tell, TPC-H forbids use of materialized views.
    See sections 1.5.6 and 1.5.7 of the spec. The effect of the fine print
    seems to be that the only way you are allowed to store extra copies of
    data is as indexes over columns that are primary keys, foreign keys,
    or date columns.

    regards, tom lane
  • Simon Riggs at Feb 24, 2004 at 12:17 am

    Tom Lane
    "Simon Riggs" <simon@2ndquadrant.com> writes:
    Could I suggest that your next step is to sync up with the work
    being
    done on tuning the DBT-3 query workload? As I'm sure you're aware,
    that
    is very similar to TPC-H workload, where most of the commercial
    RDBMS
    vendors utilise Materialized Views to enhance certain queries.
    Oh? As far as I can tell, TPC-H forbids use of materialized views.
    See sections 1.5.6 and 1.5.7 of the spec. The effect of the fine print
    seems to be that the only way you are allowed to store extra copies of
    data is as indexes over columns that are primary keys, foreign keys,
    or date columns.
    Sorry, I wasn't very clear there. I'm very happy that you're looking at
    this area and are able to slap my imprecision into shape so succinctly.

    You are 100% right: MVs are unambiguously not allowed as part of the
    TPC-H spec - what 1.5.7 shows is how useful MVs are: they've had to ban
    them! My take on the reason we now have MVs in all of the major
    commercial DBMS is because they weren't directly banned in the original
    TPC-D spec.
    [ http://www.tpc.org/tpch/spec/tpch2.1.0.pdf ]

    For me, there are two issues:
    i) passing the TPC-H test
    ii) dealing with a real-world workload

    IMHO, TPC-H is very much a real-world workload, so the difference is:
    i) coping with a TPC-H style workload when you have no a priori
    knowledge of what might be performed, and when: that is the heart of the
    TPC-H test. I think it is important that we strive to succeed on the
    pure test since there always will be many queries you can't predict.
    Removing MVs from that was an important statement about the requirement
    to cope with ad-hoc queries.

    ii) coping with the same workload when you have learnt something about
    it - i.e. you are able to tune the system over time. That's where MVs
    come in. Of course, you can go too far here, so there needs to be some
    judgement about what constitutes a real-world MV scenario.

    For me, the issue isn't passing the test, but exceeding it.

    If Jonathan's MV work can make a noticeable improvement on the DBT-3
    workload, then it will be worthwhile; this is likely to be intertwined
    with optimizer improvements required in other areas.

    However, overall it was very cheeky of me to presume to bring you or
    anybody else together on these things, so I'll butt out until I have
    time to contribute personally,

    Best Regards, Simon Riggs
  • Tom Lane at Feb 24, 2004 at 1:11 am

    "Simon Riggs" <simon@2ndquadrant.com> writes:
    You are 100% right: MVs are unambiguously not allowed as part of the
    TPC-H spec - what 1.5.7 shows is how useful MVs are: they've had to ban
    them!
    Ah, now I get your point. You're quite right, the TPC-H queries taken
    as a repetitive workload would be a good test case for materialized
    views. I misunderstood you as suggesting that we'd want to push use
    of MVs into the DBT-3 benchmark as such.

    regards, tom lane

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedFeb 20, '04 at 10:22p
activeFeb 24, '04 at 1:11a
posts5
users3
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase