Attached is a patch that is still WIP but that I think is getting
pretty close to completion. It is not intended to be the be-all and
end-all for materialized views, but the minimum useful feature set --
which is all that I've had time to do for this release. In
particular, the view is only updated on demand by a complete rebuild.
For the next release, I hope to build on this base to allow more
eager and incremental updates, and perhaps a concurrent batch update.

1.  CREATE MATERIALIZED VIEW syntax is stolen directly from CREATE
TABLE AS, with all the same clauses supported. That includes
declaring a materialized view to be temporary or unlogged.
2.  MVs don't support inheritance.
3.  MVs can't define foreign keys.
4.  MVs can't be the target of foreign keys.
5.  MVs can't have triggers.
6.  Users can't create rules which reference MVs (although MVs
[ab]use the rules mechanism internally, similar to how views do).
7.  MVs can't be converted to views, nor vice versa.
8.  Users may not directly use INSERT/UPDATE/DELETE on an MV.
9.  MVs can't directly be used in a COPY statement, but can be the
source of data using a SELECT.
10. MVs can't own sequences.
11. MVs can't be the target of LOCK statements, although other
statements get locks just like a table.
12. MVs can't use data modifying CTEs in their definitions.
13. pg_class now has a relisvalid column, which is true if an MV is
truncated or created WITH NO DATA. You can not scan a relation
flagged as invalid.
14. ALTER MATERIALIZED VIEW is supported for the options that seemed
to make sense. For example, you can change the tablespace or
schema, but you cannot add or drop column with ALTER.
15. The SELECT query used to define the MV may not contain a
data-modifying CTE.
16. To get new data into the MV, the command is LOAD MATERIALIZED
VIEW mat view_name. This seemed more descriptive to me that the
alternatives and avoids declaring any new keywords beyond
MATERIALIZED. If the MV is flagged as relisvalid == false, this
will change it to true.
17. Since the data viewed in an MV is not up-to-date with the latest
committed transaction, it didn't seem to make any sense to try to
apply SERIALIZABLE transaction semantics to queries looking at
the contents of an MV, although if LMV is run in a SERIALIZABLE
transaction the MV data is guaranteed to be free of serialization
anomalies. This does leave the transaction running the LOAD
command vulnerable to serialization failures unless it is also
READ ONLY DEFERRABLE.
18. Bound parameters are not supported for the CREATE MATERIALIZED
VIEW statement.
19. LMV doesn't show a row count. It wouldn't be hard to add, it just
seemed a little out of place to do that, when CLUSTER, etc.,
don't.

I wasn't able to wrap up a few things in time for this commitfest:

- Documentation is incomplete.
- pg_dump support needs addtional dependencies added to properly
handle MVs which are defined using other MVs.
- pg_dump binary hasn't had a lot of attention yet.
- There are no regression tests yet.
- I ran into problems getting the validity check working right, so I
have disabled it by commenting out the function body in this
patch.
- TRUNCATE should probably support a MATERIALIZED VIEW clause.

It would be good to have some discussion to try to reach a consensus
about whether we need to differentiate between *missing* datat (where
a materialized view which has been loaded WITH NO DATA or TRUNCATEd
and has not been subsequently LOADed) and potentially *stale* data.
If we don't care to distinguish between a view which generated no
rows when it ran and a one for which the query has not been run, we
can avoid adding the relisvalid flag, and we could support UNLOGGED
MVs. Perhaps someone can come up with a better solution to that
problem.

In the long term, we will probably need to separate the
implementation of CREATE TABLE AS and CREATE MATERIALIZED VIEW, but
for now there is so little that they need to do differently it seemed
less evil to have a few "if" clauses that that much duplicated code.

The paint is pretty wet still, but hopefully people can evaluate the
approach and work out any issues with the design choices in the CF so
that it can be wrapped up nicely for the next one.

92 files changed, 2377 insertions(+), 440 deletions(-)

-Kevin

Search Discussions

  • Alvaro Herrera at Nov 15, 2012 at 4:57 am
    Kevin Grittner wrote:

    Interesting stuff.
    /*
    + * SetRelationIsValid
    + * Set the value of the relation's relisvalid field in pg_class.
    + *
    + * NOTE: caller must be holding an appropriate lock on the relation.
    + * ShareUpdateExclusiveLock is sufficient.
    + *
    + * NOTE: an important side-effect of this operation is that an SI invalidation
    + * message is sent out to all backends --- including me --- causing plans
    + * referencing the relation to be rebuilt with the new list of children.
    + * This must happen even if we find that no change is needed in the pg_class
    + * row.
    + */
    + void
    + SetRelationIsValid(Oid relationId, bool relisvalid)
    + {
    It's not clear to me that it's right to do this by doing regular heap
    updates here instead of heap_inplace_update. Also, I think this might
    end up causing a lot of pg_class tuple churn (at least for matviews that
    delete rows at xact end), which would be nice to avoid.

    --
    Álvaro Herrera http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
  • Greg Smith at Nov 15, 2012 at 10:34 am

    On 11/14/12 6:28 PM, Kevin Grittner wrote:
    - Documentation is incomplete.
    ...
    - There are no regression tests yet.
    Do you have any simple test cases you've been using you could attach?
    With epic new features like this, when things don't work it's hard to
    distinguish between "that just isn't implemented yet" and "the author
    never tested that". Having some known good samples you have tested,
    even if they're not proper regression tests, would be helpful for
    establishing the code baseline works.

    --
    Greg Smith 2ndQuadrant US greg@2ndquadrant.com Baltimore, MD
    PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
  • Josh Berkus at Nov 15, 2012 at 6:36 pm
    Kevin,
    Attached is a patch that is still WIP but that I think is getting
    pretty close to completion. It is not intended to be the be-all and
    end-all for materialized views, but the minimum useful feature set --
    which is all that I've had time to do for this release. In
    particular, the view is only updated on demand by a complete rebuild.
    For the next release, I hope to build on this base to allow more
    eager and incremental updates, and perhaps a concurrent batch update.
    Nice to see this come in!
    1. CREATE MATERIALIZED VIEW syntax is stolen directly from CREATE
    TABLE AS, with all the same clauses supported. That includes
    declaring a materialized view to be temporary or unlogged.
    What use would a temporary matview be?

    Unlogged is good.
    2. MVs don't support inheritance.
    In which direction? Can't inherit, or can't be inherited from?
    3. MVs can't define foreign keys.
    4. MVs can't be the target of foreign keys.
    5. MVs can't have triggers.
    Makes sense.
    9. MVs can't directly be used in a COPY statement, but can be the
    source of data using a SELECT.
    Hmmm? I don't understand the reason for this.
    13. pg_class now has a relisvalid column, which is true if an MV is
    truncated or created WITH NO DATA. You can not scan a relation
    flagged as invalid.
    What error would a user see?
    14. ALTER MATERIALIZED VIEW is supported for the options that seemed
    to make sense. For example, you can change the tablespace or
    schema, but you cannot add or drop column with ALTER.
    How would you change the definition of an MV then?
    16. To get new data into the MV, the command is LOAD MATERIALIZED
    VIEW mat view_name. This seemed more descriptive to me that the
    alternatives and avoids declaring any new keywords beyond
    MATERIALIZED. If the MV is flagged as relisvalid == false, this
    will change it to true.
    UPDATE MATERIALIZED VIEW was problematic?

    Does LOAD automatically TRUNCATE the view before reloading it? If not,
    why not?
    It would be good to have some discussion to try to reach a consensus
    about whether we need to differentiate between *missing* datat (where
    a materialized view which has been loaded WITH NO DATA or TRUNCATEd
    and has not been subsequently LOADed) and potentially *stale* data.
    If we don't care to distinguish between a view which generated no
    rows when it ran and a one for which the query has not been run, we
    can avoid adding the relisvalid flag, and we could support UNLOGGED
    MVs. Perhaps someone can come up with a better solution to that
    problem.
    Hmmm. I understand the distinction you're making here, but I'm not sure
    it actually matters to the user. MVs, by their nature, always have
    potentially stale data. Being empty (in an inaccurate way) is just one
    kind of stale data.

    It would be nice for the user to have some way to know that a matview is
    empty due to never being LOADed or recently being TRUNCATEd. However, I
    don't think that relisvalid flag -- and preventing scanning the relation
    -- is a good solution. What I'd rather have instead is a timestamp of
    when the MV was last LOADed. If the MV was never loaded (or was
    truncated) that timestamp would be NULL. Such a timestamp would allow
    users to construct all kinds of ad-hoc refresh schemes for MVs which
    would not be possible without it.

    I don't see how this relates to UNLOGGED matviews either way.

    --
    Josh Berkus
    PostgreSQL Experts Inc.
    http://pgexperts.com
  • Robert Haas at Nov 16, 2012 at 5:02 pm

    On Thu, Nov 15, 2012 at 1:36 PM, Josh Berkus wrote:
    Hmmm. I understand the distinction you're making here, but I'm not sure
    it actually matters to the user. MVs, by their nature, always have
    potentially stale data. Being empty (in an inaccurate way) is just one
    kind of stale data.
    This is my feeling also.
    I don't see how this relates to UNLOGGED matviews either way.
    Right now, Kevin has things set up so that when you do "TRUNCATE mv",
    it clears the relisvalid flag. If we allowed unlogged materialized
    views, the table would be automatically truncated on a crash, but
    there wouldn't be any way to clear relisvalid in that situation, so
    Kevin felt we should simply disable unlogged MVs. Personally, I'm not
    excited about having a relisvalid flag at all, and doubly not excited
    if it means we can't have unlogged MVs.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Jeff Davis at Nov 15, 2012 at 11:35 pm

    On Wed, 2012-11-14 at 21:28 -0500, Kevin Grittner wrote:
    Attached is a patch that is still WIP but that I think is getting
    pretty close to completion. It is not intended to be the be-all and
    end-all for materialized views, but the minimum useful feature set --
    which is all that I've had time to do for this release. In
    particular, the view is only updated on demand by a complete rebuild.
    For the next release, I hope to build on this base to allow more
    eager and incremental updates, and perhaps a concurrent batch update.
    The documentation says that a materialized view is basically a
    create-table-as-select except that it remembers the query. Would you say
    that there is a compelling use case for this alone, or is this a
    building block for more sophisticated materialized view support (e.g.
    eager updating) later?

    Regards,
    Jeff Davis
  • Dimitri Fontaine at Nov 16, 2012 at 1:13 pm

    Jeff Davis writes:
    The documentation says that a materialized view is basically a
    create-table-as-select except that it remembers the query. Would you say
    that there is a compelling use case for this alone, or is this a
    building block for more sophisticated materialized view support (e.g.
    eager updating) later?
    The implementation of the re-LOAD'ing command makes it already
    worthwile. Bonus point if locking is limited to when the new content is
    all computer and ready, but even without that, I want to have it. ;)

    I'd bikeshed and prefer the UPDATE MATERIALIZED VIEW nsp.foo; of course.

    The alternative is creating a view, a matching table and a stored
    procedure that will implement the rebuilding, for each mat view you want
    to have. So that's already a big step forward in my eyes.

    Regards,
    --
    Dimitri Fontaine
    http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
  • Merlin Moncure at Nov 16, 2012 at 3:40 pm

    On Fri, Nov 16, 2012 at 7:13 AM, Dimitri Fontaine wrote:
    Jeff Davis <pgsql@j-davis.com> writes:
    The documentation says that a materialized view is basically a
    create-table-as-select except that it remembers the query. Would you say
    that there is a compelling use case for this alone, or is this a
    building block for more sophisticated materialized view support (e.g.
    eager updating) later?
    The implementation of the re-LOAD'ing command makes it already
    worthwile. Bonus point if locking is limited to when the new content is
    all computer and ready, but even without that, I want to have it. ;)
    Seconded. Background lock free refresh of materialization table would
    be wonderful. But moving dependency between source and materialized
    table out of plpgsql function and into defined schema justifies
    feature on its own merits.

    merlin
  • Kevin Grittner at Nov 16, 2012 at 3:36 pm

    Greg Smith wrote:
    On 11/14/12 6:28 PM, Kevin Grittner wrote:
    - Documentation is incomplete.
    ...
    - There are no regression tests yet.
    Do you have any simple test cases you've been using you could attach?
    With epic new features like this, when things don't work it's hard to
    distinguish between "that just isn't implemented yet" and "the author
    never tested that". Having some known good samples you have tested,
    even if they're not proper regression tests, would be helpful for
    establishing the code baseline works.
    I can probably post something along those lines Monday. Sorry for the
    delay. Basically, though, I tried to state everything that I know of
    which is not yet done and working; so if you find something which
    doesn't behave as you would expect, please let me know. It's well
    within the realm of possibility that there are issues that I didn't
    think of. I certainly can fall  into the tendency of programmers to
    think about testing those things which I thought to cover in the code.

    -Kevin
  • Kevin Grittner at Nov 16, 2012 at 4:30 pm

    Jeff Davis wrote:
    On Wed, 2012-11-14 at 21:28 -0500, Kevin Grittner wrote:
    Attached is a patch that is still WIP but that I think is getting
    pretty close to completion. It is not intended to be the be-all and
    end-all for materialized views, but the minimum useful feature set --
    which is all that I've had time to do for this release. In
    particular, the view is only updated on demand by a complete rebuild.
    For the next release, I hope to build on this base to allow more
    eager and incremental updates, and perhaps a concurrent batch update.
    The documentation says that a materialized view is basically a
    create-table-as-select except that it remembers the query. Would you say
    that there is a compelling use case for this alone, or is this a
    building block for more sophisticated materialized view support (e.g.
    eager updating) later?
    IMV, this has some slight value as it stands, although perhaps not enough
    to justify a patch this big. The idea is that with this much in place,
    patches to implement more aggressive and incremental maintenance of the
    MV data become possible. So I think the bar it should pass for commit is
    that it seems a sane basis for that, while providing some functionality
    which people will find useful.

    -Kevin
  • Kevin Grittner at Nov 16, 2012 at 4:48 pm

    Alvaro Herrera wrote:

    It's not clear to me that it's right to do this by doing regular heap
    updates here instead of heap_inplace_update. Also, I think this might
    end up causing a lot of pg_class tuple churn (at least for matviews that
    delete rows at xact end), which would be nice to avoid.
    If we keep the flag, I will look into heap_inplace_update.

    Thanks!

    -Kevin
  • Kevin Grittner at Nov 16, 2012 at 4:59 pm

    Josh Berkus wrote:

    1. CREATE MATERIALIZED VIEW syntax is stolen directly from CREATE
    TABLE AS, with all the same clauses supported. That includes
    declaring a materialized view to be temporary or unlogged.
    What use would a temporary matview be?
    It would be essentially like a temporary table, with all the same
    persistence options. I'm not really sure how often it will be more
    useful than a temporary table before we have incremental maintenance
    of materialized views; once we have that, though, it seems likely
    that there could be reasonable use cases.
    Unlogged is good.
    I agree that there are likely to be more use cases for this than
    temp MVs. Unfortunately, I've had a hard time figuring out how to
    flag an MV which is empty because its contents were lost after a
    crash with preventing people from using an MV which hasn't been
    populated, which has the potential to silently return incorrect
    results.
    2. MVs don't support inheritance.
    In which direction? Can't inherit, or can't be inherited from?
    The table inheritance has not been implemented in either direction
    for MVs. It didn't seem clear to me that there were reasonable use
    cases. Do you see any?
    9. MVs can't directly be used in a COPY statement, but can be the
    source of data using a SELECT.
    Hmmm? I don't understand the reason for this.
    Consistency. There are other object types which seem to enforce this
    rule for no reason that I can see beyond maybe a desire to have both
    directions of COPY work with the same set of objects. If I remember
    correctly, allowing this would eliminate one line of code from the
    patch, so if sentiment is in favor of it, it is very easily done.
    13. pg_class now has a relisvalid column, which is true if an MV is
    truncated or created WITH NO DATA. You can not scan a relation
    flagged as invalid.
    What error would a user see?
    I can more directly answer that on Monday. If you enable the body of
    the function which makes the relisvalid check you can see the messages.
    I commented it out because I have not yet figured out how to suppress
    the check for a LOAD MV command.
    14. ALTER MATERIALIZED VIEW is supported for the options that seemed
    to make sense. For example, you can change the tablespace or
    schema, but you cannot add or drop column with ALTER.
    How would you change the definition of an MV then?
    At this point you would need to drop and re-create the MV. If we
    want to add columns to an MV or change what an existing column holds,
    perhaps we could have an ALTER MV which changed the SELECT statement
    that populates the MV? I would prefer to leave that to a later patch,
    though -- it seems like a bit of a minefield compared to what is
    being implemented in this patch.
    16. To get new data into the MV, the command is LOAD MATERIALIZED
    VIEW mat view_name. This seemed more descriptive to me that the
    alternatives and avoids declaring any new keywords beyond
    MATERIALIZED. If the MV is flagged as relisvalid == false, this
    will change it to true.
    UPDATE MATERIALIZED VIEW was problematic?
    Not technically, really, but I saw two reasons that I preferred LOAD MV:

    1. It seems to me to better convey that the entire contents of the MV
    will be built from scratch, rather than incrementally adjusted.
    2. We haven't hashed out the syntax for more aggressive maintenance of
    an MV, and it seemed like UPDATE MV might be syntax we would want to
    use for something which updated selected parts of an MV when we do.
    Does LOAD automatically TRUNCATE the view before reloading it? If not,
    why not?
    It builds a new heap and moves it into place. When the transaction
    running LMV commits, the old heap is deleted. In implementation it is
    closer to CLUSTER or the new VACUUM FULL than TRUNCATE followed by
    creating a new table. This allows all permissions, etc., to stay in
    place.
    It would be good to have some discussion to try to reach a consensus
    about whether we need to differentiate between *missing* datat (where
    a materialized view which has been loaded WITH NO DATA or TRUNCATEd
    and has not been subsequently LOADed) and potentially *stale* data.
    If we don't care to distinguish between a view which generated no
    rows when it ran and a one for which the query has not been run, we
    can avoid adding the relisvalid flag, and we could support UNLOGGED
    MVs. Perhaps someone can come up with a better solution to that
    problem.
    Hmmm. I understand the distinction you're making here, but I'm not sure
    it actually matters to the user. MVs, by their nature, always have
    potentially stale data. Being empty (in an inaccurate way) is just one
    kind of stale data.
    Robert feels the same way, but I disagree. Some MVs will not be terribly
    volatile. In my view there is a big difference between having a "top ten"
    list which might be based on yesterday's base tables rather than the base
    table states as of this moment, and having a "top ten" list with no
    entries. If you want to, for example, take some action if an order comes
    in for one of your top customers, and a different path for other
    customers, suddenly treating all of your long-time top customers as not
    being so, without any squawk from the database, seems dangerous.
    It would be nice for the user to have some way to know that a matview is
    empty due to never being LOADed or recently being TRUNCATEd. However, I
    don't think that relisvalid flag -- and preventing scanning the relation
    -- is a good solution. What I'd rather have instead is a timestamp of
    when the MV was last LOADed. If the MV was never loaded (or was
    truncated) that timestamp would be NULL. Such a timestamp would allow
    users to construct all kinds of ad-hoc refresh schemes for MVs which
    would not be possible without it.
    I see your point there; I'll think about that. My take was more that MVs
    would often be refreshed by crontab, and that you would want to keep
    subsequent steps from running and generating potentially plausible but
    completely inaccurate results if the LMV failed.
    I don't see how this relates to UNLOGGED matviews either way.
    UNLOGGED tables and indexes are made empty on crash recovery by copying
    the initialization fork over the "normal" relations. Care was taken to
    avoid needing to connect to each database in turn to complete that
    recovery. This style of recovery can't really set the relisvalid flag, as
    far as I can see; which leaves us choosing between unlogged MVs and
    knowing whether they hold valid data -- unless someone has a better idea.

    -Kevin
  • Thom Brown at Nov 16, 2012 at 5:05 pm

    On 16 November 2012 16:25, Kevin Grittner wrote:

    Josh Berkus wrote:
    Unlogged is good.
    I agree that there are likely to be more use cases for this than
    temp MVs. Unfortunately, I've had a hard time figuring out how to
    flag an MV which is empty because its contents were lost after a
    crash with preventing people from using an MV which hasn't been
    populated, which has the potential to silently return incorrect
    results.
    pg_class.relisvalid = false.. getting rid of the use for truncated MVs?

    --
    Thom
  • Tom Lane at Nov 16, 2012 at 5:08 pm

    "Kevin Grittner" <kgrittn@mail.com> writes:
    Josh Berkus wrote:
    What use would a temporary matview be?
    It would be essentially like a temporary table, with all the same
    persistence options. I'm not really sure how often it will be more
    useful than a temporary table before we have incremental maintenance
    of materialized views; once we have that, though, it seems likely
    that there could be reasonable use cases.
    One of the principal attributes of a temp table is that its contents
    aren't (reliably) accessible from anywhere except the owning backend.
    Not sure where you're going to hide the incremental maintenance in
    that scenario.
    The table inheritance has not been implemented in either direction
    for MVs. It didn't seem clear to me that there were reasonable use
    cases. Do you see any?
    We don't have inheritance for views, so how would we have it for
    materialized views?

    regards, tom lane
  • Josh Berkus at Nov 16, 2012 at 9:20 pm
    Kevin,
    I agree that there are likely to be more use cases for this than
    temp MVs. Unfortunately, I've had a hard time figuring out how to
    flag an MV which is empty because its contents were lost after a
    crash with preventing people from using an MV which hasn't been
    populated, which has the potential to silently return incorrect
    results.
    See below.
    2. MVs don't support inheritance.
    In which direction? Can't inherit, or can't be inherited from?
    The table inheritance has not been implemented in either direction
    for MVs. It didn't seem clear to me that there were reasonable use
    cases. Do you see any?
    No, I just wanted clarity on this. I can see a strong case for
    eventually supporting CREATE MATERIALIZED VIEW matview_1 LIKE matview,
    in order to "copy" mativews, though.
    Consistency. There are other object types which seem to enforce this
    rule for no reason that I can see beyond maybe a desire to have both
    directions of COPY work with the same set of objects. If I remember
    correctly, allowing this would eliminate one line of code from the
    patch, so if sentiment is in favor of it, it is very easily done.
    There's going to be a pretty strong demand for COPY FROM matviews.
    Forcing the user to use COPY FROM ( SELECT ... ) will be seen as
    arbitrary and unintuitive.
    How would you change the definition of an MV then?
    At this point you would need to drop and re-create the MV. If we
    want to add columns to an MV or change what an existing column holds,
    perhaps we could have an ALTER MV which changed the SELECT statement
    that populates the MV? I would prefer to leave that to a later patch,
    though -- it seems like a bit of a minefield compared to what is
    being implemented in this patch.
    I agree that it should be a later patch.
    Not technically, really, but I saw two reasons that I preferred LOAD MV:

    1. It seems to me to better convey that the entire contents of the MV
    will be built from scratch, rather than incrementally adjusted.
    2. We haven't hashed out the syntax for more aggressive maintenance of
    an MV, and it seemed like UPDATE MV might be syntax we would want to
    use for something which updated selected parts of an MV when we do.
    Hmmm, I see your point. So "LOAD" would recreate, and (when supported)
    UPDATE would incrementally update?
    It builds a new heap and moves it into place. When the transaction
    running LMV commits, the old heap is deleted. In implementation it is
    closer to CLUSTER or the new VACUUM FULL than TRUNCATE followed by
    creating a new table. This allows all permissions, etc., to stay in
    place.
    OK, so same effect as a truncate.
    Robert feels the same way, but I disagree. Some MVs will not be terribly
    volatile. In my view there is a big difference between having a "top ten"
    list which might be based on yesterday's base tables rather than the base
    table states as of this moment, and having a "top ten" list with no
    entries. If you want to, for example, take some action if an order comes
    in for one of your top customers, and a different path for other
    customers, suddenly treating all of your long-time top customers as not
    being so, without any squawk from the database, seems dangerous.
    Right, but a relisvalid flag just tells me that the matview was updated
    at sometime in the past, and not *when* it was updated. It could have
    been 3 years ago. The fact that it was updated at some indefinite time
    is fairly valueless information.

    There's a rule in data warehousing which says that it's better to have
    no data (and know that you have no data) than to have incorrect data.
    I see your point there; I'll think about that. My take was more that MVs
    would often be refreshed by crontab, and that you would want to keep
    subsequent steps from running and generating potentially plausible but
    completely inaccurate results if the LMV failed.
    Yeah, that too. Also, a timestamp it would make it easy to double-check
    if the cron job was failing or had been disabled.
    UNLOGGED tables and indexes are made empty on crash recovery by copying
    the initialization fork over the "normal" relations. Care was taken to
    avoid needing to connect to each database in turn to complete that
    recovery. This style of recovery can't really set the relisvalid flag, as
    far as I can see; which leaves us choosing between unlogged MVs and
    knowing whether they hold valid data -- unless someone has a better idea.
    Yeah, well, whether we have relisvalid or mvlastupdate, we're going to
    have to work out some way to have that field react to changes to the
    table overall. I don't know *how*, but it's something we'll have to solve.


    --
    Josh Berkus
    PostgreSQL Experts Inc.
    http://pgexperts.com
  • Tom Lane at Nov 16, 2012 at 10:13 pm

    Josh Berkus writes:
    There's going to be a pretty strong demand for COPY FROM matviews.
    Forcing the user to use COPY FROM ( SELECT ... ) will be seen as
    arbitrary and unintuitive.
    You could make that same claim about plain views, but in point of
    fact the demand for making them work in COPY has been minimal.
    So I'm not convinced this is an essential first-cut feature.
    We can always add it later.

    regards, tom lane
  • Josh Berkus at Nov 17, 2012 at 1:41 am

    You could make that same claim about plain views, but in point of
    fact the demand for making them work in COPY has been minimal.
    So I'm not convinced this is an essential first-cut feature.
    We can always add it later.
    Of course. I just had the impression that we could support COPY FROM by
    *deleting* a couple lines from Kevin's patch, rather than it being extra
    work.

    --
    Josh Berkus
    PostgreSQL Experts Inc.
    http://pgexperts.com
  • Tom Lane at Nov 17, 2012 at 3:48 pm

    Josh Berkus writes:
    You could make that same claim about plain views, but in point of
    fact the demand for making them work in COPY has been minimal.
    So I'm not convinced this is an essential first-cut feature.
    We can always add it later.
    Of course. I just had the impression that we could support COPY FROM by
    *deleting* a couple lines from Kevin's patch, rather than it being extra
    work.
    Even if it happens to be trivial in the current patch, it's an added
    functional requirement that we might later regret having cavalierly
    signed up for. And, as noted upthread, relations that support only
    one direction of COPY don't exist at the moment; that would be adding
    an asymmetry that we might later regret, too.

    regards, tom lane
  • Dimitri Fontaine at Nov 16, 2012 at 10:12 pm

    "Kevin Grittner" <kgrittn@mail.com> writes:
    UPDATE MATERIALIZED VIEW was problematic?
    Not technically, really, but I saw two reasons that I preferred LOAD MV:

    1. It seems to me to better convey that the entire contents of the MV
    will be built from scratch, rather than incrementally adjusted.
    2. We haven't hashed out the syntax for more aggressive maintenance of
    an MV, and it seemed like UPDATE MV might be syntax we would want to
    use for something which updated selected parts of an MV when we do.
    Good point, and while I'm in the mood for some grammar input, here's a
    try:

    ALTER MATERIALIZED VIEW foo RESET;
    ALTER MATERIALIZED VIEW foo UPDATE;

    I think such wholesale operations make more sense as ALTER statement
    than as UPDATE statements.
    It builds a new heap and moves it into place. When the transaction
    running LMV commits, the old heap is deleted. In implementation it is
    closer to CLUSTER or the new VACUUM FULL than TRUNCATE followed by
    creating a new table. This allows all permissions, etc., to stay in
    place.
    When you say closer to CLUSTER, do you include the Access Exclusive Lock
    that forbids reading the previous version's data while you prepare the
    new one? That would be very bad and I wouldn't understand the need to,
    in the scope of MATERIALIZED VIEWs which are by definition lagging
    behind…

    If as I think you don't have that limitation in your implementation,
    it's awesome and just what I was hoping to read :)

    Regards,
    --
    Dimitri Fontaine
    http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
  • Simon Riggs at Nov 18, 2012 at 11:38 am

    On 16 November 2012 11:25, Kevin Grittner wrote:

    16. To get new data into the MV, the command is LOAD MATERIALIZED
    VIEW mat view_name. This seemed more descriptive to me that the
    alternatives and avoids declaring any new keywords beyond
    MATERIALIZED. If the MV is flagged as relisvalid == false, this
    will change it to true.
    UPDATE MATERIALIZED VIEW was problematic?
    Not technically, really, but I saw two reasons that I preferred LOAD MV:

    1. It seems to me to better convey that the entire contents of the MV
    will be built from scratch, rather than incrementally adjusted.
    2. We haven't hashed out the syntax for more aggressive maintenance of
    an MV, and it seemed like UPDATE MV might be syntax we would want to
    use for something which updated selected parts of an MV when we do.
    Does LOAD automatically TRUNCATE the view before reloading it? If not,
    why not?
    It builds a new heap and moves it into place. When the transaction
    running LMV commits, the old heap is deleted. In implementation it is
    closer to CLUSTER or the new VACUUM FULL than TRUNCATE followed by
    creating a new table. This allows all permissions, etc., to stay in
    place.
    This seems very similar to the REPLACE command we discussed earlier,
    except this is restricted to Mat Views.

    If we're going to have this, I would prefer a whole command.

    e.g. REPLACE matviewname REFRESH

    that would also allow

    REPLACE tablename AS query

    Same thing under the covers, just more widely applicable and thus more useful.



    Either way, I don't much like overloading the use of LOAD, which
    already has a very different meaning.

    --
    Simon Riggs http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
  • Albe Laurenz at Nov 19, 2012 at 8:41 am

    Josh Berkus wrote:
    It would be nice for the user to have some way to know that a matview is
    empty due to never being LOADed or recently being TRUNCATEd. However, I
    don't think that relisvalid flag -- and preventing scanning the relation
    -- is a good solution. What I'd rather have instead is a timestamp of
    when the MV was last LOADed. If the MV was never loaded (or was
    truncated) that timestamp would be NULL. Such a timestamp would allow
    users to construct all kinds of ad-hoc refresh schemes for MVs which
    would not be possible without it.
    +1

    Kevin Grittner wrote:
    I see your point there; I'll think about that. My take was more that MVs
    would often be refreshed by crontab, and that you would want to keep
    subsequent steps from running and generating potentially plausible but
    completely inaccurate results if the LMV failed.
    If one of these subsequent steps doesn't care if refresh
    failed once, it shouldn't be forced to fail. I imagine
    that for many applications yesterday's data can be good enough.

    Those that care should check the timestamp.

    Yours,
    Laurenz Albe
  • Thom Brown at Nov 16, 2012 at 5:15 pm

    On 15 November 2012 02:28, Kevin Grittner wrote:

    Attached is a patch that...
    Got this error:

    postgres=# create view v_test as select 1;
    CREATE VIEW
    postgres=# create materialized view mv_test as select * from v_test;
    ERROR: could not open file "base/12064/16425": No such file or directory

    --
    Thom
  • Antonin Houska at Nov 16, 2012 at 9:02 pm
    By chance (?) I got similar one today too, when dropping extension:

    ERROR: could not open file "base/12623/12548": No such file or directory

    I thought something might have gone wrong during Linux upgrade 2 days
    ago, but it's not likely that we both have the issue.

    I wonder if something is broken in the catalog. The last commit I have
    in my environment is

    commit 4af3dda13601d859a20425e3554533fde0549056
    Author: Peter Eisentraut <peter_e@gmx.net>
    Date: Sun Oct 28 10:35:46 2012 -0400

    Kind regards,
    Tony.
    On 11/16/2012 06:14 PM, Thom Brown wrote:
    On 15 November 2012 02:28, Kevin Grittner wrote:

    Attached is a patch that...


    Got this error:

    postgres=# create view v_test as select 1;
    CREATE VIEW
    postgres=# create materialized view mv_test as select * from v_test;
    ERROR: could not open file "base/12064/16425": No such file or directory

    --
    Thom
  • Kevin Grittner at Nov 16, 2012 at 11:16 pm

    Thom Brown wrote:

    postgres=# create view v_test as select 1;
    CREATE VIEW
    postgres=# create materialized view mv_test as select * from v_test;
    ERROR: could not open file "base/12064/16425": No such file or directory
    Thanks for the report; will investigate.

    -Kevin
  • Kevin Grittner at Nov 16, 2012 at 11:17 pm

    Robert Haas wrote:
    Josh Berkus wrote:
    Being empty (in an inaccurate way) is just one kind of stale data.
    This is my feeling also.
    If you had an MV summarizing Wisconsin courts cumulative case counts
    by case type, "empty" would not have been a valid "stale" state for
    over 150 years. That is a degree of staleness that IMV is not just a
    quantitative degree of staleness, as if a nightly recalculation had
    failed to occur, but a qualitatively different state entirely. While
    you may or may not want to use the stale data if last night's regen
    failed, and so it should be under application control, I can't
    imagine a situation where you would want to proceed if the MV didn't
    have data that had at some time been correct -- preferrably at some
    time since the invention of digital electronic computers. Could you
    provide an example where it would be a good thing to do so?

    -Kevin
  • Kevin Grittner at Nov 19, 2012 at 4:58 pm

    Simon Riggs wrote:

    This seems very similar to the REPLACE command we discussed
    earlier, except this is restricted to Mat Views.
    I don't remember that discussion -- do you have a reference?
    If we're going to have this, I would prefer a whole command.

    e.g. REPLACE matviewname REFRESH

    that would also allow

    REPLACE tablename AS query

    Same thing under the covers, just more widely applicable and thus
    more useful.
    An interesting throught. I would have thought that if we were going
    to allow changing the definition of an existing MV, we would be
    better off with CREATE OR REPLACE MATERIALIZED VIEW. Either way, if
    you allow the column types or the number of columns to be changed,
    you do tend to run into issues if there are other MVs, views,
    triggers, rules, etc., which depend on the MV, so I don't think it's
    material for an initial patch. But it is worth considering which way
    we might want to extend it.
    Either way, I don't much like overloading the use of LOAD, which
    already has a very different meaning.
    Well, it's hard to avoid creating new keywords without overloading
    the meaning of exsiting ones. Personally I didn't find

    LOAD MATERIALIZED VIEW matview_name;

    to be very easy to confuse with

    LOAD 'filename';

    But that's a subjective thing. If too many people find that
    confusing, it may be worth creating a new keyword; but I wanted to
    see whether it was really necessary first.

    -Kevin
  • Tom Lane at Nov 19, 2012 at 5:35 pm

    "Kevin Grittner" <kgrittn@mail.com> writes:
    Simon Riggs wrote:
    Either way, I don't much like overloading the use of LOAD, which
    already has a very different meaning.
    Well, it's hard to avoid creating new keywords without overloading
    the meaning of exsiting ones.
    FWIW, I'd much rather see us overload LOAD (which is seldom used)
    than REPLACE (which might in the future become a widely-used DML
    command).

    regards, tom lane
  • Kevin Grittner at Nov 19, 2012 at 5:12 pm

    Albe Laurenz wrote:
    Kevin Grittner wrote:
    My take was more that MVs would often be refreshed by crontab, and
    that you would want to keep subsequent steps from running and
    generating potentially plausible but completely inaccurate results
    if the LMV failed.
    If one of these subsequent steps doesn't care if refresh
    failed once, it shouldn't be forced to fail. I imagine
    that for many applications yesterday's data can be good enough.

    Those that care should check the timestamp.
    It sounds like you and I are in agreement on this; I just didn't
    state it very precisely. If a LMV on a MV which already has data
    fails, the relisvalid would not prevent it from being used -- it
    would be stale, but still valid data from *some* point in time. The
    point is that if an MV is created WITH NO DATA or has been TRUNCATEd
    and there has not been a subsequent LMV, what it contains may not
    represent any state which was *ever* valid, or it may represent a
    state which would only have been valid hundreds of years in the past,
    had the system been computerized at that time. To me, that does not
    seem like the same thing as a simple "stale" state.

    I'm looking at whether there is some reasonable way to detect invalid
    data as well as capture age of data. Every solution I've thought of
    so far has at least one hard-to-solve race condition, but I have
    hopes that I can either solve that for one of the ideas, or come up
    with an idea which falls more gracefully under MVCC management.

    -Kevin
  • Josh Berkus at Nov 19, 2012 at 5:57 pm
    Kevin,
    I'm looking at whether there is some reasonable way to detect invalid
    data as well as capture age of data. Every solution I've thought of
    so far has at least one hard-to-solve race condition, but I have
    hopes that I can either solve that for one of the ideas, or come up
    with an idea which falls more gracefully under MVCC management.
    What's the race condition? I'd think that LOAD would take an exclusive
    lock on the matview involved.
    LOAD MATERIALIZED VIEW matview_name;

    to be very easy to confuse with

    LOAD 'filename';

    But that's a subjective thing. If too many people find that
    confusing, it may be worth creating a new keyword; but I wanted to
    see whether it was really necessary first.
    I do not find them confusing.

    --
    Josh Berkus
    PostgreSQL Experts Inc.
    http://pgexperts.com
  • Josh Berkus at Nov 19, 2012 at 6:01 pm

    On 11/19/12 9:57 AM, Josh Berkus wrote:
    Kevin,
    I'm looking at whether there is some reasonable way to detect invalid
    data as well as capture age of data. Every solution I've thought of
    so far has at least one hard-to-solve race condition, but I have
    hopes that I can either solve that for one of the ideas, or come up
    with an idea which falls more gracefully under MVCC management.
    What's the race condition? I'd think that LOAD would take an exclusive
    lock on the matview involved.
    BTW, another thought on the timestamp: while it would be better to have
    a lastrefresh timestamp in pg_class, the other option is to have an
    extra column in the matview (pg_last_update). While that would involve
    some redundant storage, it would neatly solve the issues around unlogged
    matviews; the timestamp and the data would vanish at the same time.

    --
    Josh Berkus
    PostgreSQL Experts Inc.
    http://pgexperts.com
  • Kevin Grittner at Nov 19, 2012 at 9:17 pm

    Tom Lane wrote:
    "Kevin Grittner" <kgrittn@mail.com> writes:
    Josh Berkus wrote:
    What use would a temporary matview be?
    It would be essentially like a temporary table, with all the same
    persistence options. I'm not really sure how often it will be more
    useful than a temporary table before we have incremental
    maintenance of materialized views; once we have that, though, it
    seems likely that there could be reasonable use cases.
    One of the principal attributes of a temp table is that its
    contents aren't (reliably) accessible from anywhere except the
    owning backend. Not sure where you're going to hide the incremental
    maintenance in that scenario.
    The more I think about that, the less sensible temporary MVs seem.
    Unless I can figure out some reasonable use case, I'll diable that in
    the next version of the patch.

    -Kevin
  • Marko Tiikkaja at Nov 26, 2012 at 12:30 am
    Hi Kevin,
    On 15/11/2012 03:28, Kevin Grittner wrote:
    Attached is a patch that is still WIP but that I think is getting
    pretty close to completion.
    I've been looking at this, but I unfortunately haven't had as much time
    as I had hoped for, and have not looked at the code in detail yet. It's
    also a relatively big patch, so I wouldn't mind another pair of eyes on it.


    I have been testing the patch a bit, and I'm slightly disappointed by
    the fact that it still doesn't solve this problem (and I apologize if I
    have missed discussion about this in the docs or in this thread):

    <assume "foo" is a non-empty materialized view>

    T1: BEGIN;
    T1: LOAD MATERIALIZED VIEW foo;

    T2: SELECT * FROM foo;

    T1: COMMIT;

    <T2 sees an empty table>


    As others have pointed out, replacing the contents of a table is
    something which people have been wanting to do for a long time, and I
    think having this ability would make this patch a lot better; now it
    just feels like syntactic sugar.
    1. CREATE MATERIALIZED VIEW syntax is stolen directly from CREATE
    TABLE AS, with all the same clauses supported. That includes
    declaring a materialized view to be temporary or unlogged.
    2. MVs don't support inheritance.
    3. MVs can't define foreign keys.
    4. MVs can't be the target of foreign keys.
    5. MVs can't have triggers.
    6. Users can't create rules which reference MVs (although MVs
    [ab]use the rules mechanism internally, similar to how views do).
    7. MVs can't be converted to views, nor vice versa.
    8. Users may not directly use INSERT/UPDATE/DELETE on an MV.
    9. MVs can't directly be used in a COPY statement, but can be the
    source of data using a SELECT.
    10. MVs can't own sequences.
    11. MVs can't be the target of LOCK statements, although other
    statements get locks just like a table.
    12. MVs can't use data modifying CTEs in their definitions.
    13. pg_class now has a relisvalid column, which is true if an MV is
    truncated or created WITH NO DATA. You can not scan a relation
    flagged as invalid.
    14. ALTER MATERIALIZED VIEW is supported for the options that seemed
    to make sense. For example, you can change the tablespace or
    schema, but you cannot add or drop column with ALTER.
    16. To get new data into the MV, the command is LOAD MATERIALIZED
    VIEW mat view_name. This seemed more descriptive to me that the
    alternatives and avoids declaring any new keywords beyond
    MATERIALIZED. If the MV is flagged as relisvalid == false, this
    will change it to true.
    17. Since the data viewed in an MV is not up-to-date with the latest
    committed transaction, it didn't seem to make any sense to try to
    apply SERIALIZABLE transaction semantics to queries looking at
    the contents of an MV, although if LMV is run in a SERIALIZABLE
    transaction the MV data is guaranteed to be free of serialization
    anomalies. This does leave the transaction running the LOAD
    command vulnerable to serialization failures unless it is also
    READ ONLY DEFERRABLE.
    18. Bound parameters are not supported for the CREATE MATERIALIZED
    VIEW statement.
    I believe all of these points have been under discussion, and I don't
    have anything to add to the ongoing discussions.
    19. LMV doesn't show a row count. It wouldn't be hard to add, it just
    seemed a little out of place to do that, when CLUSTER, etc.,
    don't.
    This sounds like a useful feature, but your point about CLUSTER and
    friends still stands.
    In the long term, we will probably need to separate the
    implementation of CREATE TABLE AS and CREATE MATERIALIZED VIEW, but
    for now there is so little that they need to do differently it seemed
    less evil to have a few "if" clauses that that much duplicated code.
    Seems sensible.

    I'll get back when I manage to get a better grasp of the code.


    Regards,
    Marko Tiikkaja
  • Robert Haas at Nov 26, 2012 at 1:07 pm

    On Sun, Nov 25, 2012 at 7:30 PM, Marko Tiikkaja wrote:
    As others have pointed out, replacing the contents of a table is something
    which people have been wanting to do for a long time, and I think having
    this ability would make this patch a lot better; now it just feels like
    syntactic sugar.
    I agree that it's mostly syntactic sugar, but I think we need to have
    realistic expectations for what is possible in an initial patch. When
    I committed the first patch for foreign data wrappers, it didn't work
    at all: it was just syntax support. Tom later committed a follow-on
    patch that made them work. Similarly, I split the event trigger patch
    into two halves, one of which added the syntax support and the other
    of which made them functional: and even with both commits in, I think
    it's fair to say that event triggers are still in a fairly primitive
    state.

    None of those patches were small patches. It's going to take multiple
    years to get materialized views up to a state where they're really
    useful to a broad audience in production applications, but I don't
    think we should sneer at anyone for writing a patch that is "just
    syntactic sugar". As it turns out, adding a whole new object type is
    a lot of work and generates a big patch even if it doesn't do much
    just yet. Rejecting such patches on the grounds that they aren't
    comprehensive enough is, IMHO, extremely unwise; we'll either end up
    landing even larger patches that are almost impossible to review
    comprehensively and therefore more likely to break something, or else
    we'll kill the projects outright and end up with nothing.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Marko Tiikkaja at Nov 26, 2012 at 1:14 pm

    On 11/26/12 2:07 PM, Robert Haas wrote:
    On Sun, Nov 25, 2012 at 7:30 PM, Marko Tiikkaja wrote:
    As others have pointed out, replacing the contents of a table is something
    which people have been wanting to do for a long time, and I think having
    this ability would make this patch a lot better; now it just feels like
    syntactic sugar.
    I agree that it's mostly syntactic sugar, but I think we need to have
    realistic expectations for what is possible in an initial patch. When
    I committed the first patch for foreign data wrappers, it didn't work
    at all: it was just syntax support. Tom later committed a follow-on
    patch that made them work. Similarly, I split the event trigger patch
    into two halves, one of which added the syntax support and the other
    of which made them functional: and even with both commits in, I think
    it's fair to say that event triggers are still in a fairly primitive
    state.

    None of those patches were small patches. It's going to take multiple
    years to get materialized views up to a state where they're really
    useful to a broad audience in production applications, but I don't
    think we should sneer at anyone for writing a patch that is "just
    syntactic sugar". As it turns out, adding a whole new object type is
    a lot of work and generates a big patch even if it doesn't do much
    just yet. Rejecting such patches on the grounds that they aren't
    comprehensive enough is, IMHO, extremely unwise; we'll either end up
    landing even larger patches that are almost impossible to review
    comprehensively and therefore more likely to break something, or else
    we'll kill the projects outright and end up with nothing.
    First of all, I have to apologize. Re-reading the email I sent out last
    night, it does indeed feel a bit harsh and I can understand your reaction.

    At no point did I mean to belittle Kevin's efforts or the patch itself.
    I was mostly looking for Kevin's input on how hard it would be to
    solve the particular problem and whether it would be possible to do so
    for 9.3.

    While I feel like the problem I pointed out is a small caveat and should
    be at least documented for 9.3, I think this patch has merits of its own
    even if that problem never gets fixed, and I will continue to review
    this patch.


    Regards,
    Marko Tiikkaja
  • Robert Haas at Nov 26, 2012 at 3:12 pm

    On Mon, Nov 26, 2012 at 8:14 AM, Marko Tiikkaja wrote:
    First of all, I have to apologize. Re-reading the email I sent out last
    night, it does indeed feel a bit harsh and I can understand your reaction.

    At no point did I mean to belittle Kevin's efforts or the patch itself. I
    was mostly looking for Kevin's input on how hard it would be to solve the
    particular problem and whether it would be possible to do so for 9.3.

    While I feel like the problem I pointed out is a small caveat and should be
    at least documented for 9.3, I think this patch has merits of its own even
    if that problem never gets fixed, and I will continue to review this patch.
    OK, no worries. I didn't really interpret your email as belittling; I
    just want to make sure this feature doesn't get feature-creeped to
    death. I think everyone, including Kevin, understands that the
    real-world applicability of v1 is going to be limited and many people
    will choose alternative techniques rather than relying on this new
    feature. But I also think that we'll never get to a really awesome,
    kick-ass feature unless we're willing to commit an initial version
    that isn't all that awesome or kick-ass. If I understand Kevin's
    goals correctly, the plan is to get this basic version committed for
    9.3, and then to try to expand the capability during the 9.4 release
    cycle (and maybe 9.5, too, there's a lot of work to do here). I think
    that's a pretty sound plan.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Peter Geoghegan at Nov 26, 2012 at 1:42 pm

    On 26 November 2012 13:07, Robert Haas wrote:
    None of those patches were small patches. It's going to take multiple
    years to get materialized views up to a state where they're really
    useful to a broad audience in production applications, but I don't
    think we should sneer at anyone for writing a patch that is "just
    syntactic sugar".
    +1. I have a sweet tooth. I don't like it when people criticise
    patches on the basis of "obviously you could achieve the same effect
    with $CONVOLUTION". Making things simpler is a desirable outcome. Now,
    that isn't to say that we should disregard everything or even anything
    else in pursuit of simplicity; just that "needing a Ph.D is
    Postgresology", as you once put it, to do something routine to many is
    really hard to defend.

    --
    Peter Geoghegan http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training and Services
  • Peter Eisentraut at Nov 26, 2012 at 2:46 pm

    On 11/14/12 9:28 PM, Kevin Grittner wrote:
    17. Since the data viewed in an MV is not up-to-date with the latest
    committed transaction,
    So, the way I understand it, in Oracle terms, this feature is a
    "snapshot", not a materialized view. Maybe that's what it should be
    called then.
  • Andrew Dunstan at Nov 26, 2012 at 3:19 pm

    On 11/26/2012 09:46 AM, Peter Eisentraut wrote:
    On 11/14/12 9:28 PM, Kevin Grittner wrote:
    17. Since the data viewed in an MV is not up-to-date with the latest
    committed transaction,
    So, the way I understand it, in Oracle terms, this feature is a
    "snapshot", not a materialized view. Maybe that's what it should be
    called then.

    If you use Jonathan Gardner's taxonomy at
    <http://tech.jonathangardner.net/wiki/PostgreSQL/Materialized_Views>,
    snapshots are a subclass of materialized views.

    cheers

    andrew
  • David Fetter at Nov 26, 2012 at 3:24 pm

    On Mon, Nov 26, 2012 at 09:46:33AM -0500, Peter Eisentraut wrote:
    On 11/14/12 9:28 PM, Kevin Grittner wrote:
    17. Since the data viewed in an MV is not up-to-date with the
    latest committed transaction,
    So, the way I understand it, in Oracle terms, this feature is a
    "snapshot", not a materialized view. Maybe that's what it should be
    called then.
    "Snapshot" is one of the options for refreshing an Oracle materialized
    view. There are others, which we'll eventually add if past is any
    prologue :)

    I hate to add to the bike-shedding, but we should probably add REFRESH
    SNAPSHOT as an optional piece of the grammar, with more REFRESH
    options to come.

    Cheers,
    David.
    --
    David Fetter <david@fetter.org> http://fetter.org/
    Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
    Skype: davidfetter XMPP: david.fetter@gmail.com
    iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

    Remember to vote!
    Consider donating to Postgres: http://www.postgresql.org/about/donate
  • Peter Geoghegan at Nov 26, 2012 at 4:02 pm

    On 26 November 2012 15:24, David Fetter wrote:
    I hate to add to the bike-shedding, but we should probably add REFRESH
    SNAPSHOT as an optional piece of the grammar, with more REFRESH
    options to come.
    I don't know that they should be called materialised views, but do we
    really need to overload the word snapshot? I'd just as soon invent a
    new word as use the Oracle one, since I don't think the term snapshot
    is widely recognised as referring to anything other than snapshot
    isolation.

    --
    Peter Geoghegan http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training and Services
  • David Fetter at Nov 26, 2012 at 4:05 pm

    On Mon, Nov 26, 2012 at 04:02:17PM +0000, Peter Geoghegan wrote:
    On 26 November 2012 15:24, David Fetter wrote:
    I hate to add to the bike-shedding, but we should probably add
    REFRESH SNAPSHOT as an optional piece of the grammar, with more
    REFRESH options to come.
    I don't know that they should be called materialised views, but do
    we really need to overload the word snapshot? I'd just as soon
    invent a new word as use the Oracle one, since I don't think the
    term snapshot is widely recognised as referring to anything other
    than snapshot isolation.
    I believe that the meaning here is unambiguous, and is used in other
    descriptions than Oracle's, including the one on our wiki.

    Cheers,
    David.
    --
    David Fetter <david@fetter.org> http://fetter.org/
    Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
    Skype: davidfetter XMPP: david.fetter@gmail.com
    iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

    Remember to vote!
    Consider donating to Postgres: http://www.postgresql.org/about/donate
  • Peter Eisentraut at Nov 27, 2012 at 12:35 am

    On Mon, 2012-11-26 at 09:46 -0500, Peter Eisentraut wrote:
    On 11/14/12 9:28 PM, Kevin Grittner wrote:
    17. Since the data viewed in an MV is not up-to-date with the latest
    committed transaction,
    So, the way I understand it, in Oracle terms, this feature is a
    "snapshot", not a materialized view. Maybe that's what it should be
    called then.
    OK, I take everything back and claim the opposite.

    In current Oracle, SNAPSHOT is an obsolete alias for MATERIALIZED VIEW.
    Materialized views have the option of REFRESH ON DEMAND and REFRESH ON
    COMMIT, with the former being the default. So it seems that the syntax
    of what you are proposing is in line with Oracle.

    I'm not fond of overloading LOAD as the refresh command. Maybe you
    could go the Oracle route here as well and use a stored procedure. That
    would also allow things like SELECT pg_refresh_mv(oid) FROM ... more
    easily.
  • David Rowley at Nov 27, 2012 at 4:46 am

    -----Original Message-----
    From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
    owner@postgresql.org] On Behalf Of Peter Eisentraut
    Sent: 27 November 2012 13:35
    To: Kevin Grittner
    Cc: Pgsql Hackers
    Subject: Re: [HACKERS] Materialized views WIP patch
    On Mon, 2012-11-26 at 09:46 -0500, Peter Eisentraut wrote:
    On 11/14/12 9:28 PM, Kevin Grittner wrote:
    17. Since the data viewed in an MV is not up-to-date with the latest
    committed transaction,
    So, the way I understand it, in Oracle terms, this feature is a
    "snapshot", not a materialized view. Maybe that's what it should be
    called then.
    OK, I take everything back and claim the opposite.

    In current Oracle, SNAPSHOT is an obsolete alias for MATERIALIZED VIEW.
    Materialized views have the option of REFRESH ON DEMAND and REFRESH
    ON COMMIT, with the former being the default. So it seems that the syntax
    of what you are proposing is in line with Oracle.

    I'm not fond of overloading LOAD as the refresh command. Maybe you could
    go the Oracle route here as well and use a stored procedure. That would also
    allow things like SELECT pg_refresh_mv(oid) FROM ... more easily.
    +1 to this.
    I can see a use case where you might want to refresh all MVs that are X number of days/hours old. Rather than having to execute statements for each one. Something like pg_refresh_mv() within a query would allow this.

    Pretty exciting work Kevin, I understand what Robert said about feature creep etc and agree 100%, but I'm really looking forward to when we can *one day* have the planner make use of an eager MV to optimise a query!

    Regards

    David Rowley

    --
    Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
    changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-hackers
  • Dimitri Fontaine at Nov 27, 2012 at 10:26 am

    Peter Eisentraut writes:
    I'm not fond of overloading LOAD as the refresh command. Maybe you
    could go the Oracle route here as well and use a stored procedure. That
    would also allow things like SELECT pg_refresh_mv(oid) FROM ... more
    easily.
    I would like that we have a way to refresh a Materialized View by
    calling a stored procedure, but I don't think it should be the main UI.

    The wholesale refreshing of a matview appears to me to be comparable to
    TRUNCATE is that it's both a DDL and a DML. The incremental refreshing
    modes we want to have later are clearly DML only, either on commit
    refresh or incrementally on demand.

    I would then propose that we use ALTER MATERIALIZED VIEW as the UI for
    the wholesale on demand refreshing operation, and UPDATE MATERIALIZED
    VIEW as the incremental command (to come later).

    So my proposal for the current feature would be:

    ALTER MATERIALIZED VIEW mv UPDATE [ CONCURRENTLY ];
    UPDATE MATERIALIZED VIEW mv;

    The choice of keywords and syntax here hopefully clearly hint the user
    about the locking behavior of the commands, too. And as we said, the
    bare minimum for this patch does *not* include the CONCURRENTLY option,
    which we still all want to have (someday). :)

    Regards,
    --
    Dimitri Fontaine
    http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
  • Pavel Stehule at Nov 27, 2012 at 10:31 am

    2012/11/27 Dimitri Fontaine <dimitri@2ndquadrant.fr>:
    Peter Eisentraut <peter_e@gmx.net> writes:
    I'm not fond of overloading LOAD as the refresh command. Maybe you
    could go the Oracle route here as well and use a stored procedure. That
    would also allow things like SELECT pg_refresh_mv(oid) FROM ... more
    easily.
    I would like that we have a way to refresh a Materialized View by
    calling a stored procedure, but I don't think it should be the main UI.

    The wholesale refreshing of a matview appears to me to be comparable to
    TRUNCATE is that it's both a DDL and a DML. The incremental refreshing
    modes we want to have later are clearly DML only, either on commit
    refresh or incrementally on demand.

    I would then propose that we use ALTER MATERIALIZED VIEW as the UI for
    the wholesale on demand refreshing operation, and UPDATE MATERIALIZED
    VIEW as the incremental command (to come later).

    So my proposal for the current feature would be:

    ALTER MATERIALIZED VIEW mv UPDATE [ CONCURRENTLY ];
    UPDATE MATERIALIZED VIEW mv;

    The choice of keywords and syntax here hopefully clearly hint the user
    about the locking behavior of the commands, too. And as we said, the
    bare minimum for this patch does *not* include the CONCURRENTLY option,
    which we still all want to have (someday). :)
    +1

    Pavel
    Regards,
    --
    Dimitri Fontaine
    http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support



    --
    Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-hackers
  • David Johnston at Nov 27, 2012 at 2:17 pm

    On Nov 27, 2012, at 5:25, Dimitri Fontaine wrote:


    So my proposal for the current feature would be:

    ALTER MATERIALIZED VIEW mv UPDATE [ CONCURRENTLY ];
    UPDATE MATERIALIZED VIEW mv;

    The choice of keywords and syntax here hopefully clearly hint the user
    about the locking behavior of the commands, too. And as we said, the
    bare minimum for this patch does *not* include the CONCURRENTLY option,
    which we still all want to have (someday). :)
    I dislike using ALTER syntax to perform a data-only action.

    The other advantage of non-functional syntax is that you could more easily supply some form of where clause should you only want to perform a partial refresh. With a function call that becomes more obtuse.

    David J.
  • Kevin Grittner at Nov 26, 2012 at 9:24 pm

    Marko Tiikkaja wrote:
    On 15/11/2012 03:28, Kevin Grittner wrote:
    I have been testing the patch a bit Thanks!
    and I'm slightly disappointed by the fact that it still doesn't
    solve this problem (and I apologize if I have missed discussion
    about this in the docs or in this thread):

    <assume "foo" is a non-empty materialized view>

    T1: BEGIN;
    T1: LOAD MATERIALIZED VIEW foo;

    T2: SELECT * FROM foo;

    T1: COMMIT;

    <T2 sees an empty table>
    As far as I know you are the first to notice this behavior. Thanks
    for pointing it out.

    I agree with Robert that we have to be careful about scope creep,
    but this one looks to me like it should be considered a bug. IMO,
    LOAD for a populated MV should move it from one state which
    reflects the captured state of a previous point in time to a
    captured state which is later, with no invalid or inconsistent
    states visible in between.

    I will take a look at the issue; I don't know whether it's
    something small I can address in this CF or whether it will need to
    be in the next CF, but I will fix it.
    19. LMV doesn't show a row count. It wouldn't be hard to add, it
    just seemed a little out of place to do that, when CLUSTER,
    etc., don't.
    This sounds like a useful feature, but your point about CLUSTER
    and friends still stands.
    Other possible arguments against providing a count are:

    (1) For a populated MV, the LOAD might be replacing the contents
    with fewer rows than were there before.

    (2) Once we have incremental updates of the MV, this count is only
    one of the ways to update the view -- and the others won't show
    counts. Showing it here might be considered inconsistent.

    I don't feel strongly about it, and I don't think it's a big change
    either way; just explaining what got me off the fence when I had to
    pick one behavior or the other to post the WIP patch.
    I'll get back when I manage to get a better grasp of the code.
    Thanks.

    Keep in mind that the current behavior of behaving like a regular
    view when the contents are invalid is not what I had in mind, that
    was an accidental effect of commenting out the body of the
    ExecCheckRelationsValid() function right before posting the patch
    because I noticed a regression. When I noticed current behavior, it
    struck me that someone might prefer it to the intended behavior of
    showing an error like this:

    ereport(ERROR,
    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
    errmsg("materialized view \"%s\" has not been populated",
    get_rel_name(rte->relid)),
    errhint("Use the LOAD MATERIALIZED VIEW command.")));

    I mention it in case someone wants to argue for silently behaving
    as a regular view when the MV is not populated.

    -Kevin
  • Marko Tiikkaja at Dec 2, 2012 at 11:24 pm
    Hi Kevin,
    On Mon, 26 Nov 2012 22:24:33 +0100, Kevin Grittner wrote:
    Marko Tiikkaja wrote:
    <T2 sees an empty table>
    As far as I know you are the first to notice this behavior. Thanks
    for pointing it out.

    I will take a look at the issue; I don't know whether it's
    something small I can address in this CF or whether it will need to
    be in the next CF, but I will fix it.
    Any news on this front?
    I'll get back when I manage to get a better grasp of the code.
    The code looks relatively straightforward and good to my eyes. It passes
    my testing and looks to be changing all the necessary parts of the code.
    Keep in mind that the current behavior of behaving like a regular
    view when the contents are invalid is not what I had in mind, that
    was an accidental effect of commenting out the body of the
    ExecCheckRelationsValid() function right before posting the patch
    because I noticed a regression. When I noticed current behavior, it
    struck me that someone might prefer it to the intended behavior of
    showing an error like this:

    ereport(ERROR,
    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
    errmsg("materialized view \"%s\" has not been populated",
    get_rel_name(rte->relid)),
    errhint("Use the LOAD MATERIALIZED VIEW command.")));

    I mention it in case someone wants to argue for silently behaving
    as a regular view when the MV is not populated.
    FWIW, I'd prefer an error in this case, but I don't feel strongly about it.


    Regards,
    Marko Tiikkaja
  • Kevin Grittner at Nov 26, 2012 at 9:34 pm

    David Fetter wrote:
    On Mon, Nov 26, 2012 at 09:46:33AM -0500, Peter Eisentraut wrote:

    So, the way I understand it, in Oracle terms, this feature is a
    "snapshot", not a materialized view. Maybe that's what it should
    be called then.
    "Snapshot" is one of the options for refreshing an Oracle
    materialized view. There are others, which we'll eventually add
    if past is any prologue :)
    That's the way I understand it, too.
    I hate to add to the bike-shedding, but we should probably add
    REFRESH SNAPSHOT as an optional piece of the grammar, with more
    REFRESH options to come.
    I would prefer to leave the syntax for refreshing MVs to a later
    patch. I'm kind of assuming that SNAPSHOT would be the default if
    no other options are specified, but I would prefer not to get into
    too much speculation about add-on patches for fear of derailing
    this initial effort.

    -Kevin
  • David Fetter at Nov 26, 2012 at 9:37 pm

    On Mon, Nov 26, 2012 at 04:34:36PM -0500, Kevin Grittner wrote:
    David Fetter wrote:
    On Mon, Nov 26, 2012 at 09:46:33AM -0500, Peter Eisentraut wrote:

    So, the way I understand it, in Oracle terms, this feature is a
    "snapshot", not a materialized view. Maybe that's what it should
    be called then.
    "Snapshot" is one of the options for refreshing an Oracle
    materialized view. There are others, which we'll eventually add
    if past is any prologue :)
    That's the way I understand it, too.
    Great :)
    I hate to add to the bike-shedding, but we should probably add
    REFRESH SNAPSHOT as an optional piece of the grammar, with more
    REFRESH options to come.
    I would prefer to leave the syntax for refreshing MVs to a later
    patch. I'm kind of assuming that SNAPSHOT would be the default if
    no other options are specified, but I would prefer not to get into
    too much speculation about add-on patches for fear of derailing
    this initial effort.
    You're right. I withdraw my suggestion until such time as this patch
    (or descendent) is committed and actual working code implementing
    other refresh strategies is written.

    Cheers,
    David.
    --
    David Fetter <david@fetter.org> http://fetter.org/
    Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
    Skype: davidfetter XMPP: david.fetter@gmail.com
    iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

    Remember to vote!
    Consider donating to Postgres: http://www.postgresql.org/about/donate
  • Kevin Grittner at Nov 27, 2012 at 2:07 pm

    Pavel Stehule wrote:
    2012/11/27 Dimitri Fontaine <dimitri@2ndquadrant.fr>:
    I would like that we have a way to refresh a Materialized View
    by calling a stored procedure, but I don't think it should be
    the main UI.
    I agree. I saw that Oracle uses a function for that without any
    statement-level support, and that would probably be easier to
    implement; but it felt wrong to do it that way. I couldn't think of
    any other cases where similar action is taken without statement
    syntax for it.
    The wholesale refreshing of a matview appears to me to be
    comparable to TRUNCATE is that it's both a DDL and a DML. The
    incremental refreshing modes we want to have later are clearly
    DML only, either on commit refresh or incrementally on demand.
    Personally, I expect the most popular update method to eventually
    become a queued update. I've looked ahead far enough to see that I
    want to structure the incremental updates to be controlled through
    an API where changes to supporting tables produce records saying
    what was done which are fed to consumers which do the updating.
    Then it becomes a matter of whether that consumer performs the
    related updates to the MV during commit processing of the producer,
    by pulling from a queue, or by reading accumulated rows when the MV
    is referenced.

    But I'm getting ahead of things with such speculation...
    I would then propose that we use ALTER MATERIALIZED VIEW as the
    UI for the wholesale on demand refreshing operation, and UPDATE
    MATERIALIZED VIEW as the incremental command (to come later).
    Honestly, I have managed to keep myself from speculating on syntax
    for incremental updates. There will be enough complexity involved
    that I expect months of bikeshedding. :-/
    So my proposal for the current feature would be:

    ALTER MATERIALIZED VIEW mv UPDATE [ CONCURRENTLY ];
    UPDATE MATERIALIZED VIEW mv;
    An ALTER MATERIALIZED VIEW option was my first thought on syntax to
    do what LOAD does in the current patch. But it bothered me that I
    couldn't think of any other cases where ALTER <some-object-type>
    only changed the data contained within the object and had no other
    impact. Are you both really comfortable with an ALTER MATERIALIZED
    VIEW which has no effect other than to update the data? It seems
    wrong to me.
    The choice of keywords and syntax here hopefully clearly hint
    the user about the locking behavior of the commands, too. And as
    we said, the bare minimum for this patch does *not* include the
    CONCURRENTLY option, which we still all want to have (someday).
    :)
    +1
    Sure -- a CONCURRENTLY option for LMV (or AMVU) seems like one of
    the next steps. I'll feel more confident about implementing that
    when it appears that we have shaken the last bugs out of
    CREATE/DROP INDEX CONCURRENTLY, since anything which affects those
    statements will probably also matter here.

    -Kevin

Related Discussions

People

Translate

site design / logo © 2021 Grokbase