Here is a new version of the patch, with most issues discussed in
previous posts fixed.

I've been struggling with two areas:

  - pg_dump sorting for MVs which depend on other MVs
  - proper handling of the relisvalid flag for unlogged MVs after recovery

I've been hacking at the code in those areas without success;
what's here is the least broken form I have, but work is still
needed for these cases. Any other problems are news to me.

In addition, the docs need another pass, and there is an open
question about what is the right thing to use for TRUNCATE syntax.

-Kevin

Search Discussions

  • Kevin Grittner at Jan 16, 2013 at 1:27 pm
    [resending with patch compressed, since original post didn't make
    it through to the list]

    Here is a new version of the patch, with most issues discussed in
    previous posts fixed.

    I've been struggling with two areas:

      - pg_dump sorting for MVs which depend on other MVs
      - proper handling of the relisvalid flag for unlogged MVs after recovery

    I've been hacking at the code in those areas without success;
    what's here is the least broken form I have, but work is still
    needed for these cases. Any other problems are news to me.

    In addition, the docs need another pass, and there is an open
    question about what is the right thing to use for TRUNCATE syntax.

    -Kevin
  • Tom Lane at Jan 16, 2013 at 4:26 pm

    "Kevin Grittner" <kgrittn@mail.com> writes:
    I've been struggling with two areas:
    - pg_dump sorting for MVs which depend on other MVs
    Surely that should fall out automatically given that the dependency is
    properly expressed in pg_depend?

    If you mean you're trying to get it to cope with circular dependencies
    between MVs, it might take some work on the pg_dump side, but plain
    ordering shouldn't require new code.

        regards, tom lane
  • Kevin Grittner at Jan 16, 2013 at 4:48 pm

    Tom Lane wrote:
    "Kevin Grittner" <kgrittn@mail.com> writes:
    I've been struggling with two areas:
    - pg_dump sorting for MVs which depend on other MVs
    Surely that should fall out automatically given that the
    dependency is properly expressed in pg_depend?

    If you mean you're trying to get it to cope with circular
    dependencies between MVs, it might take some work on the pg_dump
    side, but plain ordering shouldn't require new code.
    The *definitions* sort properly, but what I'm trying to do is
    define them WITH NO DATA and load data after all the COPY
    statements for tables. If mva is referenced by mvb, the goal is the
    REFRESH mva, build its indexes before running REFRESH for mvb and
    building its indexes. To do things in any other order does't seem
    to me to leave things after restore in the same state they were in
    at the time of the dump.

    So I should have been a little more verbose describing the problem:
    pg_dump sorting of REFRESH and CREATE INDEX steps for MVs which
    depend on other MVs.

    Last night I found why my previous attempts had been failing -- I
    was trying to build the dependencies at the wrong point in the dump
    process, after the sorts had already been done.  Now that I've
    spotted that fundamental flaw, I think I can get this out of the
    way without too much more fanfare. I kept thinking I had something
    wrong in the detail of my approach, while the problem was at a much
    higher level.

    Where I really need someone to hit me upside the head with a
    clue-stick is the code I added to the bottom of RelationBuildDesc()
    in relcache.c. The idea is that on first access to an unlogged MV,
    to detect that the heap has been replaced by the init fork, set
    relisvalid to false, and make the heap look normal again. I
    couldn't see any way to do that which wasn't a kludge, and I can't
    figure out how to deal with relcache properly in implementing that
    kludge. Either a tip about the right way to work the kludge, or a
    suggestion for a less kludgy alternative would be welcome.

    -Kevin
  • Tom Lane at Jan 16, 2013 at 6:39 pm

    "Kevin Grittner" <kgrittn@mail.com> writes:
    Tom Lane wrote:
    Surely that should fall out automatically given that the
    dependency is properly expressed in pg_depend?
    The *definitions* sort properly, but what I'm trying to do is
    define them WITH NO DATA and load data after all the COPY
    statements for tables. If mva is referenced by mvb, the goal is the
    REFRESH mva, build its indexes before running REFRESH for mvb and
    building its indexes. To do things in any other order does't seem
    to me to leave things after restore in the same state they were in
    at the time of the dump.
    Ah. Can't you treat this using the same pg_dump infrastructure as
    for the data for an ordinary table? The dependencies made for the
    TableDataInfo object might be a bit different, but after that it
    seems like the sort logic ought to be happy.
    Where I really need someone to hit me upside the head with a
    clue-stick is the code I added to the bottom of RelationBuildDesc()
    in relcache.c. The idea is that on first access to an unlogged MV,
    to detect that the heap has been replaced by the init fork, set
    relisvalid to false, and make the heap look normal again.
    Hmm. I agree that relcache.c has absolutely no business doing that,
    but not sure what else to do instead. Seems like it might be better
    done at first touch of the MV in the parser, rewriter, or planner ---
    but the fact that I can't immediately decide which of those is right
    makes me feel that it's still too squishy.

    I'm also wondering about locking issues there. Obviously you don't
    want more than one backend trying to rebuild the MV.

    Do we really need unlogged MVs in the first iteration? Seems like
    that's adding a whole bunch of new issues, when you have quite enough
    already without that.

        regards, tom lane
  • Josh Berkus at Jan 16, 2013 at 7:12 pm

    Do we really need unlogged MVs in the first iteration? Seems like
    that's adding a whole bunch of new issues, when you have quite enough
    already without that.
    While I think there is strong user demand for unlogged MVs, if we can
    get MVs without unlogged ones for 9.3, I say go for that. We'll add
    unlogged in 9.4.


    --
    Josh Berkus
    PostgreSQL Experts Inc.
    http://pgexperts.com
  • Robert Haas at Jan 17, 2013 at 12:55 pm

    On Wed, Jan 16, 2013 at 1:38 PM, Tom Lane wrote:
    Where I really need someone to hit me upside the head with a
    clue-stick is the code I added to the bottom of RelationBuildDesc()
    in relcache.c. The idea is that on first access to an unlogged MV,
    to detect that the heap has been replaced by the init fork, set
    relisvalid to false, and make the heap look normal again.
    Hmm. I agree that relcache.c has absolutely no business doing that,
    but not sure what else to do instead. Seems like it might be better
    done at first touch of the MV in the parser, rewriter, or planner ---
    but the fact that I can't immediately decide which of those is right
    makes me feel that it's still too squishy.
    I think we shouldn't be doing that at all. The whole business of
    transferring the relation-is-invalid information from the relation to
    a pg_class flag seems like a Rube Goldberg device to me. I'm still
    not convinced that we should have a relation-is-invalid flag at all,
    but can we at least not have two?

    It seems perfectly adequate to detect that the MV is invalid when we
    actually try to execute a plan - that is, when we first access the
    heap or one of its indexes. So the bit can just live in the
    file-on-disk, and there's no need to have a second copy of it in
    pg_class.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Noah Misch at Jan 24, 2013 at 6:08 pm

    On Thu, Jan 17, 2013 at 07:54:55AM -0500, Robert Haas wrote:
    On Wed, Jan 16, 2013 at 1:38 PM, Tom Lane wrote:
    Where I really need someone to hit me upside the head with a
    clue-stick is the code I added to the bottom of RelationBuildDesc()
    in relcache.c. The idea is that on first access to an unlogged MV,
    to detect that the heap has been replaced by the init fork, set
    relisvalid to false, and make the heap look normal again.
    Hmm. I agree that relcache.c has absolutely no business doing that,
    but not sure what else to do instead. Seems like it might be better
    done at first touch of the MV in the parser, rewriter, or planner ---
    but the fact that I can't immediately decide which of those is right
    makes me feel that it's still too squishy.
    I think we shouldn't be doing that at all. The whole business of
    transferring the relation-is-invalid information from the relation to
    a pg_class flag seems like a Rube Goldberg device to me. I'm still
    not convinced that we should have a relation-is-invalid flag at all,
    but can we at least not have two?

    It seems perfectly adequate to detect that the MV is invalid when we
    actually try to execute a plan - that is, when we first access the
    heap or one of its indexes. So the bit can just live in the
    file-on-disk, and there's no need to have a second copy of it in
    pg_class.
    Like Kevin, I want a way to distinguish unpopulated MVs from MVs that
    genuinely yielded the empty set at last refresh. I agree that there's no
    particular need to store that fact in pg_class, and I would much prefer only
    storing it one way in any case. A user-visible disadvantage of the current
    implementation is that relisvalid remains stale until something opens the rel.
    That's fine for the system itself, but it can deceive user-initiated catalog
    queries. Imagine a check_postgres action that looks for invalid MVs to
    complain about. It couldn't just scan pg_class; it would need to first do
    something that opens every MV.

    I suggest the following:

    1. Let an invalid MV have a zero-length heap. Distinguish a valid, empty MV
        by giving it a page with no tuples. This entails VACUUM[1] not truncating
        MVs below one page and the refresh operation, where necessary, extending
        the relation from zero pages to one.
    2. Remove pg_class.relisvalid.
    3. Add a bool field to RelationData. The word "valid" is used in that context
        to refer to the validity of the structure itself, so perhaps call the new
        field rd_scannable. RelationIsFlaggedAsValid() can become a macro;
        consider changing its name for consistency with the field name.
    4. During relcache build, set the field to "RelationGetNumberBlocks(rel) != 0"
        for MVs, fixed "true" for everyone else. Any operation that changes the
        field must, and probably would anyway, instigate a relcache invalidation.
    5. Expose a database function, say pg_relation_scannable(), for querying the
        current state of a relation. This supports user-level monitoring.

    Does that seem reasonable? One semantic difference to keep in mind is that
    unlogged MVs will be considered invalid on the standby while valid on the
    master. That's essentially an accurate report, so I won't mind it.

    For the benefit of the archives, I note that we almost need not truncate an
    unlogged materialized view during crash recovery. MVs are refreshed in a
    VACUUM FULL-like manner: fill a new relfilenode, fsync it, and point the MV's
    pg_class to that relfilenode. When a crash occurs with no refresh in flight,
    the latest refresh had been safely synced. When a crash cuts short a refresh,
    the pg_class update will not stick, and the durability of the old heap is not
    in doubt. However, non-btree index builds don't have the same property; we
    would need to force an immediate sync of the indexes to be safe here. It
    would remain necessary to truncate unlogged MVs when recovering a base backup,
    which may contain a partially-written refresh that did eventually commit.
    Future MV variants that modify the MV in place would also need the usual
    truncate on crash.

    I'm going to follow this with a review covering a broader range of topics.

    Thanks,
    nm

    [1] For the time being, it's unfortunate to VACUUM materialized views at all;
    they only ever bear frozen tuples.
  • Kevin Grittner at Feb 16, 2013 at 1:01 am

    Noah Misch wrote:

    Like Kevin, I want a way to distinguish unpopulated MVs from MVs that
    genuinely yielded the empty set at last refresh.  I agree that there's no
    particular need to store that fact in pg_class, and I would much prefer only
    storing it one way in any case.  A user-visible disadvantage of the current
    implementation is that relisvalid remains stale until something opens the rel.
    That's fine for the system itself, but it can deceive user-initiated catalog
    queries.  Imagine a check_postgres action that looks for invalid MVs to
    complain about.  It couldn't just scan pg_class; it would need to first do
    something that opens every MV.

    I suggest the following:

    1. Let an invalid MV have a zero-length heap.  Distinguish a valid, empty MV
    by giving it a page with no tuples.  This entails VACUUM[1] not truncating
    MVs below one page and the refresh operation, where necessary, extending
    the relation from zero pages to one.
    2. Remove pg_class.relisvalid.
    3. Add a bool field to RelationData.  The word "valid" is used in that context
    to refer to the validity of the structure itself, so perhaps call the new
    field rd_scannable.  RelationIsFlaggedAsValid() can become a macro;
    consider changing its name for consistency with the field name.
    4. During relcache build, set the field to "RelationGetNumberBlocks(rel) !=> 0"
    for MVs, fixed "true" for everyone else.  Any operation that changes the
    field must, and probably would anyway, instigate a relcache invalidation.
    5. Expose a database function, say pg_relation_scannable(), for querying the
    current state of a relation.  This supports user-level monitoring.

    Does that seem reasonable?  One semantic difference to keep in mind is that
    unlogged MVs will be considered invalid on the standby while valid on the
    master.  That's essentially an accurate report, so I won't mind it.
    Changed to work pretty much as you suggested.
    I'm going to follow this with a review covering a broader range of topics.
    Those issues addressed, too.  That includes the most egregious doc
    problems you pointed out, but there still needs to be a thorough
    review, and I expect to find a few more doc cleanup issues.

    There was one minor syntax issue not addressed by Noah, nor much
    discussed in general that I didn't want to just unilaterally
    choose; but given that nobody seems to care that much I will put
    forward a proposal and do it that way tomorrow if nobody objects.
    Before this patch tables were the only things subject to
    truncation, but now materialized views can also be truncated.  So
    far we have been treating TABLE as a noise word in the truncate
    command.  I assume we still want to allow tables to be truncated
    with or without the word.  The question is what to do about
    materialized views, and wheter both can be specified on a single
    TRUNCATE statement.  I propose that we allow TABLE or MATERIALIZED
    VIEW to be specified, or that part of the statement to be left out.
    I propose that either type of object be allowed unless one or the
    other is specified and the object to be truncated is not of that
    kind.  So you could mix both kinds on one statement, so long as you
    didn't specify either kind.

    There is one odd aspect to pg_dump, but I think the way it is
    behaving is the best way to handle it, although I invite other
    opinions.  If you load from pg_dump output, it will try to
    populated materialized views which were populated on dump, and
    leave the ones which were not scannable because the contents had
    not been generated in an empty and unscannable state on restore.
    That much seems pretty obvious.  Where it gets  a little tricky is
    if mva is generated with data, and mvb is generated based on mva.
    Then mva is truncated.  Then you dump.  mvb was populated at the
    time of the dump, but its contents can't be regenerated on restore
    because mva is not scannable.  As the patch currently stands, you
    get an error on the attempt to REFRESH mvb.  I think that's a good
    thing, but I'm open to arguments to the contrary.

    I don't have any handling in pg_dump for circular references among
    materialized views, because I couldn't see how that could happen.
    I'm not 100% sure that isn't just a failure of imagination on my
    part, though.

    The only other comment I know of that hasn't been addressed is
    Simon's suggestion that I put in syntax for features which we might
    implement in future releases.  I don't want to do that without the
    usual community design and bike-shedding process, so syntax is only
    implemented for implemented features.

    I'm still waiting for final word (and a small patch?) from KaiGai
    Kohei for the sepgsql part.

    This patch does require an initdb because of a new function.

    Unless something else comes up in review or I get feedback to the
    contrary I plan to deal with the above-mentioned issues and commit
    this within a week or two.

    Thanks to Marko Tiikkaja, Robert Haas, Thom Brown, Simon Riggs,
    KaiGai Kohei, and Noah Misch for the reviews and suggestions so
    far, thanks to Robert for the initial cut at the docs, and big
    thanks to Noah for helping me track down an elusive bug.

    --
    Kevin Grittner
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Robert Haas at Feb 16, 2013 at 1:24 am

    On Fri, Feb 15, 2013 at 8:01 PM, Kevin Grittner wrote:
    There is one odd aspect to pg_dump, but I think the way it is
    behaving is the best way to handle it, although I invite other
    opinions. If you load from pg_dump output, it will try to
    populated materialized views which were populated on dump, and
    leave the ones which were not scannable because the contents had
    not been generated in an empty and unscannable state on restore.
    That much seems pretty obvious. Where it gets a little tricky is
    if mva is generated with data, and mvb is generated based on mva.
    Then mva is truncated. Then you dump. mvb was populated at the
    time of the dump, but its contents can't be regenerated on restore
    because mva is not scannable. As the patch currently stands, you
    get an error on the attempt to REFRESH mvb. I think that's a good
    thing, but I'm open to arguments to the contrary.
    Hmm, anything that means a dump-and-restore can fail seems like a bad
    thing to me. There's nothing outrageous about that scenario. It's
    arguable what state dump-and-restore should leave the new database in,
    but I don't see why it shouldn't work. I predict we'll end up with
    unhappy users if we leave it like this.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Bruce Momjian at Feb 16, 2013 at 1:57 am

    On Fri, Feb 15, 2013 at 08:24:16PM -0500, Robert Haas wrote:
    On Fri, Feb 15, 2013 at 8:01 PM, Kevin Grittner wrote:
    There is one odd aspect to pg_dump, but I think the way it is
    behaving is the best way to handle it, although I invite other
    opinions. If you load from pg_dump output, it will try to
    populated materialized views which were populated on dump, and
    leave the ones which were not scannable because the contents had
    not been generated in an empty and unscannable state on restore.
    That much seems pretty obvious. Where it gets a little tricky is
    if mva is generated with data, and mvb is generated based on mva.
    Then mva is truncated. Then you dump. mvb was populated at the
    time of the dump, but its contents can't be regenerated on restore
    because mva is not scannable. As the patch currently stands, you
    get an error on the attempt to REFRESH mvb. I think that's a good
    thing, but I'm open to arguments to the contrary.
    Hmm, anything that means a dump-and-restore can fail seems like a bad
    thing to me. There's nothing outrageous about that scenario. It's
    arguable what state dump-and-restore should leave the new database in,
    but I don't see why it shouldn't work. I predict we'll end up with
    unhappy users if we leave it like this.
    pg_upgrade is going to fail on that pg_restore error. :-(

    --
       Bruce Momjian <bruce@momjian.us> http://momjian.us
       EnterpriseDB http://enterprisedb.com

       + It's impossible for everything to be true. +
  • Kevin Grittner at Feb 16, 2013 at 6:00 pm

    Bruce Momjian wrote:
    On Fri, Feb 15, 2013 at 08:24:16PM -0500, Robert Haas wrote:
    On Fri, Feb 15, 2013 at 8:01 PM, Kevin Grittner wrote:
    There is one odd aspect to pg_dump, but I think the way it is
    behaving is the best way to handle it, although I invite other
    opinions.  If you load from pg_dump output, it will try to
    populated materialized views which were populated on dump, and
    leave the ones which were not scannable because the contents had
    not been generated in an empty and unscannable state on restore.
    That much seems pretty obvious.  Where it gets  a little tricky is
    if mva is generated with data, and mvb is generated based on mva.
    Then mva is truncated.  Then you dump.  mvb was populated at the
    time of the dump, but its contents can't be regenerated on restore
    because mva is not scannable.  As the patch currently stands, you
    get an error on the attempt to REFRESH mvb.  I think that's a good
    thing, but I'm open to arguments to the contrary.
    Hmm, anything that means a dump-and-restore can fail seems like a bad
    thing to me.  There's nothing outrageous about that scenario.  It's
    arguable what state dump-and-restore should leave the new database in,
    but I don't see why it shouldn't work.  I predict we'll end up with
    unhappy users if we leave it like this.
    pg_upgrade is going to fail on that pg_restore error.  :-(
    With the hard link option it should succeed, I would think.  If we
    arranged for the check option, when run without the hard link
    option, to report such cases so that the user could choose to
    either truncate mvb or refresh mva before the upgrade, would that
    satisfy this concern?

    --
    Kevin Grittner
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Kevin Grittner at Feb 16, 2013 at 5:53 pm

    Robert Haas wrote:
    On Fri, Feb 15, 2013 at 8:01 PM, Kevin Grittner wrote:

    There is one odd aspect to pg_dump, but I think the way it is
    behaving is the best way to handle it, although I invite other
    opinions.  If you load from pg_dump output, it will try to
    populated materialized views which were populated on dump, and
    leave the ones which were not scannable because the contents had
    not been generated in an empty and unscannable state on restore.
    That much seems pretty obvious.  Where it gets  a little tricky is
    if mva is generated with data, and mvb is generated based on mva.
    Then mva is truncated.  Then you dump.  mvb was populated at the
    time of the dump, but its contents can't be regenerated on restore
    because mva is not scannable.  As the patch currently stands, you
    get an error on the attempt to REFRESH mvb.  I think that's a good
    thing, but I'm open to arguments to the contrary.
    Hmm, anything that means a dump-and-restore can fail seems like a bad
    thing to me.  There's nothing outrageous about that scenario.  It's
    arguable what state dump-and-restore should leave the new database in,
    but I don't see why it shouldn't work.  I predict we'll end up with
    unhappy users if we leave it like this.
    Keeping in mind that mva may take hours to refresh, and mvb may
    take only minutes once you have the data from mva, what behavior do
    you think is preferable?

    The alternatives I can think of are:

    (1)  Force mva to refresh on restore, even though it was empty and
    not scannable on dump.  This may delay completion of the restore
    for an extended time.  It would leave both mva and mvb populated.

    (2)  Populate mvb by using mva's query as a regular view.  This
    would leave things in the same state as they were on dump, and
    might possibly optimized to something faster than generating mva
    and then mvb; but probably would not be much faster in most cases.

    (3)  Change the failure to generate data for mvb in this case as a
    WARNING rather than an ERROR.

    (4)  Actually dump and restore data with COPY statements for
    materialized views, rather than having the dump create REFRESH
    statements.  The main down side of this, it seems to me, is that it
    opens up materialized views to direct tinkering of contents via SQL
    statements, which I was hoping to avoid.  Perhaps this can be
    mitigated in some way.

    --
    Kevin Grittner
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Noah Misch at Feb 16, 2013 at 7:46 pm

    On Sat, Feb 16, 2013 at 09:53:14AM -0800, Kevin Grittner wrote:
    Robert Haas wrote:
    On Fri, Feb 15, 2013 at 8:01 PM, Kevin Grittner wrote:
    There is one odd aspect to pg_dump, but I think the way it is
    behaving is the best way to handle it, although I invite other
    opinions.  If you load from pg_dump output, it will try to
    populated materialized views which were populated on dump, and
    leave the ones which were not scannable because the contents had
    not been generated in an empty and unscannable state on restore.
    That much seems pretty obvious.  Where it gets  a little tricky is
    if mva is generated with data, and mvb is generated based on mva.
    Then mva is truncated.  Then you dump.  mvb was populated at the
    time of the dump, but its contents can't be regenerated on restore
    because mva is not scannable.  As the patch currently stands, you
    get an error on the attempt to REFRESH mvb.  I think that's a good
    thing, but I'm open to arguments to the contrary.
    Hmm, anything that means a dump-and-restore can fail seems like a bad
    thing to me.  There's nothing outrageous about that scenario.  It's
    arguable what state dump-and-restore should leave the new database in,
    but I don't see why it shouldn't work.  I predict we'll end up with
    unhappy users if we leave it like this.
    I agree that making the dump fail on this account is bad.
    Keeping in mind that mva may take hours to refresh, and mvb may
    take only minutes once you have the data from mva, what behavior do
    you think is preferable?

    The alternatives I can think of are:

    (1)  Force mva to refresh on restore, even though it was empty and
    not scannable on dump.  This may delay completion of the restore
    for an extended time.  It would leave both mva and mvb populated.
    This is reasonable. If the user doesn't like it, he can presumably use an
    edited dump list to remove the REFRESHes.
    (2)  Populate mvb by using mva's query as a regular view.  This
    would leave things in the same state as they were on dump, and
    might possibly optimized to something faster than generating mva
    and then mvb; but probably would not be much faster in most cases.
    Interesting idea, but I don't think adding novel server behavior is justified
    to achieve this.
    (3)  Change the failure to generate data for mvb in this case as a
    WARNING rather than an ERROR.
    This is also fair. However, I think it's better to restore more valid MVs
    (option 1) than fewer.
    (4)  Actually dump and restore data with COPY statements for
    materialized views, rather than having the dump create REFRESH
    statements.  The main down side of this, it seems to me, is that it
    opens up materialized views to direct tinkering of contents via SQL
    statements, which I was hoping to avoid.  Perhaps this can be
    mitigated in some way.
    This is a door better left closed.


    Overall, I recommend option 1. None of the options will furnish the desire of
    every database, but the DBA can always tailor the outcome by replacing the
    dumped REFRESH statements with his own. I'm not envisioning that MVs left
    invalid for the long term will be a typical thing, anyway.

    --
    Noah Misch
    EnterpriseDB http://www.enterprisedb.com
  • Kevin Grittner at Feb 17, 2013 at 3:13 pm

    Noah Misch wrote:
    On Sat, Feb 16, 2013 at 09:53:14AM -0800, Kevin Grittner wrote:
    I agree that making the dump fail on this account is bad.
    I would argue that this is an overstatement of the issue except for
    restores that use the single-transaction switch and pg_upgrade
    without the hard link option.  In all other cases, one could just
    issue REFRESH statements after the dump successfully completed all
    the other work.  But those two cases are important enough that the
    issue must be seriously considered.
    (1)  Force mva to refresh on restore, even though it was empty
    and not scannable on dump.  This may delay completion of the
    restore for an extended time.  It would leave both mva and mvb
    populated.
    This is reasonable.  If the user doesn't like it, he can
    presumably use an edited dump list to remove the REFRESHes.
    Overall, I recommend option 1.
    I'm OK with that approach, and in the absence of anyone pushing for
    another direction, will make that change to pg_dump.  I'm thinking
    I would only do this for materialized views which were not
    scannable, but which cause REFRESH failures on other materialized
    views if not refreshed first (recursively evaluated), rather than
    just automatically refreshing all MVs on restore.  The reason this
    seems important is that some MVs may take a long time to refresh,
    and a user might want a dump/restore to get to a point where they
    can use the rest of the database while building the contents of
    some MVs in the background or during off hours.
    None of the options will furnish the desire of every database, Agreed.
    but the DBA can always tailor the outcome by replacing the dumped
    REFRESH statements with his own.
    ... or by issuing TRUNCATE or REFRESH statements before the dump to
    avoid the issue.
    I'm not envisioning that MVs left invalid for the long term will
    be a typical thing, anyway.
    Agreed.  I think this will be an infrequent issue caused by unusual
    user actions; but it would be bound to come up occasionally.

    -Kevin
  • Alvaro Herrera at Feb 18, 2013 at 1:02 pm

    Kevin Grittner escribió:

    I'm OK with that approach, and in the absence of anyone pushing for
    another direction, will make that change to pg_dump.  I'm thinking
    I would only do this for materialized views which were not
    scannable, but which cause REFRESH failures on other materialized
    views if not refreshed first (recursively evaluated), rather than
    just automatically refreshing all MVs on restore.  The reason this
    seems important is that some MVs may take a long time to refresh,
    and a user might want a dump/restore to get to a point where they
    can use the rest of the database while building the contents of
    some MVs in the background or during off hours.
    Maybe it would be a good idea to try to put such commands at the very
    end of the dump, if possible.

    --
    Álvaro Herrera http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
  • Kevin Grittner at Feb 18, 2013 at 2:49 pm

    Alvaro Herrera wrote:
    Kevin Grittner escribió:
    I'm OK with that approach, and in the absence of anyone pushing
    for another direction, will make that change to pg_dump.  I'm
    thinking I would only do this for materialized views which were
    not scannable, but which cause REFRESH failures on other
    materialized views if not refreshed first (recursively
    evaluated), rather than just automatically refreshing all MVs on
    restore.  The reason this seems important is that some MVs may
    take a long time to refresh, and a user might want a
    dump/restore to get to a point where they can use the rest of
    the database while building the contents of some MVs in the
    background or during off hours.
    Maybe it would be a good idea to try to put such commands at the
    very end of the dump, if possible.
    Here is the dump order as currently implemented in that patch.  MVs
    are created at the same priority as tables and views.  MV REFRESH
    and MV index builds obviously need to follow population of table
    data. These are at the same priority because it makes the most
    sense to populated an MV without any indexes and then build them
    before the MV is used to populate some other MV.  Dependency
    information is used to get that to sort properly within the
    priority level.

        1,                            /* DO_NAMESPACE */
        2,                            /* DO_PROCLANG */
        3,                            /* DO_COLLATION */
        4,                            /* DO_EXTENSION */
        5,                            /* DO_TYPE */
        5,                            /* DO_SHELL_TYPE */
        6,                            /* DO_FUNC */
        7,                            /* DO_AGG */
        8,                            /* DO_OPERATOR */
        9,                            /* DO_OPCLASS */
        9,                            /* DO_OPFAMILY */
        10,                            /* DO_CAST */
        11,                            /* DO_CONVERSION */
        12,                            /* DO_TSPARSER */
        13,                            /* DO_TSTEMPLATE */
        14,                            /* DO_TSDICT */
        15,                            /* DO_TSCONFIG */
        16,                            /* DO_FDW */
        17,                            /* DO_FOREIGN_SERVER */
        18,                            /* DO_TABLE */
        19,                            /* DO_DUMMY_TYPE */
        20,                            /* DO_ATTRDEF */
        21,                            /* DO_BLOB */
        22,                            /* DO_PRE_DATA_BOUNDARY */
        23,                            /* DO_TABLE_DATA */
        24,                            /* DO_BLOB_DATA */
        25,                            /* DO_POST_DATA_BOUNDARY */
        26,                            /* DO_CONSTRAINT */
        27,                            /* DO_INDEX */
        28,                            /* DO_REFRESH_MATVIEW */
        28                             /* DO_MATVIEW_INDEX */
        29,                            /* DO_RULE */
        30,                            /* DO_TRIGGER */
        31,                            /* DO_FK_CONSTRAINT */
        32,                            /* DO_DEFAULT_ACL */
        33,                            /* DO_EVENT_TRIGGER */

    I don't think that pushing MV refreshes and index creation farther
    down the list should require anything beyond adjusting the priority
    numbers.  I don't see a problem pushing them to the end.  Does
    anyone else see anything past priority 28 that MV population should
    *not* follow?

    --
    Kevin Grittner
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Noah Misch at Feb 18, 2013 at 8:28 pm

    On Mon, Feb 18, 2013 at 06:49:14AM -0800, Kevin Grittner wrote:
    Alvaro Herrera wrote:
    Maybe it would be a good idea to try to put such commands at the
    very end of the dump, if possible.
    25,                            /* DO_POST_DATA_BOUNDARY */
    26,                            /* DO_CONSTRAINT */
    27,                            /* DO_INDEX */
    28,                            /* DO_REFRESH_MATVIEW */
    28                             /* DO_MATVIEW_INDEX */
    29,                            /* DO_RULE */
    30,                            /* DO_TRIGGER */
    31,                            /* DO_FK_CONSTRAINT */
    32,                            /* DO_DEFAULT_ACL */
    33,                            /* DO_EVENT_TRIGGER */

    I don't think that pushing MV refreshes and index creation farther
    down the list should require anything beyond adjusting the priority
    numbers.  I don't see a problem pushing them to the end.  Does
    anyone else see anything past priority 28 that MV population should
    *not* follow?
    DO_EVENT_TRIGGER should remain last; it may change the behavior of nearly any
    other command.

    Moving DO_REFRESH_MATVIEW past DO_TRIGGER would affect the outcome when the MV
    calls functions that ultimately trip triggers or rules. Currently, the
    behavior will be the same as for CHECK constraints: the rules and triggers
    don't exist yet. This may also affect, for the better, MVs referencing views
    that need the CREATE TABLE ... CREATE RULE _RETURN restoration pathway. It
    looks like a positive change. On the flip side, I wonder if there's some case
    I'm not considering where it's important to delay restoring rules and/or
    triggers until after restoring objects for which restoration can entail calls
    to arbitrary user functions.

    --
    Noah Misch
    EnterpriseDB http://www.enterprisedb.com
  • Kevin Grittner at Feb 18, 2013 at 9:48 pm

    Noah Misch wrote:
    On Mon, Feb 18, 2013 at 06:49:14AM -0800, Kevin Grittner wrote:
    Alvaro Herrera wrote:
    Maybe it would be a good idea to try to put such commands at
    the very end of the dump, if possible.
    25,                            /* DO_POST_DATA_BOUNDARY */
    26,                            /* DO_CONSTRAINT */
    27,                            /* DO_INDEX */
    28,                            /* DO_REFRESH_MATVIEW */
    28                             /* DO_MATVIEW_INDEX */
    29,                            /* DO_RULE */
    30,                            /* DO_TRIGGER */
    31,                            /* DO_FK_CONSTRAINT */
    32,                            /* DO_DEFAULT_ACL */
    33,                            /* DO_EVENT_TRIGGER */

    I don't think that pushing MV refreshes and index creation
    farther down the list should require anything beyond adjusting
    the priority numbers.  I don't see a problem pushing them to the
    end.  Does anyone else see anything past priority 28 that MV
    population should *not* follow?
    DO_EVENT_TRIGGER should remain last; it may change the behavior
    of nearly any other command.

    Moving DO_REFRESH_MATVIEW past DO_TRIGGER would affect the
    outcome when the MV calls functions that ultimately trip triggers
    or rules. Currently, the behavior will be the same as for CHECK
    constraints: the rules and triggers don't exist yet.  This may
    also affect, for the better, MVs referencing views that need the
    CREATE TABLE ... CREATE RULE _RETURN restoration pathway.  It
    looks like a positive change.  On the flip side, I wonder if
    there's some case I'm not considering where it's important to
    delay restoring rules and/or triggers until after restoring
    objects for which restoration can entail calls to arbitrary user
    functions.
    I didn't quite follow all of Noah's points or their implications,
    so we chatted off-list.  He made a couple additional observations
    which allow some simplification of the patch, and allow MV REFRESH
    to be moved to the very end of the priority list without ill
    effect.

    (1)  While it might be incorrect for the CREATE INDEX on a
    materialized view to come after event triggers are set up, REFRESH
    can be expected to be a routine action in the presence of such
    triggers, and it might actually be incorrect to REFRESH when the
    triggers are not present.

    (2)  REFRESH MATERIALIZED VIEW creates and builds a new heap, and
    reindexes it after the data has been loaded, so the timing of the
    CREATE INDEX statements on MVs is not critical, as long as they are
    done after the CREATE and before the REFRESH.  We could drop them
    into the same priority as the other CREATE INDEX statements, and it
    would not be a big deal because the MVs would be empty.

    This should allow me to simplify the code a little bit and move the
    RMV step to the very end.  That may have some advantages when users
    want to start using the database while MVs are being populated.

    --
    Kevin Grittner
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Robert Haas at Feb 19, 2013 at 1:54 pm

    On Mon, Feb 18, 2013 at 4:48 PM, Kevin Grittner wrote:
    This should allow me to simplify the code a little bit and move the
    RMV step to the very end. That may have some advantages when users
    want to start using the database while MVs are being populated.
    In the department of crazy ideas, what about having pg_dump NEVER
    refresh ANY materialized views?

    It's true that the job of pg_dump and pg_restore is to put the new
    database in the same state that the old database was in, but I think
    you could make a reasonable case that materialized views ought to be
    an exception. After all, even with all of this infrastructure,
    chances are pretty good that the new MV contents won't end up being
    the same as the old MV contents on the old server - because the old
    MVs could easily have been stale. So why not just get the restore
    over with as fast as possible, and then let the user refresh the MVs
    that they think need refreshing (perhaps after getting the portions of
    their system that don't rely on MVs up and running)?

    At the very least, I think we ought to have an option for this
    behavior. But the more I think about it, the more I think maybe it
    ought to be the default.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Nicolas Barbier at Feb 19, 2013 at 2:24 pm

    2013/2/19 Robert Haas <robertmhaas@gmail.com>:

    In the department of crazy ideas, what about having pg_dump NEVER
    refresh ANY materialized views?

    It's true that the job of pg_dump and pg_restore is to put the new
    database in the same state that the old database was in, but I think
    you could make a reasonable case that materialized views ought to be
    an exception. After all, even with all of this infrastructure,
    chances are pretty good that the new MV contents won't end up being
    the same as the old MV contents on the old server - because the old
    MVs could easily have been stale. So why not just get the restore
    over with as fast as possible, and then let the user refresh the MVs
    that they think need refreshing (perhaps after getting the portions of
    their system that don't rely on MVs up and running)?

    At the very least, I think we ought to have an option for this
    behavior. But the more I think about it, the more I think maybe it
    ought to be the default.
    +1 from me from a minimalist point of view.

    I think of a matview of the manually refreshed kind as “can contain
    stale contents (or be invalid) unless someone manually makes sure it
    is up to date (or valid)”. Making any matviews invalid by default upon
    restoring (itself being a manual action) would be consistent with that
    definition. Additionally, ISTM to be the least arbitrary (and hence
    most elegant) choice, and even more so in the context of
    matviews-depending-on-matviews.

    Spamming some more craziness:

    Another (more elaborate) suggestion could be: Store for each matview
    whether it is to be rebuilt upon restore or not. Using this setting
    would intuitively mean something like “I consider this matview being
    valid a precondition for considering the database state valid.”
    Setting this to true for a matview would only be allowed when any
    other matviews on which it depends also have this setting set to true.

    Just my €0.02 of course.

    Nicolas

    --
    A. Because it breaks the logical sequence of discussion.
    Q. Why is top posting bad?
  • Peter Eisentraut at Feb 19, 2013 at 4:01 pm

    On 2/19/13 8:54 AM, Robert Haas wrote:
    In the department of crazy ideas, what about having pg_dump NEVER
    refresh ANY materialized views?
    It might be useful to have an option for this, but I don't think it
    should be the default. The default should be that the new database is
    "ready to go".

    Then again, when would you ever actually use that option?

    This might be different if there were a command to refresh all
    materialized views, because you don't want to have to go around and type
    separate commands 47 times after a restore.
  • Robert Haas at Feb 20, 2013 at 11:13 am

    On Tue, Feb 19, 2013 at 11:01 AM, Peter Eisentraut wrote:
    On 2/19/13 8:54 AM, Robert Haas wrote:
    In the department of crazy ideas, what about having pg_dump NEVER
    refresh ANY materialized views?
    It might be useful to have an option for this, but I don't think it
    should be the default. The default should be that the new database is
    "ready to go".

    Then again, when would you ever actually use that option?
    You'd use that option if you'd rather get the database mostly-up as
    soon as possible, and then worry about the materialized views
    afterwards.
    This might be different if there were a command to refresh all
    materialized views, because you don't want to have to go around and type
    separate commands 47 times after a restore.
    Well, it's pretty easy to do:

    SELECT 'LOAD MATERIALIZED VIEW ' || p.oid::regclass || ';' FROM
    pg_class WHERE relkind = 'm';

    ...but we could also add explicit syntax for it, perhaps along the
    lines of what we have for CLUSTER and VACUUM.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Tom Lane at Feb 20, 2013 at 11:43 am

    Robert Haas writes:
    On Tue, Feb 19, 2013 at 11:01 AM, Peter Eisentraut wrote:
    This might be different if there were a command to refresh all
    materialized views, because you don't want to have to go around and type
    separate commands 47 times after a restore.
    Well, it's pretty easy to do:
    SELECT 'LOAD MATERIALIZED VIEW ' || p.oid::regclass || ';' FROM
    pg_class WHERE relkind = 'm';
    ...but we could also add explicit syntax for it, perhaps along the
    lines of what we have for CLUSTER and VACUUM.
    It's not really that easy, because of the likelihood that MVs have to be
    refreshed in a specific order. The SELECT you suggest definitely seems
    too simplistic. A dedicated command could perhaps be built to pay
    attention to dependencies ... but if we're still coding such things now,
    it seems a bit late for 9.3.

        regards, tom lane
  • Peter Eisentraut at Feb 20, 2013 at 4:57 pm

    On 2/20/13 6:13 AM, Robert Haas wrote:
    It might be useful to have an option for this, but I don't think it
    should be the default. The default should be that the new database is
    "ready to go".

    Then again, when would you ever actually use that option?
    You'd use that option if you'd rather get the database mostly-up as
    soon as possible, and then worry about the materialized views
    afterwards.
    Since the proposed materialized views are not available for implicit use
    in query optimization, the only way an application would make use of
    them is to access them directly. And if it accesses an unpopulated
    materialized view, it would fail. So I don't think in the current state
    a database is mostly-up without the materialized views filled in.

    I can see the value in having a restore mode that postpones certain
    nonessential operations, such as creating indexes or certain constraints
    or even materialized views. But I think the boundaries and expectations
    for that need to be defined more precisely. For example, a database
    without constraints might be considered "ready for read-only use",
    without secondary indexes it might be "ready for use but slow".
  • Thom Brown at Feb 17, 2013 at 8:24 pm

    On 16 February 2013 01:01, Kevin Grittner wrote:
    Unless something else comes up in review or I get feedback to the
    contrary I plan to deal with the above-mentioned issues and commit
    this within a week or two.
    At the moment it's not possible to rename a column without using ALTER
    TABLE on an MV.

    Also, shouldn't we have the ability to set the collation on a column of the MV?

    And the inconsistency between regular views and MVs is still present,
    where MVs always dump with column parameters in their definition, and
    views never do. Not a show-stopper, but curious nonetheless.

    --
    Thom
  • Kevin Grittner at Feb 18, 2013 at 3:18 pm

    Thom Brown wrote:
    On 16 February 2013 01:01, Kevin Grittner wrote:
    Unless something else comes up in review or I get feedback to
    the contrary I plan to deal with the above-mentioned issues and
    commit this within a week or two.
    At the moment it's not possible to rename a column without using
    ALTER TABLE on an MV.

    Also, shouldn't we have the ability to set the collation on a
    column of the MV? Will fix.
    And the inconsistency between regular views and MVs is still
    present, where MVs always dump with column parameters in their
    definition, and views never do.  Not a show-stopper, but curious
    nonetheless.
    I haven't worried about this because current behavior generates
    correct results -- this seems like a micro-optimization.  The
    explanation for why it wound up that way is that creating a
    materialized view is in many ways more like creating a table than
    like creating a view -- it seemed safer and less invasive to modify
    the CREATE TABLE code than the CREATE VIEW code, and specifying
    column names just fell out of that as part of the minimal change.
    In looking at the pg_dump output, though, I see that the CMV AS
    clause also is getting the names right with column-level aliases,
    so it should be pretty simple and safe to leave off the
    column-list section for MVs.  I guess it's worth it just to
    forestall further questions on the topic.

    Thanks!

    -Kevin
  • Erik Rijkers at Feb 19, 2013 at 10:09 pm

    On Sat, February 16, 2013 02:01, Kevin Grittner wrote:
    matview-v4.patch.gz
    Hi,

    I was wondering if material views should not go into information_schema. I was thinking either
    .views or .tables. Have you considered this?

    I ask because as far as I can see querying for mv's has to go like this:

    SELECT n.nspname, c.relname
    FROM pg_catalog.pg_class c
    LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
    WHERE c.relkind IN ('m','')
       and n.nspname = 'myschema'

    which seems rather ugly.


    Also, some documentation typos: please see attached.


    Thanks,

    Erik Rijkers
  • Josh Berkus at Feb 19, 2013 at 10:18 pm

    On 02/19/2013 02:09 PM, Erik Rijkers wrote:
    On Sat, February 16, 2013 02:01, Kevin Grittner wrote:
    matview-v4.patch.gz
    Hi,

    I was wondering if material views should not go into information_schema. I was thinking either
    .views or .tables. Have you considered this?

    I ask because as far as I can see querying for mv's has to go like this:

    SELECT n.nspname, c.relname
    FROM pg_catalog.pg_class c
    LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
    WHERE c.relkind IN ('m','')
    and n.nspname = 'myschema'
    Well, I'm not sure about information_schema, but we'll definitely want a
    pg_matviews system view. Also a \dM. That could wait until 9.4, though.


    --
    Josh Berkus
    PostgreSQL Experts Inc.
    http://pgexperts.com
  • Kevin Grittner at Feb 19, 2013 at 10:38 pm

    Josh Berkus wrote:

    Well, I'm not sure about information_schema, but we'll definitely
    want a pg_matviews system view.
    That could wait until 9.4, though.
    That I could probably do.  Do you think they should have a separate
    pg_stat_user_matviews table, etc., or do you think it would be
    better to include them in with tables there?
    Also a \dM.
    I already added it as \dm in the current patch.  Does that conflict
    with something else that's pending?

    --
    Kevin Grittner
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Josh Berkus at Feb 19, 2013 at 10:45 pm

    That I could probably do. Do you think they should have a separate
    pg_stat_user_matviews table, etc., or do you think it would be
    better to include them in with tables there?
    Well, ideally pg_matviews would have matview definitions, and
    pg_stat_matviews would have stats on matview usage and rows. But see
    what you can get done; I imagine we'll overhaul it for 9.4 anyway once
    we've had a chance to use the feature.
    Also a \dM.
    I already added it as \dm in the current patch. Does that conflict
    with something else that's pending?
    Oh, no, I thought \dm was *already* in use, but apparently not.

    --
    Josh Berkus
    PostgreSQL Experts Inc.
    http://pgexperts.com
  • Kevin Grittner at Feb 19, 2013 at 11:32 pm

    Josh Berkus wrote:

    That I could probably do.  Do you think they should have a separate
    pg_stat_user_matviews table, etc., or do you think it would be
    better to include them in with tables there?
    Well, ideally pg_matviews would have matview definitions, and
    pg_stat_matviews would have stats on matview usage and rows.  But see
    what you can get done; I imagine we'll overhaul it for 9.4 anyway once
    we've had a chance to use the feature.
    I agree on pg_matviews, but after looking over the existing views
    and thinking about what I would use them for as a DBA, I'm inclined
    to fold the backing tables for MVs into the _stat_ and _statio_
    views -- especially since we already include the backing tables and
    indexes for TOAST.  There is a precident for including
    implementation details at that level.  The only difference from
    TOAST, is that I include the heap and indexes for MVs in the _user_
    views.  I'm attaching the patch for just the system_views.sql file
    for discussion before I go write docs for this part.

    --
    Kevin Grittner
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Kevin Grittner at Feb 19, 2013 at 11:41 pm

    Kevin Grittner wrote:

    I'm attaching the patch for just the system_views.sql file
    for discussion before I go write docs for this part.
    Meh.  If I'm gonna have pg_matviews I might as well include an
    isscannable column.  v2 attached.

    --
    Kevin Grittner
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Josh Berkus at Feb 20, 2013 at 12:07 am

    On 02/19/2013 03:41 PM, Kevin Grittner wrote:
    Kevin Grittner wrote:
    I'm attaching the patch for just the system_views.sql file
    for discussion before I go write docs for this part.
    Meh. If I'm gonna have pg_matviews I might as well include an
    isscannable column. v2 attached.
    pg_get_viewdef() will work on Matviews? Great.

    --
    Josh Berkus
    PostgreSQL Experts Inc.
    http://pgexperts.com
  • David Fetter at Feb 19, 2013 at 10:22 pm

    On Tue, Feb 19, 2013 at 11:09:13PM +0100, Erik Rijkers wrote:
    On Sat, February 16, 2013 02:01, Kevin Grittner wrote:
    matview-v4.patch.gz
    Hi,

    I was wondering if material views should not go into information_schema. I was thinking either
    .views or .tables. Have you considered this?
    I'm guessing it'd be .views if anything. Haven't been able to
    decipher from section 11 of the standard (Schemata) whether the
    standard has anything to say on the matter.

    Cheers,
    David.
    --
    David Fetter <david@fetter.org> http://fetter.org/
    Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
    Skype: davidfetter XMPP: david.fetter@gmail.com
    iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

    Remember to vote!
    Consider donating to Postgres: http://www.postgresql.org/about/donate
  • Peter Eisentraut at Feb 20, 2013 at 2:29 pm

    On 2/19/13 5:22 PM, David Fetter wrote:
    On Tue, Feb 19, 2013 at 11:09:13PM +0100, Erik Rijkers wrote:
    On Sat, February 16, 2013 02:01, Kevin Grittner wrote:
    matview-v4.patch.gz
    Hi,

    I was wondering if material views should not go into information_schema. I was thinking either
    .views or .tables. Have you considered this?
    I'm guessing it'd be .views if anything. Haven't been able to
    decipher from section 11 of the standard (Schemata) whether the
    standard has anything to say on the matter.
    I suppose one should be able to expect that if one finds a view in the
    information schema, then one should be able to use DROP VIEW to remove
    it. Which in this case wouldn't work. So I don't think including a
    materialized view under views or tables is appropriate.
  • Kevin Grittner at Feb 20, 2013 at 3:28 pm

    Peter Eisentraut wrote:

    I suppose one should be able to expect that if one finds a view
    in the information schema, then one should be able to use DROP
    VIEW to remove it.  Which in this case wouldn't work.  So I don't
    think including a materialized view under views or tables is
    appropriate.
    Right.  I think adding pg_matviews covers the stated use-case
    enough to answer Erik's concern.  I'm not going to mess with adding
    non-standard stuff to the standard views.

    --
    Kevin Grittner
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Erik Rijkers at Feb 20, 2013 at 5:14 pm

    On Wed, February 20, 2013 16:28, Kevin Grittner wrote:
    Peter Eisentraut wrote:
    I suppose one should be able to expect that if one finds a view
    in the information schema, then one should be able to use DROP
    VIEW to remove it.  Which in this case wouldn't work.  So I don't
    think including a materialized view under views or tables is
    appropriate.
    Right.  I think adding pg_matviews covers the stated use-case
    enough to answer Erik's concern.
    Absolutely - I agree pg_matviews is much better than adding deviating information_schema stuff.

    Thank you,

    Erik Rijkers
  • Erik Rijkers at Feb 25, 2013 at 10:07 pm
    2013-02-19 Kevin Grittner wrote:
    [matview-system_views-v2.diff]

    I assumed the patches matview-v4.patch and matview-system_views-v2.diff
    were to be applied together.

    They do apply correctly but during tests, the "test rules ... FAILED".

    Perhaps it is solved already but I thought I'd mention it in case it is overlooked.

    Thanks,

    Erik Rijkers
  • Kevin Grittner at Feb 19, 2013 at 10:31 pm

    Erik Rijkers wrote:

    I was wondering if material views should not go into
    information_schema.  I was thinking either .views or .tables.
    Have you considered this?
    I had not considered this to be a good idea because
    information_schema is defined by the standard, and materialized
    views are an extension to the standard.  Someone using these views
    to identify either tables or views might make a bad choice based on
    this.  I'm open to arguments for inclusion, if you think it would
    not violate the standard.  Which would be safe?
    Also, some documentation typos: please see attached.
    Will apply.  Thanks.

    --
    Kevin Grittner
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Kevin Grittner at Feb 19, 2013 at 10:47 pm
    Kevin Grittner <kgrittn@ymail.com>
    There was one minor syntax issue not addressed by Noah, nor much
    discussed in general that I didn't want to just unilaterally
    choose; but given that nobody seems to care that much I will put
    forward a proposal and do it that way tomorrow if nobody objects.
    Before this patch tables were the only things subject to
    truncation, but now materialized views can also be truncated.  So
    far we have been treating TABLE as a noise word in the truncate
    command.  I assume we still want to allow tables to be truncated
    with or without the word.  The question is what to do about
    materialized views, and wheter both can be specified on a single
    TRUNCATE statement.  I propose that we allow TABLE or MATERIALIZED
    VIEW to be specified, or that part of the statement to be left out.
    I propose that either type of object be allowed unless one or the
    other is specified and the object to be truncated is not of that
    kind.  So you could mix both kinds on one statement, so long as you
    didn't specify either kind.
    When I went to do this, I hit a shift/reduce conflict, because with
    TABLE being optional it couldn't tell whether:

    TRUNCATE MATERIALIZED VIEW x, y, z;

    ... was looking for five relations or three.  That goes away with
    MATERIALIZED escalated to TYPE_FUNC_NAME_KEYWORD.  Is that OK?

    --
    Kevin Grittner
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Tom Lane at Feb 20, 2013 at 8:11 am

    Kevin Grittner writes:
    When I went to do this, I hit a shift/reduce conflict, because with
    TABLE being optional it couldn't tell whether:
    TRUNCATE MATERIALIZED VIEW x, y, z;
    ... was looking for five relations or three.  That goes away with
    MATERIALIZED escalated to TYPE_FUNC_NAME_KEYWORD.  Is that OK?
    Not really. I would much rather see us not bother with this pedantic
    syntax than introduce an even-partially-reserved word.

    Having said that, I don't think I believe your analysis of why this
    doesn't work. The presence or absence of commas ought to make the
    syntax non-ambiguous, I would think. Maybe you just factored the
    grammar wrong.

        regards, tom lane
  • Kevin Grittner at Feb 20, 2013 at 3:24 pm

    Tom Lane wrote:
    Kevin Grittner <kgrittn@ymail.com> writes:
    When I went to do this, I hit a shift/reduce conflict, because
    with TABLE being optional it couldn't tell whether:
    TRUNCATE MATERIALIZED VIEW x, y, z;
    ... was looking for five relations or three.  That goes away
    with MATERIALIZED escalated to TYPE_FUNC_NAME_KEYWORD.  Is that
    OK?
    Not really.  I would much rather see us not bother with this
    pedantic syntax than introduce an even-partially-reserved word.
    I'm not sure it's worth it either; but two people requested it and
    I didn't forsee this shift/reduce conflict, so I took a shot at it.
     If we can't eliminate the conflict, I'm fine with leaving things
    as they are in the latest posted patch.
    Having said that, I don't think I believe your analysis of why
    this doesn't work.  The presence or absence of commas ought to
    make the syntax non-ambiguous, I would think.  Maybe you just
    factored the grammar wrong.
    Well, it wouldn't be the first time you've seen a better way to do
    something in flex than I was able to see.  Taking just the gram.y
    part of the change which implemented this, and omitting the change
    in reservedness of MATERIALIZED, I have:

    diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
    index 820cb41..1d393c5 100644
    --- a/src/backend/parser/gram.y
    +++ b/src/backend/parser/gram.y
    @@ -394,6 +394,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);

     %type <ival>   opt_column event cursor_options opt_hold opt_set_data
     %type <objtype>    reindex_type drop_type comment_type security_label_type
    +               trunc_type

     %type <node>   fetch_args limit_clause select_limit_value
                    offset_clause select_offset_value
    @@ -5172,9 +5173,10 @@ attrs:       '.' attr_name
      *****************************************************************************/

     TruncateStmt:
    -           TRUNCATE opt_table relation_expr_list opt_restart_seqs opt_drop_behavior
    +           TRUNCATE trunc_type relation_expr_list opt_restart_seqs opt_drop_behavior
                    {
                        TruncateStmt *n = makeNode(TruncateStmt);
    +                   n->objtype = $2;
                        n->relations = $3;
                        n->restart_seqs = $4;
                        n->behavior = $5;
    @@ -5182,6 +5184,12 @@ TruncateStmt:
                    }
            ;

    +trunc_type:
    +           TABLE                       { $$ = OBJECT_TABLE; }
    +           | MATERIALIZED VIEW         { $$ = OBJECT_MATVIEW; }
    +           | /*EMPTY*/                 { $$ = OBJECT_UNSPECIFIED; }
    +       ;
    +
     opt_restart_seqs:
                CONTINUE_P IDENTITY_P       { $$ = false; }
    RESTART IDENTITY_P        { $$ = true; }
    I'm open to suggestions on a better way.

    --
    Kevin Grittner
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Tom Lane at Feb 20, 2013 at 4:20 pm

    Kevin Grittner writes:
    Tom Lane wrote:
    Having said that, I don't think I believe your analysis of why
    this doesn't work.
    Well, it wouldn't be the first time you've seen a better way to do
    something in flex than I was able to see. Taking just the gram.y
    part of the change which implemented this, and omitting the change
    in reservedness of MATERIALIZED, I have:
    - TRUNCATE opt_table relation_expr_list opt_restart_seqs opt_drop_behavior
    + TRUNCATE trunc_type relation_expr_list opt_restart_seqs opt_drop_behavior
    +trunc_type:
    + TABLE { $$ = OBJECT_TABLE; }
    + | MATERIALIZED VIEW { $$ = OBJECT_MATVIEW; }
    + | /*EMPTY*/ { $$ = OBJECT_UNSPECIFIED; }
    + ;
    Yeah, this is a standard gotcha when working with unreserved keywords.
    You can't factor it like that because then the parser is required to
    make a shift-reduce decision (on whether to reduce trunc_type to empty)
    before it can "see past" the first word. So for instance given

      TRUNCATE MATERIALIZED ...
              ^

    the parser has to make that decision when it can't see past the word
    "MATERIALIZED" and so doesn't know what comes after it.

    The way to fix it is to not try to use the sub-production but spell it
    all out:

        TRUNCATE TABLE relation_expr_list ...
    TRUNCATE MATERIALIZED VIEW relation_expr_list ...
    TRUNCATE relation_expr_list ...
    Now the parser doesn't have to make any shift-reduce decision until
    after it can "see past" the first identifier. It's a bit tedious
    but beats making a word more reserved than it has to be.

        regards, tom lane
  • Kevin Grittner at Feb 20, 2013 at 4:26 pm

    Tom Lane wrote:


    The way to fix it is to not try to use the sub-production but spell it
    all out:

    TRUNCATE TABLE relation_expr_list ...
    TRUNCATE MATERIALIZED VIEW relation_expr_list ...
    TRUNCATE relation_expr_list ...
    Now the parser doesn't have to make any shift-reduce decision until
    after it can "see past" the first identifier.  It's a bit tedious
    but beats making a word more reserved than it has to be.
    Thanks!  Will do.


    --
    Kevin Grittner
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Greg Stark at Feb 20, 2013 at 6:00 pm

    On Wed, Feb 20, 2013 at 4:26 PM, Kevin Grittner wrote:
    Tom Lane wrote:

    The way to fix it is to not try to use the sub-production but spell it
    all out:

    TRUNCATE TABLE relation_expr_list ...
    TRUNCATE MATERIALIZED VIEW relation_expr_list ...
    TRUNCATE relation_expr_list ...
    Now the parser doesn't have to make any shift-reduce decision until
    after it can "see past" the first identifier. It's a bit tedious
    but beats making a word more reserved than it has to be.
    Thanks! Will do.
    Fwiw I think worrying about stuff like this at this point is probably
    a waste of time. There'll be a period of bike-shedding where people
    debate what the command should be called so worrying about parser
    conflicts before there's a consensus is kind pointless.

    I would like to know what operations you plan to support independently
    of the command names. I may have missed much earlier in the discussion
    but then I suspect things have evolved since then.

    It sounds like you want to support:

    1) Selecting from materialized viws
    2) Manually refreshing materialized views
    3) Manually truncating materialized views

    And explicitly not support

    1) Automatically rewriting queries to select from matching views
    2) Incrementally refreshing materialized views
    3) Manual DML against data in materialized views (except truncate
    which is kind of DDL)
    4) Keeping track of whether the data in the materialized view is up to date

    I have to say I find this model a bit odd. It seems the UI you're
    presenting is that they're basically read-only tables that the
    database will fill in the data for automatically. My mental model of
    materialized views is that they're basically views that the database
    guarantees a different performance characteristic for.

    I would expect a materialized view to be up to date all the time. If
    we don't support incremental updates (which seems like a fine thing
    not to support in a first cut) then I would expect any DML against the
    table to mark the view invalid and any queries against it to produce
    an error (or possibly go to the source tables using the view
    definition but that's probably a bad idea for most use cases). Ie.
    they should behave like a view at all times and have up to date
    information or fail entirely.

    I would expect a command like TRUNCATE MATERIALIZED VIEW to exist but
    I would expect it to be called something like INVALIDATE rather than
    TRUNCATE and dropping the storage is a side effect of simply telling
    the database that it doesn't need to maintain this materialized view.
    Though I could be convinced "truncate" is a good name as long as it's
    documented well.

    --
    greg
  • Kevin Grittner at Feb 20, 2013 at 6:28 pm

    Greg Stark wrote:
    Kevin Grittner wrote:
    Tom Lane wrote:
    The way to fix it is to not try to use the sub-production but
    spell it all out:

    TRUNCATE TABLE relation_expr_list ...
    TRUNCATE MATERIALIZED VIEW relation_expr_list ...
    TRUNCATE relation_expr_list ...
    Now the parser doesn't have to make any shift-reduce decision
    until after it can "see past" the first identifier.  It's a bit
    tedious but beats making a word more reserved than it has to
    be.
    Thanks!  Will do.
    Fwiw I think worrying about stuff like this at this point is
    probably a waste of time. There'll be a period of bike-shedding
    where people debate what the command should be called so worrying
    about parser conflicts before there's a consensus is kind
    pointless.
    That sort of bikeshedding already happened three months ago.  Too
    late now.
    I would like to know what operations you plan to support
    independently of the command names. I may have missed much
    earlier in the discussion but then I suspect things have evolved
    since then.

    It sounds like you want to support:

    1) Selecting from materialized viws
    2) Manually refreshing materialized views
    3) Manually truncating materialized views

    And explicitly not support

    1) Automatically rewriting queries to select from matching views
    2) Incrementally refreshing materialized views
    Those are material for later releases, building on the base of what
    goes into this release.
    3) Manual DML against data in materialized views (except truncate
    which is kind of DDL)
    There is quite a lot of DML allowed -- changing tablespace,
    changing schema, changing name of the MV or of individual columns
    in it, changing statistics targets, creating indexes, and other
    operations are supported.
    4) Keeping track of whether the data in the materialized view is
    up to date
    Only keeping track of whether data has been populated or not, for
    now.  There has been agreement that one or more timestamps relating
    to freshness will make sense, but these are not in the initial
    patch.
    I have to say I find this model a bit odd.
    It's not a model, it's a starting point.  Several people have
    already said that even this much is useful and they expect to take
    advantage of it.  I'm doing what I can to not paint us into a
    corner where it's hard to extend to all the features everyone
    dreams of, but if we waited for that to commit something, it will
    never happen.
    I would expect a materialized view to be up to date all the time.
    I expect that this will eventually be an option, but I expect that
    is will be a seldom-used one.  Most cases that I've seen, people
    want summary data that is reasonably up-to-date without unduly
    affecting the performance of incremental changes to the underlying
    data.  I've sketched out the roadmap from this patch to all of
    these options in a vauge, handwavy fashion, and don't have a lot of
    interest in taking it farther until we're past 9.3 beta.
    If we don't support incremental updates (which seems like a fine
    thing not to support in a first cut) then I would expect any DML
    against the table to mark the view invalid and any queries
    against it to produce an error (or possibly go to the source
    tables using the view definition but that's probably a bad idea
    for most use cases). Ie. they should behave like a view at all
    times and have up to date information or fail entirely.
    That would render them completely useless for the use-cases I've
    seen.  If you want to offer a patch to do that as an option, feel
    free, but I will strongly argue against that as unconditional
    behavior.
    I would expect a command like TRUNCATE MATERIALIZED VIEW to exist
    but I would expect it to be called something like INVALIDATE
    rather than TRUNCATE and dropping the storage is a side effect of
    simply telling the database that it doesn't need to maintain this
    materialized view. Though I could be convinced "truncate" is a
    good name as long as it's documented well.
    I'm trying to minimize the number of new keywords.  The initial
    patch only added MATERIALIZED.  I added REFRESH due to
    near-universal demand for something other than the LOAD
    MATERIALIZED VIEW I initially used.  Have you seen the statistics
    Tom gave out on how much the size of the executable bloats with
    every new keyword?  Until now nobody has expressed concern about
    TRUNCATE MATERIALIZED VIEW, so it would take quite a groundswell of
    concern at this point to even consider a new keyword for this
    functionality this late in the game.

    --
    Kevin Grittner
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Josh Berkus at Feb 20, 2013 at 6:33 pm

    And explicitly not support

    1) Automatically rewriting queries to select from matching views
    2) Incrementally refreshing materialized views
    3) Manual DML against data in materialized views (except truncate
    which is kind of DDL)
    4) Keeping track of whether the data in the materialized view is up to date
    The idea is to add the above features over the next few versions of
    Postgres.
    I have to say I find this model a bit odd. It seems the UI you're
    presenting is that they're basically read-only tables that the
    database will fill in the data for automatically.
    This is what matviews are in other DBMSes.
    My mental model of
    materialized views is that they're basically views that the database
    guarantees a different performance characteristic for.
    How would we do that, exactly? That would be lovely, but unless you
    have a way to accomplish it ...
    I would expect a materialized view to be up to date all the time.
    Actually, there's a huge use case for asynchronously updated matviews,
    so we would not want an implementation which ruled them out. Also
    there's the argument that synchronously updated matviews have little
    actual performance advantage over regular dynamic views.

    Or to put it another way: I could use this feature, as it is, in about 8
    different projects I'm currently supporting. I personally can't think
    of a single project where I need synchronously updated matviews,
    currently. I have in the past, but it's a LOT less frequent that the
    desire for async, just as the desire for async replication is more
    common than the desire for syncrep.

    --
    Josh Berkus
    PostgreSQL Experts Inc.
    http://pgexperts.com
  • Tatsuo Ishii at Feb 21, 2013 at 12:45 am

    I would like to know what operations you plan to support independently
    of the command names. I may have missed much earlier in the discussion
    but then I suspect things have evolved since then.

    It sounds like you want to support:

    1) Selecting from materialized viws
    2) Manually refreshing materialized views
    3) Manually truncating materialized views
    Maybe plus?

    4) Automatically dropping materialized views if underlying table(s)
    are dropped/altered

    Or this has to be done manually?
    And explicitly not support

    1) Automatically rewriting queries to select from matching views
    2) Incrementally refreshing materialized views
    3) Manual DML against data in materialized views (except truncate
    which is kind of DDL)
    4) Keeping track of whether the data in the materialized view is up to date

    I have to say I find this model a bit odd. It seems the UI you're
    presenting is that they're basically read-only tables that the
    database will fill in the data for automatically. My mental model of
    materialized views is that they're basically views that the database
    guarantees a different performance characteristic for.

    I would expect a materialized view to be up to date all the time. If
    we don't support incremental updates (which seems like a fine thing
    not to support in a first cut) then I would expect any DML against the
    table to mark the view invalid and any queries against it to produce
    an error (or possibly go to the source tables using the view
    definition but that's probably a bad idea for most use cases). Ie.
    they should behave like a view at all times and have up to date
    information or fail entirely.

    I would expect a command like TRUNCATE MATERIALIZED VIEW to exist but
    I would expect it to be called something like INVALIDATE rather than
    TRUNCATE and dropping the storage is a side effect of simply telling
    the database that it doesn't need to maintain this materialized view.
    Though I could be convinced "truncate" is a good name as long as it's
    documented well.

    --
    greg


    --
    Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-hackers
  • Peter Eisentraut at Feb 20, 2013 at 6:57 pm

    On 2/19/13 5:47 PM, Kevin Grittner wrote:
    When I went to do this, I hit a shift/reduce conflict, because with
    TABLE being optional it couldn't tell whether:

    TRUNCATE MATERIALIZED VIEW x, y, z;

    ... was looking for five relations or three. That goes away with
    MATERIALIZED escalated to TYPE_FUNC_NAME_KEYWORD. Is that OK?
    Is TRUNCATE even the right command here? For regular tables TRUNCATE is
    a fast DELETE, which logically empties the table. For materialized
    views, there is no deleting, so this command (I suppose?) just
    invalidates the materalized view. That's not the same thing.

    Are there TRUNCATE triggers on materialized views?
  • Kevin Grittner at Feb 20, 2013 at 7:30 pm

    Peter Eisentraut wrote:
    On 2/19/13 5:47 PM, Kevin Grittner wrote:
    When I went to do this, I hit a shift/reduce conflict, because
    with TABLE being optional it couldn't tell whether:

    TRUNCATE MATERIALIZED VIEW x, y, z;

    ... was looking for five relations or three.  That goes away
    with MATERIALIZED escalated to TYPE_FUNC_NAME_KEYWORD.  Is that
    OK?
    Is TRUNCATE even the right command here?  For regular tables
    TRUNCATE is a fast DELETE, which logically empties the table.
    For materialized views, there is no deleting, so this command (I
    suppose?) just invalidates the materalized view.  That's not the
    same thing.
    Hmm.  That's what Greg Stark just said, and I replied that nobody
    else had raised the issue in over three months.  With Greg, that's
    two now.

    TRUNCATE MATERIALIZED VIEW discards any data which has been loaded
    into the MV, rendering it unavailable for scanning.  Internally, it
    does do a truncate, exactly as truncate table.  The resulting
    zero-length heap file is what is used to determine whether a
    materialized view is "scannable".  When a CREATE WITH DATA or a
    REFRESH generates zero rows, an empty single page is created to
    indicate that it is scannable (valid to use in queries) but
    contains no rows.

    I agree that INVALIDATE is probably more descriptive, although it
    seems that there might be some even better word if we bikeshed
    enough.  The question is, is it worth creating a new keyword to
    call the internal truncate function for materialized views, versus
    documenting that truncating a materialized view renders it invalid?
    Again, given the numbers that Tom presented a while back about the
    space requirements of every new keyword, I don't think this is
    enough of a gain to justify that.  I still squirm a little about
    having used REFRESH, even though demand for that was overwhelming.
    Are there TRUNCATE triggers on materialized views?
    No.  Nor SELECT, INSERT, UPDATE, or DELETE triggers.  You can't
    create a trigger of any type on a materialized view.  I don't think
    that would interfere with event triggers, though.

    --
    Kevin Grittner
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company

Related Discussions

People

Translate

site design / logo © 2021 Grokbase