FAQ
pg_dump restore times can be high when they include many ALTER TABLE ADD
FORIEGN KEY statements, since each statement checks the data to see if
it is fully valid in all cases.

I've been asked "why we run that at all?", since if we dumped the tables
together, we already know they match.

If we had a way of pg_dump passing on the information that the test
already passes, we would be able to skip the checks.

Proposal:

* Introduce a new mode for ALTER TABLE ADD FOREIGN KEY [WITHOUT CHECK];
When we run WITHOUT CHECK, iff both the source and target table are
newly created in this transaction, then we skip the check. If the check
is skipped we mark the constraint as being unchecked, so we can tell
later if this has been used.

* Have pg_dump write the new syntax into its dumps, when both the source
and target table are dumped in same run

I'm guessing that the WITHOUT CHECK option would not be acceptable as an
unprotected trap for our lazy and wicked users. :-)

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

Search Discussions

  • Heikki Linnakangas at Jun 5, 2008 at 7:19 am

    Simon Riggs wrote:
    I'm guessing that the WITHOUT CHECK option would not be acceptable as an
    unprotected trap for our lazy and wicked users. :-)
    Yes, that sounds scary.

    Instead, I'd suggest finding ways to speed up the ALTER TABLE ADD
    FOREIGN KEY. Or speeding up COPY into a table with foreign keys already
    defined. For example, you might want to build an in-memory hash table of
    the keys in the target table, instead of issuing a query on each INSERT,
    if the target table isn't huge.

    Nothing beats the speed of simply not checking the constraint, of
    course, but I'd hate to lose the protection it gives.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Simon Riggs at Jun 5, 2008 at 7:42 am

    On Thu, 2008-06-05 at 10:19 +0300, Heikki Linnakangas wrote:
    Simon Riggs wrote:
    I'm guessing that the WITHOUT CHECK option would not be acceptable as an
    unprotected trap for our lazy and wicked users. :-)
    Yes, that sounds scary.

    Instead, I'd suggest finding ways to speed up the ALTER TABLE ADD
    FOREIGN KEY.
    I managed a suggestion for improving it for integers only, but if
    anybody has any other ideas, I'm all ears.
    Or speeding up COPY into a table with foreign keys already
    defined. For example, you might want to build an in-memory hash table of
    the keys in the target table, instead of issuing a query on each INSERT,
    if the target table isn't huge.
    No, that's not the problem, but I agree that is a problem also.
    Nothing beats the speed of simply not checking the constraint, of
    course, but I'd hate to lose the protection it gives.
    Are you saying you don't like the rest of the proposal, or just don't
    like the idea of having that added as an unprotected option, but find
    the proposal acceptable?

    --
    Simon Riggs www.2ndQuadrant.com
    PostgreSQL Training, Services and Support
  • Heikki Linnakangas at Jun 5, 2008 at 12:23 pm

    Simon Riggs wrote:
    Are you saying you don't like the rest of the proposal, or just don't
    like the idea of having that added as an unprotected option, but find
    the proposal acceptable?
    I don't like the idea of having an unprotected option. If we were going
    to have one, I wouldn't bother with the extra checks you proposed; it's
    going to be unsafe anyway.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Heikki Linnakangas at Jun 5, 2008 at 1:01 pm

    Simon Riggs wrote:
    On Thu, 2008-06-05 at 10:19 +0300, Heikki Linnakangas wrote:
    Simon Riggs wrote:
    I'm guessing that the WITHOUT CHECK option would not be acceptable as an
    unprotected trap for our lazy and wicked users. :-)
    Yes, that sounds scary.

    Instead, I'd suggest finding ways to speed up the ALTER TABLE ADD
    FOREIGN KEY.
    I managed a suggestion for improving it for integers only, but if
    anybody has any other ideas, I'm all ears.
    Well, one idea would be to allow adding multiple foreign keys in one
    command, and checking them all at once with one SQL query instead of one
    per foreign key. Right now we need one seq scan over the table per
    foreign key, by checking all references at once we would only need one
    seq scan to check them all.
    Or speeding up COPY into a table with foreign keys already
    defined. For example, you might want to build an in-memory hash table of
    the keys in the target table, instead of issuing a query on each INSERT,
    if the target table isn't huge.
    No, that's not the problem, but I agree that is a problem also.
    It is related, because if we can make COPY into a table with foreign
    keys fast enough, we could rearrange dumps so that foreign keys are
    created before loading data. That would save the seqscan over the table
    altogether.

    Thinking about this idea a bit more, instead of loading the whole target
    table into memory, it would probably make more sense to keep a hash
    table as just a cache of the most recent keys that have been referenced.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Simon Riggs at Jun 5, 2008 at 1:25 pm

    On Thu, 2008-06-05 at 16:01 +0300, Heikki Linnakangas wrote:
    Simon Riggs wrote:
    On Thu, 2008-06-05 at 10:19 +0300, Heikki Linnakangas wrote:
    Simon Riggs wrote:
    I'm guessing that the WITHOUT CHECK option would not be acceptable as an
    unprotected trap for our lazy and wicked users. :-)
    Yes, that sounds scary.

    Instead, I'd suggest finding ways to speed up the ALTER TABLE ADD
    FOREIGN KEY.
    I managed a suggestion for improving it for integers only, but if
    anybody has any other ideas, I'm all ears.
    Well, one idea would be to allow adding multiple foreign keys in one
    command, and checking them all at once with one SQL query instead of one
    per foreign key. Right now we need one seq scan over the table per
    foreign key, by checking all references at once we would only need one
    seq scan to check them all.
    No need. Just parallelise the restore with concurrent psql. Which would
    speed up the index creation also. Does Greg have plans for further work?
    Or speeding up COPY into a table with foreign keys already
    defined. For example, you might want to build an in-memory hash table of
    the keys in the target table, instead of issuing a query on each INSERT,
    if the target table isn't huge.
    No, that's not the problem, but I agree that is a problem also.
    It is related, because if we can make COPY into a table with foreign
    keys fast enough, we could rearrange dumps so that foreign keys are
    created before loading data. That would save the seqscan over the table
    altogether. True.
    Thinking about this idea a bit more, instead of loading the whole target
    table into memory, it would probably make more sense to keep a hash
    table as just a cache of the most recent keys that have been referenced.
    If you can think of a way of improving hash joins generally, then it
    will work for this specific case also.

    --
    Simon Riggs www.2ndQuadrant.com
    PostgreSQL Training, Services and Support
  • Heikki Linnakangas at Jun 5, 2008 at 1:36 pm

    Simon Riggs wrote:
    On Thu, 2008-06-05 at 16:01 +0300, Heikki Linnakangas wrote:
    Well, one idea would be to allow adding multiple foreign keys in one
    command, and checking them all at once with one SQL query instead of one
    per foreign key. Right now we need one seq scan over the table per
    foreign key, by checking all references at once we would only need one
    seq scan to check them all.
    No need. Just parallelise the restore with concurrent psql. Which would
    speed up the index creation also.
    True, you could do that.
    Does Greg have plans for further work?
    I believe he's busy with other stuff at the moment.
    Thinking about this idea a bit more, instead of loading the whole target
    table into memory, it would probably make more sense to keep a hash
    table as just a cache of the most recent keys that have been referenced.
    If you can think of a way of improving hash joins generally, then it
    will work for this specific case also.
    Individual RI checks performed on inserts/COPY don't do a hash join. The
    bulk check done by ALTER TABLE ADD FOREIGN KEY does, but that's
    different issue.

    This hash table would be a specific trick to speed up RI checks. If
    you're anyway I/O bound, it wouldn't help, and you'd already be better
    off creating the foreign key first and loading the data after that.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Richard Huxton at Jun 5, 2008 at 8:56 am

    Simon Riggs wrote:

    If we had a way of pg_dump passing on the information that the test
    already passes, we would be able to skip the checks.

    Proposal:

    * Introduce a new mode for ALTER TABLE ADD FOREIGN KEY [WITHOUT CHECK];
    * Have pg_dump write the new syntax into its dumps, when both the source
    and target table are dumped in same I've been known to manually tweak dumps before now. I can see me
    forgetting this.

    What about pg_dump writing out a row-count and MD5 of the rows in the
    COPY (just a textual calculation). Iff the restore checksum matches the
    dump checksum for both tables then the foreign-keys can be skipped.

    If the restore checksum doesn't match the dump then it can issue a
    warning, but continue and run the full fkey check.

    --
    Richard Huxton
    Archonet Ltd
  • Andrew Dunstan at Jun 5, 2008 at 11:58 am

    Simon Riggs wrote:
    pg_dump restore times can be high when they include many ALTER TABLE ADD
    FORIEGN KEY statements, since each statement checks the data to see if
    it is fully valid in all cases.

    I've been asked "why we run that at all?", since if we dumped the tables
    together, we already know they match.

    If we had a way of pg_dump passing on the information that the test
    already passes, we would be able to skip the checks.

    Proposal:

    * Introduce a new mode for ALTER TABLE ADD FOREIGN KEY [WITHOUT CHECK];
    When we run WITHOUT CHECK, iff both the source and target table are
    newly created in this transaction, then we skip the check. If the check
    is skipped we mark the constraint as being unchecked, so we can tell
    later if this has been used.

    * Have pg_dump write the new syntax into its dumps, when both the source
    and target table are dumped in same run

    I'm guessing that the WITHOUT CHECK option would not be acceptable as an
    unprotected trap for our lazy and wicked users. :-)
    This whole proposal would be a major footgun which would definitely be
    abused, IMNSHO.

    I think Heikki's idea of speeding up the check using a hash table of the
    foreign keys possibly has merit.

    cheers

    andrew
  • Simon Riggs at Jun 5, 2008 at 12:53 pm

    On Thu, 2008-06-05 at 07:57 -0400, Andrew Dunstan wrote:

    Simon Riggs wrote:
    pg_dump restore times can be high when they include many ALTER TABLE ADD
    FORIEGN KEY statements, since each statement checks the data to see if
    it is fully valid in all cases.

    I've been asked "why we run that at all?", since if we dumped the tables
    together, we already know they match.

    If we had a way of pg_dump passing on the information that the test
    already passes, we would be able to skip the checks.

    Proposal:

    * Introduce a new mode for ALTER TABLE ADD FOREIGN KEY [WITHOUT CHECK];
    When we run WITHOUT CHECK, iff both the source and target table are
    newly created in this transaction, then we skip the check. If the check
    is skipped we mark the constraint as being unchecked, so we can tell
    later if this has been used.

    * Have pg_dump write the new syntax into its dumps, when both the source
    and target table are dumped in same run

    I'm guessing that the WITHOUT CHECK option would not be acceptable as an
    unprotected trap for our lazy and wicked users. :-)
    This whole proposal would be a major footgun which would definitely be
    abused, IMNSHO.
    OK, understood. Two negatives is enough to sink it.
    I think Heikki's idea of speeding up the check using a hash table of the
    foreign keys possibly has merit.
    The query is sent through SPI, so if there was a way to speed this up,
    we would already be using it implicitly. If we find a way to speed up
    joins it will improve the FK check also.

    The typical join plan for the check query is already a hash join,
    assuming the target table is small enough. If not, its a huge sort/merge
    join. So in a way, we already follow the suggestion.

    --
    Simon Riggs www.2ndQuadrant.com
    PostgreSQL Training, Services and Support
  • Robert Treat at Jun 7, 2008 at 5:08 pm

    On Thursday 05 June 2008 08:56:35 Simon Riggs wrote:
    On Thu, 2008-06-05 at 07:57 -0400, Andrew Dunstan wrote:
    Simon Riggs wrote:
    pg_dump restore times can be high when they include many ALTER TABLE
    ADD FORIEGN KEY statements, since each statement checks the data to see
    if it is fully valid in all cases.

    I've been asked "why we run that at all?", since if we dumped the
    tables together, we already know they match.

    If we had a way of pg_dump passing on the information that the test
    already passes, we would be able to skip the checks.

    Proposal:

    * Introduce a new mode for ALTER TABLE ADD FOREIGN KEY [WITHOUT CHECK];
    When we run WITHOUT CHECK, iff both the source and target table are
    newly created in this transaction, then we skip the check. If the check
    is skipped we mark the constraint as being unchecked, so we can tell
    later if this has been used.

    * Have pg_dump write the new syntax into its dumps, when both the
    source and target table are dumped in same run

    I'm guessing that the WITHOUT CHECK option would not be acceptable as
    an unprotected trap for our lazy and wicked users. :-)
    This whole proposal would be a major footgun which would definitely be
    abused, IMNSHO.
    OK, understood. Two negatives is enough to sink it.
    Heh, I would have argued that the idea should go the other way and just make
    this part of the normal syntax. Oracle DBA's have been doing this for years
    (MS SQL supports it too actually) and it really helps working around having
    to hold locks on large relations for lengthy periods of times. Heck, I'd like
    to see a no check option for all constraints really.

    --
    Robert Treat
    Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
  • Simon Riggs at Jun 7, 2008 at 5:38 pm

    On Sat, 2008-06-07 at 13:08 -0400, Robert Treat wrote:
    On Thursday 05 June 2008 08:56:35 Simon Riggs wrote:
    On Thu, 2008-06-05 at 07:57 -0400, Andrew Dunstan wrote:
    Heh, I would have argued that the idea should go the other way and
    just make this part of the normal syntax. Oracle DBA's have been
    doing this for years (MS SQL supports it too actually) and it really
    helps working around having to hold locks on large relations for
    lengthy periods of times. Heck, I'd like to see a no check option for
    all constraints really.
    Interesting that SQL Server does it also.

    Holding the lock for a long period is just one more problem. :-)

    I'm always torn between the I-know-what-Im-doing-so-give-me-the-option
    viewpoint and the some-dumbass-will-abuse-it viewpoint. I see the
    results of both viewpoints daily.

    Perhaps we need a GUC that says expert_mode = on. In expert_mode we are
    allowed to do a range of things that are normally avoided - there would
    be an explicit list. Managers can then take a single considered decision
    as to whether the situation warrants extreme action and their DBA is
    good enough to handle it. That might resolve our continued angst about
    whether our users our smart enough to avoid the gotchas, or just smart
    enough to win a DBA's Darwin Award.

    The UNIX philosophy has always been to allow the power to exist, yet
    seek to minimise the number of people who exercise it. Another idea
    might be to make such command options superuser only, to ensure the
    power is available, yet only in the hands of, by-definition, the trusted
    few.

    --
    Simon Riggs www.2ndQuadrant.com
    PostgreSQL Training, Services and Support
  • Andrew Dunstan at Jun 7, 2008 at 7:00 pm

    Simon Riggs wrote:
    On Sat, 2008-06-07 at 13:08 -0400, Robert Treat wrote:
    On Thursday 05 June 2008 08:56:35 Simon Riggs wrote:

    On Thu, 2008-06-05 at 07:57 -0400, Andrew Dunstan wrote:
    Heh, I would have argued that the idea should go the other way and
    just make this part of the normal syntax. Oracle DBA's have been
    doing this for years (MS SQL supports it too actually) and it really
    helps working around having to hold locks on large relations for
    lengthy periods of times. Heck, I'd like to see a no check option for
    all constraints really.
    Interesting that SQL Server does it also.

    Holding the lock for a long period is just one more problem. :-)

    I'm always torn between the I-know-what-Im-doing-so-give-me-the-option
    viewpoint and the some-dumbass-will-abuse-it viewpoint. I see the
    results of both viewpoints daily.

    Perhaps we need a GUC that says expert_mode = on. In expert_mode we are
    allowed to do a range of things that are normally avoided - there would
    be an explicit list. Managers can then take a single considered decision
    as to whether the situation warrants extreme action and their DBA is
    good enough to handle it. That might resolve our continued angst about
    whether our users our smart enough to avoid the gotchas, or just smart
    enough to win a DBA's Darwin Award.

    The UNIX philosophy has always been to allow the power to exist, yet
    seek to minimise the number of people who exercise it. Another idea
    might be to make such command options superuser only, to ensure the
    power is available, yet only in the hands of, by-definition, the trusted
    few.
    If we go down this road then I would far rather we tried to devise some
    safe (or semi-safe) way of doing it instead of simply providing expert
    (a.k.a. footgun) mode.

    For instance, I'm wondering if we could do something with checksums of
    the input lines or something else that would make this difficult to do
    in circumstances other than pg_restore.

    cheers

    andrew
  • Decibel! at Jun 9, 2008 at 12:43 pm

    On Jun 7, 2008, at 2:00 PM, Andrew Dunstan wrote:
    If we go down this road then I would far rather we tried to devise
    some safe (or semi-safe) way of doing it instead of simply
    providing expert (a.k.a. footgun) mode.

    For instance, I'm wondering if we could do something with checksums
    of the input lines or something else that would make this difficult
    to do in circumstances other than pg_restore.

    Yes, but that provides no help at all outside of pg_dump. Being able
    to add a FK with NO CHECK would be tremendously useful outside of
    pg_dump. Actually, in the interest of stating the problem and not the
    solution, what we need is a way to add FKs that doesn't lock
    everything up to perform the key checks. Perhaps there is some semi-
    safe way that the constraint could be added and the checks done in
    the background...

    As for the footgun aspect, are we the enterprise-class OSS database
    or the one that caters itself to noobs that will go out of their way
    to make life hard on themselves? I'm all in favor of not adding
    footguns that don't have value, but this one holds a lot of value for
    anyone trying to maintain a large database in a 24/7 environment. To
    put this in perspective, the amount of revenue we would loose from
    adding just one FK to one of our larger tables would more than cover
    paying someone to develop this feature.
    --
    Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org
    Give your computer some brain candy! www.distributed.net Team #1828
  • Andrew Dunstan at Jun 9, 2008 at 2:17 pm

    Decibel! wrote:

    Yes, but that provides no help at all outside of pg_dump. Being able
    to add a FK with NO CHECK would be tremendously useful outside of
    pg_dump. Actually, in the interest of stating the problem and not the
    solution, what we need is a way to add FKs that doesn't lock
    everything up to perform the key checks. Perhaps there is some
    semi-safe way that the constraint could be added and the checks done
    in the background...
    I had some thoughts along the same lines.

    But how do you propose to recover when the check fails? What should
    pg_restore do if the dump is corrupt causing an FK check to fail?

    I suppose we could have some sort of marking for FK constraints along
    the lines of {checked, unchecked, invalid}.
    As for the footgun aspect, are we the enterprise-class OSS database or
    the one that caters itself to noobs that will go out of their way to
    make life hard on themselves?
    We are the database that tries very hard to keep its promises. If you
    want to change or relax those promises then the implications need to be
    very very clear.
    I'm all in favor of not adding footguns that don't have value, but
    this one holds a lot of value for anyone trying to maintain a large
    database in a 24/7 environment. To put this in perspective, the amount
    of revenue we would loose from adding just one FK to one of our larger
    tables would more than cover paying someone to develop this feature.

    Come up with a good proposal and I'm your man :-) I haven't seen one yet.

    cheers

    andrew
  • Tom Lane at Jun 9, 2008 at 2:58 pm

    Decibel! <decibel@decibel.org> writes:
    Actually, in the interest of stating the problem and not the
    solution, what we need is a way to add FKs that doesn't lock
    everything up to perform the key checks.
    Ah, finally a useful comment. I think it might be possible to do an
    "add FK concurrently" type of command that would take exclusive lock
    for just long enough to add the triggers, then scan the tables with just
    AccessShareLock to see if the existing rows meet the constraint, and
    if so finally mark the constraint "valid". Meanwhile the constraint
    would be enforced against newly-added rows by the triggers, so nothing
    gets missed. You'd still get a small hiccup in system performance
    from the transient exclusive lock, but nothing like as bad as it is
    now. Would that solve your problem?

    regards, tom lane
  • Simon Riggs at Jun 9, 2008 at 3:10 pm

    On Mon, 2008-06-09 at 10:57 -0400, Tom Lane wrote:
    Decibel! <decibel@decibel.org> writes:
    Actually, in the interest of stating the problem and not the
    solution, what we need is a way to add FKs that doesn't lock
    everything up to perform the key checks.
    Ah, finally a useful comment. I think it might be possible to do an
    "add FK concurrently" type of command that would take exclusive lock
    for just long enough to add the triggers, then scan the tables with just
    AccessShareLock to see if the existing rows meet the constraint, and
    if so finally mark the constraint "valid". Meanwhile the constraint
    would be enforced against newly-added rows by the triggers, so nothing
    gets missed. You'd still get a small hiccup in system performance
    from the transient exclusive lock, but nothing like as bad as it is
    now. Would that solve your problem?
    That's good, but it doesn't solve the original user complaint about
    needing to re-run many, many large queries to which we already know the
    answer.

    --
    Simon Riggs www.2ndQuadrant.com
    PostgreSQL Training, Services and Support
  • Andrew Dunstan at Jun 9, 2008 at 3:24 pm

    Simon Riggs wrote:
    On Mon, 2008-06-09 at 10:57 -0400, Tom Lane wrote:

    Decibel! <decibel@decibel.org> writes:
    Actually, in the interest of stating the problem and not the
    solution, what we need is a way to add FKs that doesn't lock
    everything up to perform the key checks.
    Ah, finally a useful comment. I think it might be possible to do an
    "add FK concurrently" type of command that would take exclusive lock
    for just long enough to add the triggers, then scan the tables with just
    AccessShareLock to see if the existing rows meet the constraint, and
    if so finally mark the constraint "valid". Meanwhile the constraint
    would be enforced against newly-added rows by the triggers, so nothing
    gets missed. You'd still get a small hiccup in system performance
    from the transient exclusive lock, but nothing like as bad as it is
    now. Would that solve your problem?
    That's good, but it doesn't solve the original user complaint about
    needing to re-run many, many large queries to which we already know the
    answer.
    But we don't know it for dead sure, we only think we do. What if the
    data for one or other of the tables is corrupted? We'll end up with data
    we believe is consistent but in fact is not, ISTM. If you can somehow
    guarantee the integrity of data in both tables then we might be
    justified in assuming that the FK constraint will be consistent - that's
    why I suggested some sort of checksum mechanism might serve the purpose.

    cheers

    andrew
  • Simon Riggs at Jun 9, 2008 at 3:43 pm

    On Mon, 2008-06-09 at 11:23 -0400, Andrew Dunstan wrote:

    Simon Riggs wrote:
    On Mon, 2008-06-09 at 10:57 -0400, Tom Lane wrote:

    Decibel! <decibel@decibel.org> writes:
    Actually, in the interest of stating the problem and not the
    solution, what we need is a way to add FKs that doesn't lock
    everything up to perform the key checks.
    Ah, finally a useful comment. I think it might be possible to do an
    "add FK concurrently" type of command that would take exclusive lock
    for just long enough to add the triggers, then scan the tables with just
    AccessShareLock to see if the existing rows meet the constraint, and
    if so finally mark the constraint "valid". Meanwhile the constraint
    would be enforced against newly-added rows by the triggers, so nothing
    gets missed. You'd still get a small hiccup in system performance
    from the transient exclusive lock, but nothing like as bad as it is
    now. Would that solve your problem?
    That's good, but it doesn't solve the original user complaint about
    needing to re-run many, many large queries to which we already know the
    answer.
    But we don't know it for dead sure, we only think we do. What if the
    data for one or other of the tables is corrupted? We'll end up with data
    we believe is consistent but in fact is not, ISTM. If you can somehow
    guarantee the integrity of data in both tables then we might be
    justified in assuming that the FK constraint will be consistent - that's
    why I suggested some sort of checksum mechanism might serve the purpose.
    Agreed.

    Can we get COPY to output the checksum of its output as part of the
    command tag? How else can we return the checksum? In $file.cksum for any
    given output file?

    We can then use an explicit checksum option in the COPY when we reload,
    with CHECKSUM option.

    --
    Simon Riggs www.2ndQuadrant.com
    PostgreSQL Training, Services and Support
  • Andrew Dunstan at Jun 9, 2008 at 4:10 pm

    Simon Riggs wrote:
    But we don't know it for dead sure, we only think we do. What if the
    data for one or other of the tables is corrupted? We'll end up with data
    we believe is consistent but in fact is not, ISTM. If you can somehow
    guarantee the integrity of data in both tables then we might be
    justified in assuming that the FK constraint will be consistent - that's
    why I suggested some sort of checksum mechanism might serve the purpose.
    Agreed.

    Can we get COPY to output the checksum of its output as part of the
    command tag? How else can we return the checksum? In $file.cksum for any
    given output file?
    It seems a reasonable idea to use the command tag, unless that's going
    to break lots of stuff. I think the only thing we can usefully checksum
    is the output lines in the client encoding.
    We can then use an explicit checksum option in the COPY when we reload,
    with CHECKSUM option.
    We need rather more than this to make sure your facility isn't abused.
    That's the part that I haven't been able to think of a good answer for
    (yet).

    cheers

    andrew
  • Tom Lane at Jun 9, 2008 at 3:33 pm

    Simon Riggs writes:
    On Mon, 2008-06-09 at 10:57 -0400, Tom Lane wrote:
    Ah, finally a useful comment. I think it might be possible to do an
    "add FK concurrently" type of command that would take exclusive lock
    That's good, but it doesn't solve the original user complaint about
    needing to re-run many, many large queries to which we already know the
    answer.
    No, we are running a large query to which the user *thinks* he knows the
    answer. There are any number of reasons why he might be wrong.

    regards, tom lane
  • Simon Riggs at Jun 9, 2008 at 3:44 pm

    On Mon, 2008-06-09 at 11:33 -0400, Tom Lane wrote:
    Simon Riggs <simon@2ndquadrant.com> writes:
    On Mon, 2008-06-09 at 10:57 -0400, Tom Lane wrote:
    Ah, finally a useful comment. I think it might be possible to do an
    "add FK concurrently" type of command that would take exclusive lock
    That's good, but it doesn't solve the original user complaint about
    needing to re-run many, many large queries to which we already know the
    answer.
    No, we are running a large query to which the user *thinks* he knows the
    answer. There are any number of reasons why he might be wrong.
    Of course. I should have said "to which we already know the answer" to
    indicate I'm passing on others' criticisms of us.

    --
    Simon Riggs www.2ndQuadrant.com
    PostgreSQL Training, Services and Support
  • Tom Lane at Jun 9, 2008 at 3:59 pm

    Simon Riggs writes:
    On Mon, 2008-06-09 at 11:33 -0400, Tom Lane wrote:
    No, we are running a large query to which the user *thinks* he knows the
    answer. There are any number of reasons why he might be wrong.
    Of course. I should have said "to which we already know the answer" to
    indicate I'm passing on others' criticisms of us.
    [ shrug... ] We don't know the answer either, and anyone who says
    we do is merely betraying his ignorance of the number of ways to load
    a foot-gun.

    I don't have any confidence in the "checksum" proposal either, as it's
    still naively assuming that changes in the data are the only possible
    problem. Consider that you are loading the data into a new database,
    which might be running under a different locale setting, might contain
    a new implementation of a datatype with subtly (or not so subtly)
    different semantics, or might just already contain data in the target
    tables. pg_dump scripts are not nearly bright enough to defend against
    these types of threats.

    regards, tom lane
  • Robert Treat at Jun 9, 2008 at 4:43 pm

    On Monday 09 June 2008 11:59:27 Tom Lane wrote:
    Simon Riggs <simon@2ndquadrant.com> writes:
    On Mon, 2008-06-09 at 11:33 -0400, Tom Lane wrote:
    No, we are running a large query to which the user *thinks* he knows the
    answer. There are any number of reasons why he might be wrong.
    Of course. I should have said "to which we already know the answer" to
    indicate I'm passing on others' criticisms of us.
    [ shrug... ] We don't know the answer either, and anyone who says
    we do is merely betraying his ignorance of the number of ways to load
    a foot-gun.
    I think the more realistic scenario (based on the FK idea) is that you want to
    prevent any future rows from coming without validating the FK, and you're
    willing to clean up any violators after the fact, since you can make that
    an "out of the critical path" operation.

    if you extend this to a more general "create constraint concurrently" (to
    handle normal constraint, not null constraints, etc...), it would certainly
    be a big win, and i think most would see it as a reasonable compromise.

    --
    Robert Treat
    Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
  • Simon Riggs at Jun 9, 2008 at 5:52 pm

    On Mon, 2008-06-09 at 12:37 -0400, Robert Treat wrote:
    On Monday 09 June 2008 11:59:27 Tom Lane wrote:
    Simon Riggs <simon@2ndquadrant.com> writes:
    On Mon, 2008-06-09 at 11:33 -0400, Tom Lane wrote:
    No, we are running a large query to which the user *thinks* he knows the
    answer. There are any number of reasons why he might be wrong.
    Of course. I should have said "to which we already know the answer" to
    indicate I'm passing on others' criticisms of us.
    [ shrug... ] We don't know the answer either, and anyone who says
    we do is merely betraying his ignorance of the number of ways to load
    a foot-gun.
    I think the more realistic scenario (based on the FK idea) is that you want to
    prevent any future rows from coming without validating the FK, and you're
    willing to clean up any violators after the fact, since you can make that
    an "out of the critical path" operation.

    if you extend this to a more general "create constraint concurrently" (to
    handle normal constraint, not null constraints, etc...), it would certainly
    be a big win, and i think most would see it as a reasonable compromise.
    Agreed. I think the "out of the critical path" action is more likely to
    be the intended path rather than the "never check at all" route.

    If we break down the action into two parts.

    ALTER TABLE ... ADD CONSTRAINT foo FOREIGN KEY ... NOVALIDATE;
    which holds exclusive lock, but only momentarily
    After this runs any new data is validated at moment of data change, but
    the older data has yet to be validated.

    ALTER TABLE ... VALIDATE CONSTRAINT foo
    which runs lengthy check, though only grabs lock as last part of action

    This way we have the ability to add them concurrently if we choose by
    running one after the other, or we can run first part only for now and
    run the other one at a more convenient moment.

    On a full set of checks on a large complex database can easily take
    hours or even days.

    We should allow this. It's not a footgun, its an honest attempt by
    people to add RI checks to their database. The only other alternative
    for some people is to not add FKs at all, which is also a footgun, but
    we don't seem bothered that they might take that option.

    --
    Simon Riggs www.2ndQuadrant.com
    PostgreSQL Training, Services and Support
  • Alvaro Herrera at Jun 9, 2008 at 6:07 pm

    Simon Riggs wrote:

    If we break down the action into two parts.

    ALTER TABLE ... ADD CONSTRAINT foo FOREIGN KEY ... NOVALIDATE;
    which holds exclusive lock, but only momentarily
    After this runs any new data is validated at moment of data change, but
    the older data has yet to be validated.

    ALTER TABLE ... VALIDATE CONSTRAINT foo
    which runs lengthy check, though only grabs lock as last part of action
    The problem I see with this approach in general (two-phase FK creation)
    is that you have to keep the same transaction for the first and second
    command, but you really want concurrent backends to see the tuple for
    the not-yet-validated constraint row.

    Another benefit that could arise from this is that the hypothetical
    VALIDATE CONSTRAINT step could validate more than one constraint at a
    time, possibly processing all the constraints with a single table scan.

    Perhaps VALIDATE CONSTRAINT could be handled as an automatic commit-time
    action.

    --
    Alvaro Herrera http://www.CommandPrompt.com/
    PostgreSQL Replication, Consulting, Custom Development, 24x7 support
  • Gregory Stark at Jun 9, 2008 at 8:52 pm

    "Alvaro Herrera" <alvherre@commandprompt.com> writes:

    The problem I see with this approach in general (two-phase FK creation)
    is that you have to keep the same transaction for the first and second
    command, but you really want concurrent backends to see the tuple for
    the not-yet-validated constraint row.
    Do you? It seems like having a constraint which is enforced on any new
    operations but which doesn't guarantee that existing records satisfy it is a
    useful feature in itself -- separating the two concepts "this property is true
    for all records" and "any action taken must leave the record with this
    property"

    ISTM you can validate an "invalid" constraint using any snapshot taken at any
    time >= the original snapshot. As long as the constraint is being enforced for
    all transactions which start after the validating snapshot's xmin then when
    it's done it can know the constraint is valid.

    Taking a lock on the table to create the constraint certainly leaves that
    property fulfilled. Actually it seems we could not take any lock and just
    check when it comes time to do the validation that the snapshot's xmin is >=
    the xmin on the constraint. I'm starting to get leery of all these tightly
    argued bits of logic though. Each one on its own is safe but the resulting
    system is getting to be quite complex.
    Another benefit that could arise from this is that the hypothetical
    VALIDATE CONSTRAINT step could validate more than one constraint at a
    time, possibly processing all the constraints with a single table scan.
    Interesting.
    Perhaps VALIDATE CONSTRAINT could be handled as an automatic commit-time
    action.
    I don't really like this, at least not as the only option, because as I said
    above and Robert Treat also said, it could be useful to have the constraint in
    place for new operations but check it for the existing data at some later
    date. (Or even never)

    --
    Gregory Stark
    EnterpriseDB http://www.enterprisedb.com
    Ask me about EnterpriseDB's Slony Replication support!
  • Simon Riggs at Jun 10, 2008 at 12:26 am

    On Mon, 2008-06-09 at 14:07 -0400, Alvaro Herrera wrote:
    Simon Riggs wrote:
    If we break down the action into two parts.

    ALTER TABLE ... ADD CONSTRAINT foo FOREIGN KEY ... NOVALIDATE;
    which holds exclusive lock, but only momentarily
    After this runs any new data is validated at moment of data change, but
    the older data has yet to be validated.

    ALTER TABLE ... VALIDATE CONSTRAINT foo
    which runs lengthy check, though only grabs lock as last part of action
    The problem I see with this approach in general (two-phase FK creation)
    is that you have to keep the same transaction for the first and second
    command, but you really want concurrent backends to see the tuple for
    the not-yet-validated constraint row.
    Well, they *must* be in separate transactions if we are to avoid holding
    an AccessExclusiveLock while we perform the check. Plus the whole idea
    is to perform the second part at some other non-critical time, though we
    all agree that never performing the check at all is foolhardy.

    Maybe we say that you can defer the check, but after a while autovacuum
    runs it for you if you haven't done so. It would certainly be useful to
    run the VALIDATE part as a background task with vacuum wait enabled.
    Another benefit that could arise from this is that the hypothetical
    VALIDATE CONSTRAINT step could validate more than one constraint at a
    time, possibly processing all the constraints with a single table scan.
    Good thought, though not as useful for FK checks.

    --
    Simon Riggs www.2ndQuadrant.com
    PostgreSQL Training, Services and Support
  • Alvaro Herrera at Jun 10, 2008 at 12:55 am

    Simon Riggs wrote:

    Maybe we say that you can defer the check, but after a while autovacuum
    runs it for you if you haven't done so. It would certainly be useful to
    run the VALIDATE part as a background task with vacuum wait enabled.
    It would be useful if there was anywhere to report the error to, or an
    action that could be taken automatically.

    --
    Alvaro Herrera http://www.CommandPrompt.com/
    PostgreSQL Replication, Consulting, Custom Development, 24x7 support
  • Filip Rembiałkowski at Jun 9, 2008 at 4:09 pm
    2008/6/9 Simon Riggs <simon@2ndquadrant.com>:
    On Mon, 2008-06-09 at 10:57 -0400, Tom Lane wrote:
    Decibel! <decibel@decibel.org> writes:
    Actually, in the interest of stating the problem and not the
    solution, what we need is a way to add FKs that doesn't lock
    everything up to perform the key checks.
    Ah, finally a useful comment. I think it might be possible to do an
    "add FK concurrently" type of command that would take exclusive lock
    for just long enough to add the triggers, then scan the tables with just
    AccessShareLock to see if the existing rows meet the constraint, and
    if so finally mark the constraint "valid". Meanwhile the constraint
    would be enforced against newly-added rows by the triggers, so nothing
    gets missed. You'd still get a small hiccup in system performance
    from the transient exclusive lock, but nothing like as bad as it is
    now. Would that solve your problem?
    That's good, but it doesn't solve the original user complaint about
    needing to re-run many, many large queries to which we already know the
    answer.

    just a guess, but maybe "create FK concurrently" feature combined with
    "synchronized scan" feature _does_ resolve original problem.

    if you run many "create FK concurrently" one after another, wouldn't the seq
    scan be reused?










    --
    Simon Riggs www.2ndQuadrant.com
    PostgreSQL Training, Services and Support


    --
    Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-hackers


    --
    Filip Rembiałkowski
  • Tom Lane at Jun 7, 2008 at 8:23 pm

    Simon Riggs writes:
    Perhaps we need a GUC that says expert_mode = on. ... Another idea
    might be to make such command options superuser only, to ensure the
    power is available, yet only in the hands of, by-definition, the trusted
    few.
    This all seems pretty useless, as the sort of user most likely to shoot
    himself in the foot will also always be running as superuser.

    I'd much rather see us expend more effort on speeding up the checks
    than open holes in the system.

    regards, tom lane
  • Robert Treat at Jun 8, 2008 at 11:46 pm

    On Saturday 07 June 2008 16:22:56 Tom Lane wrote:
    Simon Riggs <simon@2ndquadrant.com> writes:
    Perhaps we need a GUC that says expert_mode = on. ... Another idea
    might be to make such command options superuser only, to ensure the
    power is available, yet only in the hands of, by-definition, the trusted
    few.
    This all seems pretty useless, as the sort of user most likely to shoot
    himself in the foot will also always be running as superuser.
    yeah, i'm not a big fan of "set enable_footgun=true" since the people likely
    to get tripped up are going to blindly enable these modes.

    otoh, if we do such a thing, i would be a big fan of calling
    it "enable_footgun" :-)
    I'd much rather see us expend more effort on speeding up the checks
    than open holes in the system.
    and i'm sure no one is against that idea, but you're never going to be able to
    match the performance of just avoiding the check.

    --
    Robert Treat
    Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
  • Tom Lane at Jun 9, 2008 at 12:15 am

    Robert Treat writes:
    and i'm sure no one is against that idea, but you're never going to be able to
    match the performance of just avoiding the check.
    We'll never be able to match the performance of not having transactions,
    either, but the community has never for a moment considered having a
    "no transactions" mode.

    regards, tom lane
  • Robert Treat at Jun 9, 2008 at 1:03 am

    On Sunday 08 June 2008 20:12:15 Tom Lane wrote:
    Robert Treat <xzilla@users.sourceforge.net> writes:
    and i'm sure no one is against that idea, but you're never going to be
    able to match the performance of just avoiding the check.
    We'll never be able to match the performance of not having transactions,
    either, but the community has never for a moment considered having a
    "no transactions" mode.
    it's unclear what a "no transaction" mode would mean, but I'd be willing to
    guess some people have consider aspects of it (we've just never had
    agreement)

    --
    Robert Treat
    Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedJun 5, '08 at 6:20a
activeJun 10, '08 at 12:55a
posts34
users10
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase