On 1 June 2013 21:27, Noah Misch wrote:
On Sat, Jun 01, 2013 at 09:41:13AM +0100, Simon Riggs wrote:
FK checks can be expensive, especially when loading large volumes of
data into an existing table or partition. A couple of ideas for
improving performance are discussed here:

1. Use Case: Bulk loading
COPY pgbench_accounts; --> references pgbench_branches with many
repeated values

Proposal: Transactions that need multiple checks can be optimised by
simply LOCKing the whole referenced table, once. We can then hold the
referenced table as a Hash, like we do with a Hash Join (its almost
exactly the same thing). This works in two ways: it speeds up checks
and it also reduces the locking overhead.
2. Use Case: Transactional repetition
INSERT INTO order VALUES (ordid, ....)
INSERT INTO order_line VALUES (ordid, 1, .....)
INSERT INTO order_line VALUES (ordid, 2, .....)
INSERT INTO order_line VALUES (ordid, 3, .....)
INSERT INTO order_line VALUES (ordid, 4, .....)
Proposal: De-duplicate multiple checks against same value. This would
be implemented by keeping a hash of rows that we had already either
inserted and/or locked as the transaction progresses, so we can use
the hash to avoid queuing up after triggers.
I find (2) more promising than (1). It helps automatically, and it helps in
more cases. The main advantage of (1) is avoiding the writes of tuple locks
onto the PK table. Therefore, I expect (1) to distinguish itself over (2)
when the referenced values are _not_ repeated too much. If referenced values
are repeated, tuple locking costs would tend to disappear into the noise after
the deduplication of (2).
In both cases we are building up a local hash table with values and
then using those values to avoid queuing constraint triggers. So code
is similar for both.

Thanks for your detailed reply.

Will this add too much cost where it doesn't help? I don't know what to
predict there. There's the obvious case of trivial transactions with no more
than one referential integrity check per FK, but there's also the case of a
transaction with many FK checks all searching different keys. If the hash hit
rate (key duplication rate) is low, the hash can consume considerably more
memory than the trigger queue without preventing many RI queries. What sort
of heuristic could we use to avoid pessimizing such cases?
I've struggled with that for a while now. Probably all we can say is
that there might be one, and if there is not, then manual decoration
of the transaction will be the way to go.
Same-transaction UPDATE or DELETE of the PK table, as well as subtransaction
abort, potentially invalidates hash entries. I recommend thinking relatively
early about how best to handle that.
Thanks, good point.
Before deciding what to think overall, I needed to see a benchmark. I ran
this simple one based on your scenarios:

CREATE TABLE "order" (orderid int PRIMARY KEY);
CREATE TABLE order_line (
orderid int,
lineno int,
PRIMARY KEY (orderid, lineno),
FOREIGN KEY (orderid) REFERENCES "order"
INSERT INTO order_line SELECT 1, n FROM generate_series(1,1000000) t(n);

See attached output from "perf report -s parent -g graph,5,caller"; I suggest
browsing under "less -S". It confirms that the expensive part is something
your proposals would address.
Thanks for posting this. It shows nicely that there is a problem with
repeated execution of SQL in constraint triggers. However, that isn't
the only problem since we must also consider memory usage and memory
scrolling. Currently, the after trigger queue builds up in memory and
doesn't scroll to disk, so overall memory usage is a problem in many
cases. Memory scrolling is also a problem since the after trigger
queue records the tids against which we we need to execute triggers;
this causes us to re-access blocks that had scrolled out of memory,
thus defeating the bulk access strategy, as well as spoiling cache.

For clarity the 4 problems are
1. SQL execution overhead
2. Memory usage
3. Memory scrolling
4. Locking overhead, specifically FPWs and WAL records from FK checks
probably in that order or thereabouts.

The above is why I went for a technique that avoided SQL execution
entirely, as well as conserving memory by de-duplicating the values in
a hash table as we go, which avoids all three problems. The fourth was
solved by the more extreme approach to locking.

A different way to help the bulk loading case would be to lock more keys with
a single query. Currently, we issue a query like this for every INSERTed row:


We could instead consider queued tuples in batches:

SELECT 1 FROM ONLY pktable WHERE pkcol = ANY (ARRAY[$1,$2,...,$100]) FOR KEY SHARE OF x

This would have the advantage of not adding a long-lived, potentially-large
data structure and not depending on the rate of referenced key duplication.
But it would not help non-bulk scenarios. (However, the user could make your
example for (2) become a bulk scenario by deferring the FK constraint.)
At the moment we queue the after triggers first, then execute later.
So we can run out of memory before we execute any SQL. Which means we
need to do something to prevent the after trigger queue growing,
rather than simply optimise the way we execute things on the queue.

In terms of formal trigger behaviour, I am suggesting we redefine FK
triggers as if they were executing a WHEN clause that looks like this
   WHEN check_not_already_queued()
and where the function would have the side effect of maintaining an
in-memory hash that grows up to a certain size. I suggest work_mem for
most statements, but maintenance_work_mem for COPY.

Thus the WHEN clause gets executed before we queue anything and so we
are able to minimise problems 2 and 3.

Having that formal definition also puts us neatly into an
implementation route in that we are simply adding a WHEN clause to FK

The overall execution plan then looks a lot like a hash join, with
skew avoidance. We build up a hash table in memory, up to a certain
size, while letting non-matched rows spill to the after trigger queue
for later execution. Since we skip many of the checks via the WHEN
clause, the after trigger queue should be more manageable in size.

I think it might be worth considering joining the after trigger queue
directly to the referenced table(s), something like this...


WHERE orderid IN
(SELECT orderid FROM ONLY order_line WHERE ctid IN (SELECT
after_trigger_queue FROM after_trigger_queue() ))
                                                 FOR KEY SHARE;

But we could optimise that even further if we had a "BlockScan", which
would be a block-oriented version of the tid scan where we simply
provide a bitmap of blocks needing to be scanned, just like the output
of an BitmapIndexScan. The reason for mentioning that here is that
parallel query will eventually need the ability to do a scan of a
subset of blocks, as does tablesample. So I can see 3 callers of such
a Scan type.

  Simon Riggs http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 3 of 21 | next ›
Discussion Overview
grouppgsql-hackers @
postedJun 1, '13 at 8:41a
activeJun 11, '13 at 4:09a



site design / logo © 2021 Grokbase