FAQ

On 4 June 2013 01:54, Noah Misch wrote:
On Sun, Jun 02, 2013 at 10:45:21AM +0100, Simon Riggs wrote:
For clarity the 4 problems are
1. SQL execution overhead
2. Memory usage
3. Memory scrolling
4. Locking overhead, specifically FPWs and WAL records from FK checks
probably in that order or thereabouts.

The above is why I went for a technique that avoided SQL execution
entirely, as well as conserving memory by de-duplicating the values in
a hash table as we go, which avoids all three problems. The fourth was
solved by the more extreme approach to locking.
That nicely frames the benefits of your proposals. Makes sense.
I think it might be worth considering joining the after trigger queue
directly to the referenced table(s), something like this...

CREATE OR REPLACE FUNCTION after_trigger_queue() RETURNS SETOF tid AS $$
...
$$ LANGUAGE SQL;

EXPLAIN
SELECT 1 FROM ONLY "order"
WHERE orderid IN
(SELECT orderid FROM ONLY order_line WHERE ctid IN (SELECT
after_trigger_queue FROM after_trigger_queue() ))
FOR KEY SHARE;
Agreed.
Hmm, although the above is cute, it still falls down by requiring lots
of memory (problem 2) and churning memory again (problem 3).

Lets rethink things to put a few options on the table and see what we get...

1. Store all the after trigger events in a big queue, then execute
later. That requires us to scroll it to disk to solve problem2, but it
still falls foul of problem3, which becomes severe on big data.

Implementation: Implement scrolling to disk for the after trigger
queue. Then implement event batching in the RI handler code, similar
to what is suggested above or directly as suggested by Noah on
previous post.

2. Don't store FK events in the after trigger queue at all, but apply
them as we go. That solves problems2 and 3. That could be considered
to be in violation of the SQL standard, which requires us to apply the
checks at the end of the statement. We already violate the standard
with regard to uniqueness checks, so doing it here doesn't seem
unacceptable.

Implementation: Given we know that COPY uses a ring buffer, and given
your earlier thoughts on use of a batched SQL, I have a new
suggestion. Every time the ring buffer fills, we go through the last
buffers accessed, pulling out all the PKs and then issue them as a
single SQL statement (as above). We can do that manually, or using the
block scan mentioned previously. This uses batched SQL to solve
problem1. It doesn't build up a large data structure in memory,
problem2, and it also solves problem3 by accessing data blocks before
they fall out of the ring buffer. If there are no duplicates in the
referenced table, then this behavious will do as much as possible to
make accesses to the referenced table also use a small working set.
(We may even wish to consider making the batched SQL use a separate
ring buffer for RI accesses). That approach doesn't make any
assumptions about duplicates.

3.
Strip the PK values from the rows at access time and store them in a
hash table, de-duplicating as we go. If that gets too big, write
seldom accessed values to a temporary table which will automatically
scroll to disk when it exceeds temp_buffers. We don't create the temp
table until we hit work_mem, so we only create that for larger
statements. Then at the end of the statement, copy the hash table into
the temp table and then join to the referenced table directly. We
don't guarantee the list of PK values is completely unique, but we
will have significantly reduced the number of duplicates to check.

Comparison:
Approach 1 suffers from memory scrolling, problem3. But it follows the
SQL standard.
Approach 2 solves all performance problems but doesn't follow the
letter of the standard (AIUI - anyone see differently?)
Approach 3 solves all performance problems and follows the SQL standard

Perhaps another way would be to avoid very large COPY statements
altogether, breaking down loads into smaller pieces. For example
pg_dump could issue COPY in 10000 row chunks rather than big copy.
That would mostly avoid problem2 and problem3 and is more realistic
with "wild" data.

Based on that thought, implementation option 1 looks most reasonable.
Which in short summary is a) scroll after trigger queue to disk and b)
batch SQL values together as Noah originally suggested, or in the SQL
above.

Let me know what you think of that analysis.

--
  Simon Riggs http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 7 of 21 | next ›
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedJun 1, '13 at 8:41a
activeJun 11, '13 at 4:09a
posts21
users7
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase