On Mon, 2005-06-20 at 14:50 -0400, Tom Lane wrote:
Simon Riggs <firstname.lastname@example.org> writes:
I enclose a complete patch for avoiding WAL usage for CREATE TABLE AS
SELECT, when not in archive mode (PITR). The main use case for this is
large BI environments that create summary tables or prejoined tables,
though there are many general applications.
after heavy corrections --- there were a number of things wrong
with this "simple" patch, starting with having gotten the tests
Sorry, I thought I had corrected that error before submission. I was
aware that I had made that error earlier.
and extending to not having actually flushed the data
before commit (smgrimmedsync isn't enough, you have to
I followed the logic as seen in nbtsort.c as you suggested. That code
doesn't perform a FlushRelationBuffers and it looks like I fooled myself
into thinking the CTAS/SELECT INTO case was also in local.
Perhaps we should be building CTAS/SELECT INTO in local buffers anyway?
It looks like we could save time by avoiding shared_buffers completely
and build up a whole page before writing it anywhere. (But thats a story
for another day).
Perhaps this is also related to metapage errors, since the metapage is
always the last page to be written?
A consideration we had all missed in the original discussions is that
if the transaction doesn't emit any WAL records at all,
RecordTransactionCommit will think that it need not WAL-log the
transaction commit, leading to the possibility that the commit is lost
even though all the data is preserved :-(
This is not a hazard for CREATE TABLE AS, since it will certainly have
emitted WAL records while creating the table's catalog entries. It will
be a very real hazard for COPY however.
OK, but I haven't written that patch yet!
The cleanest solution I can
think of is that the COPY code should emit a WAL record for the first
tuple copied in, but not for later ones. To this end, I separated the
"use_wal" and "use_fsm" aspects of what the patch was doing.
Not very clean, but will do as you suggest.
I didn't apply the freespace.c changes either; that struck me as a
serious kluge with no real benefit. We can just omit updating the FSM's
running average, if it even has one. (ISTM there's a reasonable
argument to be made that the tuple sizes during CREATE/COPY might not be
representative of later requests anyway.)
I was striving for completeness only. I was doubtful about that part of
the patch, but thought I'd add that rather than have you say I hadn't
thought about the FSM avg_request_size.
I put those changes in mainly for COPY. If you don't make any request at
all to FSM then a relation never gets to the MRU relation FSM list. I
agree that it is not strictly necessary, but leaving it off would be a
change in behaviour, since COPY did previously cause the relation to get
to the MRU. That could be a problem, since a relation might not then be
allocated any FSM pages following a vacuum.
Best Regards, Simon Riggs