On Tue, 2004-06-01 at 01:24, Tom Lane wrote:
I was just about to commit a patch that revises the btree index build
procedure as discussed here:
specifically, not using shared buffers during index build and bypassing
WAL-logging in favor of just fsyncing the index file before commit.

I was actually writing the commit message when it occurred to me that
this would seriously break PITR. If the WAL datastream doesn't contain
enough info to rebuild the index then rolling forward from a past backup
isn't gonna work.

I thought for a little bit about a magic "reconstruct the index" WAL
entry that would invoke the index build procedure in toto, but that
doesn't look like it will fly either. (Two problems: during crash
recovery, you couldn't be sure that what's on disk for the underlying
table exactly matches the index you need to build --- it could be a
later state of the table; and besides, the environment of the WAL replay
process isn't capable of running user-defined functions, so it couldn't
work for functional indexes.)

So AFAICS, we've got to dump the index contents into WAL to support
PITR. This is a tad annoying.

What I'm thinking about right now is tweaking the index-build code to
write to WAL only if it sees that PITR is actually in use. It would
have to look at the GUC variables to determine whether WAL archiving
is enabled. If archiving isn't turned on, then we could assume that
rollforward from a past backup isn't needed in this installation, and
use the WAL-less index build method.

The mechanism you suggest would also break crash recovery, not just PITR
- though the avoidance of shared buffers seems like a gain either way.

..You raise the whole subject of UNRECOVERABLE data objects. Also known
as NOT LOGGED etc.

There are many significant performance gains to be had by turning off
recoverability for certain large operations. Oracle and Teradata make
extensive use of such features, i.e. especially in Data Warehousing.

Examples of such operations might be:
- index builds
- INSERT SELECTs into previously empty tables

This is an important area for performance...not just index builds.

A suggestion would be:
- add the "dont send to xlog" functionality as a user option on each
statement, default=LOGGING - this could be Oracle compatible, or not,
but concept is similar. Put the hooks in now and we can add this to all
appropriate statement syntax later.


- if conf file says dont use fsync, then dont write to log - clearly
they dont mind losing data in the event of a crash...
i.e. default=NOLOGGING on all statements

Best regards, Simon Riggs

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 10 of 20 | next ›
Discussion Overview
grouppgsql-hackers @
postedJun 1, '04 at 12:28a
activeJun 1, '04 at 8:31p



site design / logo © 2021 Grokbase