On Fri, 2004-04-30 at 04:02, Bruce Momjian wrote:
Simon Riggs wrote:
Agreed we want to allow the superuser control over writing of the
archive logs. The question is how do they get access to that. Is it by
running a client program continuously or calling an interface script
from the backend?

My point was that having the backend call the program has improved
reliablity and control over when to write, and easier administration.
Agreed. We've both suggested ways that can occur, though I suggest this
is much less of a priority, for now. Not "no", just not "now".
Another case is server start/stop. You want to start/stop the archive
logger to match the database server, particularly if you reboot the
server. I know Informix used a client program for logging, and it was a
pain to administer.
pg_arch is just icing on top of the API. The API is the real deal here.
I'm not bothered if pg_arch is not accepted, as long as we can adopt the
API. As noted previously, my original mind was to split the API away
from the pg_arch application to make it clearer what was what. Once that
has been done, I encourage others to improve pg_arch - but also to use
the API to interface with other BAR prodiucts.

If you're using PostgreSQL for serious business then you will be using a
serious BAR product as well. There are many FOSS alternatives...

The API's purpose is to allow larger, pre-existing BAR products to know
when and how to retrieve data from PostgreSQL. Those products don't and
won't run underneath postmaster, so although I agree with Peter's
original train of thought, I also agree with Tom's suggestion that we
need an API more than we need an archiver process.

I would be happy with an exteral program if it was started/stoped by the
postmaster (or via GUC change) and received a signal when a WAL file was
That is exactly what has been written.

The PostgreSQL side of the API is written directly into the backend, in
xlog.c and is therefore activated by postmaster controlled code. That
then sends "a signal" to the process that will do the archiving - the
Archiver side of the XLogArchive API has it as an in-process library.
(The "signal" is, in fact, a zero-length file written to disk because
there are many reasons why an external archiver may not be ready to
archive or even up and running to receive a signal).

The only difference is that there is some confusion as to the role and
importance of pg_arch.
OK, I have finalized my thinking on this.

We both agree that a pg_arch client-side program certainly works for
PITR logging. The big question in my mind is whether a client-side
program is what we want to use long-term, and whether we want to release
a 7.5 that uses it and then change it in 7.6 to something more
integrated into the backend.

Let me add this is a little different from pg_autovacuum. With that,
you could put it in cron and be done with it. With pg_arch, there is a
routine that has to be used to do PITR, and if we change the process in
7.6, I am afraid there will be confusion.

Let me also add that I am not terribly worried about having the feature
to restore to an arbitrary point in time for 7.5. I would much rather
have a good PITR solution that works cleanly in 7.5 and add it to 7.6,
than to have retore to an arbitrary point but have a strained
implementation that we have to revisit for 7.6.

Here are my ideas. (I talked to Tom about this and am including his
ideas too.) Basically, the archiver that scans the xlog directory to
identify files to be archived should be a subprocess of the postmaster.
You already have that code and it can be moved into the backend.

Here is my implementation idea. First, your pg_arch code runs in the
backend and is started just like the statistics process. It has to be
started whether PITR is being used or not, but will be inactive if PITR
isn't enabled. This must be done because we can't have a backend start
this process later in case they turn on PITR after server start.

The process id of the archive process is stored in shared memory. When
PITR is turned on, each backend that complete a WAL file sends a signal
to the archiver process. The archiver wakes up on the signal and scans
the directory, finds files that need archiving, and either does a 'cp'
or runs a user-defined program (like scp) to transfer the file to the
archive location.

In GUC we add:

pitr = true/false
pitr_location = 'directory, user@host:/dir, etc'
pitr_transfer = 'cp, scp, etc'

The archiver program updates its config values when someone changes
these values via postgresql.conf (and uses pg_ctl reload). These can
only be modified from postgresql.conf. Changing them via SET has to be
disabled because they are cluster-level settings, not per session, like
port number or checkpoint_segments.

Basically, I think that we need to push user-level control of this
process down beyond the directory scanning code (that is pretty
standard), and allow them to call an arbitrary program to transfer the
logs. My idea is that the pitr_transfer program will get $1=WAL file
name and $2=pitr_location and the program can use those arguments to do
the transfer. We can even put a pitr_transfer.sample program in share
and document $1 and $2.
...Bruce and I have just discussed this in some detail and reached a
good understanding of the design proposals as a whole. It looks like all
of this can happen in the next few weeks, with a worst case time
estimate of mid-June. TGFT!

I'll write this up and post this shortly, with a rough roadmap for
further development of recovery-related features.

Best Regards,

Simon Riggs
2nd Quadrant

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 28 of 33 | next ›
Discussion Overview
grouppgsql-hackers @
postedApr 26, '04 at 3:38p
activeMay 11, '04 at 9:59p



site design / logo © 2021 Grokbase