It's been discussed before that it would be cool if you could stream a
new base backup from the master server, via libpq. That way you would
not need low-level filesystem access to initialize a new standby.

Magnus mentioned today that he started hacking on that, and
coincidentally I just started experimenting with it yesterday as well
:-). So let's get this out on the mailing list.

Here's a WIP patch. It adds a new "TAKE_BACKUP" command to the
replication command set. Upon receiving that command, the master starts
a COPY, and streams a tarred copy of the data directory to the client.
The patch includes a simple command-line tool, pg_streambackup, to
connect to a server and request a backup that you can then redirect to a
.tar file or pipe to "tar x".

TODO:

* We need a smarter way to do pg_start/stop_backup() with this. At the
moment, you can only have one backup running at a time, but we shouldn't
have that limitation with this built-in mechanism.

* The streamed backup archive should contain all the necessary WAL files
too, so that you don't need to set up archiving to use this. You could
just point the tiny client tool to the server, and get a backup archive
containing everything that's necessary to restore correctly.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Search Discussions

  • Thom Brown at Sep 3, 2010 at 11:26 am

    On 3 September 2010 12:19, Heikki Linnakangas wrote:
    TODO:

    * We need a smarter way to do pg_start/stop_backup() with this. At the
    moment, you can only have one backup running at a time, but we shouldn't
    have that limitation with this built-in mechanism.
    Would it be possible to not require pg_start/stop_backup() for this
    new feature? (yes, I'm probably missing something obvious here)

    --
    Thom Brown
    Twitter: @darkixion
    IRC (freenode): dark_ixion
    Registered Linux user: #516935
  • Magnus Hagander at Sep 3, 2010 at 11:30 am

    On Fri, Sep 3, 2010 at 13:25, Thom Brown wrote:
    On 3 September 2010 12:19, Heikki Linnakangas
    wrote:
    TODO:

    * We need a smarter way to do pg_start/stop_backup() with this. At the
    moment, you can only have one backup running at a time, but we shouldn't
    have that limitation with this built-in mechanism.
    Would it be possible to not require pg_start/stop_backup() for this
    new feature? (yes, I'm probably missing something obvious here)
    You don't need to run it *manually*, but the process needs to run it
    automatically in the background for you. Which it does already in the
    suggested patch.
  • Thom Brown at Sep 3, 2010 at 11:33 am

    On 3 September 2010 12:30, Magnus Hagander wrote:
    On Fri, Sep 3, 2010 at 13:25, Thom Brown wrote:
    On 3 September 2010 12:19, Heikki Linnakangas
    wrote:
    TODO:

    * We need a smarter way to do pg_start/stop_backup() with this. At the
    moment, you can only have one backup running at a time, but we shouldn't
    have that limitation with this built-in mechanism.
    Would it be possible to not require pg_start/stop_backup() for this
    new feature? (yes, I'm probably missing something obvious here)
    You don't need to run it *manually*, but the process needs to run it
    automatically in the background for you. Which it does already in the
    suggested patch.
    Ah, clearly I didn't read the patch in any detail. Thanks :)

    --
    Thom Brown
    Twitter: @darkixion
    IRC (freenode): dark_ixion
    Registered Linux user: #516935
  • Heikki Linnakangas at Sep 3, 2010 at 11:44 am

    On 03/09/10 14:25, Thom Brown wrote:
    On 3 September 2010 12:19, Heikki Linnakangas
    wrote:
    TODO:

    * We need a smarter way to do pg_start/stop_backup() with this. At the
    moment, you can only have one backup running at a time, but we shouldn't
    have that limitation with this built-in mechanism.
    Would it be possible to not require pg_start/stop_backup() for this
    new feature? (yes, I'm probably missing something obvious here)
    Well, pg_start_backup() does several things:

    1. It sets the forceFullPageWrites flag, so that we don't get partial
    pages in the restored database.
    2. It performs a checkpoint
    3. It creates a backup label file

    We certainly need 1 and 2. We don't necessary need to write the backup
    label file to the data directory when we're streaming the backup
    directly to the client, we can just include it in the streamed archive.

    pg_stop_backup() also does several things:
    1. It clears the forceFullPageWrites flag.
    2. It writes an end-of-backup WAL record
    3. It switches to new WAL segment, to get the final WAL segment archived.
    4. It writes a backup history file
    5. It removes the backup label file.
    6. It waits for all the required WAL files to be archived.

    We need 1, but the rest we could do in a smarter way. When we have more
    control of the backup process, I don't think we need the end-of-backup
    WAL record or the backup label anymore. We can add the pg_control file
    as the last file in the archive, and set minRecoveryPoint in it to the
    last WAL record needed to recover.

    So no, we don't really need pg_start/stop_backup() per se, but we'll
    need something similar...

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Dave Page at Sep 3, 2010 at 11:28 am

    On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas wrote:
    Here's a WIP patch. It adds a new "TAKE_BACKUP" command to the replication
    command set. Upon receiving that command, the master starts a COPY, and
    streams a tarred copy of the data directory to the client. The patch
    includes a simple command-line tool, pg_streambackup, to connect to a server
    and request a backup that you can then redirect to a .tar file or pipe to
    "tar x".
    Cool. Can you add a TODO to build in code to un-tar the archive? tar
    is not usually found on Windows systems, and as we already have tar
    extraction code in pg_restore it could presumably be added relatively
    painlessly.


    --
    Dave Page
    Blog: http://pgsnake.blogspot.com
    Twitter: @pgsnake

    EnterpriseDB UK: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Heikki Linnakangas at Sep 3, 2010 at 11:49 am

    On 03/09/10 14:28, Dave Page wrote:
    On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
    wrote:
    Here's a WIP patch. It adds a new "TAKE_BACKUP" command to the replication
    command set. Upon receiving that command, the master starts a COPY, and
    streams a tarred copy of the data directory to the client. The patch
    includes a simple command-line tool, pg_streambackup, to connect to a server
    and request a backup that you can then redirect to a .tar file or pipe to
    "tar x".
    Cool. Can you add a TODO to build in code to un-tar the archive? tar
    is not usually found on Windows systems, and as we already have tar
    extraction code in pg_restore it could presumably be added relatively
    painlessly.
    Ok. Another obvious thing that people will want is to gzip the tar file
    while sending it, to reduce network traffic.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Magnus Hagander at Sep 3, 2010 at 11:50 am

    On Fri, Sep 3, 2010 at 13:48, Heikki Linnakangas wrote:
    On 03/09/10 14:28, Dave Page wrote:

    On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
    wrote:
    Here's a WIP patch. It adds a new "TAKE_BACKUP" command to the
    replication
    command set. Upon receiving that command, the master starts a COPY, and
    streams a tarred copy of the data directory to the client. The patch
    includes a simple command-line tool, pg_streambackup, to connect to a
    server
    and request a backup that you can then redirect to a .tar file or pipe to
    "tar x".
    Cool. Can you add a TODO to build in code to un-tar the archive? tar
    is not usually found on Windows systems, and as we already have tar
    extraction code in pg_restore it could presumably be added relatively
    painlessly.
    Ok. Another obvious thing that people will want is to gzip the tar file
    while sending it, to reduce network traffic.
    Not necessarily obvious, needs to be configurable. There are a lot of
    cases where you might not want it.
  • Robert Haas at Sep 3, 2010 at 1:24 pm

    On Fri, Sep 3, 2010 at 7:28 AM, Dave Page wrote:
    On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
    wrote:
    Here's a WIP patch. It adds a new "TAKE_BACKUP" command to the replication
    command set. Upon receiving that command, the master starts a COPY, and
    streams a tarred copy of the data directory to the client. The patch
    includes a simple command-line tool, pg_streambackup, to connect to a server
    and request a backup that you can then redirect to a .tar file or pipe to
    "tar x".
    Cool. Can you add a TODO to build in code to un-tar the archive? tar
    is not usually found on Windows systems, and as we already have tar
    extraction code in pg_restore it could presumably be added relatively
    painlessly.
    It seems like the elephant in the room here is updating an existing
    backup without recopying the entire data directory. Perhaps that's
    phase two, but worth keeping in mind...

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Magnus Hagander at Sep 3, 2010 at 1:26 pm

    On Fri, Sep 3, 2010 at 15:24, Robert Haas wrote:
    On Fri, Sep 3, 2010 at 7:28 AM, Dave Page wrote:
    On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
    wrote:
    Here's a WIP patch. It adds a new "TAKE_BACKUP" command to the replication
    command set. Upon receiving that command, the master starts a COPY, and
    streams a tarred copy of the data directory to the client. The patch
    includes a simple command-line tool, pg_streambackup, to connect to a server
    and request a backup that you can then redirect to a .tar file or pipe to
    "tar x".
    Cool. Can you add a TODO to build in code to un-tar the archive? tar
    is not usually found on Windows systems, and as we already have tar
    extraction code in pg_restore it could presumably be added relatively
    painlessly.
    It seems like the elephant in the room here is updating an existing
    backup without recopying the entire data directory.  Perhaps that's
    phase two, but worth keeping in mind...
    I'd say that's a very different use-case, but still a very useful one
    of course. It's probably going to be a lot more complex (it would
    require bi-directional traffic, I think)...
  • Dave Page at Sep 3, 2010 at 1:26 pm

    On Fri, Sep 3, 2010 at 2:24 PM, Robert Haas wrote:
    On Fri, Sep 3, 2010 at 7:28 AM, Dave Page wrote:
    On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
    wrote:
    Here's a WIP patch. It adds a new "TAKE_BACKUP" command to the replication
    command set. Upon receiving that command, the master starts a COPY, and
    streams a tarred copy of the data directory to the client. The patch
    includes a simple command-line tool, pg_streambackup, to connect to a server
    and request a backup that you can then redirect to a .tar file or pipe to
    "tar x".
    Cool. Can you add a TODO to build in code to un-tar the archive? tar
    is not usually found on Windows systems, and as we already have tar
    extraction code in pg_restore it could presumably be added relatively
    painlessly.
    It seems like the elephant in the room here is updating an existing
    backup without recopying the entire data directory.  Perhaps that's
    phase two, but worth keeping in mind...
    rsync? Might be easier to use that from day 1 (well, day 2) than to
    retrofit later.


    --
    Dave Page
    Blog: http://pgsnake.blogspot.com
    Twitter: @pgsnake

    EnterpriseDB UK: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Robert Haas at Sep 3, 2010 at 1:29 pm

    On Fri, Sep 3, 2010 at 9:26 AM, Dave Page wrote:
    rsync? Might be easier to use that from day 1 (well, day 2) than to
    retrofit later.
    I'm not sure we want to depend on an external utility like that,
    particularly one that users may not have installed. And I'm not sure
    if that can be made to work over a libpq channel, either. But
    certainly something with that functionality would be nice to have,
    whether it ends up sharing code or not.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Dave Page at Sep 3, 2010 at 1:32 pm

    On Fri, Sep 3, 2010 at 2:29 PM, Robert Haas wrote:
    On Fri, Sep 3, 2010 at 9:26 AM, Dave Page wrote:
    rsync? Might be easier to use that from day 1 (well, day 2) than to
    retrofit later.
    I'm not sure we want to depend on an external utility like that,
    particularly one that users may not have installed.  And I'm not sure
    if that can be made to work over a libpq channel, either.  But
    certainly something with that functionality would be nice to have,
    whether it ends up sharing code or not.
    No, I agree we don't want an external dependency (I was just bleating
    about needing tar on Windows). I was assuming/hoping there's a
    librsync somewhere...


    --
    Dave Page
    Blog: http://pgsnake.blogspot.com
    Twitter: @pgsnake

    EnterpriseDB UK: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Robert Haas at Sep 3, 2010 at 1:43 pm

    On Fri, Sep 3, 2010 at 9:32 AM, Dave Page wrote:
    No, I agree we don't want an external dependency (I was just bleating
    about needing tar on Windows). I was assuming/hoping there's a
    librsync somewhere...
    The rsync code itself is not modular, I believe. I think the author
    thereof kind of took the approach of placing efficiency before all.
    See:

    http://www.samba.org/rsync/how-rsync-works.html ... especially the
    section on "The Rsync Protocol"

    I Googled librsync and got a hit, but that code is a rewrite of the
    source base and seems to have little or no activity since 2004.

    http://librsync.sourceforge.net/

    That page writes: "librsync is not wire-compatible with rsync 2.x, and
    is not likely to be in the future." The current version of rsync is
    3.0.7.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Stephen Frost at Sep 3, 2010 at 1:56 pm

    * Robert Haas (robertmhaas@gmail.com) wrote:
    The rsync code itself is not modular, I believe. I think the author
    thereof kind of took the approach of placing efficiency before all.
    Yeah, I looked into this when discussing this same concept at PGCon with
    folks. There doesn't appear to be a good librsync and, even if there
    was, there's a heck of alot of complexity there that we *don't* need.
    rsync is a great tool, don't get me wrong, but let's not try to go over
    our heads here.

    We don't need permissions handling, as an example. I also don't think
    we need the binary diff/partial file transfer capability- we already
    break relations into 1G chunks (when/if they reach that size), so you
    won't necessairly be copying the entire relation if you're just doing
    mtime based or per-file-checksum based detection. We don't need device
    node handling, we don't need auto-ignoring files, or pattern
    exclusion/inclusion, we don't really need a progress bar (though it'd be
    nice.. :), etc, etc, etc.

    Thanks,

    Stephen
  • Kevin Grittner at Sep 3, 2010 at 3:02 pm

    Stephen Frost wrote:

    there's a heck of alot of complexity there that we *don't* need.
    rsync is a great tool, don't get me wrong, but let's not try to go
    over our heads here.
    Right -- among other things, it checks for portions of a new file
    which match the old file at a different location. For example, if
    you have a very large text file, and insert a line or two at the
    start, it will wind up only sending the new lines. (Well, that and
    all the checksums which help it determine that the rest of the file
    matches at a shifted location.) I would think that PostgreSQL could
    just check whether *corresponding* portions of a file matched, which
    is much simpler.
    we already break relations into 1G chunks (when/if they reach that
    size), so you won't necessairly be copying the entire relation if
    you're just doing mtime based or per-file-checksum based
    detection.
    While 1GB granularity would be OK, I doubt it's optimal; I think CRC
    checks for smaller chunks might be worthwhile. My gut feel is that
    somewhere in the 64kB to 1MB range would probably be optimal for us,
    although the "sweet spot" will depend on how the database is used.
    A configurable or self-adjusting size would be cool.

    -Kevin
  • Stephen Frost at Sep 3, 2010 at 3:20 pm
    Kevin,

    * Kevin Grittner (Kevin.Grittner@wicourts.gov) wrote:
    While 1GB granularity would be OK, I doubt it's optimal; I think CRC
    checks for smaller chunks might be worthwhile. My gut feel is that
    somewhere in the 64kB to 1MB range would probably be optimal for us,
    although the "sweet spot" will depend on how the database is used.
    A configurable or self-adjusting size would be cool.
    We have something much better, called WAL. If people want to keep their
    backup current, they should use that after getting the base backup up
    and working. We don't need to support this for the base backup, imv.

    In any case, it's certainly not something required for an initial
    implementation..

    Thanks,

    Stephen
  • Kevin Grittner at Sep 3, 2010 at 3:32 pm

    Stephen Frost wrote:
    We have something much better, called WAL. If people want to keep
    their backup current, they should use that after getting the base
    backup up and working.
    Unless you want to provide support for Point In Time Recovery
    without excessive recovery times.
    We don't need to support this for the base backup, imv.
    We found that making a hard-link copy of the previous base backup
    and using rsync to bring it up to date used 1% the WAN bandwidth as
    sending a complete, compressed base backup. Just sending modified
    files in their entirety would have bought the first order of
    magnitude; recognizing the unchanged portions buys the second order
    of magnitude.
    In any case, it's certainly not something required for an initial
    implementation..
    No disagreement there; but sometimes it pays to know where you might
    want to go, so you don't do something to make further development in
    that direction unnecessarily difficult.

    -Kevin
  • Tom Lane at Sep 3, 2010 at 3:47 pm

    "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
    Stephen Frost wrote:
    In any case, it's certainly not something required for an initial
    implementation..
    No disagreement there; but sometimes it pays to know where you might
    want to go, so you don't do something to make further development in
    that direction unnecessarily difficult.
    I think that setting out to reimplement rsync, or to go down a design
    path where we're likely to do a lot of that eventually, is the height
    of folly. We should be standing on the shoulders of other projects,
    not rolling our own because of misguided ideas about people not having
    those projects installed.

    IOW, what I'd like to see is protocol extensions that allow an external
    copy of rsync to be invoked; not build in rsync, or tar, or anything
    else that we could get off-the-shelf.

    regards, tom lane
  • Robert Haas at Sep 3, 2010 at 3:53 pm

    On Fri, Sep 3, 2010 at 11:47 AM, Tom Lane wrote:
    "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
    Stephen Frost wrote:
    In any case, it's certainly not something required for an initial
    implementation..
    No disagreement there; but sometimes it pays to know where you might
    want to go, so you don't do something to make further development in
    that direction unnecessarily difficult.
    I think that setting out to reimplement rsync, or to go down a design
    path where we're likely to do a lot of that eventually, is the height
    of folly.  We should be standing on the shoulders of other projects,
    not rolling our own because of misguided ideas about people not having
    those projects installed.

    IOW, what I'd like to see is protocol extensions that allow an external
    copy of rsync to be invoked; not build in rsync, or tar, or anything
    else that we could get off-the-shelf.
    We used to use "cp" to create databases. Should we go back to that system?

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • David Blewett at Sep 3, 2010 at 4:01 pm

    On Fri, Sep 3, 2010 at 11:47 AM, Tom Lane wrote:
    IOW, what I'd like to see is protocol extensions that allow an external
    copy of rsync to be invoked; not build in rsync, or tar, or anything
    else that we could get off-the-shelf.
    Personally, I would love to see protocol-level compression added.
    (Yes, going over a compressed SSH tunnel works well, but in general
    isn't user-friendly.)

    Josh: we talked on IRC awhile back and you mentioned that CMD had
    added this in Mammoth? Would you be interested in having someone get
    that integrated back into the community?

    David Blewett
  • Heikki Linnakangas at Sep 3, 2010 at 4:26 pm

    On 03/09/10 18:53, David Blewett wrote:
    On Fri, Sep 3, 2010 at 11:47 AM, Tom Lanewrote:
    IOW, what I'd like to see is protocol extensions that allow an external
    copy of rsync to be invoked; not build in rsync, or tar, or anything
    else that we could get off-the-shelf.
    Personally, I would love to see protocol-level compression added.
    (Yes, going over a compressed SSH tunnel works well, but in general
    isn't user-friendly.)

    Josh: we talked on IRC awhile back and you mentioned that CMD had
    added this in Mammoth? Would you be interested in having someone get
    that integrated back into the community?
    There's a recent thread on pgsql-general about just that:
    http://archives.postgresql.org/pgsql-general/2010-08/msg00003.php

    I agree with Tom's comments there, I'd like to have something to
    enable/disable SSL compression rather than implement our own. There was
    some discussion that it might not be available on JDBC SSL
    implementations, but if it's done in our protocol, you'll need changes
    to the client to make it work anyway.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • David Blewett at Sep 3, 2010 at 8:56 pm

    On Fri, Sep 3, 2010 at 12:23 PM, Heikki Linnakangas wrote:
    On 03/09/10 18:53, David Blewett wrote:

    On Fri, Sep 3, 2010 at 11:47 AM, Tom Lanewrote:
    IOW, what I'd like to see is protocol extensions that allow an external
    copy of rsync to be invoked; not build in rsync, or tar, or anything
    else that we could get off-the-shelf.
    Personally, I would love to see protocol-level compression added.
    (Yes, going over a compressed SSH tunnel works well, but in general
    isn't user-friendly.)

    Josh: we talked on IRC awhile back and you mentioned that CMD had
    added this in Mammoth? Would you be interested in having someone get
    that integrated back into the community?
    There's a recent thread on pgsql-general about just that:
    http://archives.postgresql.org/pgsql-general/2010-08/msg00003.php

    I agree with Tom's comments there, I'd like to have something to
    enable/disable SSL compression rather than implement our own. There was some
    discussion that it might not be available on JDBC SSL implementations, but
    if it's done in our protocol, you'll need changes to the client to make it
    work anyway.
    While I agree that combining SSL with compression is a great win, I'm
    not sold on Tom's argument that compression is only needed in WAN
    situations. I've seen great benefit to using an SSH tunnel with
    compression over LAN connections (100 and 1000 mbps). At work, we do
    have a private WAN that it would be nice to be able to use compression
    with no encryption on. I think it's a general-use thing. While I know
    it's not the best argument, MySQL does provide compression at the
    connection level.

    David Blewett
  • Stephen Frost at Sep 3, 2010 at 4:09 pm

    * Tom Lane (tgl@sss.pgh.pa.us) wrote:
    IOW, what I'd like to see is protocol extensions that allow an external
    copy of rsync to be invoked; not build in rsync, or tar, or anything
    else that we could get off-the-shelf.
    I'd much rather use an existing library to implement it than call out to
    some external utility. That said, I'm about as thrilled with libtar as
    librsync after a bit of googling around. :/

    Thanks,

    Stephen
  • Heikki Linnakangas at Sep 3, 2010 at 4:22 pm

    On 03/09/10 19:09, Stephen Frost wrote:
    * Tom Lane (tgl@sss.pgh.pa.us) wrote:
    IOW, what I'd like to see is protocol extensions that allow an external
    copy of rsync to be invoked; not build in rsync, or tar, or anything
    else that we could get off-the-shelf.
    I'd much rather use an existing library to implement it than call out to
    some external utility. That said, I'm about as thrilled with libtar as
    librsync after a bit of googling around. :/
    The code to build a tar archive is about 200 lines of code. The amount
    of code for untar is about the same. That's about the amount of effort
    We could add zlib compression since we already link with that, but
    that's about it. I'm not interested in adding more infrastructure for
    more tools. For more complicated scenarios, you can still use
    pg_start/stop_backup() as usual, there's nothing wrong with that.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Kevin Grittner at Sep 3, 2010 at 4:11 pm

    Tom Lane wrote:

    what I'd like to see is protocol extensions that allow an external
    copy of rsync to be invoked; not build in rsync, or tar, or
    anything else that we could get off-the-shelf.
    The complexities of dealing with properly invoking rsync externally
    could well require more code and be considerably more fragile than
    passing the data through the existing SR connection; particularly
    since to get the full benefits of rsync you need to be dealing with
    a daemon which has the appropriate modules configured -- the
    location of which you wouldn't easily know.

    If we were talking about re-implementing rsync, or doing more than a
    rough approximation, kinda, of 5% of what rsync does, I'd be with
    you.

    -Kevin
  • Robert Haas at Sep 3, 2010 at 3:41 pm

    On Fri, Sep 3, 2010 at 11:20 AM, Stephen Frost wrote:
    Kevin,

    * Kevin Grittner (Kevin.Grittner@wicourts.gov) wrote:
    While 1GB granularity would be OK, I doubt it's optimal; I think CRC
    checks for smaller chunks might be worthwhile.  My gut feel is that
    somewhere in the 64kB to 1MB range would probably be optimal for us,
    although the "sweet spot" will depend on how the database is used.
    A configurable or self-adjusting size would be cool.
    We have something much better, called WAL.  If people want to keep their
    backup current, they should use that after getting the base backup up
    and working.  We don't need to support this for the base backup, imv.

    In any case, it's certainly not something required for an initial
    implementation..
    While I'm certainly not knocking WAL, it's not difficult to think of
    cases where being able to incrementally update a backup saves you an
    awful lot of bandwidth.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Martijn van Oosterhout at Sep 3, 2010 at 7:31 pm

    On Fri, Sep 03, 2010 at 09:56:12AM -0400, Stephen Frost wrote:
    * Robert Haas (robertmhaas@gmail.com) wrote:
    The rsync code itself is not modular, I believe. I think the author
    thereof kind of took the approach of placing efficiency before all.
    Yeah, I looked into this when discussing this same concept at PGCon with
    folks. There doesn't appear to be a good librsync and, even if there
    was, there's a heck of alot of complexity there that we *don't* need.
    rsync is a great tool, don't get me wrong, but let's not try to go over
    our heads here.
    rsync is not rocket science. All you need is for the receiving end to
    send a checksum for each block it has. The server side does the same
    checksum and for each block sends back "same" or "new data".

    The client and the server don't need to synchronise at all. If the
    client sends nothing, the server sends everything.

    The tricky part of rsync (finding block that have moved) is not needed
    here.

    Have a nice day,
    --
    Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
    Patriotism is when love of your own people comes first; nationalism,
    when hate for people other than your own comes first.
    - Charles de Gaulle
  • Greg Stark at Sep 4, 2010 at 1:43 pm

    On Fri, Sep 3, 2010 at 8:30 PM, Martijn van Oosterhout wrote:

    rsync is not rocket science. All you need is for the receiving end to
    send a checksum for each block it has. The server side does the same
    checksum and for each block sends back "same" or "new data".
    Well rsync is closer to rocket science than that. It does rolling
    checksums and can handle data being moved around, which vacuum does do
    so it's probably worthwhile.


    *However* I tihnk you're all headed in the wrong direction here. I
    don't think rsync is what anyone should be doing with their backups at
    all. It still requires scanning through *all* your data even if you've
    only changed a small percentage (which it seems is the use case you're
    concerned about) and it results in corrupting your backup while the
    rsync is in progress and having a window with no usable backup. You
    could address that with rsync --compare-dest but then you're back to
    needing space and i/o for whole backups every time even if you're only
    changing small parts of the database.

    The industry standard solution that we're missing that we *should* be
    figuring out how to implement is incremental backups.

    I've actually been thinking about this recently and I think we could
    do it fairly easily with our existing infrastructure. I was planning
    on doing it as an external utility but it would be tempting to be able
    to request an external backup via the streaming protocol so maybe it
    would be better a bit more integrated.

    The way I see it there are two alternatives. You need to start by
    figuring out which blocks have been modified since the last backup (or
    selected reference point). You can do this either by scanning every
    data file and picking every block with an LSN > the reference LSN. Or
    you can do it by scanning the WAL since that point and accumulating a
    list of block numbers.

    Either way you then need to archive all those blocks into a special
    file format which includes meta-information to dictate which file and
    what block number each block represents. Also it would be useful to
    include the reference LSN and the beginning and ending LSN of the
    backup so that we can verify when restoring it that we're starting
    with a recent enough database and that we've replayed the right range
    of WAL to bring it to a consistent state.

    It's tempting to make the incremental backup file format just a
    regular WAL file with a series of special WAL records which just
    contain a backup block. That might be a bit confusing since it would
    be a second unrelated LSN series but I like the idea of being able to
    use the same bits of code to handle the "holes" and maybe other code.
    On the whole I think it would be just a little too weird though.

    --
    greg
  • Thom Brown at Sep 4, 2010 at 1:47 pm

    On 4 September 2010 14:42, Greg Stark wrote:
    The industry standard solution that we're missing that we *should* be
    figuring out how to implement is incremental backups.
    I'll buy you a crate of beer if this gets implemented... although
    you're in Dublin so would be like buying Willy Wonka a Mars bar.

    --
    Thom Brown
    Twitter: @darkixion
    IRC (freenode): dark_ixion
    Registered Linux user: #516935
  • Robert Haas at Sep 5, 2010 at 2:58 am

    On Sat, Sep 4, 2010 at 9:42 AM, Greg Stark wrote:
    *However* I tihnk you're all headed in the wrong direction here. I
    don't think rsync is what anyone should be doing with their backups at
    all. It still requires scanning through *all* your data even if you've
    only changed a small percentage (which it seems is the use case you're
    concerned about) and it results in corrupting your backup while the
    rsync is in progress and having a window with no usable backup. You
    could address that with rsync --compare-dest but then you're back to
    needing space and i/o for whole backups every time even if you're only
    changing small parts of the database.
    It depends. If the use case is "I accidentally (or purposefully but
    temporarily) started up my slave as a master, and now I want it to go
    back to having it be the master" or "I lost the WAL files I need to
    roll this base backup forward (perhaps because wal_keep_segments
    wasn't set high enough)", rsync is what you need.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Martijn van Oosterhout at Sep 5, 2010 at 3:52 pm

    On Sat, Sep 04, 2010 at 02:42:40PM +0100, Greg Stark wrote:
    On Fri, Sep 3, 2010 at 8:30 PM, Martijn van Oosterhout
    wrote:
    rsync is not rocket science. All you need is for the receiving end to
    send a checksum for each block it has. The server side does the same
    checksum and for each block sends back "same" or "new data".
    Well rsync is closer to rocket science than that. It does rolling
    checksums and can handle data being moved around, which vacuum does do
    so it's probably worthwhile.
    Not sure. When vacuum moves rows around the chance that it will move
    rows as a block and that the line pointers will be the same is
    practically nil. I don't think rsync will pick up on blocks the size of
    a typical row. Vacuum changes the headers so you never have a copied
    block.
    *However* I tihnk you're all headed in the wrong direction here. I
    don't think rsync is what anyone should be doing with their backups at
    all. It still requires scanning through *all* your data even if you've
    only changed a small percentage (which it seems is the use case you're
    concerned about) and it results in corrupting your backup while the
    rsync is in progress and having a window with no usable backup. You
    could address that with rsync --compare-dest but then you're back to
    needing space and i/o for whole backups every time even if you're only
    changing small parts of the database.
    If you're working from a known good version of the database at some
    point, yes you are right you have more interesting options. If you
    don't you want something that will fix it.

    Have a nice day,
    --
    Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
    Patriotism is when love of your own people comes first; nationalism,
    when hate for people other than your own comes first.
    - Charles de Gaulle
  • Greg Stark at Sep 6, 2010 at 2:18 pm

    On Sun, Sep 5, 2010 at 4:51 PM, Martijn van Oosterhout wrote:

    If you're working from a known good version of the database at some
    point, yes you are right you have more interesting options. If you
    don't you want something that will fix it.
    Sure, in that case you want to restore from backup. Whatever you use
    to do that is the same net result. I'm not sure rsync is actually
    going to be much faster though since it still has to read all of the
    existing database which a normal restore doesn't have to. If the
    database has changed significantly that's a lot of extra I/O and
    you're probably on a local network with a lot of bandwidth available.

    What I'm talking about is how you *take* backups. Currently you have
    to take a full backup which if you have a large data warehouse could
    be a big job. If only a small percentage of the database is changing
    then you could use rsync to reduce the network bandwidth to transfer
    your backup but you still have to read the entire database and write
    out the entire backup.

    Incremental backups mean being able to read just the data blocks that
    have been modified and write out a backup file with just those blocks.
    When it comes time to restore then you restore the last full backup,
    then any incremental backups since then, then replay any logs needed
    to bring it to a consistent state.

    I think that description pretty much settles the question in my mind.
    The implementation choice of scanning the WAL to find all the changed
    blocks is more relevant to the use cases where incremental backups are
    useful. If you still have to read the entire database then there's not
    all that much to be gained except storage space. If you scan the WAL
    then you can avoid reading most of your large data warehouse to
    generate the incremental and only read the busy portion.

    In the use case where the database is extremely busy but writing and
    rewriting the same small number of blocks over and over even scanning
    the WAL might not be ideal. For that use case it might be more useful
    to generate a kind of wal-summary which lists all the blocks touched
    since the last checkpoint every checkpoint. But that could be a later
    optimization.


    --
    greg
  • Robert Haas at Sep 6, 2010 at 10:10 pm

    On Mon, Sep 6, 2010 at 10:07 AM, Greg Stark wrote:
    I think that description pretty much settles the question in my mind.
    The implementation choice of scanning the WAL to find all the changed
    blocks is more relevant to the use cases where incremental backups are
    useful. If you still have to read the entire database then there's not
    all that much to be gained except storage space. If you scan the WAL
    then you can avoid reading most of your large data warehouse to
    generate the incremental and only read the busy portion.
    If you can scan the WAL, why wouldn't you just replay it?

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Bruce Momjian at Sep 8, 2010 at 1:29 am

    Greg Stark wrote:
    The industry standard solution that we're missing that we *should* be
    figuring out how to implement is incremental backups.

    I've actually been thinking about this recently and I think we could
    do it fairly easily with our existing infrastructure. I was planning
    on doing it as an external utility but it would be tempting to be able
    to request an external backup via the streaming protocol so maybe it
    would be better a bit more integrated.

    The way I see it there are two alternatives. You need to start by
    figuring out which blocks have been modified since the last backup (or
    selected reference point). You can do this either by scanning every
    data file and picking every block with an LSN > the reference LSN. Or
    you can do it by scanning the WAL since that point and accumulating a
    list of block numbers.
    That's what pgrman does already:

    http://code.google.com/p/pg-rman/

    Are you saying you want to do that over the libpq connection?

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://enterprisedb.com

    + It's impossible for everything to be true. +
  • Magnus Hagander at Sep 3, 2010 at 11:29 am

    On Fri, Sep 3, 2010 at 13:19, Heikki Linnakangas wrote:
    It's been discussed before that it would be cool if you could stream a new
    base backup from the master server, via libpq. That way you would not need
    low-level filesystem access to initialize a new standby.

    Magnus mentioned today that he started hacking on that, and coincidentally I
    just started experimenting with it yesterday as well :-). So let's get this
    out on the mailing list.

    Here's a WIP patch. It adds a new "TAKE_BACKUP" command to the replication
    command set. Upon receiving that command, the master starts a COPY, and
    streams a tarred copy of the data directory to the client. The patch
    includes a simple command-line tool, pg_streambackup, to connect to a server
    and request a backup that you can then redirect to a .tar file or pipe to
    "tar x".

    TODO:

    * We need a smarter way to do pg_start/stop_backup() with this. At the
    moment, you can only have one backup running at a time, but we shouldn't
    have that limitation with this built-in mechanism.

    * The streamed backup archive should contain all the necessary WAL files
    too, so that you don't need to set up archiving to use this. You could just
    point the tiny client tool to the server, and get a backup archive
    containing everything that's necessary to restore correctly.
    For this last point, this should of course be *optional*, but it would
    be very good to have that option (and probably on by default).


    Couple of quick comments that I saw directly differentiated from the
    code I have :-) We chatted some about it already, but it should be
    included for others...

    * It should be possible to pass the backup label through, not just
    hardcode it to basebackup

    * Needs support for tablespaces. We should either follow the symlinks
    and pick up the files, or throw an error if it's there. Silently
    delivering an incomplete backup is not a good thing :-)

    * Is there a point in adapting the chunk size to the size of the libpq buffers?

    FWIW, my implementation was as a user-defined function, which has the
    advantage it can run on 9.0. But most likely this code can be ripped
    out and provided as a separate backport project for 9.0 if necessary -
    no need to have separate codebases.

    Other than that, our code is remarkably similar.
  • Greg Stark at Sep 3, 2010 at 12:16 pm

    On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas wrote:
    * We need a smarter way to do pg_start/stop_backup() with this. At the
    moment, you can only have one backup running at a time, but we shouldn't
    have that limitation with this built-in mechanism.
    Well there's no particular reason we couldn't support having multiple
    pg_start_backup() pending either. It's just not usually something
    people have need so far.


    --
    greg
  • Heikki Linnakangas at Sep 3, 2010 at 12:29 pm

    On 03/09/10 15:16, Greg Stark wrote:
    On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
    wrote:
    * We need a smarter way to do pg_start/stop_backup() with this. At the
    moment, you can only have one backup running at a time, but we shouldn't
    have that limitation with this built-in mechanism.
    Well there's no particular reason we couldn't support having multiple
    pg_start_backup() pending either. It's just not usually something
    people have need so far.
    The backup label file makes that hard. There can be only one at a time.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Tom Lane at Sep 3, 2010 at 3:01 pm

    Heikki Linnakangas writes:
    On 03/09/10 15:16, Greg Stark wrote:
    On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
    wrote:
    * We need a smarter way to do pg_start/stop_backup() with this. At the
    moment, you can only have one backup running at a time, but we shouldn't
    have that limitation with this built-in mechanism.
    Well there's no particular reason we couldn't support having multiple
    pg_start_backup() pending either. It's just not usually something
    people have need so far.
    The backup label file makes that hard. There can be only one at a time.
    I don't actually see a use-case for streaming multiple concurrent
    backups. How many people are going to be able to afford that kind of
    load on the master's I/O bandwidth?

    Certainly for version 1, it would be sufficient to throw an error if
    someone tries to start a backup while another one is in progress.
    *Maybe*, down the road, we'd want to relax it.

    regards, tom lane
  • Thom Brown at Sep 3, 2010 at 3:24 pm

    On 3 September 2010 16:01, Tom Lane wrote:
    Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
    On 03/09/10 15:16, Greg Stark wrote:
    On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
    wrote:
    * We need a smarter way to do pg_start/stop_backup() with this. At the
    moment, you can only have one backup running at a time, but we shouldn't
    have that limitation with this built-in mechanism.
    Well there's no particular reason we couldn't support having multiple
    pg_start_backup() pending either. It's just not usually something
    people have need so far.
    The backup label file makes that hard. There can be only one at a time.
    I don't actually see a use-case for streaming multiple concurrent
    backups.  How many people are going to be able to afford that kind of
    load on the master's I/O bandwidth?
    To make it affordable, could functionality be added to allow slaves to
    become chainable? (i.e. master streams to standby 1, which streams to
    standby 2 etc) This would help reduce bandwidth for normal streaming
    replication too, which would be useful on particularly busy databases.
    Obviously in synchronous replication this would be horribly slow so
    not feasible for that.
    --
    Thom Brown
    Twitter: @darkixion
    IRC (freenode): dark_ixion
    Registered Linux user: #516935
  • Heikki Linnakangas at Sep 3, 2010 at 3:38 pm

    On 03/09/10 18:01, Tom Lane wrote:
    Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes:
    On 03/09/10 15:16, Greg Stark wrote:
    On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
    wrote:
    * We need a smarter way to do pg_start/stop_backup() with this. At the
    moment, you can only have one backup running at a time, but we shouldn't
    have that limitation with this built-in mechanism.
    Well there's no particular reason we couldn't support having multiple
    pg_start_backup() pending either. It's just not usually something
    people have need so far.
    The backup label file makes that hard. There can be only one at a time.
    I don't actually see a use-case for streaming multiple concurrent
    backups. How many people are going to be able to afford that kind of
    load on the master's I/O bandwidth?
    It's more a matter of convenience when you're setting up test
    environments with small databases or something like that. I don't see
    many people regularly using the streaming backup for anything larger
    than a few hundred gigabytes anyway. At that point you'll most likely
    want to use something more efficient.
    Certainly for version 1, it would be sufficient to throw an error if
    someone tries to start a backup while another one is in progress.
    *Maybe*, down the road, we'd want to relax it.
    Yeah, it's OK for 1st version.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedSep 3, '10 at 11:19a
activeSep 8, '10 at 1:29a
posts41
users12
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase