Hello folks,

I have an idea, to break pg_dump into functions within PostgreSQL.

We currently have some functions like pg_get_viewdef, but it would
be great to have a function or functions to dump the SQL definition
of any object.

This would allow easy dumping of any object from admin interfaces,
and allow totally customized consistent dumps from within a
transaction.
In combination with dblink, it would also allow easy replication of
objects between databases.

From there we could add diff functions, that take two similar
objects and produces the SQL commands to transform one into the
other.
This would help greatly when it comes to development, migrating
the schema, etc.

Would this be practical, would it be possible to break the pg_dump
code out into PostgreSQL functions?

Anyone have any ideas/thoughts/suggestions on this?

Cheers
--
Mark Gibson <gibsonm |AT| cromwell |DOT| co |DOT| uk>
Web Developer & Database Admin
Cromwell Tools Ltd.
Leicester, England.

Search Discussions

  • Christopher Kings-Lynne at Sep 14, 2004 at 8:43 am

    Would this be practical, would it be possible to break the pg_dump
    code out into PostgreSQL functions?

    Anyone have any ideas/thoughts/suggestions on this?
    It's been my plan for sometime, I'm time-starved at the moment however.

    Chris
  • Darko Prenosil at Sep 14, 2004 at 2:45 pm
    Yes, IMHO this looks like a god idea. I planed to do something like that,
    but to return some kind of description language (XML for example),so restore
    could decide at runtime what exactly to do with that metadata ( depending on
    what user wants - create database or not, create users or not, reset
    serializators or not, replace existing objects or just create objects that
    are not existing at database, even do a "pgpatch" on existing objects), but
    this needs more time than I currently have :-(
    I still hope I'll do it some day...

    Regards !

    ----- Original Message -----
    From: "Christopher Kings-Lynne" <chriskl@familyhealth.com.au>
    To: "Mark Gibson" <gibsonm@cromwell.co.uk>
    Cc: "Hackers (PostgreSQL)" <pgsql-hackers@postgresql.org>
    Sent: Tuesday, September 14, 2004 10:44 AM
    Subject: Re: [HACKERS] pg_dump as a bunch of PostgreSQL functions

    Would this be practical, would it be possible to break the pg_dump
    code out into PostgreSQL functions?

    Anyone have any ideas/thoughts/suggestions on this?
    It's been my plan for sometime, I'm time-starved at the moment however.

    Chris


    ---------------------------(end of broadcast)---------------------------
    TIP 8: explain analyze is your friend
  • Philip Warner at Sep 14, 2004 at 1:12 pm

    At 06:00 PM 14/09/2004, Mark Gibson wrote:
    I have an idea, to break pg_dump into functions within PostgreSQL.
    This has been suggested before, and I think been generally accepted as the
    right broad approach (the basic idea that the pg backend should know how to
    describe itself).

    Recent versions of pg_dump have started using backend functions to dump
    database structures (eg. type definitions). As time goes by more functions
    will be written, but I don't think it's the highest priority on anybody's
    list. There are also the information schemas which are the ISO way of
    getting database definitions; these can/should be used where possible.

    However, there are some complications because pg_dump is also the upgrade
    tool; the backed can only know how to describe itself for the current
    dialect of SQL accepted by PG. As we upgrade and improve the SQL, and add
    features, pg_dump needs to talk to old backends and dump prior versions in
    a format compatible with current (new) versions. This means that for some
    purposes it will not be able to use backend functions, or at least will
    have to have it's own mutant version of them.

    There are other differences; for reasons of performance and atomicity, we
    try to keep the items dumped as simple as possible. eg. in 8.0, a table
    definition will ultimately be dumped as:

    1. set default_tablespace=xxx
    1. set search_path=xxx
    2. create table (no constraints, tablespace or namespace clauses)
    4. load table data
    3. alter table add constraint...
    5. set table acls

    A 'friendly' definition would at least contain the namespace & constraints,
    but pg_dump does not want that.

    So it's not a simple as it sounds.

    <random_idea>
    Perhaps it would be nice if, in each new version we created a library that
    could be built against old versions to provide the functions needed by
    pg_dump to upgrade, and a similar library would form part of the new
    version as well. Kind of a 'pg_dump translation plugin'. This may be way
    too expensive an option, when a few 'if' statements inside pg_dump will
    achieve almost the same result. It would remove/reduce bloat in pg_dump and
    make the functions available more generally, at the expense of duplicating
    lots of code for each supported version.
    </random_idea>



    ----------------------------------------------------------------
    Philip Warner | __---_____
    Albatross Consulting Pty. Ltd. |----/ - \
    (A.B.N. 75 008 659 498) | /(@) ______---_
    Tel: (+61) 0500 83 82 81 | _________ \
    Fax: (+61) 03 5330 3172 | ___________ |
    Http://www.rhyme.com.au | / \|
    --________--
    PGP key available upon request, | /
    and from pgp.mit.edu:11371 |/
  • Tom Lane at Sep 14, 2004 at 2:19 pm

    Mark Gibson writes:
    I have an idea, to break pg_dump into functions within PostgreSQL.
    This strikes me as largely useless, because the problems that are
    actually hard for pg_dump are not at the level of individual objects;
    they have to do with problems like determining a safe dump order and
    altering definitions to break circularities in the dump requirements.
    I don't think that extending the pg_get_xxx family of functions would
    make pg_dump's life easier by any measurable amount.

    There is also a fundamental problem with the current pg_get_xxx
    functions, which is that they rely on backend-internal catalog
    operations that generally use SnapshotNow semantics. This is not what
    we want to guarantee that pg_dump dumps a consistent snapshot --- we
    need to look at catalog rows that existed as of pg_dump's serializable
    snapshot, instead. We have gotten away with it so far because pg_dump
    starts by taking read locks on every table in sight, and that is
    sufficient to block schema changes on the tables. But extending the
    pg_get_xxx approach to non-table-related objects would be seriously
    dangerous. (I think pg_get_viewdef is already broken, actually,
    since you can't lock a view.)

    regards, tom lane
  • Mark Gibson at Sep 16, 2004 at 1:54 pm

    Mark Gibson writes:
    I have an idea, to break pg_dump into functions within PostgreSQL.
    Philip Warner wrote:
    However, there are some complications because pg_dump is also the
    upgrade tool; the backed can only know how to describe itself for the
    current dialect of SQL accepted by PG. As we upgrade and improve the
    SQL, and add features, pg_dump needs to talk to old backends and dump
    prior versions in a format compatible with current (new) versions. This
    means that for some purposes it will not be able to use backend
    functions, or at least will have to have it's own mutant version of them.
    Tom Lane wrote:
    This strikes me as largely useless, because the problems that are
    actually hard for pg_dump are not at the level of individual objects;
    they have to do with problems like determining a safe dump order and
    altering definitions to break circularities in the dump requirements.
    I don't think that extending the pg_get_xxx family of functions would
    make pg_dump's life easier by any measurable amount.
    I was thinking really of very low-level dumping functions,
    that just dump simple single statements.
    Its mainly for development purposes, we generally develop db apps in
    three or four stages:
    1, In a developer only db, each developer has a private schema to
    experiment in (prototypes, proof-of-concepts, alpha stage, etc).
    2, Then its moved into its own schema on the same developer server for
    further stabilization (alpha/beta stage).
    3, Once it reaches beta, it moves onto a mirror of the production
    server, for further testing by users (beta/pre-release stage).
    4, For release, it's moved onto a production server for general consumption.

    So there is lots of dumping and reloading of schemas, the only way
    to do it is by using pg_dump and then hand editing the dump (or with
    sed/awk scripts etc.), which is a bit of a pain.

    I was thinking that if all the dumping functionality was available
    as PostgreSQL functions, then a simple PL/pgSQL function could be
    written to dump exactly what is required, and using dblink, fed
    directly from one db to another. Nice :)

    Dependency functions could be developed to determine the ideal
    dump order for higher-level dumping, and would also be useful
    for displaying dependency graphs in admin interfaces
    (phpPgAdmin, pgAdmin III, etc.).
    There is also a fundamental problem with the current pg_get_xxx
    functions, which is that they rely on backend-internal catalog
    operations that generally use SnapshotNow semantics. This is not what
    we want to guarantee that pg_dump dumps a consistent snapshot --- we
    need to look at catalog rows that existed as of pg_dump's serializable
    snapshot, instead. We have gotten away with it so far because pg_dump
    starts by taking read locks on every table in sight, and that is
    sufficient to block schema changes on the tables. But extending the
    pg_get_xxx approach to non-table-related objects would be seriously
    dangerous. (I think pg_get_viewdef is already broken, actually,
    since you can't lock a view.)
    So, are pg_catalog tables subject to MVCC in the same way as user
    tables? ie. If I BEGIN a transaction, will the pg_catalog data
    remain consistent thoughout the transaction regardless of any DDL
    commands in a parallel session?

    Philip Warner wrote:
    Perhaps it would be nice if, in each new version we created a library
    that could be built against old versions to provide the functions needed
    by pg_dump to upgrade, and a similar library would form part of the new
    version as well. Kind of a 'pg_dump translation plugin'. This may be way
    too expensive an option, when a few 'if' statements inside pg_dump will
    achieve almost the same result. It would remove/reduce bloat in pg_dump
    and make the functions available more generally, at the expense of
    duplicating lots of code for each supported version.
    I was thinking of sharing a library (or just code base) between
    pg_dump and built-in functions, so that pg_dump doesn't have to
    rely on old built-in functions of a db you are trying to upgrade.

    --
    Mark Gibson <gibsonm |AT| cromwell |DOT| co |DOT| uk>
    Web Developer & Database Admin
    Cromwell Tools Ltd.
    Leicester, England.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedSep 14, '04 at 8:40a
activeSep 16, '04 at 1:54p
posts6
users5
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase