[Explanation on why an RPM cannot dump a database during upgrade
follows. This is a lengthy explanation. If you don't want to read it,
please hit 'Delete' now. -- Also, I have blind copied Hackers, and cc:'d
PORTS, as that is where this discussion belongs, per Bruce's wishes.]

Peter Eisentraut wrote:
Lamar Owen writes:
In the environment of the general purpose OS upgrade, the RPM's
installation scripts cannot fire up a backend, nor can it assume one
I don't understand why this is so. It seems perfectly possible that some
%preremovebeforeupdate starts a postmaster, runs pg_dumpall, saves the
file somewhere, then the %postinstallafterupdate runs the inverse
operation. Disk space is not a valid objection, you'll never get away
without 2x storage. Security is not a problem either. Are you not
upgrading in proper dependency order or what? Everybody does dump,
remove, install, undump; so can the RPMs.
The RedHat installer (anaconda) is running in a terribly picky
environment. There a very few tools in this environment -- after all,
this is an installer we're talking about here. Starting a postmaster is
likely to fail, and fail big. Further, the anaconda install environment
is a chroot -- or, at least the environment the RPM scriptlets run in is
a chroot -- a chroot that is the active filesystem that is being
upgraded. This filesystem likely contains old libraries, old
executables, and other programs that may have a hard time running under
the limited installation kernel and the limited libraries available to
the installer.

And since packages are actively discouraged from probing whether they're
running in the anaconda chroot or not, it is not possible to start a
postmaster. Mandrake allows packages to probe this -- which I
personally think is a bad idea -- packages that need to know this sort
of information are usually packages that would be better off finding a
least common denominator upgrade path that will work the best. A single
upgrade path is much easier to maintain the two upgrade paths.

Sure, during a command line upgrade, I can probe for a postmaster, and
even start one -- but I dare say the majority of PostgreSQL RPM upgrades
don't happen from the command line. Even if I _can_ probe whether I'm
in the anaconda chroot or not, I _still_ have to have an upgrade path in
case this _is_ an OS upgrade.

Think about it: suppose I had a postmaster start up, and a pg_dumpall
runs during OS upgrade. Calculating free space is not possible -- you
are in the middle of an OS upgrade, and more packages may be selected
for installation than are already installed -- or, an upgrade to an
existing package may take more space than the previous version (XFree86
3.3.6 to XFree86 4.0.1 is a good example) -- you have no way of knowing
from the RPM installation scripts in the package how much free space
there will or won't be when the upgrade is complete. And anaconda
doesn't help you out with an ESTIMATED_SPACE_AFTER_INSTALL environment
variable.

And you really can't assume 2x space -- the user may have decided that
this machine that didn't have TeX installed needs TeX installed, and
Emacs, and, while it didn't have GNOME before, it needs it now.....
Sure, the user just got himself in a pickle -- but I'm not about to be
the scapegoat for _his_ pickle.

And I can't assume that the /var partition (where the dataset resides)
is separate, or that it even has enough space -- the user might be
dumping to another filesystem, or maybe onto tape. And, in the confines
of an RPM %pre scriptlet, I have no way of finding out.

Furthermore, I can't accurately predict how much space even a compressed
ASCII dump will take . Calculating the size of the dataset in PGDATA
does not accurately predict the size of the dumpfile.

As to using split or the like to split huge dumpfiles, that is a
necessity -- but the space calculation problem defeats the whole concept
of dump-during-upgrade. I cannot determine how much space I have, and I
cannot determine how much space I need -- and, if I overflow the
filesystem during an OS upgrade that is halfway complete (PostgreSQL
usually is upgraded about two thirds of the way through or so), then I
leave the user with a royally hosed system. I don't want that on my
shoulders, do you? :-)

Therefore, the pg_dumpall _has_ to occur _after_ the new version has
overwritten the old version, and _after_ the OS upgrade is completed --
unless the user has done what they should have done to begin with --
but, the fact of the matter is that many users simply won't do it Right.

You can't assume the user is going to be reasonable by your standard --
in fact, you have to do the opposite -- your standard of reasonable, and
the user's standard of reasonable, might be totally different things.

Incidentally, I originally attempted doing the dump inside the
preinstall, and found it to be an almost impossible task. The above
reasons might be solvable, but then there's this little problem: what if
you _are_ able to predict the space needed and the space available --
and there's not enough space available?

The PostgreSQL RPM's are not a single package, and anaconda has no way
of rolling back another part of an RPMset's installation if one part
fails. So, you can't just abort because you failed to dump -- the
package that needs the dump is the server subpackage -- and the main
package has already finished installation by that time. And you can't
roll it back.

And the user has a hosed PostgreSQL installation as a result.

As to why the package is split, well, it is highly useful to many people
to have a PostgreSQL _client_ installation that accesses a central
database server -- there is no need to have a postmaster and a full
backend when all you need is psql and the libraries and documentation
that goes along with psql.

RPM's have to deal with both a very difficult environment, and users who
might not be as technically savvy as those who install from source.
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11

Search Discussions

  • Karl DeBisschop at Oct 31, 2000 at 4:28 pm

    Lamar Owen wrote:

    As to why the package is split, well, it is highly useful to many people
    to have a PostgreSQL _client_ installation that accesses a central
    database server -- there is no need to have a postmaster and a full
    backend when all you need is psql and the libraries and documentation
    that goes along with psql.
    My personal experience is that the way the PostgreSQL RPMs are split is very good. It meshes nicely with other dependencies so that I don't need to install extra RPMs on our servers. I for one would not like to see that change.

    --
    Karl DeBisschop kdebisschop@alert.infoplease.com
    Learning Network Reference http://www.infoplease.com
    Netsaint Plugin Developer kdebisschop@users.sourceforge.net
  • Lamar Owen at Oct 31, 2000 at 4:38 pm

    Karl DeBisschop wrote:

    Lamar Owen wrote:
    As to why the package is split, well, it is highly useful to many people
    to have a PostgreSQL _client_ installation that accesses a central
    database server -- there is no need to have a postmaster and a full
    backend when all you need is psql and the libraries and documentation
    that goes along with psql.
    My personal experience is that the way the PostgreSQL RPMs are split is very good. It meshes nicely with other dependencies so that I don't need to install extra RPMs on our servers. I for one would not like to see that change.
    And I agree -- and have no plans to change. If anything the RPMset will
    increase in number, not decrease.
    --
    Lamar Owen
    WGCR Internet Radio
    1 Peter 4:11

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-ports @
categoriespostgresql
postedOct 31, '00 at 3:51p
activeOct 31, '00 at 4:38p
posts3
users2
websitepostgresql.org
irc#postgresql

2 users in discussion

Lamar Owen: 2 posts Karl DeBisschop: 1 post

People

Translate

site design / logo © 2022 Grokbase