I have spent several days now puzzling over the corrupted WAL logfile
that Scott Parish was kind enough to send me from a 7.1beta4 crash.
It looks a lot like two different series of transactions were getting
written into the same logfile. I'd been digging like mad in the WAL
code to try to explain this as a buffer-management logic error, but
after a fresh exchange of info it turns out that I was barking up the
wrong tree. There *were* two different series of transactions.
Specifically, here's what happened:

1. Scott (or actually his associate) shut down and restarted the
postmaster using the /etc/rc.d/init.d/pgsql script that ships with
our RPMs. That script shuts down the old postmaster with
killproc postmaster
It turns out that at least on Scott's machine (RedHat 6.1), the default
kill level for the killproc function is kill -9. (This is clearly a bad
bug in the init script, but I digress.)

2. So, the old postmaster was killed with kill -9, but its child
backends were still running. The new postmaster will start up
successfully because it'll think the old postmaster crashed, and
so it will go through the usual recovery procedure.

3. Now we have two sets of backends running in different shmem blocks
(7.0 might have choked on that part, but 7.1 doesn't care) and running
different sets of transactions. But they're writing to the same WAL
log. Result: guaranteed corruption of the log.

It actually took two iterations of this to expose the bug: the third
attempted postmaster start went looking for the checkpoint record last
written by the second one, which meanwhile had got overwritten by
activity of the first backend set.


Now, killing the postmaster -9 and not cleaning up the backends has
always been a good way to shoot yourself in the foot, but up to now the
worst thing that was likely to happen to you was isolated corruption in
specific tables. In the brave new world of WAL the stakes are higher,
because the system will refuse to start up if it finds a corrupted
checkpoint record. Clueless admins who resort to kill -9 as a routine
admin tool *will* lose their databases. Moreover, the init scripts
that are running around now are dangerous weapons if used with 7.1.

I think we need a stronger interlock to prevent this scenario, but I'm
unsure what it should be. Ideas?

regards, tom lane

Search Discussions

  • Thomas Swan at Mar 5, 2001 at 11:19 pm

    At 3/5/2001 04:30 PM, you wrote:
    Now, killing the postmaster -9 and not cleaning up the backends has
    always been a good way to shoot yourself in the foot, but up to now the
    worst thing that was likely to happen to you was isolated corruption in
    specific tables. In the brave new world of WAL the stakes are higher,
    because the system will refuse to start up if it finds a corrupted
    checkpoint record. Clueless admins who resort to kill -9 as a routine
    admin tool *will* lose their databases. Moreover, the init scripts
    that are running around now are dangerous weapons if used with 7.1.

    I think we need a stronger interlock to prevent this scenario, but I'm
    unsure what it should be. Ideas?
    Is there anyway to see if the other processes (child) have a lock on the
    log file?

    On a lot of systems, when a daemon starts, will record the PID in a file so
    it/'the admin' can do a 'shutdown' script with the PID listed.
    Can child processes list themselves like child.PID in a configurable
    directory, and have the starting process look for all of these and shut the
    "orphaned" child processes down?

    Just thoughts...

    Thomas
  • Alfred Perlstein at Mar 5, 2001 at 11:48 pm

    * Tom Lane [010305 14:51] wrote:

    I think we need a stronger interlock to prevent this scenario, but I'm
    unsure what it should be. Ideas?
    Re having multiple postmasters active by accident.

    The sysV IPC stuff has some hooks in it that may help you.

    One idea is to check the 'struct shmid_ds' feild 'shm_nattch',
    basically at startup if it's not 1 (or 0) then you have more than
    one postgresql instance messing with it and it should not proceed.

    I'd also suggest looking into using sysV semaphores and the semundo
    stuff, afaik it can be used to track the number of consumers of
    a reasource.

    --
    -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
  • Lamar Owen at Mar 6, 2001 at 1:46 am

    Tom Lane wrote:
    checkpoint record. Clueless admins who resort to kill -9 as a routine
    admin tool *will* lose their databases. Moreover, the init scripts
    that are running around now are dangerous weapons if used with 7.1.
    Thanks for the headsup, Tom. Time to nix killproc and do something
    cleaner -- compatible, but cleaner. I'll have to research what the
    defaults are for later RH's -- but, as 6.1 is one of my target platforms
    at this time, I have to fix that issue for sure.
    --
    Lamar Owen
    WGCR Internet Radio
    1 Peter 4:11
  • Tom Lane at Mar 6, 2001 at 1:49 am

    Lamar Owen writes:
    Thanks for the headsup, Tom. Time to nix killproc and do something
    cleaner -- compatible, but cleaner.
    As far as I could tell from the 6.1 scripts, it would work to do

    killproc postmaster -TERM

    The problem is just that killproc has an overenthusiastic default...

    regards, tom lane
  • Bruce Momjian at Mar 6, 2001 at 1:52 am

    Lamar Owen writes:
    Thanks for the headsup, Tom. Time to nix killproc and do something
    cleaner -- compatible, but cleaner.
    As far as I could tell from the 6.1 scripts, it would work to do

    killproc postmaster -TERM
    Yes, amazing it has a -9 default.

    --
    Bruce Momjian | http://candle.pha.pa.us
    pgman@candle.pha.pa.us | (610) 853-3000
    + If your life is a hard drive, | 830 Blythe Avenue
    + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
  • Bruce Momjian at Mar 6, 2001 at 1:52 am
    killproc should send a kill -15 to the process, wait a few seconds for
    it to exit. If it does not, try kill -1, and if that doesn't kill it,
    then kill -9.
    Tom Lane wrote:
    checkpoint record. Clueless admins who resort to kill -9 as a routine
    admin tool *will* lose their databases. Moreover, the init scripts
    that are running around now are dangerous weapons if used with 7.1.
    Thanks for the headsup, Tom. Time to nix killproc and do something
    cleaner -- compatible, but cleaner. I'll have to research what the
    defaults are for later RH's -- but, as 6.1 is one of my target platforms
    at this time, I have to fix that issue for sure.
    --
    Lamar Owen
    WGCR Internet Radio
    1 Peter 4:11

    ---------------------------(end of broadcast)---------------------------
    TIP 3: if posting/reading through Usenet, please send an appropriate
    subscribe-nomail command to majordomo@postgresql.org so that your
    message can get through to the mailing list cleanly

    --
    Bruce Momjian | http://candle.pha.pa.us
    pgman@candle.pha.pa.us | (610) 853-3000
    + If your life is a hard drive, | 830 Blythe Avenue
    + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
  • Tom Lane at Mar 6, 2001 at 1:55 am

    Bruce Momjian writes:
    killproc should send a kill -15 to the process, wait a few seconds for
    it to exit. If it does not, try kill -1, and if that doesn't kill it,
    then kill -9.
    Tell it to the Linux people ... this is their boot-script code we're
    talking about.

    regards, tom lane
  • Lamar Owen at Mar 6, 2001 at 2:11 am

    Tom Lane wrote:

    Bruce Momjian <pgman@candle.pha.pa.us> writes:
    killproc should send a kill -15 to the process, wait a few seconds for
    it to exit. If it does not, try kill -1, and if that doesn't kill it,
    then kill -9.
    Tell it to the Linux people ... this is their boot-script code we're
    talking about.
    RedHat, in particular. I can't vouch for any others.

    On my RH 6.2 box, with initscripts-5.00-1 loaded, here's what killproc
    does if no killlevel is set (even though a default $killlevel is set to
    -9, it's not used in this code):
    ($pid is the pid of the proc to kill, $base is the name of the proc,
    etc)

    if [ "$notset" = "1" ] ; then
    if ps h $pid>/dev/null 2>&1; then
    # TERM first, then KILL if not dead
    kill -TERM $pid
    usleep 100000
    if ps h $pid >/dev/null 2>&1 ; then
    sleep 1
    if ps h $pid >/dev/null 2>&1 ; then
    sleep 3
    if ps h $pid >/dev/null 2>&1 ; then
    kill -KILL $pid
    fi
    fi
    fi
    fi
    ps h $pid >/dev/null 2>&1
    RC=$?
    [ $RC -eq 0 ] && failure "$base shutdown" || success "$base
    shutdown"
    RC=$((! $RC))
    # use specified level only
    else
    if ps h $pid >/dev/null 2>&1; then
    kill $killlevel $pid
    RC=$?
    [ $RC -eq 0 ] && success "$base $killlevel" || failure "$base
    $killlevel"
    fi
    fi


    Is 6.1 this different from 6.2? This code on the surface seems
    reasonable to me -- am I missing something? The 6.2 code (found in
    /etc/rc.d/init.d/functions, for those who might not know where to find
    killproc) sets a default killlevel but never uses it -- ignorant but not
    stupid.
    --
    Lamar Owen
    WGCR Internet Radio
    1 Peter 4:11
  • Bruce Momjian at Mar 6, 2001 at 2:14 am

    if [ "$notset" = "1" ] ; then
    if ps h $pid>/dev/null 2>&1; then
    # TERM first, then KILL if not dead
    kill -TERM $pid
    usleep 100000
    if ps h $pid >/dev/null 2>&1 ; then
    sleep 1
    if ps h $pid >/dev/null 2>&1 ; then
    sleep 3
    if ps h $pid >/dev/null 2>&1 ; then
    kill -KILL $pid
    fi
    fi
    fi
    fi
    Yes, this seems like the proper way to do it.

    --
    Bruce Momjian | http://candle.pha.pa.us
    pgman@candle.pha.pa.us | (610) 853-3000
    + If your life is a hard drive, | 830 Blythe Avenue
    + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
  • Lamar Owen at Mar 6, 2001 at 2:24 am

    Bruce Momjian wrote:
    # TERM first, then KILL if not dead
    Yes, this seems like the proper way to do it.
    Now to verify that 6.1 is the same....or different.... Hmmmm.... The
    mirrors of ftp.redhat.com (and, in fact, RedHat.com itself) no longer
    have the updates or the original for 6.1's initscripts-4.70 package.
    Can a RedHat 6.1 user (using as close as possible to 6.1's release
    initscripts package) send me a copy of /etc/rc.d/init.d/functions, or
    verify how that initscripts package defines killproc? I cannot at this
    moment locate my RH 6.1 SRPMS CD. Found my RH _4_.1 CD, but that's just
    a _little_ old :-).
    --
    Lamar Owen
    WGCR Internet Radio
    1 Peter 4:11
  • Tom Lane at Mar 6, 2001 at 2:36 am

    Lamar Owen writes:
    Is 6.1 this different from 6.2?
    Scott sent me a copy of /etc/init.d/functions from his box, and it has
    largely the same behavior (I hadn't read the whole code to notice that
    it doesn't use the default killlevel...). What's actually happening
    here is that the init script sends SIGTERM, and then SIGKILL four
    seconds later if the postmaster hasn't shut down yet. Unfortunately,
    unless your clients are very short-lived four seconds isn't going to
    be enough for a "polite" shutdown. (It's pretty marginal even for
    an impolite one, since a checkpoint will take at least a couple of
    seconds.)

    However, with an explicit kill level that doesn't happen: you get one
    signal of the specified value, no more. Possibly it would be better for
    the init script to send SIGINT (forcibly disconnect clients) instead of
    SIGTERM, however. So I'm now leaning to "killproc postmaster -INT".

    regards, tom lane
  • Lamar Owen at Mar 6, 2001 at 2:45 am

    Tom Lane wrote:
    However, with an explicit kill level that doesn't happen: you get one
    signal of the specified value, no more. Possibly it would be better for
    the init script to send SIGINT (forcibly disconnect clients) instead of
    SIGTERM, however. So I'm now leaning to "killproc postmaster -INT".
    Ok, since I can't seem to count on killproc's exact behavior, istm that
    I can:
    killproc postmaster -INT
    wait some number of seconds
    if postmaster still up
    killproc postmaster -TERM
    wait some number of seconds
    if postmaster STILL up
    killproc postmaster #and let the grim reaper do its dirty work.

    After all, the system shutdown is relying on this script to properly and
    thoroughly shut things down, or it WILL do the 'kill -9
    pid-of-postmaster' for you.

    Now, what's a good delay here? Or is there a better metric that a
    simple delay? After all, I want to avoid the kill -9 unless we have an
    emergency hard lock situation -- what's a good indicator of the backend
    fleet of processes actually _doing_ something? Or should I key on an
    indicator of processor speed (Linux does provide a nice bogus metric
    known as BogoMIPS for such a purpose)? The last thing I want to do is
    wait too long on some platforms and not long enough on others.
    --
    Lamar Owen
    WGCR Internet Radio
    1 Peter 4:11
  • Tom Lane at Mar 6, 2001 at 2:53 am

    Lamar Owen writes:
    The last thing I want to do is
    wait too long on some platforms and not long enough on others.
    The difficulty is to know how long the final checkpoint will take.
    This depends on (at least) your hard disk speed and the number of
    dirty buffers, so I think you're going to have some difficulty
    estimating it with any reliability. BogoMIPS won't help, for sure.

    However, if you do SIGINT and then wait a few seconds, you can be fairly
    sure that all the extant backends are dead (if not frozen up...) and
    that the checkpoint is in progress. That may be about the best you can
    do.

    I do not agree that this script should take it on itself to kill -9 the
    postmaster. Please note that the reason we're having this discussion at
    all is that the init script may be used for purposes other than system
    shutdown. So the argument that "it's going to happen anyway" is wrong.

    regards, tom lane
  • Lamar Owen at Mar 6, 2001 at 3:03 am

    Tom Lane wrote:
    Please note that the reason we're having this discussion at
    all is that the init script may be used for purposes other than system
    shutdown. So the argument that "it's going to happen anyway" is wrong.
    Believe it or not, you just disproved your own statement that the
    initscript should not take it upon itself to issue the kill -9. So,
    what if I issue '/etc/rc.d/init.d/postgresql restart' -- and backends
    don't go away during the 'stop' phase, while postmaster may actually
    have died? Or is it even possible for postmaster to drop out with a
    running backend out there?

    No, more is needed. But I think a careful reap through the running
    backends to kill those that need killing if postmaster won't go down
    might be prudent. Currently it is not possible to run multiple
    postmasters with the RPM install (I am working on that little problem,
    but it won't be for 7.1's RPMset yet), so all backends that are running
    on the RPM PGDATA location (which I am looking at making configurable as
    well) will belong to the one postmaster. Of course, that would be an
    absolute last resort.

    Oh well -- the real solution is elsewhere, anyway. I just have to make
    sure it is not data-corruption broken. And, if leaving the -9 out
    completely is the only solution, then, well, it's the only solution.
    --
    Lamar Owen
    WGCR Internet Radio
    1 Peter 4:11
  • Tom Lane at Mar 6, 2001 at 3:10 am

    Lamar Owen writes:
    Tom Lane wrote:
    Please note that the reason we're having this discussion at
    all is that the init script may be used for purposes other than system
    shutdown. So the argument that "it's going to happen anyway" is wrong.
    Believe it or not, you just disproved your own statement that the
    initscript should not take it upon itself to issue the kill -9. How?
    So, what if I issue '/etc/rc.d/init.d/postgresql restart' -- and
    backends don't go away during the 'stop' phase, while postmaster may
    actually have died? Or is it even possible for postmaster to drop out
    with a running backend out there?
    The postmaster will certainly not do so voluntarily. If you kill -9 it,
    of course, that's the situation you're left with ... but your reasoning
    seems circular to me. "I should kill -9 the postmaster to prevent the
    situation where I've kill -9'd the postmaster."

    regards, tom lane
  • Lamar Owen at Mar 6, 2001 at 3:24 am

    Tom Lane wrote:
    of course, that's the situation you're left with ... but your reasoning
    seems circular to me. "I should kill -9 the postmaster to prevent the
    situation where I've kill -9'd the postmaster."
    Ok, while the script can certainly be used from the command line, its
    primary purpose is system shutdown.

    And, I am thinking kindof circituitously at this point -- I only now
    realize just how circituitously. If I keep slapping my forehead like
    this, I'm going to be bald in a few years....

    I don't want to reap the postmaster off -- I want to reap off the
    backends associated with that particular postmaster, allowing that
    postmaster to die on its own. Duh. Doing this in a safe manner is not
    going to be easy, given that the PGDATA is not on the command line to
    the backend as echoed by ps. Although I could key on PPID for the
    backends.... I'll have to experiment. But not tonight -- last week was
    more taxing than I thought. :-(.
    --
    Lamar Owen
    WGCR Internet Radio
    1 Peter 4:11
  • Tom Lane at Mar 6, 2001 at 3:31 am

    Lamar Owen writes:
    I don't want to reap the postmaster off -- I want to reap off the
    backends associated with that particular postmaster, allowing that
    postmaster to die on its own. Duh. Doing this in a safe manner is not
    going to be easy, given that the PGDATA is not on the command line to
    the backend as echoed by ps. Although I could key on PPID for the
    backends.... I'll have to experiment.
    PPID should work fine, actually. Keep in mind though that SIGINT'ing
    the postmaster will already have sent a terminate signal to its children
    (barring postmaster breakage), and that if you wait around for awhile
    and then kill off remaining children, you may well accomplish nothing
    except to kill off the checkpoint process :-(

    regards, tom lane
  • Bruce Momjian at Mar 6, 2001 at 2:58 am

    Ok, since I can't seem to count on killproc's exact behavior, istm that
    I can:
    killproc postmaster -INT
    wait some number of seconds
    if postmaster still up
    killproc postmaster -TERM
    wait some number of seconds
    if postmaster STILL up
    killproc postmaster #and let the grim reaper do its dirty work.

    After all, the system shutdown is relying on this script to properly and
    thoroughly shut things down, or it WILL do the 'kill -9
    pid-of-postmaster' for you.

    Now, what's a good delay here? Or is there a better metric that a
    simple delay? After all, I want to avoid the kill -9 unless we have an
    emergency hard lock situation -- what's a good indicator of the backend
    fleet of processes actually _doing_ something? Or should I key on an
    indicator of processor speed (Linux does provide a nice bogus metric
    known as BogoMIPS for such a purpose)? The last thing I want to do is
    wait too long on some platforms and not long enough on others.
    In remembering how other databases handle it, I think you should use
    pg_ctl to shut it down. You need to enable wait mode, not sure if that
    is the default or not. That will wait for it to shut down before
    continuing. I realize a hung shutdown would stop the kernel from
    shutting down. You could put a sleep 100 in there and call a trap on a
    timeout.

    Here is some shell code:

    TIME=60
    pg_ctl -w stop &
    BG="$!"; export BG

    (sleep "$TIME"; kill "$BG" ) &
    BG2="$!"; export BG2

    wait "$BG"
    if ! kill -0 "$BG2"
    else kill "$BG2"
    fi


    This will try a pg_ctl shutdown for 60 seconds, then kill pg_ctl. You
    would then need a kill of you own.

    --
    Bruce Momjian | http://candle.pha.pa.us
    pgman@candle.pha.pa.us | (610) 853-3000
    + If your life is a hard drive, | 830 Blythe Avenue
    + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
  • Lamar Owen at Mar 6, 2001 at 3:06 am

    Bruce Momjian wrote:
    This will try a pg_ctl shutdown for 60 seconds, then kill pg_ctl. You
    would then need a kill of you own.
    I missed something somehwere: wasn't the consensus a few weeks ago that
    pg_ctl shouldn't be used for a system initscript? Or did I black out
    that day? :-) I certainly have no problem using pg_ctl for this purpose
    -- as I have been using pg_ctl to start postmaster all along (then why
    am I not using it to stop -- don't answer that :-))......
    --
    Lamar Owen
    WGCR Internet Radio
    1 Peter 4:11
  • Bruce Momjian at Mar 6, 2001 at 3:08 am

    Bruce Momjian wrote:
    This will try a pg_ctl shutdown for 60 seconds, then kill pg_ctl. You
    would then need a kill of you own.
    I missed something somehwere: wasn't the consensus a few weeks ago that
    pg_ctl shouldn't be used for a system initscript? Or did I black out
    that day? :-) I certainly have no problem using pg_ctl for this purpose
    -- as I have been using pg_ctl to start postmaster all along (then why
    am I not using it to stop -- don't answer that :-))......
    I don't remember that discussion. My guess was that you didn't want
    pg_ctl to hang forever. My script handles that, I think.

    --
    Bruce Momjian | http://candle.pha.pa.us
    pgman@candle.pha.pa.us | (610) 853-3000
    + If your life is a hard drive, | 830 Blythe Avenue
    + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
  • Tom Lane at Mar 6, 2001 at 3:12 am

    Lamar Owen writes:
    I missed something somehwere: wasn't the consensus a few weeks ago that
    pg_ctl shouldn't be used for a system initscript?
    I thought there was some concern about whether pg_ctl is really "ready
    for prime time". But I don't recall the details either.

    regards, tom lane
  • Peter Eisentraut at Mar 6, 2001 at 4:56 pm

    Lamar Owen writes:

    I missed something somehwere: wasn't the consensus a few weeks ago that
    pg_ctl shouldn't be used for a system initscript?
    The consensus(?) was that there was some work to do in pg_ctl before it
    was robust enough to be used (for anything). That work has been done.
    An example Linux init.d script is at contrib/start-scripts/linux. The
    only fault in that script that I can see is that it has no recipe for the
    case when the postmaster does not come down after 60 seconds. But this is
    really no problem for the issue at hand because if you do a normal
    runlevel switch then the postmaster will simply keep running, and during a
    system shutdown all the backends are going to die anyway.

    --
    Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
  • Lamar Owen at Mar 6, 2001 at 5:56 pm

    Peter Eisentraut wrote:

    Lamar Owen writes:
    I missed something somehwere: wasn't the consensus a few weeks ago that
    pg_ctl shouldn't be used for a system initscript?
    The consensus(?) was that there was some work to do in pg_ctl before it
    was robust enough to be used (for anything). That work has been done.
    That was the detail I missed.
    case when the postmaster does not come down after 60 seconds. But this is
    really no problem for the issue at hand because if you do a normal
    runlevel switch then the postmaster will simply keep running, and during a
    system shutdown all the backends are going to die anyway.
    Only if each and every shutdown script succeeds in its task. And I have
    to make sure that the RPM's shipping script successfully pulls down the
    system in an orderly fashion -- of course, I don't have to worry about
    the case where a postmaster is going to be started back up if we are in
    system shutdown -- but, as Tom also stated, I can't assume I'm in the
    system's death throes when called with the stop parameter.

    And it _is_ possible for an admin to set up the runlevels such that a
    level is set aside where even networking isn't running (actually, that
    level already exists, and is called 'single user mode') -- or a run
    level for website maintenance where networking is still up, but the
    webserver and postgresql (and other associated) processes are to be shut
    down. I personally use this -- I have set up runlevel 4 as a 'remote
    single user mode' of sorts where I still have sshd running (and the
    networking stack, obviously), but AOLserver, postgresql, and RealServer
    are shut down. I then switch runlevels back to 3 to return to normal.
    Much easier than manually stopping and restarting (in the correct order,
    as AOLserver is not a happy camper if postmaster drops out from
    underneath it) all the necessary pieces.

    So I can't assume anything. The default RPM installation used to
    automatically configure runlevels 3, 4, and 5 (not any more), but my
    script can't assume that the system is actually in that state by any
    means.
    --
    Lamar Owen
    WGCR Internet Radio
    1 Peter 4:11
  • Peter Eisentraut at Mar 6, 2001 at 6:11 pm

    Lamar Owen writes:

    case when the postmaster does not come down after 60 seconds. But this is
    really no problem for the issue at hand because if you do a normal
    runlevel switch then the postmaster will simply keep running, and during a
    system shutdown all the backends are going to die anyway.
    Only if each and every shutdown script succeeds in its task. And I have
    to make sure that the RPM's shipping script successfully pulls down the
    system in an orderly fashion -- of course, I don't have to worry about
    the case where a postmaster is going to be started back up if we are in
    system shutdown -- but, as Tom also stated, I can't assume I'm in the
    system's death throes when called with the stop parameter.
    Well, if you have something clever you want to do if the postmaster
    doesn't come down after an orderly shutdown then please share it. The
    current alternatives are 'leave running' or 'kill -9'. I know I'd prefer
    the former.

    --
    Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
  • Lamar Owen at Mar 6, 2001 at 7:07 pm

    Peter Eisentraut wrote:
    Well, if you have something clever you want to do if the postmaster
    doesn't come down after an orderly shutdown then please share it. The
    current alternatives are 'leave running' or 'kill -9'. I know I'd prefer
    the former.
    Well, my preferences aren't really relevant here. I have a job to do as
    an initscript in the RPMish environment -- and I really have to meet my
    obligations (using the first personal pronoun there to anthropomorph the
    initscript to a person, allowing us to have a little sympathy for the
    poor shell script's plight :-)).

    My preference is to let it float in limbo -- if it's in limbo and won't
    come out, then we have bigger issues.

    However, I could do something really sneaky in the RedHat environment
    and let init do the dirty work for me -- but, again, I am not at all
    guaranteed that things will come down orderly -- if it is at all
    possible for me to bring about an orderly (if slow) shutdown that does
    terminate as the rest of the system needs it to do, then I'll attempt to
    do so.

    But, the immediate issue is preventing chaotic stops within the
    initscript, so I'm going to experiment with things and see if I can make
    the initscript hang -- if I can't, then I'll likely put in the 'killproc
    postmaster -INT' with escalation to -TERM if it doesn't come down within
    sixty seconds (and, no, I am not going to sleep 60 then check things --
    I am going to sleep 1 and loop sixty times) -- no need to unnecessarily
    delay system shutdown (and potential restart). And I won't put in the
    -KILL unless I can find a safe and thorough way to do so.

    Or I may go ahead and pg_ctl-ize things and let pg_ctl do the dirty
    work, as that IS what pg_ctl is supposed to accomplish.
    --
    Lamar Owen
    WGCR Internet Radio
    1 Peter 4:11
  • Nathan Myers at Mar 6, 2001 at 2:19 am

    On Mon, Mar 05, 2001 at 08:55:41PM -0500, Tom Lane wrote:
    Bruce Momjian <pgman@candle.pha.pa.us> writes:
    killproc should send a kill -15 to the process, wait a few seconds for
    it to exit. If it does not, try kill -1, and if that doesn't kill it,
    then kill -9.
    Tell it to the Linux people ... this is their boot-script code we're
    talking about.
    Not to be a zealot, but this isn't _Linux_ boot-script code, it's
    _Red Hat_ boot-script code. Red Hat would like for us all to confuse
    the two, but they jes' ain't the same. (As a rule of thumb, where it
    works right, credit Linux; where it doesn't, blame Red Hat. :-)

    Nathan Myers
    ncm@zembu.com
  • Lamar Owen at Mar 6, 2001 at 2:33 am

    Nathan Myers wrote:
    Not to be a zealot, but this isn't _Linux_ boot-script code, it's
    _Red Hat_ boot-script code. Red Hat would like for us all to confuse
    the two, but they jes' ain't the same. (As a rule of thumb, where it
    works right, credit Linux; where it doesn't, blame Red Hat. :-)
    So we're going to credit Linux for PostgreSQL being shipped as part of
    the RedHat distribution since RH 5.0, then? :-0
    --
    Lamar Owen
    WGCR Internet Radio
    1 Peter 4:11
  • Hiroshi Inoue at Mar 6, 2001 at 2:18 am

    Tom Lane wrote:

    Now, killing the postmaster -9 and not cleaning up the backends has
    always been a good way to shoot yourself in the foot, but up to now the
    worst thing that was likely to happen to you was isolated corruption in
    specific tables. In the brave new world of WAL the stakes are higher,
    because the system will refuse to start up if it finds a corrupted
    checkpoint record. Clueless admins who resort to kill -9 as a routine
    admin tool *will* lose their databases. Moreover, the init scripts
    that are running around now are dangerous weapons if used with 7.1.

    I think we need a stronger interlock to prevent this scenario, but I'm
    unsure what it should be. Ideas?
    Seems the simplest way is to inhibit starting postmaster
    if the pid file exists.
    Another way is to use flock() if flock() is available.
    We could flock() the pid file so that another postmaster
    could detect the lock of the file.

    Regards,
    Hiroshi Inoue
  • Tom Lane at Mar 6, 2001 at 2:28 am

    Hiroshi Inoue writes:
    Tom Lane wrote:
    I think we need a stronger interlock to prevent this scenario, but I'm
    unsure what it should be. Ideas?
    Seems the simplest way is to inhibit starting postmaster
    if the pid file exists.
    Then we're unable to recover from a crash without manual intervention.

    The tricky part of this is not to give up the ability to restart when
    there *has* been a crash.
    Another way is to use flock() if flock() is available.
    We could flock() the pid file so that another postmaster
    could detect the lock of the file.
    This would only work if every backend is holding flock on the file,
    which would mean they'd all have to keep it open all the time. Kind
    of annoying to use up that many file descriptors on it. Might be the
    best answer though; I haven't thought of anything I like better...

    regards, tom lane
  • Lamar Owen at Mar 6, 2001 at 2:37 am

    Tom Lane wrote:
    The tricky part of this is not to give up the ability to restart when
    there *has* been a crash.
    But kill -9 effectively _is_ an admin-initiated crash.
    --
    Lamar Owen
    WGCR Internet Radio
    1 Peter 4:11
  • Tom Lane at Mar 6, 2001 at 2:40 am

    Lamar Owen writes:
    Tom Lane wrote:
    The tricky part of this is not to give up the ability to restart when
    there *has* been a crash.
    But kill -9 effectively _is_ an admin-initiated crash.
    Yeah, but only a partial crash. If the admin finishes the job by
    killing the backends too, we're fine. Postmaster down, backends alive
    is not a scenario we're currently prepared for. We need a way to plug
    that gap.

    regards, tom lane
  • Lamar Owen at Mar 6, 2001 at 2:55 am

    Tom Lane wrote:
    Yeah, but only a partial crash. If the admin finishes the job by
    killing the backends too, we're fine. Postmaster down, backends alive
    is not a scenario we're currently prepared for. We need a way to plug
    that gap.
    Postmaster can easily enough find out if zombie backends are 'out there'
    during startup, right? What can postmaster _do_ about it, though? It
    won't necessarily be able to kill them -- but it also can't control
    them. If it _can_ kill them, should it try?

    After all, if those zombies are out there on this PGDATA there's going
    to be big trouble if we even try to start. If we can't kill the zombies
    (that might still be doing something useful with their clients) from our
    starting postmaster, how can we possibly start up underneath running
    backends?

    Should the backend look for the presence of its parent postmaster
    periodically and gracefully come down if postmaster goes away without
    the proper handshake? A watchdog semaphore (or shared memory flag) that
    the backend resets and then checks periodically for it being set by its
    parent postmaster?

    Should a set of backends detect a new postmaster coming up and try to
    'sync up' with that postmaster, like the baroque GEMM handshake dance
    performed by 386 memory managers when Windows needs to start its own
    VMM?

    Or should we spend that much time protecting Barney Fife's from their
    own single bullet? :-)

    Just a nor-easter of a brainstorm....
    --
    Lamar Owen
    WGCR Internet Radio
    1 Peter 4:11
  • Tom Lane at Mar 6, 2001 at 3:04 am

    Lamar Owen writes:
    Tom Lane wrote:
    Postmaster down, backends alive is not a scenario we're currently
    prepared for. We need a way to plug that gap.
    Postmaster can easily enough find out if zombie backends are 'out there'
    during startup, right?
    If you think it's easy enough, enlighten the rest of us ;-). Be sure
    your solution only finds leftover backends from the previous instance of
    the same postmaster, else it will prevent running multiple postmasters
    on one system.
    What can postmaster _do_ about it, though? It
    won't necessarily be able to kill them -- but it also can't control
    them. If it _can_ kill them, should it try?
    I think refusal to start is sufficient. They should go away by
    themselves as their clients disconnect, and forcing the issue doesn't
    seem like it will improve matters. The admin can kill them (hopefully
    with just a SIGTERM ;-)) if he wants to move things along ... but I'd
    not like to see a newly-starting postmaster do that automatically.
    Should the backend look for the presence of its parent postmaster
    periodically and gracefully come down if postmaster goes away without
    the proper handshake?
    Unless we checked just before every disk write, this wouldn't represent
    a safe failure mode. The onus has to be on the newly-starting
    postmaster, I think, not on the old backends.
    Should a set of backends detect a new postmaster coming up and try to
    'sync up' with that postmaster,
    Nice try ;-). How will you persuade the kernel that these processes are
    now children of the new postmaster?

    regards, tom lane
  • Lamar Owen at Mar 6, 2001 at 3:12 am

    Tom Lane wrote:
    Lamar Owen wrote:
    Postmaster can easily enough find out if zombie backends are 'out there'
    during startup, right?
    If you think it's easy enough, enlighten the rest of us ;-).
    If postgres reported PGDATA on the command line it would be easy enough.
    What can postmaster _do_ about it, though? It
    won't necessarily be able to kill them -- but it also can't control
    them. If it _can_ kill them, should it try?
    I think refusal to start is sufficient. They should go away by
    themselves as their clients disconnect, and forcing the issue doesn't
    ???? I have misunderstood your previous statement about not wanting to
    force a manual crash recovery, then.
    Should a set of backends detect a new postmaster coming up and try to
    'sync up' with that postmaster,
    Nice try ;-). How will you persuade the kernel that these processes are
    now children of the new postmaster?
    Yeah, that's the kicker.
    --
    Lamar Owen
    WGCR Internet Radio
    1 Peter 4:11
  • Tom Lane at Mar 6, 2001 at 3:17 am

    Lamar Owen writes:
    Tom Lane wrote:
    If you think it's easy enough, enlighten the rest of us ;-).
    If postgres reported PGDATA on the command line it would be easy enough.
    In ps status you mean? I don't think we are prepared to require ps
    status functionality to let the system start up... we'd lose a number
    of supported platforms that way.

    I think refusal to start is sufficient. They should go away by
    themselves as their clients disconnect, and forcing the issue doesn't
    ???? I have misunderstood your previous statement about not wanting to
    force a manual crash recovery, then.
    In the case of an actual crash and restart, postgres should come back up
    without help. However, the situation here is not a crash, it is
    incomplete admin intervention. I don't think that expecting the admin
    to complete his intervention is the same thing as manual crash recovery.
    I especially don't think that we should second-guess what the admin
    wants us to do by auto-killing backends that are still serving clients.

    regards, tom lane
  • Lamar Owen at Mar 6, 2001 at 3:27 am

    Tom Lane wrote:
    Lamar Owen <lamar.owen@wgcr.org> writes:
    Tom Lane wrote:
    If you think it's easy enough, enlighten the rest of us ;-).
    If postgres reported PGDATA on the command line it would be easy enough.
    In ps status you mean? I don't think we are prepared to require ps
    status functionality to let the system start up... we'd lose a number
    of supported platforms that way.
    That is one downside. A major downside. Again, alot of work to protect
    the Barney Fife's out there.
    In the case of an actual crash and restart, postgres should come back up
    without help. However, the situation here is not a crash, it is
    incomplete admin intervention. I don't think that expecting the admin
    Is it a correct assumption that this is the only time postmaster might
    drop out?

    But, thanks for the clarification, as I had misunderstood what you
    meant.
    --
    Lamar Owen
    WGCR Internet Radio
    1 Peter 4:11
  • Tom Lane at Mar 6, 2001 at 3:33 am

    Lamar Owen writes:
    Is it a correct assumption that this is the only time postmaster might
    drop out?
    Well, there's always the possibility of a bug leading to postmaster
    coredump. Historically those have been pretty rare though.

    In any case, I'm not sure that the init script is the place to be
    solving these problems. We do need some internal mechanism to protect
    against a crashed or kill -9'd postmaster.

    regards, tom lane
  • Lamar Owen at Mar 6, 2001 at 3:44 am

    Tom Lane wrote:
    Well, there's always the possibility of a bug leading to postmaster
    coredump. Historically those have been pretty rare though.
    I have never personally seen one, since 6.1.1.
    In any case, I'm not sure that the init script is the place to be
    solving these problems.
    Well, I do kindof have the responsibility to allow the system to shut
    down..... I'll have to double check -- there may be a timeout mechanism
    in the RedHat init to reap off shutdown scripts -- but I haven't yet
    found it. Better to gracefully yank the plugs than have the grim reaper
    yank them in the wrong order for you, in any case.
    --
    Lamar Owen
    WGCR Internet Radio
    1 Peter 4:11
  • Dom at Mar 6, 2001 at 12:38 pm

    I especially don't think that we should second-guess what the admin
    wants us to do by auto-killing backends that are still serving
    clients.
    Sure. But it would be nice anyway if pg_ctl could do this with a
    specific command line switch.

    --
    << Tout n'y est pas parfait, mais on y honore certainement les jardiniers >>

    Dominique Quatravaux <dom@kilimandjaro.dyndns.org>
  • Alfred Perlstein at Mar 6, 2001 at 5:43 am

    * Tom Lane [010305 19:13] wrote:
    Lamar Owen <lamar.owen@wgcr.org> writes:
    Tom Lane wrote:
    Postmaster down, backends alive is not a scenario we're currently
    prepared for. We need a way to plug that gap.
    Postmaster can easily enough find out if zombie backends are 'out there'
    during startup, right?
    If you think it's easy enough, enlighten the rest of us ;-). Be sure
    your solution only finds leftover backends from the previous instance of
    the same postmaster, else it will prevent running multiple postmasters
    on one system.
    I'm sure some sort of encoding of the PGDATA directory along with
    the pids stored in the shm segment...
    What can postmaster _do_ about it, though? It
    won't necessarily be able to kill them -- but it also can't control
    them. If it _can_ kill them, should it try?
    I think refusal to start is sufficient. They should go away by
    themselves as their clients disconnect, and forcing the issue doesn't
    seem like it will improve matters. The admin can kill them (hopefully
    with just a SIGTERM ;-)) if he wants to move things along ... but I'd
    not like to see a newly-starting postmaster do that automatically.
    I agree, shooting down processes incorrectly should be left up to
    vendors braindead scripts. :)
    Should the backend look for the presence of its parent postmaster
    periodically and gracefully come down if postmaster goes away without
    the proper handshake?
    Unless we checked just before every disk write, this wouldn't represent
    a safe failure mode. The onus has to be on the newly-starting
    postmaster, I think, not on the old backends.
    Should a set of backends detect a new postmaster coming up and try to
    'sync up' with that postmaster,
    Nice try ;-). How will you persuade the kernel that these processes are
    now children of the new postmaster?
    Oh, easy, use ptrace. :)

    --
    -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
  • Tom Lane at Mar 6, 2001 at 6:11 pm

    Alfred Perlstein writes:
    I'm sure some sort of encoding of the PGDATA directory along with
    the pids stored in the shm segment...
    I thought about this too, but it strikes me as not very trustworthy.
    The problem is that there's no guarantee that the new postmaster will
    even notice the old shmem segment: it might select a different shmem
    key. (The 7.1 coding of shmem key selection makes this more likely
    than it used to be, but even under 7.0, it will certainly fail to work
    if I choose to start the new postmaster using a different port number
    than the old one had. The shmem key is driven primarily by port number
    not data directory ...)

    The interlock has to be tightly tied to the PGDATA directory, because
    what we're trying to protect is the files in and under that directory.
    It seems that something based on file(s) in that directory is the way
    to go.

    The best idea I've seen so far is Hiroshi's idea of having all the
    backends hold fcntl locks on the same file (probably postmaster.pid
    would do fine). Then the new postmaster can test whether any backends
    are still alive by trying to lock the old postmaster.pid file.
    Unfortunately, I read in the fcntl man page:

    Locks are not inherited by a child process in a fork(2) system call.

    This makes the idea much less attractive than I originally thought:
    a new backend would not automatically inherit a lock on the
    postmaster.pid file from the postmaster, but would have to open/lock it
    for itself. That means there's a window where the new backend exists
    but would be invisible to a hypothetical new postmaster.

    We could work around this with the following, very ugly protocol:

    1. Postmaster normally maintains fcntl read lock on its postmaster.pid
    file. Each spawned backend immediately opens and read-locks
    postmaster.pid, too, and holds that file open until it dies. (Thus
    wasting a kernel FD per backend, which is one of the less attractive
    things about this.) If the backend is unable to obtain read lock on
    postmaster.pid, then it complains and dies. We must use read locks
    here so that all these processes can hold them separately.

    2. If a newly started postmaster sees a pre-existing postmaster.pid
    file, it tries to obtain a *write* lock on that file. If it fails,
    conclude that an old postmaster or backend is still alive; complain
    and quit. If it succeeds, sit for say 1 second before deleting the file
    and creating a new one. (The delay here is to allow any just-started
    old backends to fail to acquire read lock and quit. A possible
    objection is that we have no way to guarantee 1 second is enough, though
    it ought to be plenty if the lock acquisition is just after the fork.)

    One thing that worries me a little bit is that this means an fcntl
    read-lock request will exist inside the kernel for each active backend.
    Does anyone know of any performance problems or hard kernel limits we
    might run into with large numbers of backends (lots and lots of fcntl
    locks)? At least the locks are on a file that we don't actually touch
    in the normal course of business.

    A small savings is that the backends don't actually need to open new FDs
    for the postmaster.pid file; they can use the one they inherit from the
    postmaster, even though they do need to lock it again. I'm not sure how
    much that saves inside the kernel, but at least something.

    There are also the usual set of concerns about portability of flock,
    though this time we're locking a plain file and not a socket, so it
    shouldn't be as much trouble as it was before.

    Comments? Does anyone see a better way to do it?

    regards, tom lane
  • Alfred Perlstein at Mar 6, 2001 at 6:22 pm

    * Tom Lane [010306 10:10] wrote:
    Alfred Perlstein <bright@wintelcom.net> writes:
    I'm sure some sort of encoding of the PGDATA directory along with
    the pids stored in the shm segment...
    I thought about this too, but it strikes me as not very trustworthy.
    The problem is that there's no guarantee that the new postmaster will
    even notice the old shmem segment: it might select a different shmem
    key. (The 7.1 coding of shmem key selection makes this more likely
    than it used to be, but even under 7.0, it will certainly fail to work
    if I choose to start the new postmaster using a different port number
    than the old one had. The shmem key is driven primarily by port number
    not data directory ...)
    This seems like a mistake.

    I'm suprised you guys aren't just using some form of the FreeBSD
    ftok() algorithm for this:

    FTOK(3) FreeBSD Library Functions Manual FTOK(3)

    ...

    The ftok() function attempts to create a unique key suitable for use with
    the msgget(3), semget(2) and shmget(2) functions given the path of an ex-
    isting file and a user-selectable id.

    The specified path must specify an existing file that is accessible to
    the calling process or the call will fail. Also, note that links to
    files will return the same key, given the same id.

    BUGS
    The returned key is computed based on the device minor number and inode
    of the specified path in combination with the lower 8 bits of the given
    id. Thus it is quite possible for the routine to return duplicate keys.

    The "BUGS" seems to be exactly what you guys are looking for, a somewhat
    reliable method of obtaining a system id. If that sounds evil, read
    below for an alternate suggestion.
    The interlock has to be tightly tied to the PGDATA directory, because
    what we're trying to protect is the files in and under that directory.
    It seems that something based on file(s) in that directory is the way
    to go.

    The best idea I've seen so far is Hiroshi's idea of having all the
    backends hold fcntl locks on the same file (probably postmaster.pid
    would do fine). Then the new postmaster can test whether any backends
    are still alive by trying to lock the old postmaster.pid file.
    Unfortunately, I read in the fcntl man page:

    Locks are not inherited by a child process in a fork(2) system call.

    This makes the idea much less attractive than I originally thought:
    a new backend would not automatically inherit a lock on the
    postmaster.pid file from the postmaster, but would have to open/lock it
    for itself. That means there's a window where the new backend exists
    but would be invisible to a hypothetical new postmaster.

    We could work around this with the following, very ugly protocol:

    1. Postmaster normally maintains fcntl read lock on its postmaster.pid
    file. Each spawned backend immediately opens and read-locks
    postmaster.pid, too, and holds that file open until it dies. (Thus
    wasting a kernel FD per backend, which is one of the less attractive
    things about this.) If the backend is unable to obtain read lock on
    postmaster.pid, then it complains and dies. We must use read locks
    here so that all these processes can hold them separately.

    2. If a newly started postmaster sees a pre-existing postmaster.pid
    file, it tries to obtain a *write* lock on that file. If it fails,
    conclude that an old postmaster or backend is still alive; complain
    and quit. If it succeeds, sit for say 1 second before deleting the file
    and creating a new one. (The delay here is to allow any just-started
    old backends to fail to acquire read lock and quit. A possible
    objection is that we have no way to guarantee 1 second is enough, though
    it ought to be plenty if the lock acquisition is just after the fork.)

    One thing that worries me a little bit is that this means an fcntl
    read-lock request will exist inside the kernel for each active backend.
    Does anyone know of any performance problems or hard kernel limits we
    might run into with large numbers of backends (lots and lots of fcntl
    locks)? At least the locks are on a file that we don't actually touch
    in the normal course of business.

    A small savings is that the backends don't actually need to open new FDs
    for the postmaster.pid file; they can use the one they inherit from the
    postmaster, even though they do need to lock it again. I'm not sure how
    much that saves inside the kernel, but at least something.

    There are also the usual set of concerns about portability of flock,
    though this time we're locking a plain file and not a socket, so it
    shouldn't be as much trouble as it was before.

    Comments? Does anyone see a better way to do it?
    Possibly...

    What about encoding the shm id in the pidfile? Then one can just ask
    how many processes are attached to that segment? (if it doesn't
    exist, one can assume all backends have exited)

    you want the field 'shm_nattch'

    The shmid_ds struct is defined as follows:

    struct shmid_ds {
    struct ipc_perm shm_perm; /* operation permission structure */
    int shm_segsz; /* size of segment in bytes */
    pid_t shm_lpid; /* process ID of last shared memory op */
    pid_t shm_cpid; /* process ID of creator */
    short shm_nattch; /* number of current attaches */
    time_t shm_atime; /* time of last shmat() */
    time_t shm_dtime; /* time of last shmdt() */
    time_t shm_ctime; /* time of last change by shmctl() */
    void *shm_internal; /* sysv stupidity */
    };


    --
    -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
  • Tom Lane at Mar 6, 2001 at 6:35 pm

    Alfred Perlstein writes:
    * Tom Lane [010306 10:10] wrote:
    The shmem key is driven primarily by port number
    not data directory ...)
    This seems like a mistake.
    I'm suprised you guys aren't just using some form of the FreeBSD
    ftok() algorithm for this:
    This has been discussed before --- see the archives. The conclusion was
    that since ftok doesn't guarantee uniqueness, it adds nothing except
    lack of predictability to the shmem key selection process. We'd still
    need logic to cope with key collisions, and given that, we might as well
    select keys that have some obvious relationship to user-visible
    parameters, viz the port number. As is, you can fairly easily tell
    which shmem segment belongs to which postmaster from the shmem key;
    with ftok-derived keys, you couldn't tell a thing.
    Comments? Does anyone see a better way to do it?
    What about encoding the shm id in the pidfile? Then one can just ask
    how many processes are attached to that segment? (if it doesn't
    exist, one can assume all backends have exited)
    Hmm ... that might actually be a pretty good idea. A small problem is
    that the shm key isn't yet selected at the time we initially create the
    lockfile, but I can't think of any reason that we could not go back and
    append the key to the lockfile afterwards.
    you want the field 'shm_nattch'
    Are there any portability problems with relying on shm_nattch to be
    available? If not, I like this a lot...

    regards, tom lane
  • Alfred Perlstein at Mar 6, 2001 at 6:44 pm

    * Tom Lane [010306 10:35] wrote:
    Alfred Perlstein <bright@wintelcom.net> writes:
    What about encoding the shm id in the pidfile? Then one can just ask
    how many processes are attached to that segment? (if it doesn't
    exist, one can assume all backends have exited)
    Hmm ... that might actually be a pretty good idea. A small problem is
    that the shm key isn't yet selected at the time we initially create the
    lockfile, but I can't think of any reason that we could not go back and
    append the key to the lockfile afterwards.
    you want the field 'shm_nattch'
    Are there any portability problems with relying on shm_nattch to be
    available? If not, I like this a lot...
    Well it's available on FreeBSD and Solaris, I'm sure Redhat has
    some deamon that resets the value to 0 periodically just for kicks
    so it might not be viable... :)

    Seriously, there's some dispute on the type that 'shm_nattch' is,
    under Solaris it's "shmatt_t" (unsigned long afaik), under FreeBSD
    it's 'short' (i should fix this. :)).

    But since you're really only testing for 0'ness then it shouldn't
    really be a problem.

    --
    -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
  • Tom Lane at Mar 6, 2001 at 6:57 pm

    Alfred Perlstein writes:
    Are there any portability problems with relying on shm_nattch to be
    available? If not, I like this a lot...
    Well it's available on FreeBSD and Solaris, I'm sure Redhat has
    some deamon that resets the value to 0 periodically just for kicks
    so it might not be viable... :)
    I notice that our BeOS and QNX emulations of shmctl() don't support
    IPC_STAT, but that could be dealt with, at least to the extent of
    stubbing it out.

    This does raise the question of what to do if shmctl(IPC_STAT) fails
    for a reason other than EINVAL. I think the conservative thing to do
    is refuse to start up. On EPERM, for example, it's possible that there
    is a postmaster running in your PGDATA but with a different userid.

    Seriously, there's some dispute on the type that 'shm_nattch' is,
    under Solaris it's "shmatt_t" (unsigned long afaik), under FreeBSD
    it's 'short' (i should fix this. :)).
    But since you're really only testing for 0'ness then it shouldn't
    really be a problem.
    We need not copy the value anywhere, so as long as the struct is
    correctly declared in the system header files I don't think it matters
    what the field type is ...

    regards, tom lane
  • Alfred Perlstein at Mar 6, 2001 at 7:12 pm

    * Tom Lane [010306 11:03] wrote:
    Alfred Perlstein <bright@wintelcom.net> writes:
    Are there any portability problems with relying on shm_nattch to be
    available? If not, I like this a lot...
    Well it's available on FreeBSD and Solaris, I'm sure Redhat has
    some deamon that resets the value to 0 periodically just for kicks
    so it might not be viable... :)
    I notice that our BeOS and QNX emulations of shmctl() don't support
    IPC_STAT, but that could be dealt with, at least to the extent of
    stubbing it out.
    Well since we already have spinlocks, I can't see why we can't
    keep the refcount and spinlock in a special place in the shm
    for all cases?
    This does raise the question of what to do if shmctl(IPC_STAT) fails
    for a reason other than EINVAL. I think the conservative thing to do
    is refuse to start up. On EPERM, for example, it's possible that there
    is a postmaster running in your PGDATA but with a different userid.
    Yes, if possible a more meaningfull error message and pointer to
    some docco would be nice or even a nice "i don't care, i killed
    all the backends, just start darnit" flag, it's really no fun at
    all to have to attempt to decypher some cryptic error message at
    3am when the database/system is acting up. :)
    Seriously, there's some dispute on the type that 'shm_nattch' is,
    under Solaris it's "shmatt_t" (unsigned long afaik), under FreeBSD
    it's 'short' (i should fix this. :)).
    But since you're really only testing for 0'ness then it shouldn't
    really be a problem.
    We need not copy the value anywhere, so as long as the struct is
    correctly declared in the system header files I don't think it matters
    what the field type is ...
    Yup, my point exactly.

    --
    -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
  • Tom Lane at Mar 6, 2001 at 7:25 pm

    Alfred Perlstein writes:
    * Tom Lane [010306 11:03] wrote:
    I notice that our BeOS and QNX emulations of shmctl() don't support
    IPC_STAT, but that could be dealt with, at least to the extent of
    stubbing it out.
    Well since we already have spinlocks, I can't see why we can't
    keep the refcount and spinlock in a special place in the shm
    for all cases?
    No, we mustn't go there. If the kernel isn't keeping the refcount
    then it's worse than useless: as soon as some process crashes without
    decrementing its refcount, you have a condition that you can't recover
    from without reboot.

    What I'm currently imagining is that the stub implementations will just
    return a failure code for IPC_STAT, and the outer code will in turn fail
    with a message along the lines of "It looks like there's a pre-existing
    shmem block (id XXX) still in use. If you're sure there are no old
    backends still running, remove the shmem block with ipcrm(1), or just
    delete $PGDATA/postmaster.pid." I dunno what shmem management tools
    exist on BeOS/QNX, but deleting the lockfile will definitely suppress
    the startup interlock ;-).
    Yes, if possible a more meaningfull error message and pointer to
    some docco would be nice
    Is the above good enough?

    regards, tom lane
  • Alfred Perlstein at Mar 6, 2001 at 7:34 pm

    * Tom Lane [010306 11:30] wrote:
    Alfred Perlstein <bright@wintelcom.net> writes:
    * Tom Lane [010306 11:03] wrote:
    I notice that our BeOS and QNX emulations of shmctl() don't support
    IPC_STAT, but that could be dealt with, at least to the extent of
    stubbing it out.
    Well since we already have spinlocks, I can't see why we can't
    keep the refcount and spinlock in a special place in the shm
    for all cases?
    No, we mustn't go there. If the kernel isn't keeping the refcount
    then it's worse than useless: as soon as some process crashes without
    decrementing its refcount, you have a condition that you can't recover
    from without reboot.
    Not if the postmaster outputs the following:
    What I'm currently imagining is that the stub implementations will just
    return a failure code for IPC_STAT, and the outer code will in turn fail
    with a message along the lines of "It looks like there's a pre-existing
    shmem block (id XXX) still in use. If you're sure there are no old
    backends still running, remove the shmem block with ipcrm(1), or just
    delete $PGDATA/postmaster.pid." I dunno what shmem management tools
    exist on BeOS/QNX, but deleting the lockfile will definitely suppress
    the startup interlock ;-).
    Yes, if possible a more meaningfull error message and pointer to
    some docco would be nice
    Is the above good enough?
    Sure. :)

    --
    -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
  • Cyril VELTER at Mar 7, 2001 at 12:14 am
    BeOS haven't this stat (I have a bunch of others but not this one).

    If I unsterstand correctly, you want to check if there is some backend
    still attached to shared mem segment of a given key ? In this case, I have an
    easy solution to fake the stat, because all segment have an encoded name
    containing this key, so I can count them.


    cyril
    Alfred Perlstein <bright@wintelcom.net> writes:
    Are there any portability problems with relying on shm_nattch to be
    available? If not, I like this a lot...
    Well it's available on FreeBSD and Solaris, I'm sure Redhat has
    some deamon that resets the value to 0 periodically just for kicks
    so it might not be viable... :)
    I notice that our BeOS and QNX emulations of shmctl() don't support
    IPC_STAT, but that could be dealt with, at least to the extent of
    stubbing it out.

    This does raise the question of what to do if shmctl(IPC_STAT) fails
    for a reason other than EINVAL. I think the conservative thing to do
    is refuse to start up. On EPERM, for example, it's possible that there
    is a postmaster running in your PGDATA but with a different userid.

    Seriously, there's some dispute on the type that 'shm_nattch' is,
    under Solaris it's "shmatt_t" (unsigned long afaik), under FreeBSD
    it's 'short' (i should fix this. :)).
    But since you're really only testing for 0'ness then it shouldn't
    really be a problem.
    We need not copy the value anywhere, so as long as the struct is
    correctly declared in the system header files I don't think it matters
    what the field type is ...

    regards, tom lane

    ---------------------------(end of broadcast)---------------------------
    TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
  • Alfred Perlstein at Mar 7, 2001 at 12:22 am

    Alfred Perlstein writes:
    Are there any portability problems with relying on shm_nattch to be
    available? If not, I like this a lot...
    Well it's available on FreeBSD and Solaris, I'm sure Redhat has
    some deamon that resets the value to 0 periodically just for kicks
    so it might not be viable... :)
    I notice that our BeOS and QNX emulations of shmctl() don't support
    IPC_STAT, but that could be dealt with, at least to the extent of
    stubbing it out.
    * Cyril VELTER [010306 16:15] wrote:
    BeOS haven't this stat (I have a bunch of others but not this one).

    If I unsterstand correctly, you want to check if there is some backend
    still attached to shared mem segment of a given key ? In this case, I have an
    easy solution to fake the stat, because all segment have an encoded name
    containing this key, so I can count them.
    We need to be able to take a single shared memory segment and
    determine if any other process is using it.

    --
    -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]

Related Discussions

People

Translate

site design / logo © 2022 Grokbase