Version 4.10 of the buildfarm client has been released.

Following GitHub's abandonment of their download feature, releases will
now be published on the buildfarm server. The latest release will always
be available at <http://www.pgbuildfarm.org/downloads/latest-client.tgz>
This particular release is available at
<http://www.pgbuildfarm.org/downloads/releases/build-farm-4_10.tgz>

The main feature of this release is that it does better logging of
pg_upgrade failures (which is why I hope Heikki applies it to chipmunk
right away ;-) )

The rest is minor bug fixes and very small enhancements.

cheers

andrew

Search Discussions

  • Heikki Linnakangas at Jan 11, 2013 at 4:56 pm

    On 11.01.2013 18:38, Andrew Dunstan wrote:
    The main feature of this release is that it does better logging of
    pg_upgrade failures (which is why I hope Heikki applies it to chipmunk
    right away ;-) )
    Heh, ok :-)

    I've upgraded it, and launched a new buildfarm run, so we'll now more in
    a moment. This box has a very small disk (a 4GB sd card), so it's quite
    possible it simply ran out of disk space.

    There was a stray postgres instance running on the box, which I killed:

    pgbfarm@raspberrypi ~ $ ps ax | grep pg_upg
       5993 pts/0 S+ 0:00 grep --color=auto pg_upg
    20200 ? S 0:00
    /home/pgbfarm/buildroot/HEAD/pgsql.8210/contrib/pg_upgrade/tmp_check/install/home/pgbfarm/buildroot/HEAD/inst/bin/postgres
    -F -c listen_addresses=

    The directory /home/pgbfarm/buildroot/HEAD/pgsql.8210 did not exist
    anymore when I looked. Apparently the server was running within an
    already-deleted directory.

    - Heikki
  • Tom Lane at Jan 11, 2013 at 6:39 pm

    Heikki Linnakangas writes:
    There was a stray postgres instance running on the box, which I killed:
    FWIW, we've seen an awful lot of persistent buildfarm failures that
    seemed to be due to port conflicts with leftover postmasters. I think
    the buildfarm script needs to try harder to ensure that it's killed
    everything after a run. No good ideas how to go about that exactly.
    You could look through "ps" output for postmasters, but what if there's
    a regular Postgres installation on the same box? Can we just document
    that the buildfarm had better not be run as "postgres"? (If so, its
    attempt to kill an unowned postmaster would fail anyway; else we need
    a reliable way to tell which ones to kill.)

        regards, tom lane
  • Andrew Dunstan at Jan 11, 2013 at 8:05 pm

    On 01/11/2013 01:39 PM, Tom Lane wrote:
    Heikki Linnakangas <hlinnaka@iki.fi> writes:
    There was a stray postgres instance running on the box, which I killed:
    FWIW, we've seen an awful lot of persistent buildfarm failures that
    seemed to be due to port conflicts with leftover postmasters. I think
    the buildfarm script needs to try harder to ensure that it's killed
    everything after a run. No good ideas how to go about that exactly.
    You could look through "ps" output for postmasters, but what if there's
    a regular Postgres installation on the same box? Can we just document
    that the buildfarm had better not be run as "postgres"? (If so, its
    attempt to kill an unowned postmaster would fail anyway; else we need
    a reliable way to tell which ones to kill.)

    The buildfarm never builds with the standard port unless someone is
    quite perverse indeed. The logic that governs it is:

         $buildport = $PGBuild::conf{base_port};
         if ($branch =~ /REL(\d+)_(\d+)/)
         {
              $buildport += (10 * ($1 - 7)) + $2;
         }

    Certainly the script should not be run as the standard postgres user.

    Part of the trouble with detecting rogue postmasters it might have left
    lying around is that various things like to decide what port to run on,
    so it's not always easy for the buildfarm to know what it should be
    looking for.

    For branches >= 9.2 this is somewhat ameliorated by the existence of
    EXTRA_REGRESS_OPTS, although we might need a slight adjustment to
    pg_upgrade's test.sh to stop it from trampling on that willy-nilly.

    I'm certainly reluctant to be trying to kill anything we aren't dead
    certain is ours. We could possibly detect very early that there is a
    suspected rogue postmaster.

    One major source of these rogue processes has almost certainly been this
    piece of logic in pg_ctl:

         * The postmaster should create postmaster.pid very soon after being
         * started. If it's not there after we've waited 5 or more seconds,
         * assume startup failed and give up waiting.

    WHen that happens, pg_ctl fails, and thus so does the buildfarmj client,
    but if it has in fact started a postmaster that was just very slow in
    writing its pid file it has left a postmastr lying around.

    ISTR we discussed this phenomenon relatively recently, but I can't find
    a reference to it readily. In any case, nothing has changed on that front.

    cheers

    andrew
  • Kevin Grittner at Jan 13, 2013 at 3:58 pm

    Andrew Dunstan wrote:

    Part of the trouble with detecting rogue postmasters it might have left
    lying around is that various things like to decide what port to run on,
    so it's not always easy for the buildfarm to know what it should be
    looking for.
    For Linux, perhaps some form of lsof with the +D option?  Maybe?:

    lsof +D "$PGDATA" -Fp | grep -E '^p[0-9]{1,5}$' | cut -c1- | xargs kill -9

    -Kevin
  • Andrew Dunstan at Jan 13, 2013 at 5:08 pm

    On 01/13/2013 10:58 AM, Kevin Grittner wrote:
    Andrew Dunstan wrote:
    Part of the trouble with detecting rogue postmasters it might have left
    lying around is that various things like to decide what port to run on,
    so it's not always easy for the buildfarm to know what it should be
    looking for.
    For Linux, perhaps some form of lsof with the +D option? Maybe?:

    lsof +D "$PGDATA" -Fp | grep -E '^p[0-9]{1,5}$' | cut -c1- | xargs kill -9

    This actually won't help. In most cases the relevant data directory has
    long disappeared out from under the rogue postmaster as part of
    buildfarm cleanup. Also, lsof is not universally available. We try to
    avoid creating new dependencies if possible.

    Yesterday I committed a change that will let the buildfarm client ensure
    that all the tests it runs are run on the configured build port.

    Given that, we can should be able reliably to detect a rogue postmaster
    by testing for the existence of a socket at /tmp/.S.PGSQL.$buildport.
    Certainly, having something there will cause a failure. I currently have
    this test running both before a run starts and after it finishes on the
    buildfarm development instance (crake), using perl's -S operator. If it
    fails there will be a buildfarm failure on stage Pre-run-port-check or
    Post-run-port-check.

    For the pre-run check I'm not inclined to do anything. If there's a
    pre-existing listener on the required port it's an error and we'll just
    abort, before we even try a checkout let alone anything else.

    For the post-run check, we could possibly do something like

          fuser -k /tmp/.s.PGSQL.$buildport


    although that's not portable either ;-( .

    None of this helps for msvc or mingw builds where there's no unix
    socket, and I'll have to come up with another idea. But it's a start.

    cheers

    andrew
  • Kevin Grittner at Jan 14, 2013 at 8:56 pm

    Andrew Dunstan wrote:

    For Linux, perhaps some form of lsof with the +D option?
    This actually won't help. In most cases the relevant data directory has
    long disappeared out from under the rogue postmaster as part of
    buildfarm cleanup. Also, lsof is not universally available. We try to
    avoid creating new dependencies if possible.
    Well, I did say "for Linux" and the reason I suggested lsof is that
    it does show deleted files which are being held open (or in the
    suggested command, the pids of processes holding open files in or
    under the requested directory). However, if you want a solution
    which works for all OSs, lsof obviously doesn't do the job; and if
    the directory itself is deleted, +D doesn't help -- you would need
    to grep the full output

    Anyway, I guess we don't really need to do anything anyway, so the
    point is moot.

    -Kevin

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedJan 11, '13 at 4:38p
activeJan 14, '13 at 8:56p
posts7
users4
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase