Over the last few days, I ran the regression tests for 7.1 Beta 3 much more
than I have in the past for 7.0.2 and 7.0.3. Unfortunately, I experienced
the following problems:

1. Until I did a cvs update last night (1/14/2001), the regression tests
were failing on 1/12 and 1/13. Did anyone do a cvs commit that would
fix backend children from stackdump-ing on Cygwin? I hope so.

Here are some interesting snippets:

--- pg_regress output ---
..
parallel group (7 tests): create_aggregate create_operator inherit triggers constraints create_misc create_index
constraints ... FAILED
triggers ... FAILED
create_misc ... FAILED
create_aggregate ... ok
..
--- pg_regress output ---

--- postmaster output ---
NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory.
I have rolled back the current transaction and am going to terminate your database system connection and exit.
Please reconnect to the database system and repeat your query.
..
ERROR: Relation 'temptest' does not exist
0 [main] postmaster 2640 handle_exceptions: Exception: STATUS_ACCESS_VIOLATION
479 [main] postmaster 2640 stackdump: Dumping stack trace to postmaster.exe.stackdump
Server process (pid 2640) exited with status 139 at Sat Jan 13 21:28:36 2001
Terminating any active server processes...
Server processes were terminated at Sat Jan 13 21:28:36 2001
Reinitializing shared memory and semaphores
IpcMemoryDetach: shmdt(0x120b0000) failed: Invalid argument
..
--- postmaster output ---

2. I am unable to successfully run the regression tests on a NT 4.0 SP5
machine with only 64 MB of physical memory and about 175 MB of swap space.
Other than lacking RAM and swap space, this machine is the "same" as other
NT/2000 machines which can successfully run the regression tests.

The tests usually hang during the "parallel group (18 tests)" test
right after numerology. By "hang," I mean that the original postmaster
is still running, but there are no postmaster children, and there are
some number of psql processes hanging around. Using NT's TaskManager,
I can see that the machine is running out of memory. I have even seen
the "Windows is running low on virtual memory" dialog a few times.
Should I expect this behavior from such a lame machine?

3. Once (or twice), I noticed that the plpgsql test failed.
Unfortunately, I didn't capture the precise output but I think that
postmaster was complaining about being unable to

mv <somepath>/pg_internal.init.<somepid> <somepath>/pg_internal.init

due to a permissions problem. Sorry, for being vague...

Thanks,
Jason

--
Jason Tishler
Director, Software Engineering Phone: +1 (732) 264-8770 x235
Dot Hill Systems Corp. Fax: +1 (732) 264-8798
82 Bethany Road, Suite 7 Email: Jason.Tishler@dothill.com
Hazlet, NJ 07730 USA WWW: http://www.dothill.com

Search Discussions

  • Tom Lane at Jan 16, 2001 at 6:45 am

    Jason Tishler writes:
    parallel group (7 tests): create_aggregate create_operator inherit triggers constraints create_misc create_index
    constraints ... FAILED
    triggers ... FAILED
    create_misc ... FAILED
    create_aggregate ... ok
    Can't tell much from this. What are the detail diffs (regression.diffs file?)
    2. I am unable to successfully run the regression tests on a NT 4.0 SP5
    machine with only 64 MB of physical memory and about 175 MB of swap space.
    Other than lacking RAM and swap space, this machine is the "same" as other
    NT/2000 machines which can successfully run the regression tests.
    The tests usually hang during the "parallel group (18 tests)" test
    right after numerology. By "hang," I mean that the original postmaster
    is still running, but there are no postmaster children, and there are
    some number of psql processes hanging around.
    Hm. You will have 18 backends firing up there, plus 18 psqls to drive
    'em, and probably 18 shell subprocesses parenting the psqls. I wouldn't
    be too surprised at running out of memory --- but one would like to
    expect a more graceful failure than just hanging. What if anything
    shows up in the postmaster log?
    3. Once (or twice), I noticed that the plpgsql test failed.
    Unfortunately, I didn't capture the precise output but I think that
    postmaster was complaining about being unable to
    mv <somepath>/pg_internal.init.<somepid> <somepath>/pg_internal.init
    due to a permissions problem. Sorry, for being vague...
    Hm. The first backend to fire up after a vacuum will try to rebuild
    pg_internal.init, and then move it into place with

    /*
    * And rename the temp file to its final name, deleting any
    * previously-existing init file.
    */
    if (rename(tempfilename, finalfilename) < 0)
    {
    elog(NOTICE, "Cannot rename init file %s to %s: %m\n\tContinuing anyway, but there's something wrong.", tempfilename, finalfilename);
    }

    In a parallel test it's possible that several backends would try to do
    this at about the same time, but that should be OK; we should end up
    with just one file from the last-to-finish backend. I think you have
    found another Cygwin bug :-(

    regards, tom lane
  • Jason Tishler at Jan 18, 2001 at 1:43 pm
    Tom,

    I'm finally back in front of the machine where I ran these tests...
    On Tue, Jan 16, 2001 at 01:45:21AM -0500, Tom Lane wrote:
    Jason Tishler <Jason.Tishler@dothill.com> writes:
    parallel group (7 tests): create_aggregate create_operator inherit triggers constraints create_misc create_index
    constraints ... FAILED
    triggers ... FAILED
    create_misc ... FAILED
    create_aggregate ... ok
    Can't tell much from this. What are the detail diffs (regression.diffs file?)
    Unfortunately I ran more (successful) tests after these failure, so the
    detail diffs are no longer available.
    2. I am unable to successfully run the regression tests on a NT 4.0 SP5
    machine with only 64 MB of physical memory and about 175 MB of swap space.
    Other than lacking RAM and swap space, this machine is the "same" as other
    NT/2000 machines which can successfully run the regression tests.
    What if anything shows up in the postmaster log?
    Sorry, the postmaster log is gone too.
    3. Once (or twice), I noticed that the plpgsql test failed.
    Unfortunately, I didn't capture the precise output but I think that
    postmaster was complaining about being unable to
    mv <somepath>/pg_internal.init.<somepid> <somepath>/pg_internal.init
    due to a permissions problem. Sorry, for being vague...
    Hm. The first backend to fire up after a vacuum will try to rebuild
    pg_internal.init, and then move it into place with

    /*
    * And rename the temp file to its final name, deleting any
    * previously-existing init file.
    */
    if (rename(tempfilename, finalfilename) < 0)
    {
    elog(NOTICE, "Cannot rename init file %s to %s: %m\n\tContinuing anyway, but there's something wrong.", tempfilename, finalfilename);
    }

    In a parallel test it's possible that several backends would try to do
    this at about the same time, but that should be OK; we should end up
    with just one file from the last-to-finish backend. I think you have
    found another Cygwin bug :-(
    Windows has issues with open files. So, if a backend is trying to
    rename a file when it is open (by another), then the rename will fail.
    Will this cause database integrity problems? Or, will there just be
    some spurious warning?

    Thanks,
    Jason

    --
    Jason Tishler
    Director, Software Engineering Phone: +1 (732) 264-8770 x235
    Dot Hill Systems Corp. Fax: +1 (732) 264-8798
    82 Bethany Road, Suite 7 Email: Jason.Tishler@dothill.com
    Hazlet, NJ 07730 USA WWW: http://www.dothill.com
  • Tom Lane at Jan 18, 2001 at 5:40 pm

    Jason Tishler writes:
    In a parallel test it's possible that several backends would try to do
    this at about the same time, but that should be OK; we should end up
    with just one file from the last-to-finish backend. I think you have
    found another Cygwin bug :-(
    Windows has issues with open files. So, if a backend is trying to
    rename a file when it is open (by another), then the rename will fail.
    Will this cause database integrity problems? Or, will there just be
    some spurious warning?
    In this context the only bad side-effect is that a useless temporary
    file gets left around. It's small, so I wouldn't worry too much.

    However --- I suppose Windows can't cope with deleting a file someone
    else is holding open, either? That would cause significantly bigger
    problems :-(

    regards, tom lane
  • Jason Tishler at Jan 18, 2001 at 5:48 pm
    Tom,
    On Thu, Jan 18, 2001 at 12:39:59PM -0500, Tom Lane wrote:
    However --- I suppose Windows can't cope with deleting a file someone
    else is holding open, either? Yes.
    That would cause significantly bigger problems :-(
    That sounds ominous, please elaborate.

    Thanks,
    Jason


    --
    Jason Tishler
    Director, Software Engineering Phone: +1 (732) 264-8770 x235
    Dot Hill Systems Corp. Fax: +1 (732) 264-8798
    82 Bethany Road, Suite 7 Email: Jason.Tishler@dothill.com
    Hazlet, NJ 07730 USA WWW: http://www.dothill.com
  • Tom Lane at Jan 18, 2001 at 5:59 pm

    Jason Tishler writes:
    On Thu, Jan 18, 2001 at 12:39:59PM -0500, Tom Lane wrote:
    However --- I suppose Windows can't cope with deleting a file someone
    else is holding open, either? Yes.
    That would cause significantly bigger problems :-(
    That sounds ominous, please elaborate.
    If you drop a table that someone else has recently used, the someone
    else's backend is probably still holding the file open. We generally
    don't close open file descriptors until we have to.

    In current sources I think that you'd get a "cannot unlink" NOTICE,
    but the table would get logically dropped anyway, and the sole
    side-effect would be failure to recover the disk space. But in this
    case we could be talking about large amounts of disk space.

    regards, tom lane
  • Jason Tishler at Jan 18, 2001 at 6:20 pm
    Tom,
    On Thu, Jan 18, 2001 at 12:59:00PM -0500, Tom Lane wrote:
    In current sources I think that you'd get a "cannot unlink" NOTICE,
    but the table would get logically dropped anyway, and the sole
    side-effect would be failure to recover the disk space. But in this
    case we could be talking about large amounts of disk space.
    Cygwin does attempt to overcome the Windows open file issue. If a sharing
    violation is detected (i.e., the file is open) during an unlink operation
    (really DeleteFile), Cygwin will queue it for deletion later. However,
    reading the Cygwin code, I found the following:

    /* FIXME: this delqueue module is very flawed and should be rewritten.
    First, having an array of a fixed size for keeping track of the
    unlinked but not yet deleted files is bad. Second, some programs
    will unlink files and then create a new one in the same location
    and this behavior is not supported in the current code. Probably
    we should find a move/rename function that will work on open files,
    and move delqueue files to some special location or some such
    hack... */

    With the above caveats, is the current functionality sufficient for
    PostgreSQL's needs?

    Thanks
    Jason

    --
    Jason Tishler
    Director, Software Engineering Phone: +1 (732) 264-8770 x235
    Dot Hill Systems Corp. Fax: +1 (732) 264-8798
    82 Bethany Road, Suite 7 Email: Jason.Tishler@dothill.com
    Hazlet, NJ 07730 USA WWW: http://www.dothill.com
  • Tom Lane at Jan 18, 2001 at 6:53 pm

    Jason Tishler writes:
    /* FIXME: this delqueue module is very flawed and should be rewritten.
    First, having an array of a fixed size for keeping track of the
    unlinked but not yet deleted files is bad. Second, some programs
    will unlink files and then create a new one in the same location
    and this behavior is not supported in the current code. Probably
    we should find a move/rename function that will work on open files,
    and move delqueue files to some special location or some such
    hack... */
    With the above caveats, is the current functionality sufficient for
    PostgreSQL's needs?
    The fixed-size-array thing sounds like a gotcha waiting to bite someone.
    How big is the array, anyway?

    The unlink/recreate issue is not a problem for us anymore, since we use
    OIDs as filenames --- we won't try to reuse the same filename.

    regards, tom lane
  • Jason Tishler at Jan 18, 2001 at 7:58 pm
    Tom,
    On Thu, Jan 18, 2001 at 01:53:36PM -0500, Tom Lane wrote:
    Jason Tishler <Jason.Tishler@dothill.com> writes:
    With the above caveats, is the current functionality sufficient for
    PostgreSQL's needs?
    The fixed-size-array thing sounds like a gotcha waiting to bite someone. Agreed.
    How big is the array, anyway?
    The current size is 100 deep. Is that sufficient for PostgreSQL or is
    this dependent on usage?

    Jason

    --
    Jason Tishler
    Director, Software Engineering Phone: +1 (732) 264-8770 x235
    Dot Hill Systems Corp. Fax: +1 (732) 264-8798
    82 Bethany Road, Suite 7 Email: Jason.Tishler@dothill.com
    Hazlet, NJ 07730 USA WWW: http://www.dothill.com
  • Tom Lane at Jan 18, 2001 at 8:53 pm

    Jason Tishler writes:
    The fixed-size-array thing sounds like a gotcha waiting to bite someone. Agreed.
    How big is the array, anyway?
    The current size is 100 deep. Is that sufficient for PostgreSQL or is
    this dependent on usage?
    Mumble. I'd sure you could gin up a scenario where it fails, but
    deleting 100 recently-used tables in one transaction doesn't seem like a
    very likely situation.

    Probably a more interesting question to ask is how graceful is the
    behavior when that array fills up?

    regards, tom lane
  • Jason Tishler at Jan 18, 2001 at 9:04 pm
    Tom,
    On Thu, Jan 18, 2001 at 03:53:44PM -0500, Tom Lane wrote:
    Probably a more interesting question to ask is how graceful is the
    behavior when that array fills up?
    If no slots are available, then the file is never queued. Hence, it is
    nevered deleted.

    Jason

    --
    Jason Tishler
    Director, Software Engineering Phone: +1 (732) 264-8770 x235
    Dot Hill Systems Corp. Fax: +1 (732) 264-8798
    82 Bethany Road, Suite 7 Email: Jason.Tishler@dothill.com
    Hazlet, NJ 07730 USA WWW: http://www.dothill.com
  • Bruce Momjian at Jan 18, 2001 at 6:58 pm

    Tom,
    On Thu, Jan 18, 2001 at 12:59:00PM -0500, Tom Lane wrote:
    In current sources I think that you'd get a "cannot unlink" NOTICE,
    but the table would get logically dropped anyway, and the sole
    side-effect would be failure to recover the disk space. But in this
    case we could be talking about large amounts of disk space.
    Cygwin does attempt to overcome the Windows open file issue. If a sharing
    violation is detected (i.e., the file is open) during an unlink operation
    (really DeleteFile), Cygwin will queue it for deletion later. However,
    reading the Cygwin code, I found the following:

    /* FIXME: this delqueue module is very flawed and should be rewritten.
    First, having an array of a fixed size for keeping track of the
    unlinked but not yet deleted files is bad. Second, some programs
    will unlink files and then create a new one in the same location
    and this behavior is not supported in the current code. Probably
    we should find a move/rename function that will work on open files,
    and move delqueue files to some special location or some such
    hack... */

    With the above caveats, is the current functionality sufficient for
    PostgreSQL's needs?
    No, it doesn't seems sufficient, though 7.1 will be a little better
    because of oid file names.

    --
    Bruce Momjian | http://candle.pha.pa.us
    pgman@candle.pha.pa.us | (610) 853-3000
    + If your life is a hard drive, | 830 Blythe Avenue
    + Christ can be your backup. | Drexel Hill, Pennsylvania 19026

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-ports @
categoriespostgresql
postedJan 16, '01 at 4:34a
activeJan 18, '01 at 9:04p
posts12
users3
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase