PostgreSQL 8.2.0 is randomly failing "make check" on our SUSE Linux
10.1 box. Sometimes it goes through saying all 103 tests passed; more
often though, I get an error of which I've included some below. A
PostgreSQL 8.1.5 database on this same machine has been running for
over a month in production AFAICT flawlessly, although running "make
check" in the 8.1.5 source directory gives similar random errors.

$ uname -a
Linux hulk4 2.6.16.21-0.25-default #2 SMP Wed Oct 18 15:27:44 MDT
2006 x86_64 x86_64 x86_64 GNU/Linux

$ make --version
GNU Make 3.80

$ gcc --version
gcc (GCC) 4.1.0 (SUSE Linux)

If anyone has any advice as to what might be causing these errors or
what I might do to ascertain the problem, I would greatly appreciate
it. Please let me know if I can provide any further information that
would be helpful.

Thanks,

Brian Wipf
<brian@clickspace.com>

$ ./configure
$ make
$ make check
...
...
============== initializing database system ==============

pg_regress: initdb failed
Examine ./log/initdb.log for the reason.
Command was: "/usr/local/src/postgresql-8.2.0/src/test/regress/./
tmp_check/install//usr/local/pgsql/bin/initdb" -D "/usr/local/src/
postgresql-8.2.0/src/test/regress/./tmp_check/data" -L "/usr/local/
src/postgresql-8.2.0/src/test/regress/./tmp_check/install//usr/local/
pgsql/share" --noclean > "./log/initdb.log" 2>&1
make[2]: *** [check] Error 2
make[2]: Leaving directory `/usr/local/src/postgresql-8.2.0/src/test/
regress'
make[1]: *** [check] Error 2
make[1]: Leaving directory `/usr/local/src/postgresql-8.2.0/src/test'
make: *** [check] Error 2

$ cat ./src/test/regress/log/initdb.log
Running in noclean mode. Mistakes will not be cleaned up.
The files belonging to this database system will be owned by user
"admin".
This user must also own the server process.

The database cluster will be initialized with locale C.

creating directory /usr/local/src/postgresql-8.2.0/src/test/regress/./
tmp_check/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers/max_fsm_pages ... 32MB/204800
creating configuration files ... ok
creating template1 database in /usr/local/src/postgresql-8.2.0/src/
test/regress/./tmp_check/data/base/1 ... ok
initializing pg_authid ...
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@FATAL:
1 trigger record(s) not found for relation "pg_authid"
FATAL: xlog flush request 0/6418 is not satisfied --- flushed only
to 0/160
CONTEXT: writing block 0 of relation 1663/1/1259
child process exited with exit code 1
initdb: data directory "/usr/local/src/postgresql-8.2.0/src/test/
regress/./tmp_check/data" not removed at user's request


Other errors encountered in the initdb.logs:

...
loading system objects' descriptions ... ok
creating conversions ... ok
setting privileges on built-in objects ... ok
creating information schema ... FATAL: cache lookup failed for type
10445
STATEMENT: UPDATE information_schema.sql_implementation_info SET
character_value = '08.02.0000' WHERE implementation_info_name =
'DBMS VERSION';
child process exited with exit code 1

...
loading system objects' descriptions ... ok
creating conversions ... ok
setting privileges on built-in objects ... FATAL: duplicate key
violates unique constraint "pg_class_oid_index"
STATEMENT: UPDATE pg_class SET relacl = E'{"=r/\\"admin\\""}'
WHERE relkind IN ('r', 'v', 'S') AND relacl IS NULL;
child process exited with exit code 1

...
loading system objects' descriptions ... ok
creating conversions ... ok
setting privileges on built-in objects ... ok
creating information schema ... FATAL: cache lookup failed for
relation 1247
child process exited with exit code 1

...
loading system objects' descriptions ... ok
creating conversions ... PANIC: heap_insert_redo: invalid max offset
number
CONTEXT: xlog redo insert: rel 1663/1/1247; tid 3/47
sh: line 1: 25834 Aborted "/usr/local/src/
postgresql-8.2.0/src/test/regress/tmp_check/install/usr/local/pgsql/
bin/postgres" --single -F -O -c search_path=pg_catalog -c
exit_on_error=true template1 >/dev/null
child process exited with exit code 134

...
loading system objects' descriptions ... ok
creating conversions ... ok
setting privileges on built-in objects ...
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@FA
TAL: cache lookup failed for relation 1247
child process exited with exit code 1


I also got an error once when the postmaster was started as part of
make check:

============== starting postmaster ==============

pg_regress: postmaster did not respond within 60 seconds
Examine ./log/postmaster.log for the reason
make[2]: *** [check] Error 2

$ cat ./src/test/regress/log/postmaster.log
LOG: database system was shut down at 2006-12-13 07:03:38 GMT
LOG: record with zero length at 0/48CC30
LOG: invalid primary checkpoint record
LOG: record with zero length at 0/48CBB0
LOG: invalid secondary checkpoint record
PANIC: could not locate a valid checkpoint record
LOG: startup process (PID 22499) was terminated by signal 6
LOG: aborting startup due to startup process failure

Search Discussions

  • Brian Wipf at Dec 13, 2006 at 11:46 pm
    I did the install despite the errors. After the install, I ran initdb
    and it failed the first time with:

    creating configuration files ... ok
    creating template1 database in /usr/local/pgsql/data/base/1 ... ok
    initializing pg_authid ... FATAL: function flatfile_update_trigger()
    does not exist
    STATEMENT: CREATE TRIGGER pg_sync_pg_database AFTER INSERT OR
    UPDATE OR DELETE ON pg_database FOR EACH STATEMENT EXECUTE
    PROCEDURE flatfile_update_trigger();

    I ran it a second time and it went through without errors. To tempt
    fate, I removed the data directory and tried it a third time. Then it
    hung on "creating system views ..."

    A backtrace of all stack frames of initdb at the time showed:
    (gdb) bt
    #0 0x00002b3442a11836 in _IO_proc_close@@GLIBC_2.2.5 () from /lib64/
    libc.so.6
    #1 0x00002b3442a1b672 in _IO_new_file_close_it () from /lib64/libc.so.6
    #2 0x00002b3442a0f9d8 in fclose@@GLIBC_2.2.5 () from /lib64/libc.so.6
    #3 0x0000000000405b66 in pclose_check ()
    #4 0x000000000040313a in setup_sysviews ()
    #5 0x0000000000404f9a in main ()

    Any ideas what might be wrong? Hardware? OS? Maybe I should try
    something other than SUSE 10.1?

    Brian Wipf
    <brian@clickspace.com>
  • Tom Lane at Dec 13, 2006 at 11:58 pm

    Brian Wipf writes:
    Any ideas what might be wrong? Hardware? OS? Maybe I should try
    something other than SUSE 10.1?
    It certainly sounds like you've got a seriously flaky platform there :-(
    I'd assume hardware issues myself --- bad RAM seems like the first
    gut-instinct theory to check.

    regards, tom lane

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-ports @
categoriespostgresql
postedDec 13, '06 at 4:51p
activeDec 13, '06 at 11:58p
posts3
users2
websitepostgresql.org
irc#postgresql

2 users in discussion

Brian Wipf: 2 posts Tom Lane: 1 post

People

Translate

site design / logo © 2022 Grokbase