Hello.

Previously, I had some problems that appear to have been caused by the smp-ppc spinlock issue that was in the early 7.2 series.

The machine is a dual-800 g4, 10.1.5, using a libpq based client through local domain sockets.

The last few days, I've been dealing with a client who has drastically upped their usage of the database and in doing so is causing deadlocks. I was running 7.2 or 7.2.1, I upgraded to a locally compiled 7.2.4. I've run a vacuum full on the databases.

Sometimes the clients have a ps ax status of async_notify, sometimes there's just a stack of selects and updates that get hung. (I'd estimate 6 deadlocks since Saturday). It seems to coincide with times of extra activity, such as when the databases are being backed up with pg_dump.

I've also noticed the following in cron logs from nightly vacuums

NOTICE: Rel pg_attribute: Uninitialized page 59 - fixing
NOTICE: Rel pg_attribute: Uninitialized page 60 - fixing
....

Is there anything I can do to debug this? I'm willing to give it a shot, but I'm also rapidly preparing a single proc linux/intel machine to take over db duties.

eric

Search Discussions

  • Tom Lane at Mar 11, 2003 at 9:46 pm

    eric soroos writes:
    The last few days, I've been dealing with a client who has drastically upped their usage of the database and in doing so is causing deadlocks. I was running 7.2 or 7.2.1, I upgraded to a locally compiled 7.2.4. I've run a vacuum full on the databases.
    Sometimes the clients have a ps ax status of async_notify, sometimes there's just a stack of selects and updates that get hung. (I'd estimate 6 deadlocks since Saturday). It seems to coincide with times of extra activity, such as when the databases are being backed up with pg_dump.
    Hm. Do they use query-cancels at all? The reference to async_notify
    makes me wonder if this is related to the recently-discovered
    async_notify bug that could prevent fast-mode shutdowns. I'm not
    certain how that might lead to an apparent deadlock, but a query cancel
    arriving during async_notify would surely improve the odds of trouble.

    If you don't mind running a slightly customized version, you might try
    back-patching this fix:
    http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/backend/commands/async.c.diff?r1=1.91&r2=1.91.2.1
    into 7.2.4 and see if that improves matters.

    If it doesn't, I'd be interested to look into the matter, but I'd
    probably need access to the machine to see what is going on.
    I've also noticed the following in cron logs from nightly vacuums
    NOTICE: Rel pg_attribute: Uninitialized page 59 - fixing
    NOTICE: Rel pg_attribute: Uninitialized page 60 - fixing
    These are harmless.
    Is there anything I can do to debug this? I'm willing to give it a
    shot, but I'm also rapidly preparing a single proc linux/intel machine
    to take over db duties.
    I think you're mistaken to be blaming the hardware...

    regards, tom lane
  • Eric soroos at Mar 11, 2003 at 10:10 pm
    Tom,
    Hm. Do they use query-cancels at all? The reference to async_notify
    makes me wonder if this is related to the recently-discovered
    async_notify bug that could prevent fast-mode shutdowns. I'm not
    certain how that might lead to an apparent deadlock, but a query cancel
    arriving during async_notify would surely improve the odds of trouble.
    Not that I know of, unless it's for cleanup of queries when quitting the app or other such abort type states.
    If you don't mind running a slightly customized version, you might try
    back-patching this fix:
    http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/backend/commands/async.c.diff?r1=1.91&r2=1.91.2.1
    into 7.2.4 and see if that improves matters.
    I'll give that a shot.
    If it doesn't, I'd be interested to look into the matter, but I'd
    probably need access to the machine to see what is going on.
    That's probably possible, but there are some client confidentiality issues.
    Is there anything I can do to debug this? I'm willing to give it a
    shot, but I'm also rapidly preparing a single proc linux/intel machine
    to take over db duties.
    I think you're mistaken to be blaming the hardware...
    The linux box is a migration that's being accelerated from this issue. It has more drive, more memory, no app servers, and control of the kernel shared memory parameters.

    eric
  • Eric soroos at Mar 13, 2003 at 4:47 pm
    Tom,
    If you don't mind running a slightly customized version, you might try
    back-patching this fix:
    http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/backend/commands/async.c.diff?r1=1.91&r2=1.91.2.1
    into 7.2.4 and see if that improves matters.
    I'll give that a shot.
    It patched cleanly except for the version header. I've been running it for about 36 hours now with no problems. I'd say that I'm about 85% convinced that it made the difference, as I've also done some optimizations since then that reduce the database load by caching.

    I'd say that this patch is a candidate for 7.2.5 if there's ever another 7.2 release.

    thanks for your help.

    eric
  • Tom Lane at Mar 13, 2003 at 5:07 pm

    eric soroos writes:
    I'd say that this patch is a candidate for 7.2.5 if there's ever
    another 7.2 release.
    Yeah. I'm not sure that there will be another 7.2 release, but I'll pop
    the patch into the 7.2 CVS branch while I'm thinking about it...

    regards, tom lane

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-general @
categoriespostgresql
postedMar 11, '03 at 5:12p
activeMar 13, '03 at 5:07p
posts5
users2
websitepostgresql.org
irc#postgresql

2 users in discussion

Eric soroos: 3 posts Tom Lane: 2 posts

People

Translate

site design / logo © 2022 Grokbase