I've been getting a strange error from the vacuum command. When I try
'vacuum verbose analyze' the vacuum goes through the tables fine until
just after finishing one particular table. Then I get the error:
NOTICE: AbortTransaction and not in in-progress state
pqReadData() -- backend closed the channel unexpectedly.
This probably means the backend terminated abnormally
before or while processing the request.
We have lost the connection to the backend, so further processing is
impossible. Terminating.
When I try to vacuum the tables individually, I get no problems with
aborted backends.
Does anyone know what is going on here?
-Tony
PostgreSQL 6.5.1 on RH Linux 6.0, PII/400 MHz, 512 Meg RAM, 18 Gig HD
[PostgreSQL-Hackers] Aborted Transaction During Vacuum
| Tweet |
|
Search Discussions
-
Tom Lane at Aug 13, 1999 at 1:29 am ⇧
Interesting. Is there any additional message appearing in the"G. Anthony Reina" <reina@nsi.edu> writes:
I've been getting a strange error from the vacuum command. When I try
'vacuum verbose analyze' the vacuum goes through the tables fine until
just after finishing one particular table. Then I get the error:
NOTICE: AbortTransaction and not in in-progress state
pqReadData() -- backend closed the channel unexpectedly.
postmaster log? Is a core file being generated? (look in the
data/base/ subdirectory for the database in question) If there
is a corefile, a debugger backtrace from it would be helpful.
regards, tom lane
-
G. Anthony Reina at Aug 13, 1999 at 5:20 pm ⇧
Tom,Tom Lane wrote:
Interesting. Is there any additional message appearing in the
postmaster log? Is a core file being generated? (look in the
data/base/ subdirectory for the database in question) If there
is a corefile, a debugger backtrace from it would be helpful.
regards, tom lane
I tried the 'vacuum verbose analyze' again today. I get the same error
with the AbortTransaction. There is a core file generated but no pg_vlock
file. The core is over 1 Gig in size (38 Megs gzipped) so I'm not so sure
you'd want to get that (you can have it if you want it though!). Plus,
there seems to be nothing written to the postmaster.log file (I re-started
the postmaster before the vacuum using 'nohup postmaster -i -B 15000 -o -Fpostmaster.log&').Oliver Elphick (Oliver.Elphick@lfix.co.uk) told me that this sounds
like like a bug you patched for him a few days ago:It does sound very like the bug that was found a few days back where theWe're probably talking about the same bug. When I try just 'vacuum verbose'
pg_vlock file gets deleted by a mistaken call to vc_abort(). This only
gets called in the analyze code.
without the analyze, the vacuum completes just fine. So it must be within
the analyze code.
-Tony
-
Tom Lane at Aug 13, 1999 at 5:38 pm ⇧
It wouldn't do me any good anyway without a copy of the executable and a"G. Anthony Reina" <reina@nsi.edu> writes:
I tried the 'vacuum verbose analyze' again today. I get the same error
with the AbortTransaction. There is a core file generated but no pg_vlock
file. The core is over 1 Gig in size (38 Megs gzipped) so I'm not so sure
you'd want to get that (you can have it if you want it though!).
copy of gdb built for whatever platform you are on. I was hoping you
could run gdb on the corefile there and just send the backtrace output
("gdb postgres-executable core-file" and then say "bt").Oliver Elphick (Oliver.Elphick@lfix.co.uk) told me that this soundsI doubt it's the same bug --- the error message generated by the other
like like a bug you patched for him a few days ago:
bug was different...
regards, tom lane
-
Oliver Elphick at Aug 13, 1999 at 6:05 pm ⇧
>Tom Lane wrote:
Oliver Elphick (Oliver.Elphick@lfix.co.uk) told me that this sounds
like like a bug you patched for him a few days ago:I doubt it's the same bug --- the error message generated by the otherThe error message I posted was the log output at level 3, which he doesn't
bug was different...
seem to be running; if you switch to logging at that level, look for
messages that mention pg_vlock.
As far as I recall, the only error message he has listed so far is from
psql, which merely records the unlink failure at the end of the vacuum.
Everything else looks remarkably like the vc_abort bug.
--
Vote against SPAM: http://www.politik-digital.de/spam/
========================================
Oliver Elphick Oliver.Elphick@lfix.co.uk
Isle of Wight http://www.lfix.co.uk/oliver
PGP key from public servers; key ID 32B8FAA1
========================================
"Watch ye therefore, and pray always, that ye may be
accounted worthy to escape all these things that shall
come to pass, and to stand before the Son of man."
Luke 21:36 -
Tom Lane at Aug 13, 1999 at 6:10 pm ⇧
But the vc_abort problem didn't cause a backend coredump --- it just"Oliver Elphick" <olly@lfix.co.uk> writes:
As far as I recall, the only error message he has listed so far is from
psql, which merely records the unlink failure at the end of the vacuum.
Everything else looks remarkably like the vc_abort bug.
reported an error and failed to complete the vacuum, no?
regards, tom lane
-
Oliver Elphick at Aug 13, 1999 at 6:28 pm ⇧
>Tom Lane wrote:
"Oliver Elphick" <olly@lfix.co.uk> writes:As far as I recall, the only error message he has listed so far is from
psql, which merely records the unlink failure at the end of the vacuum.
Everything else looks remarkably like the vc_abort bug.But the vc_abort problem didn't cause a backend coredump --- it just>
reported an error and failed to complete the vacuum, no?
I got a coredump too; I never mentioned it, because I found the proximate
cause and curing that made it go away. When unlink failed in vc_shutdown,
it called ELOG and a segfault occurred a little later.
I forgot about that when your patch fixed the original problem.
--
Vote against SPAM: http://www.politik-digital.de/spam/
========================================
Oliver Elphick Oliver.Elphick@lfix.co.uk
Isle of Wight http://www.lfix.co.uk/oliver
PGP key from public servers; key ID 32B8FAA1
========================================
"Watch ye therefore, and pray always, that ye may be
accounted worthy to escape all these things that shall
come to pass, and to stand before the Son of man."
Luke 21:36 -
Tom Lane at Aug 13, 1999 at 9:33 pm ⇧
Ah, I wish I'd known that. So what Tony is seeing is exactly the same"Oliver Elphick" <olly@lfix.co.uk> writes:I got a coredump too; I never mentioned it, because I found the proximate
But the vc_abort problem didn't cause a backend coredump --- it just
reported an error and failed to complete the vacuum, no?
cause and curing that made it go away. When unlink failed in vc_shutdown,
it called ELOG and a segfault occurred a little later.
behavior you observed. OK, I feel better now --- I thought the coredump
was probably some platform-specific misbehavior that only Tony was seeing.
We still need to figure out what is causing it, because I can see no
reason for a coredump after vc_shutdown elog()s. Something is being
clobbered that should not be. But it sounds like installing the
vc_abort patch will get Tony on his feet, and then we can look for the
secondary bug at our leisure.
regards, tom lane
-
G. Anthony Reina at Aug 13, 1999 at 6:34 pm ⇧
Here's the error message again:Tom Lane wrote:
But the vc_abort problem didn't cause a backend coredump --- it just
reported an error and failed to complete the vacuum, no?
regards, tom lane
NOTICE: --Relation ex_ellipse_proc--
NOTICE: --Relation ex_ellipse_proc--
NOTICE: Pages 2545: Changed 0, Reapped 0, Empty 0, New 0; Tup 30535: Vac 0,
Keep/VTL 0/0, Crash 0, UnUsed 0,
MinLen 660, MaxLen 660; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail.
Pages 0/0. Elapsed 1/0 sec.
NOTICE: Pages 2545: Changed 0, Reapped 0, Empty 0, New 0; Tup 30535: Vac 0,
Keep/VTL 0/0, Crash 0, UnUsed 0,
MinLen 660, MaxLen 660; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail.
Pages 0/0. Elapsed 1/0 sec.
NOTICE: Index pkex_ellipse_proc: Pages 138; Tuples 30535. Elapsed 0/0 sec.
NOTICE: Index pkex_ellipse_proc: Pages 138; Tuples 30535. Elapsed 0/0 sec.
NOTICE: --Relation ex_ellipse_cell--
NOTICE: --Relation ex_ellipse_cell--
NOTICE: Pages 370: Changed 0, Reapped 0, Empty 0, New 0; Tup 6109: Vac 0,
Keep/VTL 0/0, Crash 0, UnUsed 0,
MinLen 80, MaxLen 2736; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail.
Pages 0/0. Elapsed 0/0 sec.
NOTICE: Pages 370: Changed 0, Reapped 0, Empty 0, New 0; Tup 6109: Vac 0,
Keep/VTL 0/0, Crash 0, UnUsed 0,
MinLen 80, MaxLen 2736; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail.
Pages 0/0. Elapsed 0/0 sec.
NOTICE: Index pkex_ellipse_cell: Pages 42; Tuples 6109. Elapsed 0/0 sec.
NOTICE: Index pkex_ellipse_cell: Pages 42; Tuples 6109. Elapsed 0/0 sec.
NOTICE: --Relation ex_ellipse_opto--
NOTICE: --Relation ex_ellipse_opto--
NOTICE: Pages 26356: Changed 0, Reapped 0, Empty 0, New 0; Tup 71475: Vac
0, Keep/VTL 0/0, Crash 0, UnUsed 0,
MinLen 1760, MaxLen 6108; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail.
Pages 0/0. Elapsed 3/2 sec.
NOTICE: Pages 26356: Changed 0, Reapped 0, Empty 0, New 0; Tup 71475: Vac
0, Keep/VTL 0/0, Crash 0, UnUsed 0,
MinLen 1760, MaxLen 6108; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail.
Pages 0/0. Elapsed 3/2 sec.
NOTICE: Index pkex_ellipse_opto: Pages 357; Tuples 71475. Elapsed 0/0 sec.
NOTICE: Index pkex_ellipse_opto: Pages 357; Tuples 71475. Elapsed 0/0 sec.
NOTICE: --Relation ex_ellipse_opto_proc--
NOTICE: --Relation ex_ellipse_opto_proc--
NOTICE: Pages 14742: Changed 0, Reapped 0, Empty 0, New 0; Tup 30535: Vac
0, Keep/VTL 0/0, Crash 0, UnUsed 0,
MinLen 1816, MaxLen 5900; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail.
Pages 0/0. Elapsed 1/1 sec.
NOTICE: Pages 14742: Changed 0, Reapped 0, Empty 0, New 0; Tup 30535: Vac
0, Keep/VTL 0/0, Crash 0, UnUsed 0,
MinLen 1816, MaxLen 5900; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail.
Pages 0/0. Elapsed 1/1 sec.
NOTICE: Index pkex_ellipse_opto_proc: Pages 138; Tuples 30535. Elapsed 0/0
sec.
NOTICE: Index pkex_ellipse_opto_proc: Pages 138; Tuples 30535. Elapsed 0/0
sec.
ERROR: vacuum: can't destroy lock file!
NOTICE: AbortTransaction and not in in-progress state
NOTICE: AbortTransaction and not in in-progress state
pqReadData() -- backend closed the channel unexpectedly.
This probably means the backend terminated abnormally
before or while processing the request.
We have lost the connection to the backend, so further processing is
impossible. Terminating.
[postgres@bigred ~]$
-Tony
-
G. Anthony Reina at Aug 13, 1999 at 8:38 pm ⇧
Yes, I have an ERROR message (either I didn't notice it before or it isTom Lane wrote:
However, there should have been an "ERROR" message if something reported
an error. You said you only saw "NOTICE: AbortTransaction and not in
in-progress state" and not any "ERROR" before or after it? The NOTICE
presumably came out of AbortTransaction itself, after whatever went
wrong went wrong...
new):
NOTICE: Index pkex_ellipse_opto_proc: Pages 138; Tuples 30535. Elapsed 0/0
sec.
NOTICE: Index pkex_ellipse_opto_proc: Pages 138; Tuples 30535. Elapsed 0/0
sec.
ERROR: vacuum: can't destroy lock file!
NOTICE: AbortTransaction and not in in-progress state
NOTICE: AbortTransaction and not in in-progress state
pqReadData() -- backend closed the channel unexpectedly.
This probably means the backend terminated abnormally
before or while processing the request.
We have lost the connection to the backend, so further processing is
impossible. Terminating.
It looks like the error is either occuring on table ex_ellipse_opto_proc or
the next table in the list ex_ellipse_proc. However, I think the error is
more general than that. It appears to occur just before the last table in
the database gets vacuumed. Here's the list of my tables:
Database = db01
+------------------+----------------------------------+----------+Owner | Relation | Type |+------------------+----------------------------------+----------+
+------------------+----------------------------------+----------+
postgres | center_out | table |
postgres | center_out_analog | table |
postgres | center_out_analog_proc | table |
postgres | center_out_cell | table |
postgres | center_out_cell_proc | table |
postgres | center_out_opto | table |
postgres | center_out_opto_proc | table |
postgres | center_out_pref_direction | table |
postgres | center_out_proc | table |
postgres | electrode | table |
postgres | ellipse | table |
postgres | ellipse_analog | table |
dan | ellipse_analog_proc | table |
postgres | ellipse_cell | table |
dan | ellipse_cell_proc | table |
postgres | ellipse_opto | table |
dan | ellipse_opto_proc | table |
dan | ellipse_proc | table |
dan | ex_ellipse | table |
dan | ex_ellipse_analog_proc | table |
dan | ex_ellipse_cell | table |
dan | ex_ellipse_cell_proc | table |
dan | ex_ellipse_opto | table |
dan | ex_ellipse_opto_proc | table
<---- ERROR occurs somewhere after here
dan | ex_ellipse_proc | table |
Yesterday, I was adding tables in one by one from a previous pg_dump. The
error didn't pop up until after I had about 9 or 10 tables restored. I
didn't think about it then, but it may have always occured after the second
to last table in the list. But don't hold me to that.
In any case, I'll try to re-build everything like you've asked to get a
better error message. Maybe if I go through step-by-step again. You'll be
able to help me find where the error is taking place.
Thanks Tom and Oliver. I'll get back to you when I finish the rebuild.
-Tony
-
Tom Lane at Aug 13, 1999 at 9:23 pm ⇧
Ah-hah! Oliver is right, then, at least in part --- that error message"G. Anthony Reina" <reina@nsi.edu> writes:
ERROR: vacuum: can't destroy lock file!
NOTICE: AbortTransaction and not in in-progress state
NOTICE: AbortTransaction and not in in-progress state
pqReadData() -- backend closed the channel unexpectedly.
from vacuum suggests that the vc_abort bug *is* biting you. However,
there may be more going on, because what Oliver and others observed did
not include a coredump (at least not that I heard about).
You can probably suppress the problem by installing the patch I posted
to pgsql-patches a few days ago. However, I'd appreciate it if you'd
first try to reproduce the problem again with debug/assert turned on,
so that we can get some idea whether there is an additional bug that's
only biting you and not the previous reporters.
BTW, if vc_abort is involved then the occurrence of the problem probably
depends on whether any other backends are using the database and what
they are doing. (The vc_abort bug escaped notice up till last week
because it doesn't trigger when vacuum is the only thing running.)
Do you have other clients running when you do the vacuum? What are
they doing?
regards, tom lane
-
G. Anthony Reina at Aug 14, 1999 at 4:05 am ⇧
Tom and Oliver,
I added the patch and re-built everything. No more problems with the
vacuum analyze (knock on wood since this is still Friday the 13th).
Tom, does this patch address the problem you found in your
EndPortalAllocMode comments? I'm just wondering whether another patch
will be needed or if this one should cover the problem.
Thanks again for the help guys.
-Tony
Tom wrote:(a) vacuum.c does a CommitTransaction to commit its final table's
worth of fixes.
(b) CommitTransaction calls EndPortalAllocMode.
(c) vacuum calls vc_shutdown, which tries to remove pg_vlock,
and reports an error when the unlink() call fails.
(d) during error cleanup, AbortTransaction is called.
(e) AbortTransaction calls EndPortalAllocMode. There has been no
intervening StartPortalAllocMode, so the portal's context stack
is empty. EndPortalAllocMode tries to free a nonexistent memory
context (or, if you have asserts turned on, dies with an assert
failure). Ka-boom. -
Tom Lane at Aug 14, 1999 at 2:29 pm ⇧
I think you need not worry. EndPortalAllocMode has a bug that ought to"G. Anthony Reina" <reina@nsi.edu> writes:
Tom, does this patch address the problem you found in your
EndPortalAllocMode comments? I'm just wondering whether another patch
will be needed or if this one should cover the problem.
be fixed IMHO, but it's only triggered if an error is reported outside
of any transaction context, and there is darn little backend processing
that happens outside of any transaction context. This vacuum shutdown
check might be the only such error, in fact. In short, it's something
to clean up for 6.6, but I doubt it's worth issuing a 6.5 patch for.
regards, tom lane
Related Discussions
Discussion Navigation
| view | thread | post |
Discussion Overview
| group | pgsql-hackers
|
| categories | postgresql |
| posted | Aug 12, '99 at 11:26p |
| active | Aug 14, '99 at 2:29p |
| posts | 13 |
| users | 3 |
| website | postgresql.org... |
| irc | #postgresql |
