On Feb 28, 2011, at 3:28 PM, daveg wrote:
On Wed, Jan 12, 2011 at 10:46:14AM +0200, Heikki Linnakangas wrote:
On 12.01.2011 06:21, Fujii Masao wrote:
On Sat, Dec 25, 2010 at 2:09 PM, Maxim Bogukwrote:
While I trying create reproducible test case for BUG #5798 I
encountered very strange effect on two of my servers (both servers
have same hardware platform/OS (freebsd 7.2) and PostgreSQL 8.4.4).

Very simple test table created as:
CREATE TABLE test (id integer);
INSERT INTO test select generate_series(0,10000);

And I trying repeateble vacuum of that table with script:
perl -e "foreach (1..100000) {system \"psql -d test -h -c 'vacuum
test'\";}"

And once per like an minute (really random intervals can be 5 minutes
without problems can be 3 vacuum in row show same error) I getting
next errors:
WARNING: PD_ALL_VISIBLE flag was incorrectly set in relation "test" page
1
...
WARNING: PD_ALL_VISIBLE flag was incorrectly set in relation "test"
page 30 for all pages of the relation.
Oh, interesting. This is the first time anyone can reliably reproducible
that. I can't reproduce that on my laptop with that script, though, so
I'm going to need your help to debug this.

Can you compile PostgreSQL with the attached patch, and rerun the test?
It will dump the pages with incorrectly set flags to files in /tmp/, and
adds a bit more detail in the WARNING. Please run the test until you
get those warnings, and tar up the the created "/tmp/pageimage*" files,
and post them along with the warning generated.

We'll likely need to go back and forth a few times with various
debugging patches until we get to the heart of this..
Anything new on this? I'm seeing at on one of my clients production boxes.
Also, what is the significance, ie what is the risk or damage potential if
this flag is set incorrectly?

Was this cluster upgraded to 8.4.4 from 8.4.0? It sounds to me like a known bug in 8.4.0 which was fixed by this commit:

commit 7fc7a7c4d082bfbd579f49e92b046dd51f1faf5f
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Mon Aug 24 02:18:32 2009 +0000

Fix a violation of WAL coding rules in the recent patch to include an
"all tuples visible" flag in heap page headers. The flag update *must*
be applied before calling XLogInsert, but heap_update and the tuple
moving routines in VACUUM FULL were ignoring this rule. A crash and
replay could therefore leave the flag incorrectly set, causing rows
to appear visible in seqscans when they should not be. This might explain
recent reports of data corruption from Jeff Ross and others.

In passing, do a bit of editorialization on comments in visibilitymap.c.

oy:postgresql machack$ git describe --tag 7fc7a7c4d082bfbd579f49e92b046dd51f1faf5f
REL8_4_0-190-g7fc7a7c

If the flag got twiddled while running as 8.4.0, the incorrect PD_ALL_VISIBLE flag would (obviously) not be fixed by the upgrade to 8.4.4. (Is this a separate issue?)

Regards,

David
--
David Christensen
End Point Corporation
david@endpoint.com

Search Discussions

  • Fujii Masao at Mar 1, 2011 at 3:19 am

    On Tue, Mar 1, 2011 at 10:43 AM, David Christensen wrote:
    Was this cluster upgraded to 8.4.4 from 8.4.0?  It sounds to me like a known bug in 8.4.0 which was fixed by this commit:

    commit 7fc7a7c4d082bfbd579f49e92b046dd51f1faf5f
    Author: Tom Lane <tgl@sss.pgh.pa.us>
    Date:   Mon Aug 24 02:18:32 2009 +0000

    Fix a violation of WAL coding rules in the recent patch to include an
    "all tuples visible" flag in heap page headers.  The flag update *must*
    be applied before calling XLogInsert, but heap_update and the tuple
    moving routines in VACUUM FULL were ignoring this rule.  A crash and
    replay could therefore leave the flag incorrectly set, causing rows
    to appear visible in seqscans when they should not be.  This might explain
    recent reports of data corruption from Jeff Ross and others.

    In passing, do a bit of editorialization on comments in visibilitymap.c.

    oy:postgresql machack$ git describe --tag 7fc7a7c4d082bfbd579f49e92b046dd51f1faf5f
    REL8_4_0-190-g7fc7a7c

    If the flag got twiddled while running as 8.4.0, the incorrect PD_ALL_VISIBLE flag would (obviously) not be fixed by the upgrade to 8.4.4.  (Is this a separate issue?)
    Yes, it's a different issue. I observed it on 8.4.2.

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
  • Greg Stark at Mar 1, 2011 at 3:33 am

    On Tue, Mar 1, 2011 at 1:43 AM, David Christensen wrote:
    Was this cluster upgraded to 8.4.4 from 8.4.0?  It sounds to me like a known bug in 8.4.0 which was fixed by this commit:
    The reproduction script described was running vacuum repeatedly. A
    single vacuum run out to be sufficient to clean up the problem if it
    was left-over.

    I wonder if it would help to write a regression test that runs 100 or
    so vacuums and see if the bulid farm turns up any examples of this
    behaviour.

    --
    greg
  • Robert Haas at Mar 1, 2011 at 1:40 pm

    On Mon, Feb 28, 2011 at 10:32 PM, Greg Stark wrote:
    On Tue, Mar 1, 2011 at 1:43 AM, David Christensen wrote:
    Was this cluster upgraded to 8.4.4 from 8.4.0?  It sounds to me like a known bug in 8.4.0 which was fixed by this commit:
    The reproduction script described was running vacuum repeatedly. A
    single vacuum run out to be sufficient to clean up the problem if it
    was left-over.

    I wonder if it would help to write a regression test that runs 100 or
    so vacuums and see if the bulid farm turns up any examples of this
    behaviour.
    One other thing to keep in mind here is that the warning message we've
    chosen can be a bit misleading. The warning is:

    WARNING: PD_ALL_VISIBLE flag was incorrectly set in relation "test" page 1

    ...which implies that the state of the tuples is correct, and that the
    page-level bit is wrong in comparison. But I recently saw a case
    where the infomask got clobbered, resulting in this warning. The page
    level bit was correct, at least relative to the intended page
    contents; it was the a tuple on the page that was screwed up. It
    might have been better to pick a more neutral phrasing, like "page is
    marked all-visible but some tuples are not visible".

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Daveg at Mar 3, 2011 at 7:12 am

    On Tue, Mar 01, 2011 at 08:40:37AM -0500, Robert Haas wrote:
    On Mon, Feb 28, 2011 at 10:32 PM, Greg Stark wrote:
    On Tue, Mar 1, 2011 at 1:43 AM, David Christensen wrote:
    Was this cluster upgraded to 8.4.4 from 8.4.0?  It sounds to me like a known bug in 8.4.0 which was fixed by this commit:
    The reproduction script described was running vacuum repeatedly. A
    single vacuum run out to be sufficient to clean up the problem if it
    was left-over.

    I wonder if it would help to write a regression test that runs 100 or
    so vacuums and see if the bulid farm turns up any examples of this
    behaviour.
    One other thing to keep in mind here is that the warning message we've
    chosen can be a bit misleading. The warning is:

    WARNING: PD_ALL_VISIBLE flag was incorrectly set in relation "test" page 1

    ...which implies that the state of the tuples is correct, and that the
    page-level bit is wrong in comparison. But I recently saw a case
    where the infomask got clobbered, resulting in this warning. The page
    level bit was correct, at least relative to the intended page
    contents; it was the a tuple on the page that was screwed up. It
    might have been better to pick a more neutral phrasing, like "page is
    marked all-visible but some tuples are not visible".
    Yeesh. Yikes. I hope that this is not the case as we are seeing thousands of
    these daily on each of 4 large production hosts. Mostly on catalogs,
    especially pg_statistic. However it does occur on some high delete/insert
    traffic user tables too.

    Question: what would be the consequence of simply patching out the setting
    of this flag? Assuming that the incorrect PD_ALL_VISIBLE flag is the only
    problem (big assumption perhaps) then simply never setting it would at least
    avoid the possibility of returning wrong answers, presumably at some
    performance cost. We possibly could live with that until we get a handle
    on the real cause and fix.

    I had a look and don't really see anything except vacuum_lazy that sets it,
    so it seems simple to disable.

    Or have I understood this incorrectly?

    Anything else I can be doing to try to track this down?

    -dg

    --
    David Gould daveg@sonic.net 510 536 1443 510 282 0869
    If simplicity worked, the world would be overrun with insects.
  • Heikki Linnakangas at Mar 3, 2011 at 8:17 am

    On 03.03.2011 09:12, daveg wrote:
    Question: what would be the consequence of simply patching out the setting
    of this flag? Assuming that the incorrect PD_ALL_VISIBLE flag is the only
    problem (big assumption perhaps) then simply never setting it would at least
    avoid the possibility of returning wrong answers, presumably at some
    performance cost. We possibly could live with that until we get a handle
    on the real cause and fix.
    Yes. With that assumption.

    If you really want to do that, I would suggest the attached patch
    instead. This just disables the optimization in seqscans to trust it, so
    an incorrectly set flag won't affect correctness of query results, but
    the flag is still set as usual and you still get the warnings so that we
    can continue to debug the issue.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Daveg at Mar 3, 2011 at 10:54 am

    On Thu, Mar 03, 2011 at 10:16:29AM +0200, Heikki Linnakangas wrote:
    On 03.03.2011 09:12, daveg wrote:
    Question: what would be the consequence of simply patching out the setting
    of this flag? Assuming that the incorrect PD_ALL_VISIBLE flag is the only
    problem (big assumption perhaps) then simply never setting it would at
    least
    avoid the possibility of returning wrong answers, presumably at some
    performance cost. We possibly could live with that until we get a handle
    on the real cause and fix.
    Yes. With that assumption.

    If you really want to do that, I would suggest the attached patch
    instead. This just disables the optimization in seqscans to trust it, so
    an incorrectly set flag won't affect correctness of query results, but
    the flag is still set as usual and you still get the warnings so that we
    can continue to debug the issue.
    Thanks. I'll be applying this tomorrow and will send you some page images
    to look at assuming it still does it.

    I had a look at how this gets set and cleared and did not see anything obvious
    so I'm pretty mystified. Also, we are seeing thousands of these daily for at
    least a month on 4 large hosts and no-one has noticed any other issues,
    which suprises me. Very strange.

    -dg


    --
    David Gould daveg@sonic.net 510 536 1443 510 282 0869
    If simplicity worked, the world would be overrun with insects.
  • Merlin Moncure at Mar 3, 2011 at 3:04 pm

    On Thu, Mar 3, 2011 at 2:16 AM, Heikki Linnakangas wrote:
    On 03.03.2011 09:12, daveg wrote:

    Question: what would be the consequence of simply patching out the setting
    of this flag? Assuming that the incorrect PD_ALL_VISIBLE flag is the only
    problem (big assumption perhaps) then simply never setting it would at
    least
    avoid the possibility of returning wrong answers, presumably at some
    performance cost. We possibly could live with that until we get a handle
    on the real cause and fix.
    Yes. With that assumption.

    If you really want to do that, I would suggest the attached patch instead.
    This just disables the optimization in seqscans to trust it, so an
    incorrectly set flag won't affect correctness of query results, but the
    flag is still set as usual and you still get the warnings so that we can
    continue to debug the issue.
    This. The mis-set flag can is likely a bug/concurrency issue etc,
    but could also be a symptom of more sinister data corruption. I did
    various vacuum experiments all day yesterday on my windows workstation
    and was not able to produce any mis-flags. I trust iscsi more than
    nfs, but maybe there is a connection here that is hardware based. hm.
    do you think it would be helpful to know what is causing the
    all_visible flag to get flipped? If so, the attached patch shows
    which case is throwing it...

    merlin
  • Daveg at Mar 4, 2011 at 9:02 am

    On Thu, Mar 03, 2011 at 09:04:04AM -0600, Merlin Moncure wrote:
    On Thu, Mar 3, 2011 at 2:16 AM, Heikki Linnakangas
    wrote:
    On 03.03.2011 09:12, daveg wrote:

    Question: what would be the consequence of simply patching out the setting
    of this flag? Assuming that the incorrect PD_ALL_VISIBLE flag is the only
    problem (big assumption perhaps) then simply never setting it would at
    least
    avoid the possibility of returning wrong answers, presumably at some
    performance cost. We possibly could live with that until we get a handle
    on the real cause and fix.
    Yes. With that assumption.

    If you really want to do that, I would suggest the attached patch instead.
    This just disables the optimization in seqscans to trust it, so an
    incorrectly set flag won't affect correctness of query results, but the
    flag is still set as usual and you still get the warnings so that we can
    continue to debug the issue.
    This. The mis-set flag can is likely a bug/concurrency issue etc,
    but could also be a symptom of more sinister data corruption. I did
    various vacuum experiments all day yesterday on my windows workstation
    and was not able to produce any mis-flags. I trust iscsi more than
    nfs, but maybe there is a connection here that is hardware based. hm.
    do you think it would be helpful to know what is causing the
    all_visible flag to get flipped? If so, the attached patch shows
    which case is throwing it...
    I'll apply your patch and try it. Probably can only do it for a few minutes
    tomorrow evening though as the output is huge and we have only limited down
    time availability.

    -dg

    --
    David Gould daveg@sonic.net 510 536 1443 510 282 0869
    If simplicity worked, the world would be overrun with insects.
  • Heikki Linnakangas at Mar 4, 2011 at 4:00 pm

    On 04.03.2011 11:00, daveg wrote:
    Thanks, I've applied both patches to one host. I'll probably have to back
    down on the debugging logging soon, as the output is pretty voluminious,
    it is producing 100MB of message log every few minutes. I'll try Merlins
    patch to get the case setting the bit first though.

    Anyway, here is a snippit of log with the setting and unsetting of the
    bit on one page. Unfortunately, we don't know which of a dozen odd databases
    the page belongs to, but the timestamps are so close it seems likely to be
    the same one. I've added dbname to the patch and will get that next time
    I can switch binaries.

    2011-03-03 23:28:34.170 PST 2039 WARNING: debugging: setting PD_ALL_VISIBLE in relation "pg_statistic" on page 5963 (OldestXmin 331848998)
    ...
    /cv/logs/production_03-20110303_232519.log.gz:2011-03-03 23:29:34.194 PST 2115 WARNING: PD_ALL_VISIBLE flag was incorrectly set in relation "pg_statistic" page 5963 (OldestXmin 331677178)
    ...
    2011-03-03 23:42:38.323 PST 2808 WARNING: debugging: setting PD_ALL_VISIBLE in relation "pg_attribute" on page 5963 (OldestXmin 331677178)
    Hmm, if these all came from the same database, then it looks OldestXmin
    has moved backwards. That would explain the warnings. First one vacuum
    determines that all the tuples are visible to everyone and sets the
    flag. Then another vacuum runs with an older OldestXmin, and determines
    that there's a tuple on the page with an xmin that is not yet visible to
    everyone, hence it thinks that the flag should not have been set yet.

    Looking at the code, I don't see how that situation could arise, though.
    The value calculated by GetOldestXmin() should never move backwards. And
    GetOldestXmin() is called in lazy_vacuum_rel(), after it has acquired a
    lock on the table, which should protect from a race condition where two
    vacuums could run on the table one after another, in a way where the
    later vacuum runs with an OldestXmin calculated before the first vacuum.

    Hmm, fiddling with vacuum_defer_cleanup_age on the fly could cause that,
    though. You don't do that, do you?
    Also, I've attached the relevent page image.
    Thanks. There seems to be two tuples on the page, both of were HOT
    updated at some point, but now there's only one version of each left:

    postgres=# SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax,
    t_ctid, t_infomask, t_infomask2, t_hoff
    FROM heap_page_items(loread(lo_open(29924, 262144), 8192)) WHERE
    lp_flags <> 0;
    lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_ctid |
    t_infomask | t_infomask2 | t_hoff
    ----+--------+----------+--------+-----------+--------+----------+------------+-------------+--------
    1 | 7608 | 1 | 580 | 331250141 | 0 | (5963,1) |
    10499 | -32747 | 32
    3 | 1 | 2 | 0 | | | |
    4 | 7528 | 1 | 76 | 331735553 | 0 | (5963,4) |
    10497 | -32747 | 32
    19 | 4 | 2 | 0 | | | |
    (4 rows)

    Deciphering those infomasks, the first tuple at lp 1 has these flags set:
    HEAP_UPDATED | HEAP_XMAX_INVALID | HEAP_XMIN_COMMITTED | HEAP_HASNULL |
    HEAP_HASVARWIDTH

    And the 2nd one at lp 4:
    HEAP_UPDATED | HEAP_XMAX_INVALID | HEAP_XMIN_COMMITTED | HEAP_HASNULL

    So, both of those tuples are live.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Daveg at Mar 7, 2011 at 11:53 pm

    On Fri, Mar 04, 2011 at 05:52:29PM +0200, Heikki Linnakangas wrote:
    Hmm, if these all came from the same database, then it looks OldestXmin
    has moved backwards. That would explain the warnings. First one vacuum
    determines that all the tuples are visible to everyone and sets the
    flag. Then another vacuum runs with an older OldestXmin, and determines
    that there's a tuple on the page with an xmin that is not yet visible to
    everyone, hence it thinks that the flag should not have been set yet.

    Looking at the code, I don't see how that situation could arise, though.
    The value calculated by GetOldestXmin() should never move backwards. And
    GetOldestXmin() is called in lazy_vacuum_rel(), after it has acquired a
    lock on the table, which should protect from a race condition where two
    vacuums could run on the table one after another, in a way where the
    later vacuum runs with an OldestXmin calculated before the first vacuum.

    Hmm, fiddling with vacuum_defer_cleanup_age on the fly could cause that,
    though. You don't do that, do you?
    No.

    I've updated the patch to collect db and schema and added Merlins patch as
    well and run it for a while. The attached log is all the debug messages
    for pg_statistic page 333 from one database. I've also attached the two
    most recent page images for that particular page, the last digits in the
    filename are the hour and minute of when the page was saved.

    What else can I be doing to help figure this out?

    Thanks

    -dg

    --
    David Gould daveg@sonic.net 510 536 1443 510 282 0869
    If simplicity worked, the world would be overrun with insects.
  • Greg Stark at Mar 8, 2011 at 2:08 am

    On Mon, Mar 7, 2011 at 11:53 PM, daveg wrote:
    Looking at the code, I don't see how that situation could arise, though.
    The value calculated by GetOldestXmin() should never move backwards. And
    GetOldestXmin() is called in lazy_vacuum_rel(), after it has acquired a
    lock on the table, which should protect from a race condition where two
    vacuums could run on the table one after another, in a way where the
    later vacuum runs with an OldestXmin calculated before the first vacuum.

    Hmm, fiddling with vacuum_defer_cleanup_age on the fly could cause that,
    though. You don't do that, do you?
    No.

    I've updated the patch to collect db and schema and added Merlins patch as
    well and run it for a while. The attached log is all the debug messages
    for pg_statistic page 333 from one database. I've also attached the two
    most recent page images for that particular page, the last digits in the
    filename are the hour and minute of when the page was saved.

    Well from that log you definitely have OldestXmin going backwards. And
    not by a little bit either. at 6:33 it set the all_visible flag and
    then at 7:01 it was almost 1.3 million transactions earlier. In fact
    to precisely the same value that was in use for a transaction at 1:38.
    That seems like a bit of a coincidence though it's not repeated
    earlier.

    It also seems odd that it happens only with this one block of this one table.

    What does SHOW ALL show for the current settings in effect? And what
    was process 23896, are there any other log messages from it? When did
    it start?


    --
    greg
  • Daveg at Mar 8, 2011 at 2:51 am

    On Tue, Mar 08, 2011 at 02:07:53AM +0000, Greg Stark wrote:
    On Mon, Mar 7, 2011 at 11:53 PM, daveg wrote:
    Looking at the code, I don't see how that situation could arise, though.
    The value calculated by GetOldestXmin() should never move backwards. And
    GetOldestXmin() is called in lazy_vacuum_rel(), after it has acquired a
    lock on the table, which should protect from a race condition where two
    vacuums could run on the table one after another, in a way where the
    later vacuum runs with an OldestXmin calculated before the first vacuum.

    Hmm, fiddling with vacuum_defer_cleanup_age on the fly could cause that,
    though. You don't do that, do you?
    No.

    I've updated the patch to collect db and schema and added Merlins patch as
    well and run it for a while. The attached log is all the debug messages
    for pg_statistic page 333 from one database. I've also attached the two
    most recent page images for that particular page, the last digits in the
    filename are the hour and minute of when the page was saved.

    Well from that log you definitely have OldestXmin going backwards. And
    not by a little bit either. at 6:33 it set the all_visible flag and
    then at 7:01 it was almost 1.3 million transactions earlier. In fact
    to precisely the same value that was in use for a transaction at 1:38.
    That seems like a bit of a coincidence though it's not repeated
    earlier.

    It also seems odd that it happens only with this one block of this one table.
    It happens on several tables on thousands of blocks in dozens of dbs daily
    on four separate but similar hosts. I have over 240MB of page images saved
    from one host so far but assumed that would be too much to send, so I just
    filtered it down to the one block.

    What does SHOW ALL show for the current settings in effect? And what Attached.
    was process 23896, are there any other log messages from it? When did
    it start?
    The process appears to be an autovacuum worker, all it logged was the attached
    messages from the debugging for this.

    Thanks

    -dg

    --
    David Gould daveg@sonic.net 510 536 1443 510 282 0869
    If simplicity worked, the world would be overrun with insects.
  • Heikki Linnakangas at Mar 8, 2011 at 8:00 am

    On 08.03.2011 04:07, Greg Stark wrote:
    Well from that log you definitely have OldestXmin going backwards. And
    not by a little bit either. at 6:33 it set the all_visible flag and
    then at 7:01 it was almost 1.3 million transactions earlier. In fact
    to precisely the same value that was in use for a transaction at 1:38.
    That seems like a bit of a coincidence though it's not repeated
    earlier.
    Yep. After staring at GetOldestXmin() again, it finally struck me how
    OldestXmin can move backwards. You need two databases for it, which
    probably explains why this has been so elusive.

    Here's how to reproduce that:

    CREATE DATABASE foodb;
    CREATE DATABASE bardb;

    session 1, in foodb:

    foodb=# begin isolation level serializable;
    BEGIN
    foodb=# CREATE TABLE foo (a int4); -- just something to force this xact
    to have an xid
    CREATE TABLE
    foodb=#

    (leave the transaction open)

    session 2, in bardb:

    bardb=# CREATE TABLE foo AS SELECT 1;
    SELECT
    bardb=# vacuum foo; -- to set the PD_ALL_VISIBLE flag
    VACUUM

    session 3, in bardb:
    bardb=# begin isolation level serializable;
    BEGIN
    bardb=# SELECT 1;
    ?column?
    ----------
    1
    (1 row)

    (leave transaction open)

    session 2, in bardb:

    bardb=# vacuum foo;
    WARNING: PD_ALL_VISIBLE flag was incorrectly set in relation "foo" page
    0 (OldestXmin 803)
    VACUUM
    bardb=#


    What there are no other transactions active in the same database,
    GetOldestXmin() returns just latestCompletedXid. When you open a
    transaction in the same database after that, its xid will be above
    latestCompletedXid, but its xmin includes transactions from all
    databases, and there might be a transaction in some other database with
    an xid that precedes the value that GetOldestXmin() returned earlier.

    I'm not sure what to do about that. One idea is track two xmin values in
    proc-array, one that includes transactions in all databases, and another
    that only includes transactions in the same database. GetOldestXmin()
    (when allDbs is false) would only pay attention to the latter. It would
    add a few instructions to GetSnapshotData(), though.

    Another idea is to give up on the warning when it appears that
    oldestxmin has moved backwards, and assume that it's actually fine. We
    could still warn in other cases where the flag appears to be incorrectly
    set, like if there is a deleted tuple on the page.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Heikki Linnakangas at Mar 8, 2011 at 8:37 am

    On 08.03.2011 10:00, Heikki Linnakangas wrote:
    Another idea is to give up on the warning when it appears that
    oldestxmin has moved backwards, and assume that it's actually fine. We
    could still warn in other cases where the flag appears to be incorrectly
    set, like if there is a deleted tuple on the page.
    This is probably a better idea at least in back-branches. It also
    handles the case of twiddling vacuum_defer_cleanup_age, which tracking
    two xmins per transactions would not handle.

    Here's a patch. I also changed the warning per Robert's suggestion.
    Anyone see a hole in this?

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Daveg at Mar 8, 2011 at 8:50 am

    On Tue, Mar 08, 2011 at 10:37:24AM +0200, Heikki Linnakangas wrote:
    On 08.03.2011 10:00, Heikki Linnakangas wrote:
    Another idea is to give up on the warning when it appears that
    oldestxmin has moved backwards, and assume that it's actually fine. We
    could still warn in other cases where the flag appears to be incorrectly
    set, like if there is a deleted tuple on the page.
    This is probably a better idea at least in back-branches. It also
    handles the case of twiddling vacuum_defer_cleanup_age, which tracking
    two xmins per transactions would not handle.

    Here's a patch. I also changed the warning per Robert's suggestion.
    Anyone see a hole in this?
    It would be helpful to have the dbname and schema in the message in addition
    to the relname. I added those to the original diagnostic patch as it was not
    clear that the messages were all related to the same page/table/dg.

    Also, in your comment you might mention that multiple databases are one way
    we could see oldestxmin move backwards.

    -dg


    --
    David Gould daveg@sonic.net 510 536 1443 510 282 0869
    If simplicity worked, the world would be overrun with insects.
  • Heikki Linnakangas at Mar 8, 2011 at 7:00 pm

    On 08.03.2011 10:49, daveg wrote:
    On Tue, Mar 08, 2011 at 10:37:24AM +0200, Heikki Linnakangas wrote:
    On 08.03.2011 10:00, Heikki Linnakangas wrote:
    Another idea is to give up on the warning when it appears that
    oldestxmin has moved backwards, and assume that it's actually fine. We
    could still warn in other cases where the flag appears to be incorrectly
    set, like if there is a deleted tuple on the page.
    This is probably a better idea at least in back-branches. It also
    handles the case of twiddling vacuum_defer_cleanup_age, which tracking
    two xmins per transactions would not handle.

    Here's a patch. I also changed the warning per Robert's suggestion.
    Anyone see a hole in this?
    It would be helpful to have the dbname and schema in the message in addition
    to the relname. I added those to the original diagnostic patch as it was not
    clear that the messages were all related to the same page/table/dg.
    Hmm, we don't usually include database name and schema in messages like
    this. There's a couple of other warnings in vacuum, too, that only print
    the table name. I have to admit that the database name was crucial in
    tracking this down, but in hindsight you could've added database name to
    log_line_prefix for the same effect. If you have several databases with
    same schema, that's a good idea anyway. So in the end, I decided not to
    include it.
    Also, in your comment you might mention that multiple databases are one way
    we could see oldestxmin move backwards.
    Ok, I added a comment to GetOldestXmin() explaining that.

    Committed. Thanks David for your help in debugging this! And thanks to
    everyone else for helping out too.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Daveg at Mar 8, 2011 at 8:38 am

    On Tue, Mar 08, 2011 at 10:00:01AM +0200, Heikki Linnakangas wrote:
    On 08.03.2011 04:07, Greg Stark wrote:
    Well from that log you definitely have OldestXmin going backwards. And
    not by a little bit either. at 6:33 it set the all_visible flag and
    then at 7:01 it was almost 1.3 million transactions earlier. In fact
    to precisely the same value that was in use for a transaction at 1:38.
    That seems like a bit of a coincidence though it's not repeated
    earlier.
    Yep. After staring at GetOldestXmin() again, it finally struck me how
    OldestXmin can move backwards. You need two databases for it, which
    probably explains why this has been so elusive. ...
    What there are no other transactions active in the same database,
    GetOldestXmin() returns just latestCompletedXid. When you open a
    transaction in the same database after that, its xid will be above
    latestCompletedXid, but its xmin includes transactions from all
    databases, and there might be a transaction in some other database with
    an xid that precedes the value that GetOldestXmin() returned earlier.

    I'm not sure what to do about that. One idea is track two xmin values in
    proc-array, one that includes transactions in all databases, and another
    that only includes transactions in the same database. GetOldestXmin()
    (when allDbs is false) would only pay attention to the latter. It would
    add a few instructions to GetSnapshotData(), though.

    Another idea is to give up on the warning when it appears that
    oldestxmin has moved backwards, and assume that it's actually fine. We
    could still warn in other cases where the flag appears to be incorrectly
    set, like if there is a deleted tuple on the page.
    I read this to mean that it is safe to ignore this warning and that these
    databases are not at risk for data corruption or wrong results so long as
    the warning is due to oldestxmin. Please correct me if I have misunderstood.

    Thanks

    -dg

    --
    David Gould daveg@sonic.net 510 536 1443 510 282 0869
    If simplicity worked, the world would be overrun with insects.
  • Heikki Linnakangas at Mar 8, 2011 at 8:55 am

    On 08.03.2011 10:38, daveg wrote:
    I read this to mean that it is safe to ignore this warning and that these
    databases are not at risk for data corruption or wrong results so long as
    the warning is due to oldestxmin. Please correct me if I have misunderstood.
    Yes, that's correct.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Maxim Boguk at Mar 1, 2011 at 9:45 pm
    Hi
    On Tue, Mar 1, 2011 at 11:00 AM, Heikki Linnakangas wrote:
    On 28.02.2011 23:28, daveg wrote:
    On Wed, Jan 12, 2011 at 10:46:14AM +0200, Heikki Linnakangas wrote:

    We'll likely need to go back and forth a few times with various
    debugging patches until we get to the heart of this..
    Anything new on this? I'm seeing at on one of my clients production boxes.
    I haven't heard anything from the OP since.


    Also, what is the significance, ie what is the risk or damage potential if
    this flag is set incorrectly?
    Sequential scans will honor the flag, so you might see some dead rows
    incorrectly returned by a sequential scan. That's the only "damage", but an
    incorrectly set flag could be a sign of something more sinister, like
    corrupt tuple headers. The flag should never be set incorrectly, so if you
    see that message you have hit a bug in PostgreSQL, or you have bad hardware.

    This flag is quite new, so a bug in PostgreSQL is quite possible. If you
    still have a backup that contains those incorrectly set flags, I'd like to
    see what the page looks like.


    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com

    Sorry, I was a bit busy lately.
    This effect on my production servers seems have floating nature (some weeks
    it happen each day many time and some weeks - not at all... phase of the
    moon may be). Last 2 weeks I don't see that error in database logs so I
    can't add anything more.

    What make situation even harder to debug it is I can't compile and install
    patched version of database engine (because it is production server and I
    physically don't have access to changing software there). So I trying to
    reproduce error on single connect (so I can attach GDB and look inside) but
    without luck during last week.

    --
    Maxim Boguk
    Senior Postgresql DBA.

    Skype: maxim.boguk
    Jabber: maxim.boguk@gmail.com

    LinkedIn profile: http://nz.linkedin.com/in/maximboguk
    If they can send one man to the moon... why can't they send them all?

    МойКруг: http://mboguk.moikrug.ru/
    Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не
    все.
  • Daveg at Mar 1, 2011 at 10:00 pm

    On Mon, Feb 28, 2011 at 07:43:39PM -0600, David Christensen wrote:
    On Feb 28, 2011, at 3:28 PM, daveg wrote:

    Anything new on this? I'm seeing at on one of my clients production boxes.
    Also, what is the significance, ie what is the risk or damage potential if
    this flag is set incorrectly?

    Was this cluster upgraded to 8.4.4 from 8.4.0? It sounds to me like a known bug in 8.4.0 which was fixed by this commit:

    commit 7fc7a7c4d082bfbd579f49e92b046dd51f1faf5f
    Author: Tom Lane <tgl@sss.pgh.pa.us>
    Date: Mon Aug 24 02:18:32 2009 +0000

    Fix a violation of WAL coding rules in the recent patch to include an
    "all tuples visible" flag in heap page headers. The flag update *must*
    be applied before calling XLogInsert, but heap_update and the tuple
    moving routines in VACUUM FULL were ignoring this rule. A crash and
    replay could therefore leave the flag incorrectly set, causing rows
    to appear visible in seqscans when they should not be. This might explain
    recent reports of data corruption from Jeff Ross and others.

    In passing, do a bit of editorialization on comments in visibilitymap.c.

    oy:postgresql machack$ git describe --tag 7fc7a7c4d082bfbd579f49e92b046dd51f1faf5f
    REL8_4_0-190-g7fc7a7c

    If the flag got twiddled while running as 8.4.0, the incorrect PD_ALL_VISIBLE flag would (obviously) not be fixed by the upgrade to 8.4.4. (Is this a separate issue?)
    This cluster was installed with 8.4.4. So it is still an existing problem.
    Also, to my recollection, this cluster has never crashed.

    -dg


    --
    David Gould daveg@sonic.net 510 536 1443 510 282 0869
    If simplicity worked, the world would be overrun with insects.
  • Daveg at Mar 1, 2011 at 10:00 pm

    On Tue, Mar 01, 2011 at 12:00:54AM +0200, Heikki Linnakangas wrote:
    On 28.02.2011 23:28, daveg wrote:
    On Wed, Jan 12, 2011 at 10:46:14AM +0200, Heikki Linnakangas wrote:
    We'll likely need to go back and forth a few times with various
    debugging patches until we get to the heart of this..
    Anything new on this? I'm seeing at on one of my clients production boxes.
    I haven't heard anything from the OP since.
    Also, what is the significance, ie what is the risk or damage potential if
    this flag is set incorrectly?
    Sequential scans will honor the flag, so you might see some dead rows
    incorrectly returned by a sequential scan. That's the only "damage", but
    an incorrectly set flag could be a sign of something more sinister, like
    corrupt tuple headers. The flag should never be set incorrectly, so if
    you see that message you have hit a bug in PostgreSQL, or you have bad
    hardware.

    This flag is quite new, so a bug in PostgreSQL is quite possible. If you
    still have a backup that contains those incorrectly set flags, I'd like
    to see what the page looks like.

    I ran vacuums on all the affected tables last night. I plan to take a downtime
    to clear the buffer cache and then to run vacuums on all the dbs in the
    cluster.

    Most but not all the tables involved are catalogs.

    However, I could probably pick up your old patch sometime next week if it
    recurrs and send you page images.

    -dg

    --
    David Gould daveg@sonic.net 510 536 1443 510 282 0869
    If simplicity worked, the world would be overrun with insects.
  • Daveg at Mar 2, 2011 at 9:30 pm

    On Tue, Mar 01, 2011 at 01:20:43PM -0800, daveg wrote:
    On Tue, Mar 01, 2011 at 12:00:54AM +0200, Heikki Linnakangas wrote:
    On 28.02.2011 23:28, daveg wrote:
    On Wed, Jan 12, 2011 at 10:46:14AM +0200, Heikki Linnakangas wrote:
    We'll likely need to go back and forth a few times with various
    debugging patches until we get to the heart of this..
    Anything new on this? I'm seeing at on one of my clients production boxes.
    I haven't heard anything from the OP since.
    Also, what is the significance, ie what is the risk or damage potential if
    this flag is set incorrectly?
    Sequential scans will honor the flag, so you might see some dead rows
    incorrectly returned by a sequential scan. That's the only "damage", but
    an incorrectly set flag could be a sign of something more sinister, like
    corrupt tuple headers. The flag should never be set incorrectly, so if
    you see that message you have hit a bug in PostgreSQL, or you have bad
    hardware.

    This flag is quite new, so a bug in PostgreSQL is quite possible. If you
    still have a backup that contains those incorrectly set flags, I'd like
    to see what the page looks like.

    I ran vacuums on all the affected tables last night. I plan to take a downtime
    to clear the buffer cache and then to run vacuums on all the dbs in the
    cluster.

    Most but not all the tables involved are catalogs.

    However, I could probably pick up your old patch sometime next week if it
    recurrs and send you page images.
    After a restart and vacuum of all dbs with no other activity things were
    quiet for a couple hours and then we started seeing these PD_ALL_VISIBLE
    messages again.

    Going back through the logs we have been getting these since at least before
    mid January. Oddly, this only happens on four systems which are all new Dell
    32 core Nehalem 512GB machines using iscsi partitions served off a Netapp.
    Our older 8 core 64GB hosts have never logged any of these errors. I'm not
    saying it is related to the hw, as these hosts are doing a lot more work than
    the old hosts so it may be a concurrency problem that just never came up at
    lower levels before.

    Postgresql version is 8.4.4.

    I'll pick up Heikkis page logging patch and run it for a bit to get some
    damaged page images. What else could I be doing to track this down?

    -dg

    --
    David Gould daveg@sonic.net 510 536 1443 510 282 0869
    If simplicity worked, the world would be overrun with insects.
  • Alvaro Herrera at Mar 2, 2011 at 9:45 pm

    Excerpts from daveg's message of mié mar 02 18:30:34 -0300 2011:

    After a restart and vacuum of all dbs with no other activity things were
    quiet for a couple hours and then we started seeing these PD_ALL_VISIBLE
    messages again.

    Going back through the logs we have been getting these since at least before
    mid January. Oddly, this only happens on four systems which are all new Dell
    32 core Nehalem 512GB machines using iscsi partitions served off a Netapp.
    Our older 8 core 64GB hosts have never logged any of these errors. I'm not
    saying it is related to the hw, as these hosts are doing a lot more work than
    the old hosts so it may be a concurrency problem that just never came up at
    lower levels before.

    Postgresql version is 8.4.4.
    I don't see how this could be related, but since you're running on NFS,
    maybe it is, somehow:
    http://archives.postgresql.org/message-id/4D40DDB7.1010000@credativ.com
    (for example what if the visibility map fork's last page is overwritten?)

    --
    Álvaro Herrera <alvherre@commandprompt.com>
    The PostgreSQL Company - Command Prompt, Inc.
    PostgreSQL Replication, Consulting, Custom Development, 24x7 support
  • Daveg at Mar 2, 2011 at 11:54 pm

    On Wed, Mar 02, 2011 at 06:45:13PM -0300, Alvaro Herrera wrote:
    Excerpts from daveg's message of mié mar 02 18:30:34 -0300 2011:
    After a restart and vacuum of all dbs with no other activity things were
    quiet for a couple hours and then we started seeing these PD_ALL_VISIBLE
    messages again.

    Going back through the logs we have been getting these since at least before
    mid January. Oddly, this only happens on four systems which are all new Dell
    32 core Nehalem 512GB machines using iscsi partitions served off a Netapp.
    Our older 8 core 64GB hosts have never logged any of these errors. I'm not
    saying it is related to the hw, as these hosts are doing a lot more work than
    the old hosts so it may be a concurrency problem that just never came up at
    lower levels before.

    Postgresql version is 8.4.4.
    I don't see how this could be related, but since you're running on NFS,
    maybe it is, somehow:
    http://archives.postgresql.org/message-id/4D40DDB7.1010000@credativ.com
    (for example what if the visibility map fork's last page is overwritten?)
    Running on ISCSI, not nfs. But it is still a Netapp, so who knows. I'll look.
    Also, we are not seeing any of the "unexpected data beyond EOF" errors,
    just thousands per day of the PD_ALL_VISIBLE error.

    -dg


    --
    David Gould daveg@sonic.net 510 536 1443 510 282 0869
    If simplicity worked, the world would be overrun with insects.
  • Bricklen at Mar 3, 2011 at 12:20 am

    On Wed, Mar 2, 2011 at 3:53 PM, daveg wrote:
    Postgresql version is 8.4.4.
    I don't see how this could be related, but since you're running on NFS,
    maybe it is, somehow:
    http://archives.postgresql.org/message-id/4D40DDB7.1010000@credativ.com
    (for example what if the visibility map fork's last page is overwritten?)
    Running on ISCSI, not nfs. But it is still a Netapp, so who knows. I'll look.
    Also, we are not seeing any of the "unexpected data beyond EOF" errors,
    just thousands per day of the PD_ALL_VISIBLE error.

    -dg
    FWIW, we had a couple occurrences of that message about a month ago on 9.0.2

    http://archives.postgresql.org/pgsql-general/2011-01/msg00887.php

    Haven't seen it since we ran a cluster-wide vacuum.
  • Daveg at Mar 3, 2011 at 12:30 am

    On Wed, Mar 02, 2011 at 04:20:24PM -0800, bricklen wrote:
    On Wed, Mar 2, 2011 at 3:53 PM, daveg wrote:
    Postgresql version is 8.4.4.
    I don't see how this could be related, but since you're running on NFS,
    maybe it is, somehow:
    http://archives.postgresql.org/message-id/4D40DDB7.1010000@credativ.com
    (for example what if the visibility map fork's last page is overwritten?)
    Running on ISCSI, not nfs. But it is still a Netapp, so who knows. I'll look.
    Also, we are not seeing any of the "unexpected data beyond EOF" errors,
    just thousands per day of the PD_ALL_VISIBLE error.

    -dg
    FWIW, we had a couple occurrences of that message about a month ago on 9.0.2

    http://archives.postgresql.org/pgsql-general/2011-01/msg00887.php

    Haven't seen it since we ran a cluster-wide vacuum.
    We did a shutdown and restart to clear the buffer cache (but did not reboot
    the host) and a vacuum on all dbs in the cluster last night. That cleared it
    up for a couple hours, but we are still getting lots of these messages.

    Most of them are pg_statistic and we create and drop hundreds of thousands of
    temp tables daily, so there is a good chance there is a concurrancy issue.

    -dg


    --
    David Gould daveg@sonic.net 510 536 1443 510 282 0869
    If simplicity worked, the world would be overrun with insects.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedMar 1, '11 at 1:43a
activeMar 8, '11 at 7:00p
posts27
users10
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase