FAQ

Jesse Denardo escribió:

$ 9.2_dev/bin/pg_controldata data
Latest checkpoint's NextMultiXactId: 2982
Latest checkpoint's NextMultiOffset: 6479
So what's happening here is that the MultiXact 2982 lives in a SLRU page
that doesn't exist. pg_upgrade didn't copy the pg_multixact files from
the old cluster, because they are not compatible; instead it just sets
the values in pg_control. As soon as a new multixact is to be created,
things fail because the code is not prepared to deal with the
possibility that the underlying SLRU files have not been extended during
normal operation.

I see two ways to deal with this:

1. On each multixact creation, verify whether the pages we're trying to
modify do in fact exist. If they don't, create them.

2. At startup, verify the "next" multixact values, and extend the files
if necessary.

I think (1) is not a very good idea because it will cause too large an
impact at runtime, when it is not really necessary. I lean more towards
(2). On IM, Bruce suggested instead:

2a. Same as (2), but only do it in pg_upgrade's usage of postgres'
binary-upgrade mode (postgres -b). Thus this will be done once during
the upgrade process and not every time the system starts up.


As it turns out, I have a patched slru.c that adds a new function to
verify whether a page exists on disk. I created this for the commit
timestamp module, for the BDR branch, but I think it's what we need
here.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Search Discussions

  • Alvaro Herrera at Aug 2, 2013 at 10:18 pm

    Alvaro Herrera escribió:

    As it turns out, I have a patched slru.c that adds a new function to
    verify whether a page exists on disk. I created this for the commit
    timestamp module, for the BDR branch, but I think it's what we need
    here.
    Here's a patch that should fix the problem. Jesse, if you're able to
    test it, please give it a run and let me know if it works for you. I
    was able to upgrade an installation containing a problem that should
    reproduce yours.

    --
    Álvaro Herrera http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
  • Andres Freund at Aug 3, 2013 at 1:14 am

    On 2013-08-02 18:17:43 -0400, Alvaro Herrera wrote:
    Alvaro Herrera escribió:
    As it turns out, I have a patched slru.c that adds a new function to
    verify whether a page exists on disk. I created this for the commit
    timestamp module, for the BDR branch, but I think it's what we need
    here.
    Here's a patch that should fix the problem. Jesse, if you're able to
    test it, please give it a run and let me know if it works for you. I
    was able to upgrade an installation containing a problem that should
    reproduce yours.
    Wouldn't it be easier to make pg_upgrade fudge pg_control to have a safe
    NextMultiXactId/Offset using pg_resetxlog?

    Greetings,

    Andres Freund

    --
      Andres Freund http://www.2ndQuadrant.com/
      PostgreSQL Development, 24x7 Support, Training & Services
  • Alvaro Herrera at Aug 3, 2013 at 2:25 am

    Andres Freund escribió:
    On 2013-08-02 18:17:43 -0400, Alvaro Herrera wrote:
    Alvaro Herrera escribió:
    As it turns out, I have a patched slru.c that adds a new function to
    verify whether a page exists on disk. I created this for the commit
    timestamp module, for the BDR branch, but I think it's what we need
    here.
    Here's a patch that should fix the problem. Jesse, if you're able to
    test it, please give it a run and let me know if it works for you. I
    was able to upgrade an installation containing a problem that should
    reproduce yours.
    Wouldn't it be easier to make pg_upgrade fudge pg_control to have a safe
    NextMultiXactId/Offset using pg_resetxlog?
    I don't understand. pg_upgrade already fudges pg_control to have a safe
    next multi, namely the same value used by the old cluster. The reason
    to preserve this value is that we must ensure no older value is
    consulted in pg_multixact: those might be present in tuples that were
    locked in the old cluster. (To be precise, this is the value to set as
    oldest multi, not next multi. But of course, the next multi must be
    greater than that one.)

    --
    Álvaro Herrera http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
  • Jesse Denardo at Aug 3, 2013 at 3:21 am
    Alvaro,

    I applied the patch and tried upgrading again, and everything seemed to
    work as expected. We are now up and running the beta!


    --
    Jesse Denardo

    On Fri, Aug 2, 2013 at 10:25 PM, Alvaro Herrera wrote:

    Andres Freund escribió:
    On 2013-08-02 18:17:43 -0400, Alvaro Herrera wrote:
    Alvaro Herrera escribió:
    As it turns out, I have a patched slru.c that adds a new function to
    verify whether a page exists on disk. I created this for the commit
    timestamp module, for the BDR branch, but I think it's what we need
    here.
    Here's a patch that should fix the problem. Jesse, if you're able to
    test it, please give it a run and let me know if it works for you. I
    was able to upgrade an installation containing a problem that should
    reproduce yours.
    Wouldn't it be easier to make pg_upgrade fudge pg_control to have a safe
    NextMultiXactId/Offset using pg_resetxlog?
    I don't understand. pg_upgrade already fudges pg_control to have a safe
    next multi, namely the same value used by the old cluster. The reason
    to preserve this value is that we must ensure no older value is
    consulted in pg_multixact: those might be present in tuples that were
    locked in the old cluster. (To be precise, this is the value to set as
    oldest multi, not next multi. But of course, the next multi must be
    greater than that one.)

    --
    Álvaro Herrera http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
  • Bruce Momjian at Aug 3, 2013 at 3:47 am

    On Fri, Aug 2, 2013 at 11:20:37PM -0400, Jesse Denardo wrote:
    Alvaro,

    I applied the patch and tried upgrading again, and everything seemed to work as
    expected. We are now up and running the beta!
    Yeah, great, thanks everyone!

    --
       Bruce Momjian <bruce@momjian.us> http://momjian.us
       EnterpriseDB http://enterprisedb.com

       + It's impossible for everything to be true. +
  • Alvaro Herrera at Aug 19, 2013 at 4:57 pm

    Jesse Denardo escribió:
    Alvaro,

    I applied the patch and tried upgrading again, and everything seemed to
    work as expected. We are now up and running the beta!
    Pushed, thanks.


    --
    Álvaro Herrera http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
  • Andres Freund at Aug 3, 2013 at 4:08 am

    On 2013-08-02 22:25:36 -0400, Alvaro Herrera wrote:
    Andres Freund escribió:
    On 2013-08-02 18:17:43 -0400, Alvaro Herrera wrote:
    Alvaro Herrera escribió:
    As it turns out, I have a patched slru.c that adds a new function to
    verify whether a page exists on disk. I created this for the commit
    timestamp module, for the BDR branch, but I think it's what we need
    here.
    Here's a patch that should fix the problem. Jesse, if you're able to
    test it, please give it a run and let me know if it works for you. I
    was able to upgrade an installation containing a problem that should
    reproduce yours.
    Wouldn't it be easier to make pg_upgrade fudge pg_control to have a safe
    NextMultiXactId/Offset using pg_resetxlog?
    I don't understand. pg_upgrade already fudges pg_control to have a safe
    next multi, namely the same value used by the old cluster. The reason
    to preserve this value is that we must ensure no older value is
    consulted in pg_multixact: those might be present in tuples that were
    locked in the old cluster. (To be precise, this is the value to set as
    oldest multi, not next multi. But of course, the next multi must be
    greater than that one.)
    I am suggesting to set them to a greater value than in the old cluster,
    computed so it's guaranteed that they are proper page boundaries. Then
    the situation described upthread shouldn't occur anymore, right?

    Greetings,

    Andres Freund

    --
      Andres Freund http://www.2ndQuadrant.com/
      PostgreSQL Development, 24x7 Support, Training & Services

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedJul 31, '13 at 8:55p
activeAug 19, '13 at 4:57p
posts8
users4
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase