Since Phase1 is functioning and should hopefully soon complete, we can
now start thinking about Phase 2: full recovery to a point-in-time.

Previous thinking was that a command line switch would be used to
specify recover to a given point in time, rather than the default, which
will be recover all the way to end of (available) xlogs.

Recovering to a specific point in time forces us to consider what the
granularity is of time.
We could recover:
1.to end of a full transaction log file
2.to end of a full transaction

Transaction log files currently have timestamps, so that is
straightforward, but probably not the best we can do. We would
rollforward until the xlog file time > desired point in time.

To make (2) work we would have to have a timestamp associated with each
transaction. This could be in one of two places:
1. the transaction record in the clog
2. the log record in the xlog
We would then recover the xlog record by record, until we found a record
that had a timestamp > desired point-in-time.

Currently, neither of these places have a timestamp. Hmmmm. We can't use
pg_control because we are assuming that it needs recovery...

I can't see any general way of adding a timestamp in any less than 2
bytes. We don't need a timezone. The timestamp could refer to a number
of seconds since last checkpoint; since this is limited already by a GUC
to force checkpoints every so often. Although code avoids a checkpoint
if no updates have taken place, we wouldn't be too remiss to use a
forced checkpoint every 32,000 seconds (9 hours).
Assuming that accuracy of the point-in-time was of the order of
seconds?? If we went to 0.1 second accuracy, we could checkpoint (force)
every 40 minutes or so. All of that seems too restrictive.
If we went to milliseconds, then we could use a 4 byte value and use a
checkpoint (force) every 284 hours or 1.5 weeks.
Thoughts?

Clog uses 2 bits per transaction, so even 2 bytes extra per transaction
will make the clog 9 times larger than originally intended. This could
well cause it to segment quicker, but I'm sure no one would be happy
with that. So, lets not add anything to the clog.

The alternative is to make the last part of the XlogHeader record a
timestamp value, increasing each xlog write. It might be possible to
make this part of the header optional depending upon whether or not PITR
was required, but then my preference is against such dynamic coding.

So, I propose:

- appending 8 byte date/time data into xlog file header record
- appending 4 bytes of time offset onto each xlog record
- altering the recovery logic to compare the calculated time of each
xlog record (file header + offset) against the desired point-in-time,
delivered to it by GUC.

Input is sought from anybody with detailed NTP knowledge, since the
working of NTP drift correction may have some subtle interplay with this
proposal.

Also, while that code is being altered, some additional log records need
to be added when recovery of each new xlog starts, with timing, to allow
DBAs watching a recovery to calculate expected completion times for the
recovery, which is essential for long recovery situations.

I am also considering any changes that may be required to prepare the
way for a future implementation of parallel redo recovery.

Best regards, Simon Riggs, 2ndQuadrant
http://www.2ndquadrant.com

Search Discussions

  • Bruce Momjian at Apr 26, 2004 at 9:05 pm

    Simon Riggs wrote:
    Transaction log files currently have timestamps, so that is
    straightforward, but probably not the best we can do. We would
    rollforward until the xlog file time > desired point in time.

    To make (2) work we would have to have a timestamp associated with each
    transaction. This could be in one of two places:
    1. the transaction record in the clog
    2. the log record in the xlog
    We would then recover the xlog record by record, until we found a record
    that had a timestamp > desired point-in-time.

    Currently, neither of these places have a timestamp. Hmmmm. We can't use
    pg_control because we are assuming that it needs recovery...

    I can't see any general way of adding a timestamp in any less than 2
    bytes. We don't need a timezone. The timestamp could refer to a number
    of seconds since last checkpoint; since this is limited already by a GUC
    to force checkpoints every so often. Although code avoids a checkpoint
    if no updates have taken place, we wouldn't be too remiss to use a
    forced checkpoint every 32,000 seconds (9 hours).
    Assuming that accuracy of the point-in-time was of the order of
    seconds?? If we went to 0.1 second accuracy, we could checkpoint (force)
    every 40 minutes or so. All of that seems too restrictive.
    If we went to milliseconds, then we could use a 4 byte value and use a
    checkpoint (force) every 284 hours or 1.5 weeks.
    Thoughts?
    I was thinking --- how would someone know the time to use for restore?
    Certainly they will not know subsecond accuracy? Probably second-level
    accuracty is enough, _except_ when they want everything restored up to a
    DROP TABLE transaction or some major problem. Is there a way to give
    users a list of transactions on a log backup? Can we show them the
    username, database, or commands or something? Would they be able to
    restore up to a specific transaction in that case?

    Basically, we could give them sub-second recovery, but what value would
    it be?

    --
    Bruce Momjian | http://candle.pha.pa.us
    pgman@candle.pha.pa.us | (610) 359-1001
    + If your life is a hard drive, | 13 Roberts Road
    + Christ can be your backup. | Newtown Square, Pennsylvania 19073
  • Alvaro Herrera at Apr 26, 2004 at 10:06 pm

    On Mon, Apr 26, 2004 at 05:05:41PM -0400, Bruce Momjian wrote:
    Simon Riggs wrote:
    Transaction log files currently have timestamps, so that is
    straightforward, but probably not the best we can do. We would
    rollforward until the xlog file time > desired point in time.
    I was thinking --- how would someone know the time to use for restore?
    I think there should be a way to get a TransactionId and restore up to
    that point. It'd be cool, but not required, if the system showed what
    valid TransactionIds there are, and roughly what they did (the xlog
    code already has "describers" everywhere AFAICS).

    --
    Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
    "El realista sabe lo que quiere; el idealista quiere lo que sabe" (Anónimo)
  • Simon Riggs at Apr 26, 2004 at 10:20 pm

    On Mon, 2004-04-26 at 23:01, Alvaro Herrera wrote:
    On Mon, Apr 26, 2004 at 05:05:41PM -0400, Bruce Momjian wrote:
    Simon Riggs wrote:
    Transaction log files currently have timestamps, so that is
    straightforward, but probably not the best we can do. We would
    rollforward until the xlog file time > desired point in time.
    I was thinking --- how would someone know the time to use for restore?
    I think there should be a way to get a TransactionId and restore up to
    that point. It'd be cool, but not required, if the system showed what
    valid TransactionIds there are, and roughly what they did (the xlog
    code already has "describers" everywhere AFAICS).
    You're right, I think we should start by implementing the rollforward to
    a txnid before we consider the rollforward to a specified point-in-time.
    All the hooks for that are already there...

    Best regards, Simon Riggs
  • Bruce Momjian at Apr 26, 2004 at 11:32 pm

    Simon Riggs wrote:
    On Mon, 2004-04-26 at 23:01, Alvaro Herrera wrote:
    On Mon, Apr 26, 2004 at 05:05:41PM -0400, Bruce Momjian wrote:
    Simon Riggs wrote:
    Transaction log files currently have timestamps, so that is
    straightforward, but probably not the best we can do. We would
    rollforward until the xlog file time > desired point in time.
    I was thinking --- how would someone know the time to use for restore?
    I think there should be a way to get a TransactionId and restore up to
    that point. It'd be cool, but not required, if the system showed what
    valid TransactionIds there are, and roughly what they did (the xlog
    code already has "describers" everywhere AFAICS).
    You're right, I think we should start by implementing the rollforward to
    a txnid before we consider the rollforward to a specified point-in-time.
    All the hooks for that are already there...
    Yep, sounds like a plan.
    --
    Bruce Momjian | http://candle.pha.pa.us
    pgman@candle.pha.pa.us | (610) 359-1001
    + If your life is a hard drive, | 13 Roberts Road
    + Christ can be your backup. | Newtown Square, Pennsylvania 19073
  • Richard Huxton at Apr 27, 2004 at 9:38 am

    On Tuesday 27 April 2004 00:32, Bruce Momjian wrote:
    Simon Riggs wrote:
    On Mon, 2004-04-26 at 23:01, Alvaro Herrera wrote:
    On Mon, Apr 26, 2004 at 05:05:41PM -0400, Bruce Momjian wrote:
    I was thinking --- how would someone know the time to use for
    restore?
    I think there should be a way to get a TransactionId and restore up to
    that point. It'd be cool, but not required, if the system showed what
    valid TransactionIds there are, and roughly what they did (the xlog
    code already has "describers" everywhere AFAICS).
    You're right, I think we should start by implementing the rollforward to
    a txnid before we consider the rollforward to a specified point-in-time.
    All the hooks for that are already there...
    Yep, sounds like a plan.
    Speaking as a DBA, what I usually want to do is restore to "immediately before
    I started the payroll calculation". An actual wall-clock time is mostly
    irrelevant to me.

    Suggestion: How about a pg_trans_note table (crap name, I know) that only
    permits inserts - records (backend-pid, timestamp, notes). My app inserts
    "starting payroll calc" and "ending payroll calc" entries because those are
    the points I might wish to sync to. If I want to sync for each individual
    transaction in my calculations, my app can do that too.
    From a usability point of view you might want to automatically insert rows on
    client connection/table creation etc. You could also delete any rows more
    than a week old when archiving WAL files.

    --
    Richard Huxton
    Archonet Ltd
  • Bruno Wolff III at Apr 27, 2004 at 5:39 pm

    On Tue, Apr 27, 2004 at 10:38:45 +0100, Richard Huxton wrote:

    Speaking as a DBA, what I usually want to do is restore to "immediately before
    I started the payroll calculation". An actual wall-clock time is mostly
    irrelevant to me.
    For long running transactions where you want to recover as much as possible,
    one might also want to recover up until just before a specific transaction
    committed (as opposed to started).
  • Simon Riggs at Apr 27, 2004 at 8:23 pm

    On Tue, 2004-04-27 at 18:43, Bruno Wolff III wrote:
    On Tue, Apr 27, 2004 at 10:38:45 +0100,
    Richard Huxton wrote:
    Speaking as a DBA, what I usually want to do is restore to "immediately before
    I started the payroll calculation". An actual wall-clock time is mostly
    irrelevant to me.
    For long running transactions where you want to recover as much as possible,
    one might also want to recover up until just before a specific transaction
    committed (as opposed to started).
    Sounds like the difference between > and >=, so should be possible...

    Best Regards, Simon Riggs
  • Peter Galbavy at Apr 27, 2004 at 10:03 pm

    Bruno Wolff III wrote:
    For long running transactions where you want to recover as much as possible,
    one might also want to recover up until just before a specific transaction
    committed (as opposed to started).
    If your DB has died and you are recovering it, how do you reestablish a
    session so that a transaction can complete ? Doesn't all client
    connections assume that a transaction has failed if the connection to
    the DB fails ?

    Or am I just being naive, as usual ?

    Peter
  • Simon Riggs at Apr 27, 2004 at 10:38 pm

    On Fri, 2004-05-28 at 00:02, Peter Galbavy wrote:
    Bruno Wolff III wrote:
    For long running transactions where you want to recover as much as possible,
    one might also want to recover up until just before a specific transaction
    committed (as opposed to started).
    If your DB has died and you are recovering it, how do you reestablish a
    session so that a transaction can complete ? Doesn't all client
    connections assume that a transaction has failed if the connection to
    the DB fails ?
    Reasonable question...

    You re-establish connection, but cannot resume the failed transaction.

    PostgreSQL already has crash recovery...this is for restore from backup
    scenarios.

    Best Regards, Simon Riggs
  • Bruno Wolff III at Apr 28, 2004 at 6:26 am

    On Thu, May 27, 2004 at 23:02:42 +0000, Peter Galbavy wrote:
    Bruno Wolff III wrote:
    For long running transactions where you want to recover as much as
    possible,
    one might also want to recover up until just before a specific transaction
    committed (as opposed to started).
    If your DB has died and you are recovering it, how do you reestablish a
    session so that a transaction can complete ? Doesn't all client
    connections assume that a transaction has failed if the connection to
    the DB fails ?
    The context of my suggestion was for recovering up until a transaction which
    messed things up was committed. I did not want the problem transaction to
    be committed. If the problem transaction ran for a long time, there might
    be other transactions that I want to keep, if possible, that committed
    after the problem transaction started and before it ended.
  • Peter Galbavy at Apr 28, 2004 at 8:28 am

    Bruno Wolff III wrote:
    The context of my suggestion was for recovering up until a transaction which
    messed things up was committed. I did not want the problem transaction to
    be committed. If the problem transaction ran for a long time, there might
    be other transactions that I want to keep, if possible, that committed
    after the problem transaction started and before it ended.
    Ah! followed by Eek! Now I see the light. It's very bright and painful.

    What I can see is that expressing this accurately and unambiguously is
    going to be _difficult_. How do you know accurately the point just
    before a transaction was completed. There must be a good subset of
    candidates that can be labelled.

    Is there anyway to label/name a transaction that can be kept somewhere ?
    Like "begin transaction 'bigtrasacation26';" - is there any allowance in
    the SQL standards for naming trasactions ?

    PS I have fixed my system clock - apologies to my earlier reply being a
    month ahead.

    rgds,
    --
    Peter
  • Jihuang at May 3, 2004 at 10:24 am
    I put 360000+ rows in a table , and now any select , update , analyze
    ... command fail.
    the log shows "ERROR: heapgettup: failed ReadBuffer",
    but any INSERT sql command success.


    the table schema is

    row | type | modifiers
    ---------------+-----------------------------+----------
    test_id | integer | not null
    snapshot | timestamp without time zone |
    ip_client | inet |
    ip_server | inet |
    conn_time | integer |
    response_time | integer |
    response_head | character varying |
    Check constraints:
    "invalid_conn_time" CHECK (conn_time >= 0)
    "invalid_resp_time" CHECK (response_time >= 0)


    I didn't create any index,
    any one know why or suggestion to save the un-readable database?

    or anything I mis-configuration ??

    Thanks for your help.

    June-Yen
  • Tom Lane at May 3, 2004 at 2:39 pm

    jihuang writes:
    I put 360000+ rows in a table , and now any select , update , analyze
    ... command fail.
    the log shows "ERROR: heapgettup: failed ReadBuffer",
    What Postgres version is this? AFAICS that error has been impossible
    for quite some time ...

    regards, tom lane
  • Jihuang at May 5, 2004 at 7:46 am
    Thanks for your real-time respone!

    the problem was sloved after I upgrade the postgreSQL from 7.3.4 to 7.4.2.

    by the way, is there any bug-tracking website for postgreSQL ?
    I follow the [HOMEPAGE] -> [DEVELOPERS] -> find nothing relative to
    bugzilla-like items,
    follow the [GBROG] ->>> it's PostgreSQL related projects , but without
    PostgreSQL itself ?

    let me show a advertisement... quote from ORELLY's Developer Weblogs
    RT foundry is being developed in Taiwan as part of the Open Foundry
    Project, which is aimed at encouraging for
    FS/OSS development in Taiwan. The foundry is a SF-like, expect using
    better technologies
    (RT for bug/request tracking, subversion for source control, etc ...
    the following link is the issue and comments log for sloving this
    problem I said.
    http://rt.openfoundry.org/Foundry/Project/Tracker/Display.html?Queue=90&id=2653

    there are some chinese characters mixed, but I just wanna to show that
    host a dedicate issue/bug tracking system
    may improve a software project evloution.

    June-Yen

    Tom Lane wrote:
    jihuang <jihuang@iis.sinica.edu.tw> writes:

    I put 360000+ rows in a table , and now any select , update , analyze
    ... command fail.
    the log shows "ERROR: heapgettup: failed ReadBuffer",
    What Postgres version is this? AFAICS that error has been impossible
    for quite some time ...

    regards, tom lane
  • Simon Riggs at Apr 27, 2004 at 8:05 pm

    On Tue, 2004-04-27 at 10:38, Richard Huxton wrote:
    On Tuesday 27 April 2004 00:32, Bruce Momjian wrote:
    Simon Riggs wrote:
    On Mon, 2004-04-26 at 23:01, Alvaro Herrera wrote:
    On Mon, Apr 26, 2004 at 05:05:41PM -0400, Bruce Momjian wrote:
    I was thinking --- how would someone know the time to use for
    restore?
    I think there should be a way to get a TransactionId and restore up to
    that point. It'd be cool, but not required, if the system showed what
    valid TransactionIds there are, and roughly what they did (the xlog
    code already has "describers" everywhere AFAICS).
    You're right, I think we should start by implementing the rollforward to
    a txnid before we consider the rollforward to a specified point-in-time.
    All the hooks for that are already there...
    Yep, sounds like a plan.
    Speaking as a DBA, what I usually want to do is restore to "immediately before
    I started the payroll calculation". An actual wall-clock time is mostly
    irrelevant to me.

    Suggestion: How about a pg_trans_note table (crap name, I know) that only
    permits inserts - records (backend-pid, timestamp, notes). My app inserts
    "starting payroll calc" and "ending payroll calc" entries because those are
    the points I might wish to sync to. If I want to sync for each individual
    transaction in my calculations, my app can do that too.
    From a usability point of view you might want to automatically insert rows on
    client connection/table creation etc. You could also delete any rows more
    than a week old when archiving WAL files.
    Not sure I like you're implementation, but the overall idea is great.

    I'd suggest extending the CHECKPOINT command so you can say:
    CHECKPOINT <text message>
    e.g. CHECKPOINT 'starting payroll Feb04';
    (I'm sure some other DBMS does this...head spinning can;t recall...)
    the text could just appear in the xlog record data packet...

    That could then be used as the target recovery point.

    Best Regards, Simon Riggs
  • Peter Eisentraut at Apr 29, 2004 at 3:09 pm

    Simon Riggs wrote:
    I'd suggest extending the CHECKPOINT command so you can say:
    CHECKPOINT <text message>
    e.g. CHECKPOINT 'starting payroll Feb04';
    (I'm sure some other DBMS does this...head spinning can;t recall...)
    the text could just appear in the xlog record data packet...
    I believe you are thinking of a savepoint rather than a checkpoint.
  • Alvaro Herrera at Apr 29, 2004 at 6:10 pm

    On Thu, Apr 29, 2004 at 05:09:19PM +0200, Peter Eisentraut wrote:
    Simon Riggs wrote:
    I'd suggest extending the CHECKPOINT command so you can say:
    CHECKPOINT <text message>
    e.g. CHECKPOINT 'starting payroll Feb04';
    (I'm sure some other DBMS does this...head spinning can;t recall...)
    the text could just appear in the xlog record data packet...
    I believe you are thinking of a savepoint rather than a checkpoint.
    But a "savepoint" has a very precise meaning in the SQL standard, which
    relates to points in a transaction you can roll back to. I don't think
    you want to overload with this other meaning, which I see as putting a
    special mark in the XLog -- completely unrelated.

    ISTM Checkpoint 'something' would flush all xlogs, insert some kind of
    special xlog record with 'something' on it, and then archive everything
    (up to that point?).

    --
    Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
    "XML!" Exclaimed C++. "What are you doing here? You're not a programming
    language."
    "Tell that to the people who use me," said XML.
  • Peter Eisentraut at Apr 29, 2004 at 8:02 pm

    Alvaro Herrera wrote:
    But a "savepoint" has a very precise meaning in the SQL standard,
    which relates to points in a transaction you can roll back to. I
    don't think you want to overload with this other meaning, which I see
    as putting a special mark in the XLog -- completely unrelated.
    They are completely unrelated because you're considering them on two
    different levels. From the user interface level they are both a place
    to roll back to. Whether they are inside the same transaction or not
    is like the difference between a normal and a holdable cursor, but it's
    the same interface.
  • Simon Riggs at Apr 29, 2004 at 6:39 pm

    On Thu, 2004-04-29 at 16:09, Peter Eisentraut wrote:
    Simon Riggs wrote:
    I'd suggest extending the CHECKPOINT command so you can say:
    CHECKPOINT <text message>
    e.g. CHECKPOINT 'starting payroll Feb04';
    (I'm sure some other DBMS does this...head spinning can;t recall...)
    the text could just appear in the xlog record data packet...
    I believe you are thinking of a savepoint rather than a checkpoint.
    Perhaps that was the inspiration, but no, I definitely meant a
    CHECKPOINT.

    But now you come to mention it, it would be better just to have a
    command that simply wrote a named record to the xlog, so it can be
    searched for later...

    COMMENT [IN TRANSACTION LOG] 'starting payroll Feb04'

    Best Regards, Simon Riggs
  • Mike Mascari at Apr 29, 2004 at 7:16 pm

    Simon Riggs wrote:

    On Thu, 2004-04-29 at 16:09, Peter Eisentraut wrote:

    Perhaps that was the inspiration, but no, I definitely meant a
    CHECKPOINT.

    But now you come to mention it, it would be better just to have a
    command that simply wrote a named record to the xlog, so it can
    be searched for later...

    COMMENT [IN TRANSACTION LOG] 'starting payroll Feb04'
    FWFW,

    Oracle's COMMIT syntax has an optional COMMENT specifier which is
    used for documenting a distributed transaction. In-doubt
    transactions can then be manually committed or aborted by
    referencing the transaction associated with the comment. Example:

    COMMIT WORK COMMENT 'A complex distributed Tx';

    Perhaps there is some common ground between the 2PC implementation
    and PITR?

    Mike Mascari
  • Simon Riggs at Apr 26, 2004 at 10:16 pm

    On Mon, 2004-04-26 at 22:05, Bruce Momjian wrote:
    Simon Riggs wrote:
    Transaction log files currently have timestamps, so that is
    straightforward, but probably not the best we can do. We would
    rollforward until the xlog file time > desired point in time.

    To make (2) work we would have to have a timestamp associated with each
    transaction. This could be in one of two places:
    1. the transaction record in the clog
    2. the log record in the xlog
    We would then recover the xlog record by record, until we found a record
    that had a timestamp > desired point-in-time.

    Currently, neither of these places have a timestamp. Hmmmm. We can't use
    pg_control because we are assuming that it needs recovery...

    I can't see any general way of adding a timestamp in any less than 2
    bytes. We don't need a timezone. The timestamp could refer to a number
    of seconds since last checkpoint; since this is limited already by a GUC
    to force checkpoints every so often. Although code avoids a checkpoint
    if no updates have taken place, we wouldn't be too remiss to use a
    forced checkpoint every 32,000 seconds (9 hours).
    Assuming that accuracy of the point-in-time was of the order of
    seconds?? If we went to 0.1 second accuracy, we could checkpoint (force)
    every 40 minutes or so. All of that seems too restrictive.
    If we went to milliseconds, then we could use a 4 byte value and use a
    checkpoint (force) every 284 hours or 1.5 weeks.
    Thoughts?
    I was thinking --- how would someone know the time to use for restore?
    Certainly they will not know subsecond accuracy? Probably second-level
    accuracty is enough, _except_ when they want everything restored up to a
    DROP TABLE transaction or some major problem. Is there a way to give
    users a list of transactions on a log backup? Can we show them the
    username, database, or commands or something? Would they be able to
    restore up to a specific transaction in that case?

    Basically, we could give them sub-second recovery, but what value would
    it be?
    Yes, you remind me of a whole train of thought...

    There should be a switch to allow you to specify the txnid you wish to
    recover up until as well.

    You raise the point of how you know what time to recover to. That is in
    fact the very hardest part of recovery for a DBA. That's a good reason
    for being able to list xlog contents, as you can with Oracle. Sounds
    like we need an XlogMiner utility...
    Can we show them the username, database, or commands or something?
    Yes, that sounds fairly straightforward possible using a modification of
    the ReadRecord functions at the bottom of xlog.c - which is why security
    of the xlogs is important.

    It's also a good reason for being able to pause and restart recovery, so
    you can see what it's like before continuing further.

    Usually you are trying to sync up the contents of the database with all
    of the other things that were being updated too. Often these will define
    either the required transaction contents, or give a time to use.

    Whatever level of time accuracy you choose, we would always need to
    handle the case where multiple transactions have been committed with
    exactly the same time (after rounding) and yet we may wish to split
    them. Rolling forward to a txnid would help there.

    Best regards, Simon
  • Hans-Jürgen Schönig at Apr 27, 2004 at 7:52 am

    Simon Riggs wrote:
    Since Phase1 is functioning and should hopefully soon complete, we can
    now start thinking about Phase 2: full recovery to a point-in-time.

    Previous thinking was that a command line switch would be used to
    specify recover to a given point in time, rather than the default, which
    will be recover all the way to end of (available) xlogs.

    Recovering to a specific point in time forces us to consider what the
    granularity is of time.
    We could recover:
    1.to end of a full transaction log file
    2.to end of a full transaction

    Transaction log files currently have timestamps, so that is
    straightforward, but probably not the best we can do. We would
    rollforward until the xlog file time > desired point in time.

    To make (2) work we would have to have a timestamp associated with each
    transaction. This could be in one of two places:
    1. the transaction record in the clog
    2. the log record in the xlog
    We would then recover the xlog record by record, until we found a record
    that had a timestamp > desired point-in-time.

    Currently, neither of these places have a timestamp. Hmmmm. We can't use
    pg_control because we are assuming that it needs recovery...

    I can't see any general way of adding a timestamp in any less than 2
    bytes. We don't need a timezone. The timestamp could refer to a number
    of seconds since last checkpoint; since this is limited already by a GUC
    to force checkpoints every so often. Although code avoids a checkpoint
    if no updates have taken place, we wouldn't be too remiss to use a
    forced checkpoint every 32,000 seconds (9 hours).
    Assuming that accuracy of the point-in-time was of the order of
    seconds?? If we went to 0.1 second accuracy, we could checkpoint (force)
    every 40 minutes or so. All of that seems too restrictive.
    If we went to milliseconds, then we could use a 4 byte value and use a
    checkpoint (force) every 284 hours or 1.5 weeks.
    Thoughts?

    Clog uses 2 bits per transaction, so even 2 bytes extra per transaction
    will make the clog 9 times larger than originally intended. This could
    well cause it to segment quicker, but I'm sure no one would be happy
    with that. So, lets not add anything to the clog.

    The alternative is to make the last part of the XlogHeader record a
    timestamp value, increasing each xlog write. It might be possible to
    make this part of the header optional depending upon whether or not PITR
    was required, but then my preference is against such dynamic coding.

    So, I propose:

    - appending 8 byte date/time data into xlog file header record
    - appending 4 bytes of time offset onto each xlog record
    - altering the recovery logic to compare the calculated time of each
    xlog record (file header + offset) against the desired point-in-time,
    delivered to it by GUC.

    Input is sought from anybody with detailed NTP knowledge, since the
    working of NTP drift correction may have some subtle interplay with this
    proposal.

    Also, while that code is being altered, some additional log records need
    to be added when recovery of each new xlog starts, with timing, to allow
    DBAs watching a recovery to calculate expected completion times for the
    recovery, which is essential for long recovery situations.

    I am also considering any changes that may be required to prepare the
    way for a future implementation of parallel redo recovery.

    Best regards, Simon Riggs, 2ndQuadrant
    http://www.2ndquadrant.com




    ---------------------------(end of broadcast)---------------------------
    TIP 8: explain analyze is your friend

    Simon,

    I have one question which might be important: If we use timestamps
    inside the WAL system to find out where to stop. What happens if
    somebody changes the time of the system? (e.g. correcting the system
    clock by calling ntpdate). Wouldn't it confuse the PITR system? How do
    you plan to handle that? Unfortunately time is nothing which can be used
    as a key (at least not from my point of view).

    Just some lousy ideas early in the morning ...

    Regards,

    Hans


    --
    Cybertec Geschwinde u Schoenig
    Schoengrabern 134, A-2020 Hollabrunn, Austria
    Tel: +43/2952/30706 or +43/664/233 90 75
    www.cybertec.at, www.postgresql.at, kernel.cybertec.at
  • Simon Riggs at Apr 27, 2004 at 8:01 pm

    On Tue, 2004-04-27 at 08:56, Hans-Jürgen Schönig wrote:
    Simon Riggs wrote:
    Since Phase1 is functioning and should hopefully soon complete, we can
    now start thinking about Phase 2: full recovery to a point-in-time.

    Previous thinking was that a command line switch would be used to
    specify recover to a given point in time, rather than the default, which
    will be recover all the way to end of (available) xlogs.

    Recovering to a specific point in time forces us to consider what the
    granularity is of time.
    We could recover:
    1.to end of a full transaction log file
    2.to end of a full transaction

    Transaction log files currently have timestamps, so that is
    straightforward, but probably not the best we can do. We would
    rollforward until the xlog file time > desired point in time.

    To make (2) work we would have to have a timestamp associated with each
    transaction.
    ...
    So, I propose:

    - appending 8 byte date/time data into xlog file header record
    - appending 4 bytes of time offset onto each xlog record
    - altering the recovery logic to compare the calculated time of each
    xlog record (file header + offset) against the desired point-in-time,
    delivered to it by GUC.

    Input is sought from anybody with detailed NTP knowledge, since the
    working of NTP drift correction may have some subtle interplay with this
    proposal.
    I have one question which might be important: If we use timestamps
    inside the WAL system to find out where to stop. What happens if
    somebody changes the time of the system? (e.g. correcting the system
    clock by calling ntpdate). Wouldn't it confuse the PITR system? How do
    you plan to handle that? Unfortunately time is nothing which can be used
    as a key (at least not from my point of view).
    Interesting question.

    The timestamps written within WAL would be "just data" once written. The
    recovery process would refer only to that timestamp data, so would not
    refer to any external notion of time. This is required to ensure that
    the same recovery will happen identically, no matter how many times you
    choose to re-run it (or are forced to by external circumstances).

    If you change the time of the system backwards, this might invalidate
    the transaction log history....this would effectively create two (or
    more) xlog records with the same timestamp on them and it would be
    logically indeterminate which one should limit recovery. In practical
    terms, I would implement this as "apply all records <= PIT". This would
    mean that recovery would stop only when the time became larger than PIT,
    which would only occur at the second (or last) record that had a
    timestamp equal to PIT. I guess I could put a WARNING in to say "time
    just went backwards...spoohw"

    Overall, I'd refer back to the points Bruce raised - you certainly do
    need a way of finding out the time to recover to, and as others have
    said also, time isn't the only desirable "recovery point".

    Best regards, Simon Riggs
  • Rod Taylor at Apr 27, 2004 at 8:56 pm

    Overall, I'd refer back to the points Bruce raised - you certainly do
    need a way of finding out the time to recover to, and as others have
    said also, time isn't the only desirable "recovery point".
    Wouldn't it be sufficient to simply use the transaction ID and ensure
    that all the parameters the user might want to use to find that ID can
    be made available in the log files?
  • Simon Riggs at Apr 27, 2004 at 9:36 pm

    On Tue, 2004-04-27 at 21:56, Rod Taylor wrote:
    Overall, I'd refer back to the points Bruce raised - you certainly do
    need a way of finding out the time to recover to, and as others have
    said also, time isn't the only desirable "recovery point".
    Wouldn't it be sufficient to simply use the transaction ID and ensure
    that all the parameters the user might want to use to find that ID can
    be made available in the log files?
    Yes, of course, all methods of locating a particular xlog file to stop
    at are effectively equivalent. The discussion is mostly about what is
    convenient for the user in a real recovery situation.
    From all that has been said so far, I would implement:
    1. Recovery to a specific txnid, which is fairly straightforward
    2. Recovery to a specific date/time
    a) either by implementing a log inspection tool that shows the txnid for
    a PIT
    b) implementing recovery to a PIT directly
    3. Recovery to a named checkpoint

    Best Regards, Simon Riggs
  • Rod Taylor at Apr 27, 2004 at 10:11 pm

    On Tue, 2004-04-27 at 17:36, Simon Riggs wrote:
    On Tue, 2004-04-27 at 21:56, Rod Taylor wrote:
    Overall, I'd refer back to the points Bruce raised - you certainly do
    need a way of finding out the time to recover to, and as others have
    said also, time isn't the only desirable "recovery point".
    Wouldn't it be sufficient to simply use the transaction ID and ensure
    that all the parameters the user might want to use to find that ID can
    be made available in the log files?
    Yes, of course, all methods of locating a particular xlog file to stop
    at are effectively equivalent. The discussion is mostly about what is
    convenient for the user in a real recovery situation.
    I see.. The first thing I would need to do is look at /var/log/pgsql. At
    that point it really doesn't matter what the identifier is so long as
    the identifier is there.
  • Simon Riggs at Apr 27, 2004 at 10:35 pm

    On Tue, 2004-04-27 at 23:11, Rod Taylor wrote:
    On Tue, 2004-04-27 at 17:36, Simon Riggs wrote:
    On Tue, 2004-04-27 at 21:56, Rod Taylor wrote:
    Overall, I'd refer back to the points Bruce raised - you certainly do
    need a way of finding out the time to recover to, and as others have
    said also, time isn't the only desirable "recovery point".
    Wouldn't it be sufficient to simply use the transaction ID and ensure
    that all the parameters the user might want to use to find that ID can
    be made available in the log files?
    Yes, of course, all methods of locating a particular xlog file to stop
    at are effectively equivalent. The discussion is mostly about what is
    convenient for the user in a real recovery situation.
    I see.. The first thing I would need to do is look at /var/log/pgsql. At
    that point it really doesn't matter what the identifier is so long as
    the identifier is there.
    PITR works on the assumption that /var/log/pgsql no longer exists at
    all. It is suitable for use in bare-metal recovery situations, as well
    as usage-induced situations.

    You pick up the pieces, work out what the best identifier is, then plan
    on using that.... might not be a pgsql log, it might be:
    i) literally wallclock - "power went off about 2"
    ii) other systems logs
    iii) etc

    Best Regards, Simon Riggs
  • Bruce Momjian at Apr 28, 2004 at 4:00 am

    Simon Riggs wrote:
    On Tue, 2004-04-27 at 21:56, Rod Taylor wrote:
    Overall, I'd refer back to the points Bruce raised - you certainly do
    need a way of finding out the time to recover to, and as others have
    said also, time isn't the only desirable "recovery point".
    Wouldn't it be sufficient to simply use the transaction ID and ensure
    that all the parameters the user might want to use to find that ID can
    be made available in the log files?
    Yes, of course, all methods of locating a particular xlog file to stop
    at are effectively equivalent. The discussion is mostly about what is
    convenient for the user in a real recovery situation.
    From all that has been said so far, I would implement:
    1. Recovery to a specific txnid, which is fairly straightforward
    2. Recovery to a specific date/time
    a) either by implementing a log inspection tool that shows the txnid for
    a PIT
    b) implementing recovery to a PIT directly
    3. Recovery to a named checkpoint
    What if we added transaction id to log_line_prefix? The user could then
    log all queries and find the xid where they want to stop, but of course
    that assumes they have enabled such logging, and they have access to the
    logs.

    --
    Bruce Momjian | http://candle.pha.pa.us
    pgman@candle.pha.pa.us | (610) 359-1001
    + If your life is a hard drive, | 13 Roberts Road
    + Christ can be your backup. | Newtown Square, Pennsylvania 19073
  • Simon Riggs at Apr 28, 2004 at 4:47 pm

    On Wed, 2004-04-28 at 05:00, Bruce Momjian wrote:
    Simon Riggs wrote:
    On Tue, 2004-04-27 at 21:56, Rod Taylor wrote:
    Overall, I'd refer back to the points Bruce raised - you certainly do
    need a way of finding out the time to recover to, and as others have
    said also, time isn't the only desirable "recovery point".
    Wouldn't it be sufficient to simply use the transaction ID and ensure
    that all the parameters the user might want to use to find that ID can
    be made available in the log files?
    Yes, of course, all methods of locating a particular xlog file to stop
    at are effectively equivalent. The discussion is mostly about what is
    convenient for the user in a real recovery situation.
    From all that has been said so far, I would implement:
    1. Recovery to a specific txnid, which is fairly straightforward
    2. Recovery to a specific date/time
    a) either by implementing a log inspection tool that shows the txnid for
    a PIT
    b) implementing recovery to a PIT directly
    3. Recovery to a named checkpoint
    What if we added transaction id to log_line_prefix? The user could then
    log all queries and find the xid where they want to stop, but of course
    that assumes they have enabled such logging, and they have access to the
    logs.
    Good thinking.

    I'll have a look at this and come back to you.

    Best Regards, Simon Riggs
  • Andrew Dunstan at Apr 28, 2004 at 5:34 pm

    Simon Riggs wrote:
    On Wed, 2004-04-28 at 05:00, Bruce Momjian wrote:

    What if we added transaction id to log_line_prefix? The user could then
    log all queries and find the xid where they want to stop, but of course
    that assumes they have enabled such logging, and they have access to the
    logs.
    Good thinking.

    I'll have a look at this and come back to you.

    log_statement triggered logging happens very early in the process - if
    you are logging them all it happens before the statements are even
    parsed. Would we have an xid to log sensibly at that stage?

    Perhaps with log_min_duration_statment = 0 we would (i.e. log statements
    when finished rather than when started).

    cheers

    andrew
  • Simon Riggs at Apr 28, 2004 at 8:49 pm

    On Wed, 2004-04-28 at 18:35, Andrew Dunstan wrote:
    Simon Riggs wrote:
    On Wed, 2004-04-28 at 05:00, Bruce Momjian wrote:

    What if we added transaction id to log_line_prefix? The user could then
    log all queries and find the xid where they want to stop, but of course
    that assumes they have enabled such logging, and they have access to the
    logs.
    Good thinking.

    I'll have a look at this and come back to you.
    log_statement triggered logging happens very early in the process - if
    you are logging them all it happens before the statements are even
    parsed. Would we have an xid to log sensibly at that stage?

    Perhaps with log_min_duration_statment = 0 we would (i.e. log statements
    when finished rather than when started).
    Let's call this XLogSpy.

    For starters, we only need to look at write transactions. Many read-only
    transactions would not need to be examined, just to know they were read
    only.

    Remembering that we're using xlogs for recovery, we perhaps should not
    assume that we have anything other than that which has been archived.
    Currently, that is just the xlogs. So really we need to work off what is
    within them and right now that isn't much at all.

    We might optionally include the parsed statement data into the xlog,
    since this might be smaller than the exact text itself and would also
    allow us to filter the xlogs based upon any aspect of the lists.

    Not really happy with any of these ideas yet.

    Best Regards, Simon Riggs
  • Andrew Dunstan at Apr 29, 2004 at 2:07 am

    Simon Riggs said:
    On Wed, 2004-04-28 at 18:35, Andrew Dunstan wrote:
    Simon Riggs wrote:
    On Wed, 2004-04-28 at 05:00, Bruce Momjian wrote:

    What if we added transaction id to log_line_prefix? The user could
    then log all queries and find the xid where they want to stop, but
    of course that assumes they have enabled such logging, and they have
    access to the logs.
    Good thinking.

    I'll have a look at this and come back to you.
    log_statement triggered logging happens very early in the process - if
    you are logging them all it happens before the statements are even
    parsed. Would we have an xid to log sensibly at that stage?

    Perhaps with log_min_duration_statment = 0 we would (i.e. log
    statements when finished rather than when started).
    Let's call this XLogSpy.

    For starters, we only need to look at write transactions. Many
    read-only transactions would not need to be examined, just to know they
    were read only.

    Remembering that we're using xlogs for recovery, we perhaps should not
    assume that we have anything other than that which has been archived.
    Currently, that is just the xlogs. So really we need to work off what
    is within them and right now that isn't much at all.

    We might optionally include the parsed statement data into the xlog,
    since this might be smaller than the exact text itself and would also
    allow us to filter the xlogs based upon any aspect of the lists.

    Not really happy with any of these ideas yet.

    I don't see how this relates to the log_line_prefix idea, or are you
    abandoning that?

    cheers

    andrew
  • Jim C. Nasby at Apr 28, 2004 at 12:00 am
    Another idea would be to provide some means to roll a database forwards
    and backwards. If you're doing a recovery because you did something like
    an accidental UPDATE SET field = blah without a where clause, what you
    really care about is going up to the point right before that update. If
    there's a log viewer with enough detail, that would suffice; otherwise
    you'd need to test for some specific condition.

    Of course timestamps would still be useful in this scenario since they
    can get you close to the transaction in question.
    --
    Jim C. Nasby, Database Consultant jim@nasby.net
    Member: Triangle Fraternity, Sports Car Club of America
    Give your computer some brain candy! www.distributed.net Team #1828

    Windows: "Where do you want to go today?"
    Linux: "Where do you want to go tomorrow?"
    FreeBSD: "Are you guys coming, or what?"

Related Discussions

People

Translate

site design / logo © 2021 Grokbase