On 21.09.2012 14:18, Amit kapila wrote:
On Tuesday, September 18, 2012 6:02 PM Fujii Masao wrote:
On Mon, Sep 17, 2012 at 4:03 PM, Amit Kapilawrote:
Approach-2 :
Provide a variable wal_send_status_interval, such that if this is 0, then
the current behavior would prevail and if its non-zero then KeepAlive
message would be send maximum after that time.
The modified code of WALSendLoop will be as follows:
<snip>
Which way you think is better or you have any other idea to handle.
I think #2 is better because it's more intuitive to a user.
Please find a patch attached for implementation of Approach-2.
Hmm, I think we need to step back a bit. I've never liked the way
replication_timeout works, where it's the user's responsibility to set
wal_receiver_status_interval < replication_timeout. It's not very
user-friendly. I'd rather not copy that same design to this walreceiver
timeout. If there's two different timeouts like that, it's even worse,
because it's easy to confuse the two.

So let's think how this should ideally work from a user's point of view.
I think there should be just two settings: walsender_timeout and
walreceiver_timeout. walsender_timeout specifies how long a walsender
will keep a connection open if it doesn't hear from the walreceiver, and
walreceiver_timeout is the same for walreceiver. The system should
figure out itself how often to send keepalive messages so that those
timeouts are not reached.

In walsender, after half of walsender_timeout has elapsed and we haven't
received anything from the client, the walsender process should send a
"ping" message to the client. Whenever the client receives a Ping, it
replies. The walreceiver does the same; when half of walreceiver_timeout
has elapsed, send a Ping message to the server. Each Ping-Pong roundtrip
resets the timer in both ends, regardless of which side initiated it, so
if e.g walsender_timeout < walreceiver_timeout, the client will never
have to initiate a Ping message, because walsender will always reach the
walsender_timeout/2 point first and initiate the heartbeat message.

The Ping/Pong messages don't necessarily need to be new message types,
we can use the message types we currently have, perhaps with an
additional flag attached to them, to request the other side to reply
immediately.

- Heikki

Search Discussions

  • Robert Haas at Oct 1, 2012 at 3:06 pm

    On Mon, Oct 1, 2012 at 6:38 AM, Heikki Linnakangas wrote:
    Hmm, I think we need to step back a bit. I've never liked the way
    replication_timeout works, where it's the user's responsibility to set
    wal_receiver_status_interval < replication_timeout. It's not very
    user-friendly. I'd rather not copy that same design to this walreceiver
    timeout. If there's two different timeouts like that, it's even worse,
    because it's easy to confuse the two.
    I agree, but also note that wal_receiver_status_interval serves
    another user-visible purpose as well.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Amit kapila at Oct 2, 2012 at 7:46 am
    On Monday, October 01, 2012 8:36 PM Robert Haas wrote:
    On Mon, Oct 1, 2012 at 6:38 AM, Heikki Linnakangas
    wrote:
    Hmm, I think we need to step back a bit. I've never liked the way
    replication_timeout works, where it's the user's responsibility to set
    wal_receiver_status_interval < replication_timeout. It's not very
    user-friendly. I'd rather not copy that same design to this walreceiver
    timeout. If there's two different timeouts like that, it's even worse,
    because it's easy to confuse the two.
    I agree, but also note that wal_receiver_status_interval serves
    another user-visible purpose as well.
    By above do you mean to say that wal_receiver_status_interval is used for reply of data sent by server to indicate till what point receiver has flushed data or something else?

    With Regards,
    Amit Kapila.
  • Fujii Masao at Oct 1, 2012 at 4:57 pm

    On Mon, Oct 1, 2012 at 7:38 PM, Heikki Linnakangas wrote:
    Hmm, I think we need to step back a bit. I've never liked the way
    replication_timeout works, where it's the user's responsibility to set
    wal_receiver_status_interval < replication_timeout. It's not very
    user-friendly. I'd rather not copy that same design to this walreceiver
    timeout. If there's two different timeouts like that, it's even worse,
    because it's easy to confuse the two.
    Agreed.

    I'd like to specify the replication timeout like we do TCP keepalives, i.e.,
    what about introducing something like following parameters?

         walsender_keepalives_idle
         walsender_keepalives_interval
         walsender_keeaplives_count
         walreceiver_keepalives_idle
         walreceiver_keepalives_interval
         walreceiver_keepalives_count

    I believe many users are basically familiar with TCP keepalives and how to
    specify it. So I think that this approach would be intuitive to users. Also
    this approach includes your proposal. If you specify

         walsender_keepalives_idle = walsender_timeout / 2
         walsender_keepalives_interval = -1 (disable; Ping is never sent
    again if there is no reply after first Ping is sent)
         walsender_keepalives_count = 1

    the replication timeout works as you proposed. But of course the downside
    of this approach is that the number of parameter for replication timeout is
    increased from two (replication_timeout and
    wal_receiver_status_interval) to six,
    and those parameters are confusingly similar to existing
    tcp_keepalives parameters,
    which might cause another confusion to users. One idea to solve this problem is
    to use existing tcp_keepalives paramters values for the replication timeout.

    Regards,

    --
    Fujii Masao
  • Robert Haas at Oct 2, 2012 at 12:02 am

    On Mon, Oct 1, 2012 at 12:57 PM, Fujii Masao wrote:
    I believe many users are basically familiar with TCP keepalives and how to
    specify it. So I think that this approach would be intuitive to users.
    My experience is that many users are unfamiliar with TCP keepalives
    and that when given the options they tend to do it wrong. I think a
    simpler system would be better.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Alvaro Herrera at Oct 2, 2012 at 3:50 am

    Excerpts from Robert Haas's message of lun oct 01 21:02:54 -0300 2012:
    On Mon, Oct 1, 2012 at 12:57 PM, Fujii Masao wrote:
    I believe many users are basically familiar with TCP keepalives and how to
    specify it. So I think that this approach would be intuitive to users.
    My experience is that many users are unfamiliar with TCP keepalives
    and that when given the options they tend to do it wrong. I think a
    simpler system would be better.
    +1

    --
    Álvaro Herrera http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
  • Amit kapila at Oct 2, 2012 at 7:36 am

    On Monday, October 01, 2012 4:08 PM Heikki Linnakangas wrote: On 21.09.2012 14:18, Amit kapila wrote:
    On Tuesday, September 18, 2012 6:02 PM Fujii Masao wrote:
    On Mon, Sep 17, 2012 at 4:03 PM, Amit Kapilawrote:
    Approach-2 :
    Provide a variable wal_send_status_interval, such that if this is 0, then
    the current behavior would prevail and if its non-zero then KeepAlive
    message would be send maximum after that time.
    The modified code of WALSendLoop will be as follows:
    <snip>
    Which way you think is better or you have any other idea to handle.
    I think #2 is better because it's more intuitive to a user.
    Please find a patch attached for implementation of Approach-2.
    So let's think how this should ideally work from a user's point of view.
    I think there should be just two settings: walsender_timeout and
    walreceiver_timeout. walsender_timeout specifies how long a walsender
    will keep a connection open if it doesn't hear from the walreceiver, and
    walreceiver_timeout is the same for walreceiver. The system should
    figure out itself how often to send keepalive messages so that those
    timeouts are not reached.
    By this it implies that we should remove wal_receiver_status_interval. Currently it is also used
    incase of reply message of data sent by sender which contains till what point receiver has flushed. So if we remove this variable
    receiver might start sending that message sonner than required.
    Is that okay behavior?
    In walsender, after half of walsender_timeout has elapsed and we haven't
    received anything from the client, the walsender process should send a
    "ping" message to the client. Whenever the client receives a Ping, it
    replies. The walreceiver does the same; when half of walreceiver_timeout
    has elapsed, send a Ping message to the server. Each Ping-Pong roundtrip
    resets the timer in both ends, regardless of which side initiated it, so
    if e.g walsender_timeout < walreceiver_timeout, the client will never
    have to initiate a Ping message, because walsender will always reach the
    walsender_timeout/2 point first and initiate the heartbeat message.
    Just to clarify, walsender should reset timer after it gets reply from receiver of the message it sent.
    walreceiver should reset timer after sending reply for heartbeat message.
    Similar to above timers will be reset when receiver sent the heartbeat message.
    The Ping/Pong messages don't necessarily need to be new message types,
    we can use the message types we currently have, perhaps with an
    additional flag attached to them, to request the other side to reply
    immediately.
    Can't we make the decision to send reply immediately based on message type, because these message types will be unique.

    To clarify my understanding,
    1. the heartbeat message from walsender side will be keepalive message ('k') and from walreceiver side it will be Hot Standby feedback message ('h').
    2. the reply message from walreceiver side will be current reply message ('r').
    3. currently there is no reply kind of message from walsender, so do we need to introduce one new message for it or can use some existing message only?
         if new, do we need to send any additional information along with it, for existing messages can we use keepalive message it self as reply message but with an additional byte
         to indicate it is reply?

    With Regards,
    Amit Kapila.
  • Heikki Linnakangas at Oct 2, 2012 at 8:26 am

    On 02.10.2012 10:36, Amit kapila wrote:
    On Monday, October 01, 2012 4:08 PM Heikki Linnakangas wrote:
    So let's think how this should ideally work from a user's point of view.
    I think there should be just two settings: walsender_timeout and
    walreceiver_timeout. walsender_timeout specifies how long a walsender
    will keep a connection open if it doesn't hear from the walreceiver, and
    walreceiver_timeout is the same for walreceiver. The system should
    figure out itself how often to send keepalive messages so that those
    timeouts are not reached.
    By this it implies that we should remove wal_receiver_status_interval. Currently it is also used
    incase of reply message of data sent by sender which contains till what point receiver has flushed. So if we remove this variable
    receiver might start sending that message sonner than required.
    Is that okay behavior?
    I guess we should keep that setting, then, so that you can get status
    updates more often than would be required for heartbeat purposes.
    In walsender, after half of walsender_timeout has elapsed and we haven't
    received anything from the client, the walsender process should send a
    "ping" message to the client. Whenever the client receives a Ping, it
    replies. The walreceiver does the same; when half of walreceiver_timeout
    has elapsed, send a Ping message to the server. Each Ping-Pong roundtrip
    resets the timer in both ends, regardless of which side initiated it, so
    if e.g walsender_timeout< walreceiver_timeout, the client will never
    have to initiate a Ping message, because walsender will always reach the
    walsender_timeout/2 point first and initiate the heartbeat message.
    Just to clarify, walsender should reset timer after it gets reply from receiver of the message it sent. Right.
    walreceiver should reset timer after sending reply for heartbeat message.
    Similar to above timers will be reset when receiver sent the
    heartbeat message.

    walreceiver should reset the timer when it *receives* any message from
    walsender. If it sends the reply right away, I guess that's the same
    thing, but I'd phrase it so that it's the reception of a message from
    the other end that resets the timer.
    The Ping/Pong messages don't necessarily need to be new message types,
    we can use the message types we currently have, perhaps with an
    additional flag attached to them, to request the other side to reply
    immediately.
    Can't we make the decision to send reply immediately based on message type, because these message types will be unique.

    To clarify my understanding,
    1. the heartbeat message from walsender side will be keepalive message ('k') and from walreceiver side it will be Hot Standby feedback message ('h').
    2. the reply message from walreceiver side will be current reply message ('r').
    Yep. I wonder why need separate message types for Hot Standby Feedback
    'h' and Reply 'r', though. Seems it would be simpler to have just one
    messasge type that includes all the fields from both messages.
    3. currently there is no reply kind of message from walsender, so do we need to introduce one new message for it or can use some existing message only?
    if new, do we need to send any additional information along with it, for existing messages can we use keepalive message it self as reply message but with an additional byte
    to indicate it is reply?
    Hmm, I think I'd prefer to use the existing Keepalive message 'k', with
    an additional flag.

    - Heikki
  • Amit kapila at Oct 4, 2012 at 10:14 am

    On Tuesday, October 02, 2012 1:56 PM Heikki Linnakangas wrote: On 02.10.2012 10:36, Amit kapila wrote:
    On Monday, October 01, 2012 4:08 PM Heikki Linnakangas wrote:
    So let's think how this should ideally work from a user's point of view.
    I think there should be just two settings: walsender_timeout and
    walreceiver_timeout. walsender_timeout specifies how long a walsender
    will keep a connection open if it doesn't hear from the walreceiver, and
    walreceiver_timeout is the same for walreceiver. The system should
    The Ping/Pong messages don't necessarily need to be new message types,
    we can use the message types we currently have, perhaps with an
    additional flag attached to them, to request the other side to reply
    immediately.
    Can't we make the decision to send reply immediately based on message type, because these message types will be unique.
    To clarify my understanding,
    1. the heartbeat message from walsender side will be keepalive message ('k') and from walreceiver side it will be Hot Standby feedback message ('h').
    2. the reply message from walreceiver side will be current reply message ('r').
    Yep. I wonder why need separate message types for Hot Standby Feedback
    'h' and Reply 'r', though. Seems it would be simpler to have just one
    messasge type that includes all the fields from both messages.
    moved the contents for Hot Standby Feedback 'h' to Reply 'r' and use 'h' for heart-beat purpose.
    3. currently there is no reply kind of message from walsender, so do we need to introduce one new message for it or can use some existing message only?
    if new, do we need to send any additional information along with it, for existing messages can we use keepalive message it self as reply message but with an additional byte
    to indicate it is reply?
    Hmm, I think I'd prefer to use the existing Keepalive message 'k', with an additional flag.
        Okay. I have done it in Patch.

    Thank you for suggestions.
    I have addressed your suggestions in patch attached with this mail.

    Following changes are done to support replication timeout in sender as well as receiver:

    1. One new configuration parameter wal_receiver_timeout is added to detect timeout at receiver task.
    2. Existing parameter replication_timeout is renamed to wal_sender_timeout.
    3. Now PrimaryKeepaliveMessage structure is modified to add one more field to indicate whether keep-alive is of type 'r' (i.e.
         reply) or 'h' (i.e. heart-beat).
    4. Now the keep-alive message from sender will be sent to standby if it was idle for more than or equal to half of wal_sender_timeout.
         In this case it will send keep-alive of type 'h'.
    5. Once the standby receiver a keep-alive, it needs to send an immediate reply to primary to indicate connection is alive.
    6. Now Reply message to send wal offset and Feedback message to send oldest transaction are merged into single Reply message.
         So now the structure StandbyReplyMessage is changed to add two more fields as xmin and epoch. Also StandbyHSFeedbackMessage
         structure is changed to remove xmin and epoch fields (as these are moved to StandbyReplyMessage).
    7. Because of changes as in step-6, once receiver task receives some data from primary then it will only send Reply Message.
    8. Same Reply message is sent in step-5 and step-7 but incase of step-5, then reply is sent immediately but incase of step-7, reply is sent
          if wal_receiver_status_interval has lapsed (this part is same as earlier).
    9. Similar to sender, if receiver finds itself idle for more than or equal to half of configured wal_receiver_timeout, then it will send the
          hot-standby heartbeat. This heart-beat has been modified to send only sendTime.
    10. Once sender task receiver heart-beat message from standby then it sends back the reply immediately. In this keep-alive message is
            sent of type 'r'.
    11. If even after wal_sender_timeout no message received from standby then it will be considered as network break at sender task.
    12. If even after wal_receiver_timeout no message received from primary then it will be considered as network break at receiver task.


    With Regards,
    Amit Kapila.
  • Amit Kapila at Oct 4, 2012 at 12:28 pm

    -----Original Message-----
    From: pgsql-bugs-owner@postgresql.org [mailto:pgsql-bugs-
    owner@postgresql.org] On Behalf Of Amit kapila
    Sent: Thursday, October 04, 2012 3:43 PM
    To: Heikki Linnakangas
    Cc: Fujii Masao; pgsql-bugs@postgresql.org; pgsql-hackers@postgresql.org
    Subject: Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w
    breakdown

    On Tuesday, October 02, 2012 1:56 PM Heikki Linnakangas wrote:
    On 02.10.2012 10:36, Amit kapila wrote:
    On Monday, October 01, 2012 4:08 PM Heikki Linnakangas wrote:
    So let's think how this should ideally work from a user's point of
    view.
    I think there should be just two settings: walsender_timeout and
    walreceiver_timeout. walsender_timeout specifies how long a
    walsender will keep a connection open if it doesn't hear from the
    Thank you for suggestions.
    I have addressed your suggestions in patch attached with this mail.

    Following changes are done to support replication timeout in sender as
    well as receiver:

    Testing Done for the Patch
    --------------------------------
    1. Verified the value of new configuration parameter and changed
    configuration parameter using the show command (using Show of specific
        parameter as well as show all).
    2. Verified the new configuration parameter in --describe-config.
    3. Verified the existing parameter replication_timeout's new name in
    --describe-config.
    4. Start primary and standby node with default timeout, leave it for
    sometime in idle situation.
        It should not error out due to network break error.
    5. a. Start primary and standby node with default timeout, bring down the
    network.
        b. Both sender and receiver should be able to detect network break-down
    almost at same time.
        c. Once the network is up again, connection should get re-established
    successfully.
    5. a. Start primary and standby node with wal_sender_timeout less than
    wal_receiver_timeout, bring down the network.
        b. Sender should be able to detect network break-down before receiver
    task.
        c. Once the network is up again, connection should get re-established
    successfully.
    6. a. Start primary and standby node with wal_receiver_timeout less than
    wal_sender_timeout, bring down the network.
        b. Receiver should be able to detect network break-down before sender
    task.
        c. Once the network is up again, connection should get re-established
    successfully.
    7. a. In 5th test case, change the value of wal_receiver_status_interval to
    more than wal_receiver_timeout and hence more than
           wal_sender_timeout.
        b. Then bring down the network down.
        c. Sender task should be able to detect network break-down once
    wal_sender_timeout has lapsed.
        d. Once the network is up again, connection should get re-established
    successfully.
        Intent of this test is to check there is no dependency of
    wal_sender_timeout on wal_receiver_status_interval for detection of
        Network break.

    All the above tests are passed.

    With Regards,
    Amit Kapila.
  • Robert Haas at Oct 8, 2012 at 2:08 pm

    On Thu, Oct 4, 2012 at 6:12 AM, Amit kapila wrote:
    1. One new configuration parameter wal_receiver_timeout is added to detect timeout at receiver task.
    2. Existing parameter replication_timeout is renamed to wal_sender_timeout.
    -1 from me on a backward compatibility break here. I don't know what
    else to call the new GUC (replication_server_timeout?) but I'm not
    excited about breaking existing conf files, nor do I particularly like
    the proposed new names.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Amit Kapila at Oct 8, 2012 at 2:43 pm

    On Monday, October 08, 2012 7:38 PM Robert Haas wrote:
    On Thu, Oct 4, 2012 at 6:12 AM, Amit kapila wrote:
    1. One new configuration parameter wal_receiver_timeout is added to
    detect timeout at receiver task.
    2. Existing parameter replication_timeout is renamed to
    wal_sender_timeout.

    -1 from me on a backward compatibility break here. I don't know what
    else to call the new GUC (replication_server_timeout?) but I'm not
    excited about breaking existing conf files, nor do I particularly like
    the proposed new names.
    How about following:
    1. replication_client_timeout -- shouldn't it be client as new configuration
    is for wal receiver
    2. replication_standby_timeout

    If we introduce a new parameter for wal receiver, wouldn't
    replication_timeout be confusing for user?

    With Regards,
    Amit Kapila.
  • Robert Haas at Oct 9, 2012 at 12:29 pm

    On Mon, Oct 8, 2012 at 10:42 AM, Amit Kapila wrote:
    How about following:
    1. replication_client_timeout -- shouldn't it be client as new configuration
    is for wal receiver
    2. replication_standby_timeout
    ISTM that the client and the standby are the same thing.
    If we introduce a new parameter for wal receiver, wouldn't
    replication_timeout be confusing for user?
    Maybe. I actually don't think that I understand what problem we're
    trying to solve here. If the connection between the master and the
    standby is lost, shouldn't the standby realize that it's no longer
    receiving keepalives from the master and terminate the connection? I
    thought I had tested this at some point and it was working, so either
    it's subsequently gotten broken again or the scenario you're talking
    about is different in some way that I don't currently understand.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Amit Kapila at Oct 9, 2012 at 1:05 pm

    On Tuesday, October 09, 2012 6:00 PM Robert Haas wrote:
    On Mon, Oct 8, 2012 at 10:42 AM, Amit Kapila wrote:
    How about following:
    1. replication_client_timeout -- shouldn't it be client as new
    configuration
    is for wal receiver
    2. replication_standby_timeout
    ISTM that the client and the standby are the same thing.
    Yeah same, but may be one (replication_standby_timeout) can be more easily
    understandable by user.

    If we introduce a new parameter for wal receiver, wouldn't
    replication_timeout be confusing for user?
    Maybe.
    I actually don't think that I understand what problem we're
    trying to solve here. If the connection between the master and the
    standby is lost, shouldn't the standby realize that it's no longer
    receiving keepalives from the master and terminate the connection?
    For wal receiver keepalives are also like one kind of message, so the
    behavior is such that when it checks
    that it doesn't receive any message, it tries to send reply/feedback message
    to master after an interval of
    wal_receiver_status_interval.
    So after every wal_receiver_status_interval, wal receiver sends a reply, but
    still the socket send doesn't
    fail. It fails only after many send calls as internally might be in send(),
    until the sockets internal buffer is full, it keeps accumulating even if
    other side recv has not received the data.
    So that's the reason we decided to introduce a timeout parameter in wal
    receiver similar to what we have currently in walsender.
    I
    thought I had tested this at some point and it was working, so either
    it's subsequently gotten broken again or the scenario you're talking
    about is different in some way that I don't currently understand.
    Standby takes quite longer around 15 minutes to detect whereas master is
    able to
    detect quite sooner in 2-3 mins and master also mainly detects due to
    timeout functionality in wal sender.

    With Regards,
    Amit Kapila.
  • Heikki Linnakangas at Oct 10, 2012 at 3:44 pm

    On 04.10.2012 13:12, Amit kapila wrote:
    Following changes are done to support replication timeout in sender as well as receiver:

    1. One new configuration parameter wal_receiver_timeout is added to detect timeout at receiver task.
    2. Existing parameter replication_timeout is renamed to wal_sender_timeout.
    Ok. The other option would be to have just one GUC, I'm open to
    bikeshedding on this one. On one hand, there's no reason the timeouts
    have to the same, so it would be nice to have separate settings, but on
    the other hand, I can't imagine a case where a single setting wouldn't
    work just as well.
    3. Now PrimaryKeepaliveMessage structure is modified to add one more field to indicate whether keep-alive is of type 'r' (i.e.
    reply) or 'h' (i.e. heart-beat).
    4. Now the keep-alive message from sender will be sent to standby if it was idle for more than or equal to half of wal_sender_timeout.
    In this case it will send keep-alive of type 'h'.
    5. Once the standby receiver a keep-alive, it needs to send an immediate reply to primary to indicate connection is alive.
    6. Now Reply message to send wal offset and Feedback message to send oldest transaction are merged into single Reply message.
    So now the structure StandbyReplyMessage is changed to add two more fields as xmin and epoch. Also StandbyHSFeedbackMessage
    structure is changed to remove xmin and epoch fields (as these are moved to StandbyReplyMessage).
    7. Because of changes as in step-6, once receiver task receives some data from primary then it will only send Reply Message.
    Oh I see. That's not what I meant by combining the keep-alive and hs
    feedback messages, I imagined that the hearbeats would *also* use the
    same message type. Ie. there would be only a single message type from
    standby to primary, used for:

    1. updating the receive/apply pointer
    2. HS feedback
    3. for pinging the server when wal_receiver_timeout is approaching
    4. to reply to to pings from the server.

    Since we didn't quite achieve that, it seems best leave out this merging
    of reply and HS feedback message types, to keep the patch small. We
    might still want to do that, but better do that as a separate patch.
    8. Same Reply message is sent in step-5 and step-7 but incase of step-5, then reply is sent immediately but incase of step-7, reply is sent
    if wal_receiver_status_interval has lapsed (this part is same as earlier).
    9. Similar to sender, if receiver finds itself idle for more than or equal to half of configured wal_receiver_timeout, then it will send the
    hot-standby heartbeat. This heart-beat has been modified to send only sendTime.
    10. Once sender task receiver heart-beat message from standby then it sends back the reply immediately. In this keep-alive message is
    sent of type 'r'.
    11. If even after wal_sender_timeout no message received from standby then it will be considered as network break at sender task.
    12. If even after wal_receiver_timeout no message received from primary then it will be considered as network break at receiver task.
    Attached is an updated patch. I reverted the merging of message types
    and fixed a bunch of cosmetic issues. There was one bug: in the main
    loop of walreceiver, you send the "ping" message on every wakeup after
    enough time has passed since last reception. That means that if the
    server doesn't reply promptly, you send a new ping message every 100 ms
    (NAPTIME_PER_CYCLE), until it gets a reply. Walsender had the same
    issue, but it was not quite as sever there because the naptime was
    longer. Fixed that.

    How does this look now?

    - Heikki
  • Amit Kapila at Oct 11, 2012 at 10:19 am

    On Wednesday, October 10, 2012 9:15 PM Heikki Linnakangas wrote:
    On 04.10.2012 13:12, Amit kapila wrote:
    Following changes are done to support replication timeout in sender as
    well as receiver:
    1. One new configuration parameter wal_receiver_timeout is added to
    detect timeout at receiver task.
    2. Existing parameter replication_timeout is renamed to
    wal_sender_timeout.

    Ok. The other option would be to have just one GUC, I'm open to
    bikeshedding on this one. On one hand, there's no reason the timeouts
    have to the same, so it would be nice to have separate settings, but on
    the other hand, I can't imagine a case where a single setting wouldn't
    work just as well.
    I think for below case, they are required to be separate:

    1. M1 (Master), S1 (Standby 1), S2 (Standby 2)
    2. S1 is standby for M1, and S2 is standby for S1. Basically a simple case
    of cascaded replication
    3. M1 and S1 are on local network but S2 is placed at geographically
    different location.
       (what I want to say is n/w between M1-S1 is of good speed and S1-S2 is
    very slow)
    4. In above case, user might want to configure different timeouts for sender
    and receiver on S1.
    Attached is an updated patch. I reverted the merging of message types
    and fixed a bunch of cosmetic issues. There was one bug: in the main
    loop of walreceiver, you send the "ping" message on every wakeup after
    enough time has passed since last reception. That means that if the
    server doesn't reply promptly, you send a new ping message every 100 ms
    (NAPTIME_PER_CYCLE), until it gets a reply. Walsender had the same
    issue, but it was not quite as sever there because the naptime was
    longer. Fixed that. Thanks.
    How does this look now?
    The Patch is fine and test results are also fine.

    With Regards,
    Amit Kapila.
  • Heikki Linnakangas at Oct 11, 2012 at 2:52 pm

    On 11.10.2012 13:17, Amit Kapila wrote:
    How does this look now?
    The Patch is fine and test results are also fine.
    Ok, thanks. Committed.

    - Heikki
  • Amit kapila at Oct 11, 2012 at 3:53 pm

    On Thursday, October 11, 2012 8:22 PM Heikki Linnakangas wrote: On 11.10.2012 13:17, Amit Kapila wrote:
    How does this look now?
    The Patch is fine and test results are also fine.
    Ok, thanks. Committed.
        Thank you very much.

    With Regards,
    Amit Kapila.
  • Fujii Masao at Oct 13, 2012 at 4:35 pm

    On Thu, Oct 11, 2012 at 11:52 PM, Heikki Linnakangas wrote:
    On 11.10.2012 13:17, Amit Kapila wrote:

    How does this look now?

    The Patch is fine and test results are also fine.

    Ok, thanks. Committed.
    I found one typo. The attached patch fixes that typo.

    ISTM you need to update the protocol.sgml because you added
    the field 'replyRequested' to WalSndrMessage and StandbyReplyMessage.

    Is it worth adding the same mechanism (send back the reply immediately
    if walsender request a reply) into pg_basebackup and pg_receivexlog?

    Regards,

    --
    Fujii Masao
  • Heikki Linnakangas at Oct 15, 2012 at 10:13 am

    On 13.10.2012 19:35, Fujii Masao wrote:
    On Thu, Oct 11, 2012 at 11:52 PM, Heikki Linnakangas
    wrote:
    Ok, thanks. Committed.
    I found one typo. The attached patch fixes that typo.
    Thanks, fixed.
    ISTM you need to update the protocol.sgml because you added
    the field 'replyRequested' to WalSndrMessage and StandbyReplyMessage.
    Oh, I didn't remember that we've documented the specific structs that we
    pass around. It's quite bogus anyway to explain the messages the way we
    do currently, as they are actually dependent on the underlying
    architecture's endianess and padding. I think we should refactor the
    protocol to not transmit raw structs, but use pq_sentint and friends to
    construct the messages. This was discussed earlier (see
    http://archives.postgresql.org/message-id/4FE2279C.2070506@enterprisedb.com),
    I think there's consensus that 9.3 would be a good time to do that as we
    changed the XLogRecPtr format anyway.

    I'll look into doing that..
    Is it worth adding the same mechanism (send back the reply immediately
    if walsender request a reply) into pg_basebackup and pg_receivexlog?
    Good catch. Yes, they should be taught about this too. I'll look into
    doing that too.

    - Heikki
  • Heikki Linnakangas at Oct 15, 2012 at 2:28 pm

    On 15.10.2012 13:13, Heikki Linnakangas wrote:
    On 13.10.2012 19:35, Fujii Masao wrote:
    ISTM you need to update the protocol.sgml because you added
    the field 'replyRequested' to WalSndrMessage and StandbyReplyMessage.
    Oh, I didn't remember that we've documented the specific structs that we
    pass around. It's quite bogus anyway to explain the messages the way we
    do currently, as they are actually dependent on the underlying
    architecture's endianess and padding. I think we should refactor the
    protocol to not transmit raw structs, but use pq_sentint and friends to
    construct the messages. This was discussed earlier (see
    http://archives.postgresql.org/message-id/4FE2279C.2070506@enterprisedb.com),
    I think there's consensus that 9.3 would be a good time to do that as we
    changed the XLogRecPtr format anyway.
    This is what I came up with. The replication protocol is now
    architecture-independent. The WAL format itself is still
    architecture-independent, of course, but this is useful if you want to
    e.g use pg_receivexlog to back up a server that runs on a different
    platform.

    I chose the int64 format to transmit timestamps, even when compiled with
    --disable-integer-datetimes.

    Please review if you have the time..

    - Heikki
  • Fujii Masao at Oct 15, 2012 at 4:31 pm

    On Mon, Oct 15, 2012 at 11:27 PM, Heikki Linnakangas wrote:
    On 15.10.2012 13:13, Heikki Linnakangas wrote:
    On 13.10.2012 19:35, Fujii Masao wrote:

    ISTM you need to update the protocol.sgml because you added
    the field 'replyRequested' to WalSndrMessage and StandbyReplyMessage.

    Oh, I didn't remember that we've documented the specific structs that we
    pass around. It's quite bogus anyway to explain the messages the way we
    do currently, as they are actually dependent on the underlying
    architecture's endianess and padding. I think we should refactor the
    protocol to not transmit raw structs, but use pq_sentint and friends to
    construct the messages. This was discussed earlier (see

    http://archives.postgresql.org/message-id/4FE2279C.2070506@enterprisedb.com),
    I think there's consensus that 9.3 would be a good time to do that as we
    changed the XLogRecPtr format anyway.

    This is what I came up with. The replication protocol is now
    architecture-independent. The WAL format itself is still
    architecture-independent, of course, but this is useful if you want to e.g
    use pg_receivexlog to back up a server that runs on a different platform.

    I chose the int64 format to transmit timestamps, even when compiled with
    --disable-integer-datetimes.

    Please review if you have the time..
    Thanks for the patch!

    When I ran pg_receivexlog, I encountered the following error.

    $ pg_receivexlog -D hoge
    pg_receivexlog: unexpected termination of replication stream: ERROR:
    no data left in message

    pg_basebackup -X stream caused the same error.

    $ pg_basebackup -D hoge -X stream -c fast
    pg_basebackup: could not send feedback packet: no COPY in progress
    pg_basebackup: child process exited with error 1

    In walreceiver.c, tmpbuf is allocated for every XLogWalRcvProcessMsg() call.
    It should be allocated just once and continue to be used till end, to reduce
    palloc overhead?

    + hdrlen = sizeof(int64) + sizeof(int64) + sizeof(int64);
    + hdrlen = sizeof(int64) + sizeof(int64) + sizeof(char);

    These should be macro, to avoid calculation overhead?

    + /* Construct the the message and send it. */
    + resetStringInfo(&reply_message);
    + pq_sendbyte(&reply_message, 'h');
    + pq_sendint(&reply_message, xmin, 4);
    + pq_sendint(&reply_message, nextEpoch, 4);
    + walrcv_send(reply_message.data, reply_message.len);

    You seem to have forgotten to send the sendTime.

    Regards,

    --
    Fujii Masao
  • Heikki Linnakangas at Oct 16, 2012 at 12:31 pm

    On 15.10.2012 19:31, Fujii Masao wrote:
    On Mon, Oct 15, 2012 at 11:27 PM, Heikki Linnakangas
    wrote:
    On 15.10.2012 13:13, Heikki Linnakangas wrote:

    Oh, I didn't remember that we've documented the specific structs that we
    pass around. It's quite bogus anyway to explain the messages the way we
    do currently, as they are actually dependent on the underlying
    architecture's endianess and padding. I think we should refactor the
    protocol to not transmit raw structs, but use pq_sentint and friends to
    construct the messages. This was discussed earlier (see

    http://archives.postgresql.org/message-id/4FE2279C.2070506@enterprisedb.com),
    I think there's consensus that 9.3 would be a good time to do that as we
    changed the XLogRecPtr format anyway.

    This is what I came up with. The replication protocol is now
    architecture-independent. The WAL format itself is still
    architecture-independent, of course, but this is useful if you want to e.g
    use pg_receivexlog to back up a server that runs on a different platform.

    I chose the int64 format to transmit timestamps, even when compiled with
    --disable-integer-datetimes.

    Please review if you have the time..
    Thanks for the patch!

    When I ran pg_receivexlog, I encountered the following error.
    Yeah, clearly I didn't test this near enough...

    I fixed the bugs you bumped into, new version attached.
    + hdrlen = sizeof(int64) + sizeof(int64) + sizeof(int64);
    + hdrlen = sizeof(int64) + sizeof(int64) + sizeof(char);

    These should be macro, to avoid calculation overhead?
    The compiler will calculate this at compilation time, it's going to be a
    constant at runtime.

    - Heikki
  • Fujii Masao at Oct 18, 2012 at 4:48 pm

    On Tue, Oct 16, 2012 at 9:31 PM, Heikki Linnakangas wrote:
    On 15.10.2012 19:31, Fujii Masao wrote:

    On Mon, Oct 15, 2012 at 11:27 PM, Heikki Linnakangas
    wrote:
    On 15.10.2012 13:13, Heikki Linnakangas wrote:


    Oh, I didn't remember that we've documented the specific structs that we
    pass around. It's quite bogus anyway to explain the messages the way we
    do currently, as they are actually dependent on the underlying
    architecture's endianess and padding. I think we should refactor the
    protocol to not transmit raw structs, but use pq_sentint and friends to
    construct the messages. This was discussed earlier (see


    http://archives.postgresql.org/message-id/4FE2279C.2070506@enterprisedb.com),
    I think there's consensus that 9.3 would be a good time to do that as we
    changed the XLogRecPtr format anyway.


    This is what I came up with. The replication protocol is now
    architecture-independent. The WAL format itself is still
    architecture-independent, of course, but this is useful if you want to
    e.g
    use pg_receivexlog to back up a server that runs on a different platform.

    I chose the int64 format to transmit timestamps, even when compiled with
    --disable-integer-datetimes.

    Please review if you have the time..

    Thanks for the patch!

    When I ran pg_receivexlog, I encountered the following error.

    Yeah, clearly I didn't test this near enough...

    I fixed the bugs you bumped into, new version attached.
    Thanks for updating the patch!

    We should remove the check of integer_datetime by pg_basebackup
    background process and pg_receivexlog? Currently, they always check
    it, and then if its setting value is not the same between a client and
    server, they fail. Thanks to the patch, ISTM this check is no longer
    required.

    + pq_sendint64(&reply_message, GetCurrentIntegerTimestamp());

    In XLogWalRcvSendReply() and XLogWalRcvSendHSFeedback(),
    GetCurrentTimestamp() is called twice. I think that we can skip the
    latter call if integer-datetime is enabled because the return value of
    GetCurrentTimestamp() and GetCurrentIntegerTimestamp() is in the
    same format. It's worth reducing the number of GetCurrentTimestamp()
    calls, I think.

       elog(DEBUG2, "sending write %X/%X flush %X/%X apply %X/%X",
    - (uint32) (reply_message.write >> 32), (uint32) reply_message.write,
    - (uint32) (reply_message.flush >> 32), (uint32) reply_message.flush,
    - (uint32) (reply_message.apply >> 32), (uint32) reply_message.apply);
    + (uint32) (writePtr >> 32), (uint32) writePtr,
    + (uint32) (flushPtr >> 32), (uint32) flushPtr,
    + (uint32) (applyPtr >> 32), (uint32) applyPtr);

       elog(DEBUG2, "write %X/%X flush %X/%X apply %X/%X",
    - (uint32) (reply.write >> 32), (uint32) reply.write,
    - (uint32) (reply.flush >> 32), (uint32) reply.flush,
    - (uint32) (reply.apply >> 32), (uint32) reply.apply);
    + (uint32) (writePtr >> 32), (uint32) writePtr,
    + (uint32) (flushPtr >> 32), (uint32) flushPtr,
    + (uint32) (applyPtr >> 32), (uint32) applyPtr);

    Isn't it worth logging not only WAL location but also the replyRequested
    flag in these debug message?

    The remaining of the patch looks good to me.
    + hdrlen = sizeof(int64) + sizeof(int64) +
    sizeof(int64);
    + hdrlen = sizeof(int64) + sizeof(int64) +
    sizeof(char);

    These should be macro, to avoid calculation overhead?

    The compiler will calculate this at compilation time, it's going to be a
    constant at runtime.
    Yes, you're right.

    Regards,

    --
    Fujii Masao
  • Heikki Linnakangas at Nov 7, 2012 at 5:22 pm

    On 16.10.2012 15:31, Heikki Linnakangas wrote:
    On 15.10.2012 19:31, Fujii Masao wrote:
    On Mon, Oct 15, 2012 at 11:27 PM, Heikki Linnakangas
    wrote:
    On 15.10.2012 13:13, Heikki Linnakangas wrote:

    Oh, I didn't remember that we've documented the specific structs
    that we
    pass around. It's quite bogus anyway to explain the messages the way we
    do currently, as they are actually dependent on the underlying
    architecture's endianess and padding. I think we should refactor the
    protocol to not transmit raw structs, but use pq_sentint and friends to
    construct the messages. This was discussed earlier (see

    http://archives.postgresql.org/message-id/4FE2279C.2070506@enterprisedb.com),

    I think there's consensus that 9.3 would be a good time to do that
    as we changed the XLogRecPtr format anyway.
    This is what I came up with. The replication protocol is now
    architecture-independent. The WAL format itself is still
    architecture-independent, of course, but this is useful if you want
    to e.g
    use pg_receivexlog to back up a server that runs on a different
    platform.

    I chose the int64 format to transmit timestamps, even when compiled with
    --disable-integer-datetimes.

    Please review if you have the time..
    Thanks for the patch!

    When I ran pg_receivexlog, I encountered the following error.
    Yeah, clearly I didn't test this near enough...

    I fixed the bugs you bumped into, new version attached.
    Committed this now, after fixing a few more bugs that came up during
    testing. Next, I'll take a look at the patch you sent for adding
    timeouts to pg_basebackup and pg_receivexlog
    (http://archives.postgresql.org/message-id/6C0B27F7206C9E4CA54AE035729E9C382853BBED@szxeml509-mbs)

    - Heikki
  • Fujii Masao at Nov 8, 2012 at 4:40 pm

    On Thu, Nov 8, 2012 at 2:22 AM, Heikki Linnakangas wrote:
    On 16.10.2012 15:31, Heikki Linnakangas wrote:
    On 15.10.2012 19:31, Fujii Masao wrote:

    On Mon, Oct 15, 2012 at 11:27 PM, Heikki Linnakangas
    wrote:
    On 15.10.2012 13:13, Heikki Linnakangas wrote:


    Oh, I didn't remember that we've documented the specific structs
    that we
    pass around. It's quite bogus anyway to explain the messages the way we
    do currently, as they are actually dependent on the underlying
    architecture's endianess and padding. I think we should refactor the
    protocol to not transmit raw structs, but use pq_sentint and friends to
    construct the messages. This was discussed earlier (see


    http://archives.postgresql.org/message-id/4FE2279C.2070506@enterprisedb.com),

    I think there's consensus that 9.3 would be a good time to do that
    as we changed the XLogRecPtr format anyway.

    This is what I came up with. The replication protocol is now
    architecture-independent. The WAL format itself is still
    architecture-independent, of course, but this is useful if you want
    to e.g
    use pg_receivexlog to back up a server that runs on a different
    platform.

    I chose the int64 format to transmit timestamps, even when compiled with
    --disable-integer-datetimes.

    Please review if you have the time..

    Thanks for the patch!

    When I ran pg_receivexlog, I encountered the following error.

    Yeah, clearly I didn't test this near enough...

    I fixed the bugs you bumped into, new version attached.

    Committed this now, after fixing a few more bugs that came up during
    testing.
    As I suggested upthread, pg_basebackup and pg_receivexlog no longer
    need to check integer_datetimes before establishing the connection,
    thanks to this commit. If this is right, the attached patch should be applied.
    The patch just removes the check of integer_datetimes by pg_basebackup
    and pg_receivexlog.

    Regards,

    --
    Fujii Masao
  • Fujii Masao at Nov 8, 2012 at 4:56 pm

    On Fri, Nov 9, 2012 at 1:40 AM, Fujii Masao wrote:
    On Thu, Nov 8, 2012 at 2:22 AM, Heikki Linnakangas
    wrote:
    On 16.10.2012 15:31, Heikki Linnakangas wrote:
    On 15.10.2012 19:31, Fujii Masao wrote:

    On Mon, Oct 15, 2012 at 11:27 PM, Heikki Linnakangas
    wrote:
    On 15.10.2012 13:13, Heikki Linnakangas wrote:


    Oh, I didn't remember that we've documented the specific structs
    that we
    pass around. It's quite bogus anyway to explain the messages the way we
    do currently, as they are actually dependent on the underlying
    architecture's endianess and padding. I think we should refactor the
    protocol to not transmit raw structs, but use pq_sentint and friends to
    construct the messages. This was discussed earlier (see


    http://archives.postgresql.org/message-id/4FE2279C.2070506@enterprisedb.com),

    I think there's consensus that 9.3 would be a good time to do that
    as we changed the XLogRecPtr format anyway.

    This is what I came up with. The replication protocol is now
    architecture-independent. The WAL format itself is still
    architecture-independent, of course, but this is useful if you want
    to e.g
    use pg_receivexlog to back up a server that runs on a different
    platform.

    I chose the int64 format to transmit timestamps, even when compiled with
    --disable-integer-datetimes.

    Please review if you have the time..

    Thanks for the patch!

    When I ran pg_receivexlog, I encountered the following error.

    Yeah, clearly I didn't test this near enough...

    I fixed the bugs you bumped into, new version attached.

    Committed this now, after fixing a few more bugs that came up during
    testing.
    As I suggested upthread, pg_basebackup and pg_receivexlog no longer
    need to check integer_datetimes before establishing the connection,
    thanks to this commit. If this is right, the attached patch should be applied.
    The patch just removes the check of integer_datetimes by pg_basebackup
    and pg_receivexlog.
    Another comment that I made upthread is:

    --------
    In XLogWalRcvSendReply() and XLogWalRcvSendHSFeedback(),
    GetCurrentTimestamp() is called twice. I think that we can skip the
    latter call if integer-datetime is enabled because the return value of
    GetCurrentTimestamp() and GetCurrentIntegerTimestamp() is in the
    same format. It's worth reducing the number of GetCurrentTimestamp()
    calls, I think.
    --------

    Attached patch removes redundant GetCurrentTimestamp() call
    from XLogWalRcvSendReply() and XLogWalRcvSendHSFeedback(),
    if --enable-integer-datetimes.

    Regards,

    --
    Fujii Masao
  • Amit Kapila at Oct 17, 2012 at 11:47 am

    On Monday, October 15, 2012 3:43 PM Heikki Linnakangas wrote:
    On 13.10.2012 19:35, Fujii Masao wrote:
    On Thu, Oct 11, 2012 at 11:52 PM, Heikki Linnakangas
    wrote:
    Ok, thanks. Committed.
    I found one typo. The attached patch fixes that typo.
    Thanks, fixed.
    ISTM you need to update the protocol.sgml because you added
    the field 'replyRequested' to WalSndrMessage and StandbyReplyMessage.
    Is it worth adding the same mechanism (send back the reply immediately
    if walsender request a reply) into pg_basebackup and pg_receivexlog?
    Good catch. Yes, they should be taught about this too. I'll look into
    doing that too.
    If you have not started and you don't have objection, I can pickup this to
    complete it.

    For both (pg_basebackup and pg_receivexlog), we need to get a timeout
    parameter from user in command line, as
    there is no conf file here. New Option can be -t (parameter name can be
    recvtimeout).

    The main changes will be in function ReceiveXlogStream(), it is a common
    function for both
    Pg_basebackup and pg_receivexlog. Handling will be done in same way as we
    have done in walreceiver.

    Suggestions/Comments?

    With Regards,
    Amit Kapila.
  • Amit Kapila at Oct 17, 2012 at 1:09 pm

    On Wednesday, October 17, 2012 5:16 PM Amit Kapila wrote:
    On Monday, October 15, 2012 3:43 PM Heikki Linnakangas wrote:
    On 13.10.2012 19:35, Fujii Masao wrote:
    On Thu, Oct 11, 2012 at 11:52 PM, Heikki Linnakangas
    wrote:
    Ok, thanks. Committed.
    I found one typo. The attached patch fixes that typo.
    Thanks, fixed.
    ISTM you need to update the protocol.sgml because you added
    the field 'replyRequested' to WalSndrMessage and
    StandbyReplyMessage.

    Is it worth adding the same mechanism (send back the reply
    immediately
    if walsender request a reply) into pg_basebackup and pg_receivexlog?
    Good catch. Yes, they should be taught about this too. I'll look into
    doing that too.
    If you have not started and you don't have objection, I can pickup this
    to
    complete it.

    For both (pg_basebackup and pg_receivexlog), we need to get a timeout
    parameter from user in command line, as
    there is no conf file here. New Option can be -t (parameter name can be
    recvtimeout).

    The main changes will be in function ReceiveXlogStream(), it is a common
    function for both
    Pg_basebackup and pg_receivexlog. Handling will be done in same way as
    we
    have done in walreceiver.
    Some more functions where it receives the data files also need similar
    handling in pg_basebackup.

    With Regards,
    Amit Kapila.
  • Fujii Masao at Oct 18, 2012 at 3:19 pm

    On Wed, Oct 17, 2012 at 8:46 PM, Amit Kapila wrote:
    On Monday, October 15, 2012 3:43 PM Heikki Linnakangas wrote:
    On 13.10.2012 19:35, Fujii Masao wrote:
    On Thu, Oct 11, 2012 at 11:52 PM, Heikki Linnakangas
    wrote:
    Ok, thanks. Committed.
    I found one typo. The attached patch fixes that typo.
    Thanks, fixed.
    ISTM you need to update the protocol.sgml because you added
    the field 'replyRequested' to WalSndrMessage and StandbyReplyMessage.
    Is it worth adding the same mechanism (send back the reply immediately
    if walsender request a reply) into pg_basebackup and pg_receivexlog?
    Good catch. Yes, they should be taught about this too. I'll look into
    doing that too.
    If you have not started and you don't have objection, I can pickup this to
    complete it.

    For both (pg_basebackup and pg_receivexlog), we need to get a timeout
    parameter from user in command line, as
    there is no conf file here. New Option can be -t (parameter name can be
    recvtimeout).

    The main changes will be in function ReceiveXlogStream(), it is a common
    function for both
    Pg_basebackup and pg_receivexlog. Handling will be done in same way as we
    have done in walreceiver.

    Suggestions/Comments?
    Before implementing the timeout parameter, I think that it's better to change
    both pg_basebackup background process and pg_receivexlog so that they
    send back the reply message immediately when they receive the keepalive
    message requesting the reply. Currently, they always ignore such keepalive
    message, so status interval parameter (-s) in them always must be set to
    the value less than replication timeout. We can avoid this troublesome
    parameter setting by introducing the same logic of walreceiver into both
    pg_basebackup background process and pg_receivexlog.

    Regards,

    --
    Fujii Masao
  • Amit kapila at Oct 19, 2012 at 11:44 am

    On Thursday, October 18, 2012 8:49 PM Fujii Masao wrote: On Wed, Oct 17, 2012 at 8:46 PM, Amit Kapila wrote:
    On Monday, October 15, 2012 3:43 PM Heikki Linnakangas wrote:
    On 13.10.2012 19:35, Fujii Masao wrote:
    On Thu, Oct 11, 2012 at 11:52 PM, Heikki Linnakangas
    wrote:
    Ok, thanks. Committed.
    I found one typo. The attached patch fixes that typo.
    Thanks, fixed.
    ISTM you need to update the protocol.sgml because you added
    the field 'replyRequested' to WalSndrMessage and StandbyReplyMessage.
    Is it worth adding the same mechanism (send back the reply immediately
    if walsender request a reply) into pg_basebackup and pg_receivexlog?
    Good catch. Yes, they should be taught about this too. I'll look into
    doing that too.
    If you have not started and you don't have objection, I can pickup this to
    complete it.

    For both (pg_basebackup and pg_receivexlog), we need to get a timeout
    parameter from user in command line, as
    there is no conf file here. New Option can be -t (parameter name can be
    recvtimeout).

    The main changes will be in function ReceiveXlogStream(), it is a common
    function for both
    Pg_basebackup and pg_receivexlog. Handling will be done in same way as we
    have done in walreceiver.

    Suggestions/Comments?
    Before implementing the timeout parameter, I think that it's better to change
    both pg_basebackup background process and pg_receivexlog so that they
    send back the reply message immediately when they receive the keepalive
    message requesting the reply. Currently, they always ignore such keepalive
    message, so status interval parameter (-s) in them always must be set to
    the value less than replication timeout. We can avoid this troublesome
    parameter setting by introducing the same logic of walreceiver into both
    pg_basebackup background process and pg_receivexlog.
    Please find the patch attached to address the modification mentioned by you (send immediate reply for keepalive).
    Both basebackup and pg_receivexlog uses the same function ReceiveXLogStream, so single change for both will address the issue.


    Now further to this for introducing timeout in pg_basebackup and pg_receivexlog:
    We can have mechanism similar to wal receiver timeout while streaming the data from server, but same logic can not be used incase network goes down during getting other database file from server.
    The reason for the same is to receive the data files PQgetCopyData() is called in synchronous mode, so it keeps waiting for infinite time till it gets some data.
    In order to solve this issue, I can think of following options:
    1. Making this call also asynchronous (but now sure about impact of this).
    2. In function pqWait, instead of passing hard-code value -1 (i.e. infinite wait), we can send some finite time. This time can be received as command line argument
         from respective utility and set the same in PGconn structure.
         In order to have timeout value in PGconn, we can have:
             a. Add new parameter in PGconn to indicate the receive timeout.
             b. Use the existing parameter connect_timeout for receive timeout also but this may lead to confusion.
    3. Any other better option?

    Apart from above issue, there is possibility that if during connect time network goes down, then it might hang, because connect_timeout by default will be NULL and connectDBComplete will start waiting inifinitely for connection to become successful.
    So shall we have command line argument separately for this also or any other way as you suugest.

    Suggestions/Comments

    With Regards,
    Amit Kapila.
  • Heikki Linnakangas at Nov 8, 2012 at 8:33 am

    On 19.10.2012 14:42, Amit kapila wrote:
    On Thursday, October 18, 2012 8:49 PM Fujii Masao wrote:
    Before implementing the timeout parameter, I think that it's better to change
    both pg_basebackup background process and pg_receivexlog so that they
    send back the reply message immediately when they receive the keepalive
    message requesting the reply. Currently, they always ignore such keepalive
    message, so status interval parameter (-s) in them always must be set to
    the value less than replication timeout. We can avoid this troublesome
    parameter setting by introducing the same logic of walreceiver into both
    pg_basebackup background process and pg_receivexlog.
    Please find the patch attached to address the modification mentioned by you (send immediate reply for keepalive).
    Both basebackup and pg_receivexlog uses the same function ReceiveXLogStream, so single change for both will address the issue.
    Thanks, committed this one after shuffling it around the changes I
    committed yesterday. I also updated the docs to not claim that -s option
    is required to avoid timeout disconnects anymore.

    - Heikki
  • Amit Kapila at Nov 8, 2012 at 8:55 am

    On Thursday, November 08, 2012 2:04 PM Heikki Linnakangas wrote:
    On 19.10.2012 14:42, Amit kapila wrote:
    On Thursday, October 18, 2012 8:49 PM Fujii Masao wrote:
    Before implementing the timeout parameter, I think that it's better
    to change
    both pg_basebackup background process and pg_receivexlog so that they
    send back the reply message immediately when they receive the
    keepalive
    message requesting the reply. Currently, they always ignore such
    keepalive
    message, so status interval parameter (-s) in them always must be set
    to
    the value less than replication timeout. We can avoid this
    troublesome
    parameter setting by introducing the same logic of walreceiver into
    both
    pg_basebackup background process and pg_receivexlog.
    Please find the patch attached to address the modification mentioned
    by you (send immediate reply for keepalive).
    Both basebackup and pg_receivexlog uses the same function
    ReceiveXLogStream, so single change for both will address the issue.

    Thanks, committed this one after shuffling it around the changes I
    committed yesterday. I also updated the docs to not claim that -s option
    is required to avoid timeout disconnects anymore.
    Thank you.
    However I think still the issue will not be completely solved.
    pg_basebackup/pg_receivexlog can still take long time to
    detect network break as they don't have timeout concept. To do that I have
    sent one proposal which is mentioned at end of mail chain:
    http://archives.postgresql.org/message-id/6C0B27F7206C9E4CA54AE035729E9C3828
    53BBED@szxeml509-mbs

    Do you think there is any need to introduce such mechanism in
    pg_basebackup/pg_receivexlog?

    With Regards,
    Amit Kapila.
  • Fujii Masao at Nov 8, 2012 at 5:12 pm

    On Thu, Nov 8, 2012 at 5:53 PM, Amit Kapila wrote:
    On Thursday, November 08, 2012 2:04 PM Heikki Linnakangas wrote:
    On 19.10.2012 14:42, Amit kapila wrote:
    On Thursday, October 18, 2012 8:49 PM Fujii Masao wrote:
    Before implementing the timeout parameter, I think that it's better
    to change
    both pg_basebackup background process and pg_receivexlog so that they
    send back the reply message immediately when they receive the
    keepalive
    message requesting the reply. Currently, they always ignore such
    keepalive
    message, so status interval parameter (-s) in them always must be set
    to
    the value less than replication timeout. We can avoid this
    troublesome
    parameter setting by introducing the same logic of walreceiver into
    both
    pg_basebackup background process and pg_receivexlog.
    Please find the patch attached to address the modification mentioned
    by you (send immediate reply for keepalive).
    Both basebackup and pg_receivexlog uses the same function
    ReceiveXLogStream, so single change for both will address the issue.

    Thanks, committed this one after shuffling it around the changes I
    committed yesterday. I also updated the docs to not claim that -s option
    is required to avoid timeout disconnects anymore.
    Thank you.
    However I think still the issue will not be completely solved.
    pg_basebackup/pg_receivexlog can still take long time to
    detect network break as they don't have timeout concept. To do that I have
    sent one proposal which is mentioned at end of mail chain:
    http://archives.postgresql.org/message-id/6C0B27F7206C9E4CA54AE035729E9C3828
    53BBED@szxeml509-mbs

    Do you think there is any need to introduce such mechanism in
    pg_basebackup/pg_receivexlog?
    Are you planning to introduce the timeout mechanism in pg_basebackup
    main process? Or background process? It's useful to implement both.

    BTW, IIRC the walsender has no timeout mechanism during sending
    backup data to pg_basebackup. So it's also useful to implement the
    timeout mechanism for the walsender during backup.

    Regards,

    --
    Fujii Masao
  • Amit Kapila at Nov 9, 2012 at 6:04 am

    On Thursday, November 08, 2012 10:42 PM Fujii Masao wrote:
    On Thu, Nov 8, 2012 at 5:53 PM, Amit Kapila wrote:
    On Thursday, November 08, 2012 2:04 PM Heikki Linnakangas wrote:
    On 19.10.2012 14:42, Amit kapila wrote:
    On Thursday, October 18, 2012 8:49 PM Fujii Masao wrote:
    Before implementing the timeout parameter, I think that it's
    better
    to change
    both pg_basebackup background process and pg_receivexlog so that
    they
    send back the reply message immediately when they receive the
    keepalive
    message requesting the reply. Currently, they always ignore such
    keepalive
    message, so status interval parameter (-s) in them always must be
    set
    to
    the value less than replication timeout. We can avoid this
    troublesome
    parameter setting by introducing the same logic of walreceiver
    into
    both
    pg_basebackup background process and pg_receivexlog.
    Please find the patch attached to address the modification
    mentioned
    by you (send immediate reply for keepalive).
    Both basebackup and pg_receivexlog uses the same function
    ReceiveXLogStream, so single change for both will address the issue.

    Thanks, committed this one after shuffling it around the changes I
    committed yesterday. I also updated the docs to not claim that -s
    option
    is required to avoid timeout disconnects anymore.
    Thank you.
    However I think still the issue will not be completely solved.
    pg_basebackup/pg_receivexlog can still take long time to
    detect network break as they don't have timeout concept. To do that I have
    sent one proposal which is mentioned at end of mail chain:
    http://archives.postgresql.org/message-
    id/6C0B27F7206C9E4CA54AE035729E9C3828
    53BBED@szxeml509-mbs

    Do you think there is any need to introduce such mechanism in
    pg_basebackup/pg_receivexlog?
    Are you planning to introduce the timeout mechanism in pg_basebackup
    main process? Or background process? It's useful to implement both.
    By background process, you mean ReceiveXlogStream?
    For both.

    I think for background process, it can be done in a way similar to what we
    have done for walreceiver.
    But I have some doubts for how to do for main process:

    Logic similar to walreceiver can not be used incase network goes down during
    getting other database file from server.
    The reason for the same is to receive the data files PQgetCopyData() is
    called in synchronous mode, so it keeps waiting for infinite time till it
    gets some data.
    In order to solve this issue, I can think of following options:
    1. Making this call also asynchronous (but now sure about impact of this).
    2. In function pqWait, instead of passing hard-code value -1 (i.e. infinite
    wait), we can send some finite time. This time can be received as command
    line argument
         from respective utility and set the same in PGconn structure.
         In order to have timeout value in PGconn, we can have:
             a. Add new parameter in PGconn to indicate the receive timeout.
             b. Use the existing parameter connect_timeout for receive timeout
    also but this may lead to confusion.
    3. Any other better option?

    Apart from above issue, there is possibility that if during connect time
    network goes down, then it might hang, because connect_timeout by default
    will be NULL and connectDBComplete will start waiting inifinitely for
    connection to become successful.
    So shall we have command line argument separately for this also or any other
    way as you suugest.
    BTW, IIRC the walsender has no timeout mechanism during sending
    backup data to pg_basebackup. So it's also useful to implement the
    timeout mechanism for the walsender during backup.
    Yes, its useful, but for walsender the main problem is that it uses blocking
    send call to send the data.
    I have tried using tcp_keepalive settings, but the send call doesn't comeout
    incase of network break.
    The only way I could get it out is:
    change in the corresponding file /proc/sys/net/ipv4/tcp_retries2 by using
    the command
                             echo "8" > /proc/sys/net/ipv4/tcp_retries2
    As per recommendation, its value should be at-least 8 (equivalent to 100
    sec)

    Do you have any idea, how it can be achieved?

    With Regards,
    Amit Kapila.
  • Fujii Masao at Nov 12, 2012 at 2:54 pm

    On Fri, Nov 9, 2012 at 3:03 PM, Amit Kapila wrote:
    On Thursday, November 08, 2012 10:42 PM Fujii Masao wrote:
    On Thu, Nov 8, 2012 at 5:53 PM, Amit Kapila <amit.kapila@huawei.com>
    wrote:
    On Thursday, November 08, 2012 2:04 PM Heikki Linnakangas wrote:
    On 19.10.2012 14:42, Amit kapila wrote:
    On Thursday, October 18, 2012 8:49 PM Fujii Masao wrote:
    Before implementing the timeout parameter, I think that it's
    better
    to change
    both pg_basebackup background process and pg_receivexlog so that
    they
    send back the reply message immediately when they receive the
    keepalive
    message requesting the reply. Currently, they always ignore such
    keepalive
    message, so status interval parameter (-s) in them always must be
    set
    to
    the value less than replication timeout. We can avoid this
    troublesome
    parameter setting by introducing the same logic of walreceiver
    into
    both
    pg_basebackup background process and pg_receivexlog.
    Please find the patch attached to address the modification
    mentioned
    by you (send immediate reply for keepalive).
    Both basebackup and pg_receivexlog uses the same function
    ReceiveXLogStream, so single change for both will address the issue.

    Thanks, committed this one after shuffling it around the changes I
    committed yesterday. I also updated the docs to not claim that -s
    option
    is required to avoid timeout disconnects anymore.
    Thank you.
    However I think still the issue will not be completely solved.
    pg_basebackup/pg_receivexlog can still take long time to
    detect network break as they don't have timeout concept. To do that I have
    sent one proposal which is mentioned at end of mail chain:
    http://archives.postgresql.org/message-
    id/6C0B27F7206C9E4CA54AE035729E9C3828
    53BBED@szxeml509-mbs

    Do you think there is any need to introduce such mechanism in
    pg_basebackup/pg_receivexlog?
    Are you planning to introduce the timeout mechanism in pg_basebackup
    main process? Or background process? It's useful to implement both.
    By background process, you mean ReceiveXlogStream?
    For both.

    I think for background process, it can be done in a way similar to what we
    have done for walreceiver. Yes.
    But I have some doubts for how to do for main process:

    Logic similar to walreceiver can not be used incase network goes down during
    getting other database file from server.
    The reason for the same is to receive the data files PQgetCopyData() is
    called in synchronous mode, so it keeps waiting for infinite time till it
    gets some data.
    In order to solve this issue, I can think of following options:
    1. Making this call also asynchronous (but now sure about impact of this).
    +1

    Walreceiver already calls PQgetCopyData() asynchronously. ISTM you can
    solve the issue in the similar way to walreceiver's.
    2. In function pqWait, instead of passing hard-code value -1 (i.e. infinite
    wait), we can send some finite time. This time can be received as command
    line argument
    from respective utility and set the same in PGconn structure.
    In order to have timeout value in PGconn, we can have:
    a. Add new parameter in PGconn to indicate the receive timeout.
    b. Use the existing parameter connect_timeout for receive timeout
    also but this may lead to confusion.
    3. Any other better option?

    Apart from above issue, there is possibility that if during connect time
    network goes down, then it might hang, because connect_timeout by default
    will be NULL and connectDBComplete will start waiting inifinitely for
    connection to become successful.
    So shall we have command line argument separately for this also or any other
    way as you suugest.
    Yes, I think that we should add something like --conninfo option to
    pg_basebackup
    and pg_receivexlog. We can easily set not only connect_timeout but also sslmode,
    application_name, ... by using such option accepting conninfo string.
    BTW, IIRC the walsender has no timeout mechanism during sending
    backup data to pg_basebackup. So it's also useful to implement the
    timeout mechanism for the walsender during backup.
    Yes, its useful, but for walsender the main problem is that it uses blocking
    send call to send the data.
    I have tried using tcp_keepalive settings, but the send call doesn't comeout
    incase of network break.
    The only way I could get it out is:
    change in the corresponding file /proc/sys/net/ipv4/tcp_retries2 by using
    the command
    echo "8" > /proc/sys/net/ipv4/tcp_retries2
    As per recommendation, its value should be at-least 8 (equivalent to 100
    sec)

    Do you have any idea, how it can be achieved?
    What about using pq_putmessage_noblock()?

    Regards,

    --
    Fujii Masao
  • Amit kapila at Nov 13, 2012 at 4:07 am

    On Monday, November 12, 2012 8:23 PM Fujii Masao wrote: On Fri, Nov 9, 2012 at 3:03 PM, Amit Kapila wrote:
    On Thursday, November 08, 2012 10:42 PM Fujii Masao wrote:
    On Thu, Nov 8, 2012 at 5:53 PM, Amit Kapila <amit.kapila@huawei.com>
    wrote:
    On Thursday, November 08, 2012 2:04 PM Heikki Linnakangas wrote:
    On 19.10.2012 14:42, Amit kapila wrote:
    On Thursday, October 18, 2012 8:49 PM Fujii Masao wrote:
    Before implementing the timeout parameter, I think that it's
    better
    to change
    both pg_basebackup background process and pg_receivexlog so that
    BTW, IIRC the walsender has no timeout mechanism during sending
    backup data to pg_basebackup. So it's also useful to implement the
    timeout mechanism for the walsender during backup.
    Yes, its useful, but for walsender the main problem is that it uses blocking
    send call to send the data.
    I have tried using tcp_keepalive settings, but the send call doesn't comeout
    incase of network break.
    The only way I could get it out is:
    change in the corresponding file /proc/sys/net/ipv4/tcp_retries2 by using
    the command
    echo "8" > /proc/sys/net/ipv4/tcp_retries2
    As per recommendation, its value should be at-least 8 (equivalent to 100
    sec)
    Do you have any idea, how it can be achieved?
    What about using pq_putmessage_noblock()?
    I will try this, but do you know why at first place in code the blocking mode is used to send files?
    I am asking as I am little scared that it should not break any design which was initially thought of while making send of files as blocking.

    With Regards,
    Amit Kapila.
  • Fujii Masao at Nov 13, 2012 at 4:02 pm

    On Tue, Nov 13, 2012 at 1:06 PM, Amit kapila wrote:
    On Monday, November 12, 2012 8:23 PM Fujii Masao wrote:
    On Fri, Nov 9, 2012 at 3:03 PM, Amit Kapila wrote:
    On Thursday, November 08, 2012 10:42 PM Fujii Masao wrote:
    On Thu, Nov 8, 2012 at 5:53 PM, Amit Kapila <amit.kapila@huawei.com>
    wrote:
    On Thursday, November 08, 2012 2:04 PM Heikki Linnakangas wrote:
    On 19.10.2012 14:42, Amit kapila wrote:
    On Thursday, October 18, 2012 8:49 PM Fujii Masao wrote:
    Before implementing the timeout parameter, I think that it's
    better
    to change
    both pg_basebackup background process and pg_receivexlog so that
    BTW, IIRC the walsender has no timeout mechanism during sending
    backup data to pg_basebackup. So it's also useful to implement the
    timeout mechanism for the walsender during backup.
    Yes, its useful, but for walsender the main problem is that it uses blocking
    send call to send the data.
    I have tried using tcp_keepalive settings, but the send call doesn't comeout
    incase of network break.
    The only way I could get it out is:
    change in the corresponding file /proc/sys/net/ipv4/tcp_retries2 by using
    the command
    echo "8" > /proc/sys/net/ipv4/tcp_retries2
    As per recommendation, its value should be at-least 8 (equivalent to 100
    sec)
    Do you have any idea, how it can be achieved?
    What about using pq_putmessage_noblock()?
    I will try this, but do you know why at first place in code the blocking mode is used to send files?
    I am asking as I am little scared that it should not break any design which was initially thought of while making send of files as blocking.
    I'm afraid I don't know why. I guess that using non-blocking mode complicates
    the code, so in the first version of pg_basebackup the blocking mode
    was adopted.

    Regards,

    --
    Fujii Masao
  • Amit kapila at Nov 15, 2012 at 1:59 pm

    On Monday, November 12, 2012 8:23 PM Fujii Masao wrote: On Fri, Nov 9, 2012 at 3:03 PM, Amit Kapila wrote:
    On Thursday, November 08, 2012 10:42 PM Fujii Masao wrote:
    On Thu, Nov 8, 2012 at 5:53 PM, Amit Kapila <amit.kapila@huawei.com>
    wrote:
    On Thursday, November 08, 2012 2:04 PM Heikki Linnakangas wrote:
    On 19.10.2012 14:42, Amit kapila wrote:
    On Thursday, October 18, 2012 8:49 PM Fujii Masao wrote:
    Are you planning to introduce the timeout mechanism in pg_basebackup
    main process? Or background process? It's useful to implement both.
    By background process, you mean ReceiveXlogStream?
    For both.
    I think for background process, it can be done in a way similar to what we
    have done for walreceiver. Yes.
    But I have some doubts for how to do for main process:
    Logic similar to walreceiver can not be used incase network goes down during
    getting other database file from server.
    The reason for the same is to receive the data files PQgetCopyData() is
    called in synchronous mode, so it keeps waiting for infinite time till it
    gets some data.
    In order to solve this issue, I can think of following options:
    1. Making this call also asynchronous (but now sure about impact of this).
    +1
    Walreceiver already calls PQgetCopyData() asynchronously. ISTM you can
    solve the issue in the similar way to walreceiver's.
    2. In function pqWait, instead of passing hard-code value -1 (i.e. infinite
    wait), we can send some finite time. This time can be received as command
    line argument
    from respective utility and set the same in PGconn structure.
    Yes, I think that we should add something like --conninfo option to
    pg_basebackup
    and pg_receivexlog. We can easily set not only connect_timeout but also sslmode,
    application_name, ... by using such option accepting conninfo string.
    I have prepared an attached patch to make pg_basebackup and pg_receivexlog as non-blocking.
    To do so I have to add new command line parameters in pg_basebackup and pg_receivexlog
    for now added two more command line arguments
             a. "-r" for pg_basebackup and pg_receivexlog to take receive time-out value. Default value for this parameter is 60 sec.
             b. "-t" for pg_basebackup and pg_receivexlog to take initial connection timeout value. Default value is infinite wait.
    We can change to accept --conninfo as well.

    I feel apart from above, remaining problem is for function call PQgetResult()
    1. Wherever query is getting sent from BaseBackup, it calls the function PQgetResult to receive the result of query.
         As PQgetResult() is blocking function (it calls pqWait which can hang), so if network is down before sending the query itself,
         then there will not be any result, so it will keep hanging in PQgetResult .
    IMO, it can be solved in below ways:
    a. Create one corresponding non-blocking function. But this function is being called from inside some of the
          other libpq function (PQexec->PQexecFinish->PQgetResult). So it can be little tricky to solve this way.
    b. Add the receive_timeout variable in PGconn structure and use it in pqWait for timeout whenever it is set.
    c. any other better way?

    BTW, IIRC the walsender has no timeout mechanism during sending
    backup data to pg_basebackup. So it's also useful to implement the
    timeout mechanism for the walsender during backup.
    What about using pq_putmessage_noblock()?
    I think may be some more functions also needs to be made as noblock. I am still evaluating.

    I will upload the attached patch in commitfest if you don't have any objections?

    More Suggestions/Comments?

    With Regards,
    Amit Kapila.
  • Amit Kapila at Nov 16, 2012 at 11:40 am

    On Thursday, November 15, 2012 7:29 PM Amit kapila wrote:
    On Monday, November 12, 2012 8:23 PM Fujii Masao wrote:
    On Fri, Nov 9, 2012 at 3:03 PM, Amit Kapila wrote:
    On Thursday, November 08, 2012 10:42 PM Fujii Masao wrote:
    On Thu, Nov 8, 2012 at 5:53 PM, Amit Kapila <amit.kapila@huawei.com>
    wrote:
    On Thursday, November 08, 2012 2:04 PM Heikki Linnakangas wrote:
    On 19.10.2012 14:42, Amit kapila wrote:
    On Thursday, October 18, 2012 8:49 PM Fujii Masao wrote:
    Are you planning to introduce the timeout mechanism in pg_basebackup
    I feel apart from above, remaining problem is for function call
    PQgetResult() 1. Wherever query is getting sent from BaseBackup, it
    calls the function PQgetResult to receive the result of query.
    As PQgetResult() is blocking function (it calls pqWait which can
    hang), so if network is down before sending the query itself,
    then there will not be any result, so it will keep hanging in
    PQgetResult .
    IMO, it can be solved in below ways:
    a. Create one corresponding non-blocking function. But this function is
    being called from inside some of the
    other libpq function (PQexec->PQexecFinish->PQgetResult). So it can
    be little tricky to solve this way.
    b. Add the receive_timeout variable in PGconn structure and use it in
    pqWait for timeout whenever it is set.
    c. any other better way?

    BTW, IIRC the walsender has no timeout mechanism during sending
    backup data to pg_basebackup. So it's also useful to implement the
    timeout mechanism for the walsender during backup.
    What about using pq_putmessage_noblock()?
    I think may be some more functions also needs to be made as noblock. I
    am still evaluating.
    Done the analysis and seems that for below API's also, we need equivalent
    noblock, otherwise same problem can happen as they are also
    used in the flow.
             a. pq_endmessage
             b. EndCommand
             c. pq_puttextmessage
             d. pq_putemptymessage
             e. ReadyForQuery - For this, because now walsender and normal
    backend are same.
             f. ReadCommand - For this, because now walsender and normal backend
    are same. It seems solution for it can be tricky as pq_getbyte is not called
    from first level function.

    Suggestions/Thoughts?


    With Regards,
    Amit Kapila.
  • Boszormenyi Zoltan at Jan 1, 2013 at 4:49 pm
    Hi,

    2012-11-15 14:59 keltezéssel, Amit kapila írta:
    On Monday, November 12, 2012 8:23 PM Fujii Masao wrote:
    On Fri, Nov 9, 2012 at 3:03 PM, Amit Kapila wrote:
    On Thursday, November 08, 2012 10:42 PM Fujii Masao wrote:
    On Thu, Nov 8, 2012 at 5:53 PM, Amit Kapila <amit.kapila@huawei.com>
    wrote:
    On Thursday, November 08, 2012 2:04 PM Heikki Linnakangas wrote:
    On 19.10.2012 14:42, Amit kapila wrote:
    On Thursday, October 18, 2012 8:49 PM Fujii Masao wrote:
    Are you planning to introduce the timeout mechanism in pg_basebackup
    main process? Or background process? It's useful to implement both.
    By background process, you mean ReceiveXlogStream?
    For both.
    I think for background process, it can be done in a way similar to what we
    have done for walreceiver. Yes.
    But I have some doubts for how to do for main process:
    Logic similar to walreceiver can not be used incase network goes down during
    getting other database file from server.
    The reason for the same is to receive the data files PQgetCopyData() is
    called in synchronous mode, so it keeps waiting for infinite time till it
    gets some data.
    In order to solve this issue, I can think of following options:
    1. Making this call also asynchronous (but now sure about impact of this).
    +1
    Walreceiver already calls PQgetCopyData() asynchronously. ISTM you can
    solve the issue in the similar way to walreceiver's.
    2. In function pqWait, instead of passing hard-code value -1 (i.e. infinite
    wait), we can send some finite time. This time can be received as command
    line argument
    from respective utility and set the same in PGconn structure.
    Yes, I think that we should add something like --conninfo option to
    pg_basebackup
    and pg_receivexlog. We can easily set not only connect_timeout but also sslmode,
    application_name, ... by using such option accepting conninfo string.
    I have prepared an attached patch to make pg_basebackup and pg_receivexlog as non-blocking.
    To do so I have to add new command line parameters in pg_basebackup and pg_receivexlog
    for now added two more command line arguments
    a. "-r" for pg_basebackup and pg_receivexlog to take receive time-out value. Default value for this parameter is 60 sec.
    b. "-t" for pg_basebackup and pg_receivexlog to take initial connection timeout value. Default value is infinite wait.
    We can change to accept --conninfo as well.

    I feel apart from above, remaining problem is for function call PQgetResult()
    1. Wherever query is getting sent from BaseBackup, it calls the function PQgetResult to receive the result of query.
    As PQgetResult() is blocking function (it calls pqWait which can hang), so if network is down before sending the query itself,
    then there will not be any result, so it will keep hanging in PQgetResult .
    IMO, it can be solved in below ways:
    a. Create one corresponding non-blocking function. But this function is being called from inside some of the
    other libpq function (PQexec->PQexecFinish->PQgetResult). So it can be little tricky to solve this way.
    b. Add the receive_timeout variable in PGconn structure and use it in pqWait for timeout whenever it is set.
    c. any other better way?

    BTW, IIRC the walsender has no timeout mechanism during sending
    backup data to pg_basebackup. So it's also useful to implement the
    timeout mechanism for the walsender during backup.
    What about using pq_putmessage_noblock()?
    I think may be some more functions also needs to be made as noblock. I am still evaluating.

    I will upload the attached patch in commitfest if you don't have any objections?

    More Suggestions/Comments?

    With Regards,
    Amit Kapila.
    I am reviewing your patch.

       * Is the patch in context diff format <http://en.wikipedia.org/wiki/Diff#Context_format>?


    Yes.

       * Does it apply cleanly to the current git master?


    Not quite cleanly but it doesn't produce rejects or fuzz, only offset warnings:

    [zozo@localhost postgresql]$ cat ../noblock_basebackup_and_receivexlog.patch | patch -p1
    patching file src/bin/pg_basebackup/pg_basebackup.c
    Hunk #1 succeeded at 41 (offset -6 lines).
    Hunk #2 succeeded at 123 (offset -6 lines).
    Hunk #3 succeeded at 239 (offset -6 lines).
    Hunk #4 succeeded at 292 (offset -6 lines).
    Hunk #5 succeeded at 470 (offset -6 lines).
    Hunk #6 succeeded at 588 (offset -6 lines).
    Hunk #7 succeeded at 601 (offset -6 lines).
    Hunk #8 succeeded at 727 (offset -6 lines).
    Hunk #9 succeeded at 779 (offset -6 lines).
    Hunk #10 succeeded at 797 (offset -6 lines).
    Hunk #11 succeeded at 811 (offset -6 lines).
    Hunk #12 succeeded at 879 (offset -6 lines).
    Hunk #13 succeeded at 1080 (offset -6 lines).
    Hunk #14 succeeded at 1381 (offset -6 lines).
    Hunk #15 succeeded at 1409 (offset -6 lines).
    Hunk #16 succeeded at 1521 (offset -6 lines).
    patching file src/bin/pg_basebackup/pg_receivexlog.c
    Hunk #1 succeeded at 35 (offset -6 lines).
    Hunk #2 succeeded at 65 (offset -6 lines).
    Hunk #3 succeeded at 224 (offset -6 lines).
    Hunk #4 succeeded at 281 (offset -6 lines).
    Hunk #5 succeeded at 314 (offset -6 lines).
    Hunk #6 succeeded at 341 (offset -5 lines).
    Hunk #7 succeeded at 379 (offset -5 lines).
    patching file src/bin/pg_basebackup/receivelog.c
    Hunk #1 succeeded at 181 (offset -9 lines).
    Hunk #2 succeeded at 201 (offset -9 lines).
    Hunk #3 succeeded at 223 (offset -9 lines).
    Hunk #4 succeeded at 333 (offset -9 lines).
    Hunk #5 succeeded at 342 (offset -9 lines).
    Hunk #6 succeeded at 397 (offset -9 lines).
    Hunk #7 succeeded at 484 (offset -9 lines).
    Hunk #8 succeeded at 533 (offset -9 lines).
    Hunk #9 succeeded at 550 (offset -9 lines).
    patching file src/bin/pg_basebackup/receivelog.h
    patching file src/bin/pg_basebackup/streamutil.c
    Hunk #1 succeeded at 66 (offset -6 lines).
    Hunk #2 succeeded at 87 (offset -6 lines).
    Hunk #3 succeeded at 118 (offset -6 lines).
    patching file src/bin/pg_basebackup/streamutil.h

       * Does it include reasonable tests, necessary doc patches, etc?


    The test cases are not applicable. There is no test framework for
    testing network outage in "make check".

    There are no documentation patches for the new --recvtimeout=INTERVAL
    and --conntimeout=INTERVAL options for either pg_basebackup or
    pg_receivexlog.

       * Does the patch actually implement that?


    It seems so, the patch adds the connect_timeout parameter to
    the connection options and uses PQgetCopyData(..., 1) to get
    the data asynchronously and uses select(2) to watch for incoming
    data.

       * Do we want that?


    It can speed up detecting network breakdown so yes.

       * Do we already have it?


    No.

       * Does it follow SQL spec, or the community-agreed behavior?


    There's no such SQL spec. The behaviour is desired.

       * Does it include pg_dump support (if applicable)?


    Not applicable.

       * Are there dangers?


    The patch author researched more functions that need
    to be extended in a nonblocking way.
    http://archives.postgresql.org/pgsql-hackers/2012-11/msg00863.php

       * Have all the bases been covered?


    For pg_basebackup/pg_receivexlog (for PQgetCopyData and
    PQconnect), yes.

    Per the previous comment, no. But those are for the backend
    to notice network breakdowns and as such, they need a
    separate patch.

       * Does the feature work as advertised?


    Yes.

    I tested it between two machines and pulled the ethernet
    plug while pg_basebackup was running. With "-r 2", pg_basebackup
    detected the timeout after 2 seconds. Without the patch, I lost
    patience after two minutes and pressed Ctrl-C in pg_basebackup.

    I also tested pg_receivexlog and it also noticed the network error
    in the specified timeout.

       * Are there corner cases the author has failed to consider?


    As far as I can see in the client-side libpq code flow, no.

       * Are there any assertion failures or crashes?


    Not applicable, the patch is for client applicatiions.

       * Does the patch slow down simple tests?


    No.

       * If it claims to improve performance, does it?


    Not applicable, not a performance patch. But it really
    improves detecting network breakdown.

       * Does it slow down other things?


    No.

       * Does it follow the project coding guidelines
         <http://developer.postgresql.org/pgdocs/postgres/source.html>?


    Yes.

       * Are there portability issues?


    No. It introduces atoi() and select() as new calls, these are portable.

       * Will it work on Windows/BSD etc?


    It should.

       * Are the comments sufficient and accurate?


    This chunk below removes a comment which seems obvious enough
    so it's not needed:

    ***************
    *** 518,524 **** ReceiveXlogStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
                              goto error;
                      }

    ! /* Check the message type. */
                      if (copybuf[0] == 'k')
                      {
                              int pos;
    --- 559,568 ----
                              goto error;
                      }

    ! /* Set the last reply timestamp */
    ! last_recv_timestamp = localGetCurrentTimestamp();
    ! ping_sent = false;
    !
                      if (copybuf[0] == 'k')
                      {
                              int pos;
    ***************


    Other comments are sufficient and accurate.

       * Does it do what it says, correctly?


    This question is redundant with the above "Does the feature work as advertised?"
    So yes.

       * Does it produce compiler warnings?


    No.

       * Can you make it crash?


    No.

       * Is everything done in a way that fits together coherently with other features/modules?


    Yes.

       * Are there interdependencies that can cause problems?


    No.


    Best regards,
    Zoltán Böszörményi


    --
    ----------------------------------
    Zoltán Böszörményi
    Cybertec Schönig & Schönig GmbH
    Gröhrmühlgasse 26
    A-2700 Wiener Neustadt, Austria
    Web: http://www.postgresql-support.de
           http://www.postgresql.at/
  • Hari Babu at Jan 2, 2013 at 7:11 am

    On January 01, 2013 10:19 PM Boszormenyi Zoltan wrote:
    I am reviewing your patch.
    • Is the patch in context diff format?
    Yes.
    Thanks for reviewing the patch.
    • Does it apply cleanly to the current git master?
    Not quite cleanly but it doesn't produce rejects or fuzz, only offset
    warnings:

    Will rebase the patch to head.
    • Does it include reasonable tests, necessary doc patches, etc?
    The test cases are not applicable. There is no test framework for
    testing network outage in "make check".

    There are no documentation patches for the new --recvtimeout=INTERVAL
    and --conntimeout=INTERVAL options for either pg_basebackup or
    pg_receivexlog.
    I will add the documentation for the same.

    Per the previous comment, no. But those are for the backend
    to notice network breakdowns and as such, they need a
    separate patch.
    I also think it is better to handle it as a separate patch for walsender.
    • Are the comments sufficient and accurate?
    This chunk below removes a comment which seems obvious enough
    so it's not needed:
    ***************
    *** 518,524 **** ReceiveXlogStream(PGconn *conn, XLogRecPtr startpos,
    uint32 timeline,
    goto error;
    }

    !               /* Check the message type. */
    if (copybuf[0] == 'k')
    {
    int             pos;
    --- 559,568 ----
    goto error;
    }

    !               /* Set the last reply timestamp */
    !               last_recv_timestamp = localGetCurrentTimestamp();
    !               ping_sent = false;
    !
    if (copybuf[0] == 'k')
    {
    int             pos;
    ***************

    Other comments are sufficient and accurate.
    I will fix and update the patch.

    Please let me know if anything apart from above needs to be taken care.

    Regards,
    Hari babu.
  • Hari Babu at Jan 4, 2013 at 12:43 pm

    On January 02, 2013 12:41 PM Hari Babu wrote:
    On January 01, 2013 10:19 PM Boszormenyi Zoltan wrote:
    I am reviewing your patch.
    • Is the patch in context diff format?
    Yes.
    Thanks for reviewing the patch.
    • Does it apply cleanly to the current git master?
    Not quite cleanly but it doesn't produce rejects or fuzz, only offset
    warnings:
    Will rebase the patch to head.
    • Does it include reasonable tests, necessary doc patches, etc?
    The test cases are not applicable. There is no test framework for
    testing network outage in "make check".

    There are no documentation patches for the new --recvtimeout=INTERVAL
    and --conntimeout=INTERVAL options for either pg_basebackup or
    pg_receivexlog.
    I will add the documentation for the same.
    Per the previous comment, no. But those are for the backend
    to notice network breakdowns and as such, they need a
    separate patch.
    I also think it is better to handle it as a separate patch for walsender.
    • Are the comments sufficient and accurate?
    This chunk below removes a comment which seems obvious enough
    so it's not needed:
    ***************
    *** 518,524 **** ReceiveXlogStream(PGconn *conn, XLogRecPtr startpos,
    uint32 timeline,
    goto error;
    }

    !               /* Check the message type. */
    if (copybuf[0] == 'k')
    {
    int             pos;
    --- 559,568 ----
    goto error;
    }

    !               /* Set the last reply timestamp */
    !               last_recv_timestamp = localGetCurrentTimestamp();
    !               ping_sent = false;
    !
    if (copybuf[0] == 'k')
    {
    int             pos;
    ***************

    Other comments are sufficient and accurate.
    I will fix and update the patch.
    The attached V2 patch in the mail handles all the review comments identified
    above.

    Regards,
    Hari babu.
  • Boszormenyi Zoltan at Jan 7, 2013 at 2:23 pm

    2013-01-04 13:43 keltezéssel, Hari Babu írta:
    On January 02, 2013 12:41 PM Hari Babu wrote:
    On January 01, 2013 10:19 PM Boszormenyi Zoltan wrote:
    I am reviewing your patch.
    • Is the patch in context diff format?
    Yes.
    Thanks for reviewing the patch.
    • Does it apply cleanly to the current git master?
    Not quite cleanly but it doesn't produce rejects or fuzz, only offset
    warnings:
    Will rebase the patch to head.
    • Does it include reasonable tests, necessary doc patches, etc?
    The test cases are not applicable. There is no test framework for
    testing network outage in "make check".

    There are no documentation patches for the new --recvtimeout=INTERVAL
    and --conntimeout=INTERVAL options for either pg_basebackup or
    pg_receivexlog.
    I will add the documentation for the same.
    Per the previous comment, no. But those are for the backend
    to notice network breakdowns and as such, they need a
    separate patch.
    I also think it is better to handle it as a separate patch for walsender.
    • Are the comments sufficient and accurate?
    This chunk below removes a comment which seems obvious enough
    so it's not needed:
    ***************
    *** 518,524 **** ReceiveXlogStream(PGconn *conn, XLogRecPtr startpos,
    uint32 timeline,
    goto error;
    }

    ! /* Check the message type. */
    if (copybuf[0] == 'k')
    {
    int pos;
    --- 559,568 ----
    goto error;
    }

    ! /* Set the last reply timestamp */
    ! last_recv_timestamp = localGetCurrentTimestamp();
    ! ping_sent = false;
    !
    if (copybuf[0] == 'k')
    {
    int pos;
    ***************

    Other comments are sufficient and accurate.
    I will fix and update the patch.
    The attached V2 patch in the mail handles all the review comments identified
    above.

    Regards,
    Hari babu.
    Since my other patch against pg_basebackup is now committed,
    this patch doesn't apply cleanly, patch rejects 2 hunks.
    The fixed up patch is attached.

    Best regards,
    Zoltán Böszörményi

    --
    ----------------------------------
    Zoltán Böszörményi
    Cybertec Schönig & Schönig GmbH
    Gröhrmühlgasse 26
    A-2700 Wiener Neustadt, Austria
    Web: http://www.postgresql-support.de
           http://www.postgresql.at/
  • Hari Babu at Jan 9, 2013 at 4:04 am

    On January 07, 2013 7:53 PM Boszormenyi Zoltan wrote:
    Since my other patch against pg_basebackup is now committed,
    this patch doesn't apply cleanly, patch rejects 2 hunks.
    The fixed up patch is attached.
    Patch is verified. Thanks for rebasing the patch.

    Regards,
    Hari babu.
  • Abhijit Menon-Sen at Jan 16, 2013 at 7:48 am
    Hi.

    This patch was marked "Needs review" with no reviewers in the ongoing
    CF, so I decided to take a look at it. I see that Zoltan has posted a
    review, so I've added him to the list.

    But I took a look at the latest patch in any case. Here are some
    comments, mostly cosmetic ones.
    diff -dcrpN postgresql.orig/doc/src/sgml/ref/pg_basebackup.sgml postgresql/doc/src/sgml/ref/pg_basebackup.sgml
    *** postgresql.orig/doc/src/sgml/ref/pg_basebackup.sgml 2013-01-05 17:34:30.742135371 +0100
    --- postgresql/doc/src/sgml/ref/pg_basebackup.sgml 2013-01-07 15:11:40.787007890 +0100
    *************** PostgreSQL documentation
    *** 400,405 ****
    --- 400,425 ----
    </varlistentry>

    <varlistentry>
    + <term><option>-r <replaceable class="parameter">interval</replaceable></option></term>
    + <term><option>--recvtimeout=<replaceable class="parameter">interval</replaceable></option></term>
    + <listitem>
    + <para>
    + time that receiver waits for communication from server (in seconds).
    + </para>
    + </listitem>
    + </varlistentry>
    I would reword this as "The maximum time (in seconds) to wait for data
    from the server (default: wait forever)".
    + <varlistentry>
    + <term><option>-t <replaceable class="parameter">interval</replaceable></option></term>
    + <term><option>--conntimeout=<replaceable class="parameter">interval</replaceable></option></term>
    + <listitem>
    + <para>
    + time that client wait for connection to establish with server (in seconds).
    + </para>
    + </listitem>
    + </varlistentry>
    Likewise, "The maximum time (in seconds) to wait for a connection to the
    server to succeed (default: wait forever)".

    Same thing in pg_receivexlog.sgml. Also, there's trailing whitespace in
    various places in these files (and elsewhere in the patch), which should
    be fixed.
    diff -dcrpN postgresql.orig/src/bin/pg_basebackup/pg_basebackup.c postgresql/src/bin/pg_basebackup/pg_basebackup.c
    *** postgresql.orig/src/bin/pg_basebackup/pg_basebackup.c 2013-01-05 17:34:30.778135625 +0100
    --- postgresql/src/bin/pg_basebackup/pg_basebackup.c 2013-01-07 15:16:24.610037886 +0100
    *************** bool streamwal = false;
    *** 45,50 ****
    --- 45,54 ----
    bool fastcheckpoint = false;
    bool writerecoveryconf = false;
    int standby_message_timeout = 10 * 1000; /* 10 sec = default */
    + int standby_recv_timeout = 60*1000; /* 60 sec = default */
    + char *standby_connect_timeout = NULL;
    I don't really like standby_recv_timeout being an int and
    standby_connect_timeout being a char *. I understand that it's so that
    it can be assigned to "values[i]" in GetConnection(), but that reason is
    very distant, and not obvious from this code at all.

    That said, I don't know if it's really worth bothering with.
    + #define NAPTIME_PER_CYCLE 100 /* max sleep time between cycles (100ms) */
    This probably needs a better comment. Why are we sleeping between
    cycles? What cycles?
    + printf(_(" -r, --recvtimeout=INTERVAL time that receiver waits for communication from\n"
    + " server (in seconds)\n"));
    + printf(_(" -t, --conntimeout=INTERVAL time that client wait for connection to establish\n"
    + " with server (in seconds)\n"));
    Same comments about wording apply, but perhaps there's no need to
    mention the default.
    ! if (r == 0 || (r < 0 && errno == EINTR))
    ! {
    ! /*
    ! * Got a timeout or signal. Before Continuing the loop, check for timeout.
    ! */
    ! if (standby_recv_timeout > 0)
    ! {
    ! now = localGetCurrentTimestamp();
    I'd make "now" local to this block, and get rid of the comment. The two
    "if"s are perfectly clear. This applies to the same pattern in other
    places in the patch as well.
    ! if (localTimestampDifferenceExceeds(last_recv_timestamp, now, standby_recv_timeout))
    ! {
    ! fprintf(stderr, _("%s: terminating DB File receive due to timeout\n"),
    Better wording? "DB File receive" is confusing. Even something like
    "Closing connection due to read timeout" would be better. Or perhaps
    you can make it like the following message, slightly lower:
    ! if (PQconsumeInput(conn) == 0)
    ! {
    ! fprintf(stderr,
    ! _("%s: could not receive data from WAL Sender: %s"),
    ! progname, PQerrorMessage(conn));
    …and in the former case, say "read timeout" instead of PQerrorMessage().
    ! /* Set the last reply timestamp */
    ! last_recv_timestamp = localGetCurrentTimestamp();
    !
    ! /* Some data is received, so go back read them in buffer*/
    ! continue;
    No need for these comments.
    + /* Set the last reply timestamp */
    + last_recv_timestamp = localGetCurrentTimestamp();
    Likewise (in various places).
    /*
    ! * Connect in replication mode to the server, Sending connect_timeout
    ! * as configured, there is no need for rw_timeout.
    */
    ! conn = GetConnection(standby_connect_timeout);
    This comment is pretty confusing.
    * Connect to the server. Returns a valid PGconn pointer if connected,
    * or NULL on non-permanent error. On permanent error, the function will
    * call exit(1) directly.
    + * Set conn_timeout to PGconn structure if their value
    + * is not NULL.
    */
    PGconn *
    ! GetConnection(char *conn_timeout)
    And this comment is just wrong.

    The patch looks OK otherwise. Zoltan indicated that his tests were
    successful, so I didn't retest. Marking "Waiting on author" again.

    -- Abhijit
  • Heikki Linnakangas at Jan 16, 2013 at 10:31 am

    On 07.01.2013 16:23, Boszormenyi Zoltan wrote:
    Since my other patch against pg_basebackup is now committed,
    this patch doesn't apply cleanly, patch rejects 2 hunks.
    The fixed up patch is attached.
    Now that I look at this a high-level perspective, why are we only
    worried about timeouts in the Copy-mode and when connecting? The initial
    checkpoint could take a long time too, and if the server turns into a
    black hole while the checkpoint is running, pg_basebackup will still
    hang. Then again, a short timeout on that phase would be a bad idea,
    because the checkpoint can indeed take a long time.

    In streaming replication, the keep-alive messages carry additional
    information, the timestamps and WAL locations, so a keepalive makes
    sense at that level. But otherwise, aren't we just trying to reimplement
    TCP keepalives? TCP keepalives are not perfect, but if we want to have
    an application level timeout, it should be implemented in the FE/BE
    protocol.

    I don't think we need to do anything specific to pg_basebackup. The user
    can simply specify TCP keepalive settings in the connection string, like
    with any libpq program.

    - Heikki
  • Amit Kapila at Jan 18, 2013 at 6:51 am

    On Wednesday, January 16, 2013 4:02 PM Heikki Linnakangas wrote:
    On 07.01.2013 16:23, Boszormenyi Zoltan wrote:
    Since my other patch against pg_basebackup is now committed,
    this patch doesn't apply cleanly, patch rejects 2 hunks.
    The fixed up patch is attached.
    Now that I look at this a high-level perspective, why are we only
    worried about timeouts in the Copy-mode and when connecting? The
    initial
    checkpoint could take a long time too, and if the server turns into a
    black hole while the checkpoint is running, pg_basebackup will still
    hang. Then again, a short timeout on that phase would be a bad idea,
    because the checkpoint can indeed take a long time.
    True, but IMO, if somebody want to take basebackup, he should do that when
    the server is not loaded.
    In streaming replication, the keep-alive messages carry additional
    information, the timestamps and WAL locations, so a keepalive makes
    sense at that level. But otherwise, aren't we just trying to
    reimplement
    TCP keepalives? TCP keepalives are not perfect, but if we want to have
    an application level timeout, it should be implemented in the FE/BE
    protocol.

    I don't think we need to do anything specific to pg_basebackup. The
    user
    can simply specify TCP keepalive settings in the connection string,
    like
    with any libpq program.
    I think currently user has no way to specify TCP keepalive settings from
    pg_basebackup, please let me know if there is any such existing way?

    I think specifying TCP settings is very cumbersome for most users, that's
    the reason most standard interfaces (ODBC/JDBC) have such application level
    timeout mechanism.

    By implementing in FE/BE protocol (do you mean to say that make such
    non-blocking behavior inside Libpq or something else), it might be generic
    and can be used for others as well but it might need few interface changes.

    IMHO if by having such less impact changes for pg_basebackup, it makes
    pg_basebackup network sensitive, the current approach can also be
    considered.


    With Regards,
    Amit Kapila.
  • Heikki Linnakangas at Jan 18, 2013 at 10:15 am

    On 18.01.2013 08:50, Amit Kapila wrote:
    I think currently user has no way to specify TCP keepalive settings from
    pg_basebackup, please let me know if there is any such existing way?
    I was going to say you can just use "keepalives_idle=30" in the
    connection string. But there's no way to pass a connection string to
    pg_basebackup on the command line! The usual way to pass a connection
    string is to pass it as the database name, and PQconnect will expand it,
    but that doesn't work with pg_basebackup because it hardcodes the
    database name as "replication". D'oh.

    You could still use environment variables and a service file to do it,
    but it's certainly more cumbersome. It clearly should be possible to
    pass a full connection string to pg_basebackup, that's an obvious oversight.

    - Heikki
  • Amit Kapila at Jan 18, 2013 at 11:42 am

    On Friday, January 18, 2013 3:46 PM Heikki Linnakangas wrote:
    On 18.01.2013 08:50, Amit Kapila wrote:
    I think currently user has no way to specify TCP keepalive settings from
    pg_basebackup, please let me know if there is any such existing way?
    I was going to say you can just use "keepalives_idle=30" in the
    connection string. But there's no way to pass a connection string to
    pg_basebackup on the command line! The usual way to pass a connection
    string is to pass it as the database name, and PQconnect will expand
    it,
    but that doesn't work with pg_basebackup because it hardcodes the
    database name as "replication". D'oh.

    You could still use environment variables and a service file to do it,
    but it's certainly more cumbersome. It clearly should be possible to
    pass a full connection string to pg_basebackup, that's an obvious
    oversight.
    So to solve this problem below can be done:
    1. Support connection string in pg_basebackup and mention keepalives or
    connection_timeout
    2. Support recv_timeout separately to provide a way to users who are not
    comfortable tcp keepalives

    a. 1 can be done alone
    b. 2 can be done alone
    c. both 1 and 2.

    With Regards,
    Amit Kapila.
  • Heikki Linnakangas at Jan 18, 2013 at 12:17 pm

    On 18.01.2013 13:41, Amit Kapila wrote:
    On Friday, January 18, 2013 3:46 PM Heikki Linnakangas wrote:
    On 18.01.2013 08:50, Amit Kapila wrote:
    I think currently user has no way to specify TCP keepalive settings from
    pg_basebackup, please let me know if there is any such existing way?
    I was going to say you can just use "keepalives_idle=30" in the
    connection string. But there's no way to pass a connection string to
    pg_basebackup on the command line! The usual way to pass a connection
    string is to pass it as the database name, and PQconnect will expand
    it,
    but that doesn't work with pg_basebackup because it hardcodes the
    database name as "replication". D'oh.

    You could still use environment variables and a service file to do it,
    but it's certainly more cumbersome. It clearly should be possible to
    pass a full connection string to pg_basebackup, that's an obvious
    oversight.
    So to solve this problem below can be done:
    1. Support connection string in pg_basebackup and mention keepalives or
    connection_timeout
    2. Support recv_timeout separately to provide a way to users who are not
    comfortable tcp keepalives

    a. 1 can be done alone
    b. 2 can be done alone
    c. both 1 and 2.
    Right. Let's do just 1 for now. An general application level, non-TCP,
    keepalive message at the libpq level might be a good idea, but that's a
    much larger patch, definitely not 9.3 material.

    - Heikki

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedOct 1, '12 at 10:39a
activeJan 28, '13 at 9:45a
posts65
users12
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase