Now that we have the wonderful latch facility, let's use it to reduce
the delay between receiving a piece of WAL and applying in the standby.
Currently, the startup process polls every 100ms to see if new WAL has
arrived, which adds an average a 50 ms delay between a transaction
commit in the master and it appearing as committed in a hot standby
server. The latch patch eliminated a similar polling delay in walsender
already, the attached patch does the same for walreceiver.

After this patch, there is no unnecessary delays in the streaming
replication code path. Note that this is all still asynchronous, just
with reduced latency.

This is pretty straightforward, but any comments?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Search Discussions

  • Thom Brown at Sep 13, 2010 at 11:48 am

    On 13 September 2010 12:40, Heikki Linnakangas wrote:
    Now that we have the wonderful latch facility, let's use it to reduce the
    delay between receiving a piece of WAL and applying in the standby.
    Currently, the startup process polls every 100ms to see if new WAL has
    arrived, which adds an average a 50 ms delay between a transaction commit in
    the master and it appearing as committed in a hot standby server. The latch
    patch eliminated a similar polling delay in walsender already, the attached
    patch does the same for walreceiver.

    After this patch, there is no unnecessary delays in the streaming
    replication code path. Note that this is all still asynchronous, just with
    reduced latency.

    This is pretty straightforward, but any comments?
    Is that supposed to be waiting 5000ms?

    --
    Thom Brown
    Twitter: @darkixion
    IRC (freenode): dark_ixion
    Registered Linux user: #516935
  • Thom Brown at Sep 13, 2010 at 11:53 am

    On 13 September 2010 12:47, Thom Brown wrote:
    On 13 September 2010 12:40, Heikki Linnakangas
    wrote:
    Now that we have the wonderful latch facility, let's use it to reduce the
    delay between receiving a piece of WAL and applying in the standby.
    Currently, the startup process polls every 100ms to see if new WAL has
    arrived, which adds an average a 50 ms delay between a transaction commit in
    the master and it appearing as committed in a hot standby server. The latch
    patch eliminated a similar polling delay in walsender already, the attached
    patch does the same for walreceiver.

    After this patch, there is no unnecessary delays in the streaming
    replication code path. Note that this is all still asynchronous, just with
    reduced latency.

    This is pretty straightforward, but any comments?
    Is that supposed to be waiting 5000ms?
    Ignore me, I can see that it's right.

    --
    Thom Brown
    Twitter: @darkixion
    IRC (freenode): dark_ixion
    Registered Linux user: #516935
  • Heikki Linnakangas at Sep 13, 2010 at 11:54 am

    On 13/09/10 14:47, Thom Brown wrote:
    On 13 September 2010 12:40, Heikki Linnakangas
    wrote:
    Now that we have the wonderful latch facility, let's use it to reduce the
    delay between receiving a piece of WAL and applying in the standby.
    Currently, the startup process polls every 100ms to see if new WAL has
    arrived, which adds an average a 50 ms delay between a transaction commit in
    the master and it appearing as committed in a hot standby server. The latch
    patch eliminated a similar polling delay in walsender already, the attached
    patch does the same for walreceiver.

    After this patch, there is no unnecessary delays in the streaming
    replication code path. Note that this is all still asynchronous, just with
    reduced latency.

    This is pretty straightforward, but any comments?
    Is that supposed to be waiting 5000ms?
    Yes, it gets interrupted as soon as WAL arrives, that timeout is to poll
    for the standby trigger file to appear or SIGTERM.

    BTW, I noticed that I missed incrementing the latch count in
    win32_latch.c, and the owning/disowning the latch was done correctly,
    you get an error if you restart the master and reconnect. I'll post an
    updated patch shortly.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Heikki Linnakangas at Sep 13, 2010 at 12:13 pm

    On 13/09/10 14:54, Heikki Linnakangas wrote:
    BTW, I noticed that I missed incrementing the latch count in
    win32_latch.c, and the owning/disowning the latch was done correctly,
    you get an error if you restart the master and reconnect. I'll post an
    updated patch shortly.
    Here's an updated patch with those bugs fixed.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Fujii Masao at Sep 14, 2010 at 2:03 am

    On Mon, Sep 13, 2010 at 9:13 PM, Heikki Linnakangas wrote:
    Here's an updated patch with those bugs fixed.
    Great!

    + /*
    + * Walreceiver sets this latch every time new WAL has been received and
    + * fsync'd to disk, allowing startup process to wait for new WAL to
    + * arrive.
    + */
    + Latch receivedLatch;

    I think that this latch should be available for other than walreceiver -
    startup process communication. For example, backend - startup process
    communication, which can be used for requesting a failover via SQL function
    by users in the future. What about putting the latch in XLogCtl instead of
    WalRcv and calling OwnLatch at the beginning of the startup process instead
    of RequestXLogStreaming?

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
  • Heikki Linnakangas at Sep 14, 2010 at 8:51 am

    On 14/09/10 05:02, Fujii Masao wrote:
    + /*
    + * Walreceiver sets this latch every time new WAL has been received and
    + * fsync'd to disk, allowing startup process to wait for new WAL to
    + * arrive.
    + */
    + Latch receivedLatch;

    I think that this latch should be available for other than walreceiver -
    startup process communication. For example, backend - startup process
    communication, which can be used for requesting a failover via SQL function
    by users in the future. What about putting the latch in XLogCtl instead of
    WalRcv and calling OwnLatch at the beginning of the startup process instead
    of RequestXLogStreaming?
    Yes, good point. I updated the patch along those lines, attached.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Fujii Masao at Sep 14, 2010 at 1:46 pm

    On Tue, Sep 14, 2010 at 5:51 PM, Heikki Linnakangas wrote:
    On 14/09/10 05:02, Fujii Masao wrote:

    +       /*
    +        * Walreceiver sets this latch every time new WAL has been
    received and
    +        * fsync'd to disk, allowing startup process to wait for new WAL
    to
    +        * arrive.
    +        */
    +       Latch           receivedLatch;

    I think that this latch should be available for other than walreceiver -
    startup process communication. For example, backend - startup process
    communication, which can be used for requesting a failover via SQL
    function
    by users in the future. What about putting the latch in XLogCtl instead of
    WalRcv and calling OwnLatch at the beginning of the startup process
    instead
    of RequestXLogStreaming?
    Yes, good point. I updated the patch along those lines, attached.
    Looks good.

    + /*
    + * Take ownership of the wakup latch if we're going to sleep during
    + * recovery.
    + */
    + if (StandbyMode)
    + OwnLatch(&XLogCtl->recoveryWakeupLatch);

    Since automatic restart after backend crash always performs a normal crash
    recovery, the startup process will never call OwnLatch more than once. So
    there might be no harm even if the startup process doesn't disown the shared
    latch. But... what about calling DisownLatch at the end of recovery just in
    case?

    Regards,

    --
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedSep 13, '10 at 11:40a
activeSep 14, '10 at 1:46p
posts8
users3
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase