On Sat, Jan 23, 2010 at 4:37 PM, Simon Riggs wrote:
max_standby_delay = -1 option removed to prevent deadlock.
This seems unacceptable to me. It means it's impossible to configure a
reporting slave so it doesn't spuriously signal errors if your reports
run too long.

Recall that I am still of the opinion that the only reasonable default
value for this parameter is actually -1. I don't think we should
signal errors for correctly working systems unless the user requests
them in some way.

Was there any discussion about this change? I don't recall seeing it
proposed on hackers.

--
greg

Search Discussions

  • Simon Riggs at Jan 23, 2010 at 8:28 pm

    On Sat, 2010-01-23 at 17:35 +0000, Greg Stark wrote:
    On Sat, Jan 23, 2010 at 4:37 PM, Simon Riggs wrote:
    max_standby_delay = -1 option removed to prevent deadlock.
    This seems unacceptable to me. It means it's impossible to configure a
    reporting slave so it doesn't spuriously signal errors if your reports
    run too long.

    Recall that I am still of the opinion that the only reasonable default
    value for this parameter is actually -1. I don't think we should
    signal errors for correctly working systems unless the user requests
    them in some way.
    What is your proposed way of handling buffer pin deadlocks? That will be
    acceptable and working to some extent in the next week?

    Wait forever isn't always a good idea, anymore, if it ever was.

    Lots of things still on the TODO, if you are looking for a project.
    http://wiki.postgresql.org/wiki/Hot_Standby_TODO

    --
    Simon Riggs www.2ndQuadrant.com
  • Greg Stark at Jan 23, 2010 at 9:41 pm

    On Sat, Jan 23, 2010 at 8:28 PM, Simon Riggs wrote:
    What is your proposed way of handling buffer pin deadlocks? That will be
    acceptable and working to some extent in the next week?

    Wait forever isn't always a good idea, anymore, if it ever was.
    I've never said it was always a good idea. But killing correctly
    running queries isn't always a good idea either. I'm interested in
    using HS for running read-only replicas for load balancing. It would
    pretty sad if queries dispatched to a read-only replica received a
    spurious unpredictable errors for reasons the application programmer
    cannot control.

    I'll look at the buffer pin deadlock problem again, but I didn't
    realize the situation was so dire. And what were the downsides of the
    "stop gap"?


    --
    greg
  • Simon Riggs at Jan 23, 2010 at 9:59 pm

    On Sat, 2010-01-23 at 21:40 +0000, Greg Stark wrote:
    On Sat, Jan 23, 2010 at 8:28 PM, Simon Riggs wrote:
    What is your proposed way of handling buffer pin deadlocks? That will be
    acceptable and working to some extent in the next week?

    Wait forever isn't always a good idea, anymore, if it ever was.
    I've never said it was always a good idea. But killing correctly
    running queries isn't always a good idea either. I'm interested in
    using HS for running read-only replicas for load balancing. It would
    pretty sad if queries dispatched to a read-only replica received a
    spurious unpredictable errors for reasons the application programmer
    cannot control.
    I understand your concern and seek to provide the best way forwards in
    the time available. Hopefully you have a better way, but we can do
    little about the time. Your input is welcome, and your code also.
    I'll look at the buffer pin deadlock problem again, but I didn't
    realize the situation was so dire. And what were the downsides of the
    "stop gap"?
    Any query that attempted to wait for a lock threw an ERROR.

    Since the -1 setting would never resolve a deadlock itself, if we
    allowed it we would have to either use the stop gap or use a full
    deadlock detector.

    Given the stop gap does what -1 says it will never do, ISTM that having
    -1 would be contradictory. I did not wish to remove it, but it seemed
    safer to do so. Putting it back is straightforward, if it makes sense.

    We would need to detect deadlock from both directions, when Startup
    begins to wait when users sleep and when users begin to wait when
    Startup sleeps. Full deadlock detection is to much code for too small a
    problem.

    --
    Simon Riggs www.2ndQuadrant.com
  • Heikki Linnakangas at Jan 25, 2010 at 7:52 am

    Simon Riggs wrote:
    On Sat, 2010-01-23 at 21:40 +0000, Greg Stark wrote:
    On Sat, Jan 23, 2010 at 8:28 PM, Simon Riggs wrote:
    What is your proposed way of handling buffer pin deadlocks? That will be
    acceptable and working to some extent in the next week?

    Wait forever isn't always a good idea, anymore, if it ever was.
    I've never said it was always a good idea. But killing correctly
    running queries isn't always a good idea either. I'm interested in
    using HS for running read-only replicas for load balancing. It would
    pretty sad if queries dispatched to a read-only replica received a
    spurious unpredictable errors for reasons the application programmer
    cannot control.
    I understand your concern and seek to provide the best way forwards in
    the time available. Hopefully you have a better way, but we can do
    little about the time. Your input is welcome, and your code also.
    I just woke up to this thread too. I have to agree with Greg, we must
    think harder.

    Can you summarize the problem again? I don't immediately see how the
    deadlock could happen.

    Would this simple scheme work:

    When the startup process has waited for a short while (ie
    deadlock_timeout), it sends the signal "please check if you're holding a
    pin on buffer X" to all backends. When a backend receives that signal,
    it checks if it is holding a pin on the given buffer *and* waiting on a
    lock. If it is, abort the transaction. Assuming that a backend can only
    block waiting on a lock held by the startup process, deadlock detection
    is as simple as that.
    Given the stop gap does what -1 says it will never do, ISTM that having
    -1 would be contradictory. I did not wish to remove it, but it seemed
    safer to do so. Putting it back is straightforward, if it makes sense.
    For all practical purposes, INT_MAX, which is the upper limit for
    max_standby_delay, is the same as infinity. So removing -1 doesn't
    really get you out of jail. And no, let's not make the upper limit
    smaller, there's no natural upper limit for that setting.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Simon Riggs at Jan 25, 2010 at 9:57 am

    On Mon, 2010-01-25 at 09:52 +0200, Heikki Linnakangas wrote:
    Simon Riggs wrote:
    On Sat, 2010-01-23 at 21:40 +0000, Greg Stark wrote:
    On Sat, Jan 23, 2010 at 8:28 PM, Simon Riggs wrote:
    What is your proposed way of handling buffer pin deadlocks? That will be
    acceptable and working to some extent in the next week?

    Wait forever isn't always a good idea, anymore, if it ever was.
    I've never said it was always a good idea. But killing correctly
    running queries isn't always a good idea either. I'm interested in
    using HS for running read-only replicas for load balancing. It would
    pretty sad if queries dispatched to a read-only replica received a
    spurious unpredictable errors for reasons the application programmer
    cannot control.
    I understand your concern and seek to provide the best way forwards in
    the time available. Hopefully you have a better way, but we can do
    little about the time. Your input is welcome, and your code also.
    I just woke up to this thread too. I have to agree with Greg, we must
    think harder.
    Must is a word I would disagree with. There are other bigger usability
    issues to resolve at present and I'm not personally going to be
    distracted away from addressing them. I have no problem in other
    contributions.
    Can you summarize the problem again? I don't immediately see how the
    deadlock could happen.

    Would this simple scheme work:

    When the startup process has waited for a short while (ie
    deadlock_timeout), it sends the signal "please check if you're holding a
    pin on buffer X" to all backends. When a backend receives that signal,
    it checks if it is holding a pin on the given buffer *and* waiting on a
    lock. If it is, abort the transaction. Assuming that a backend can only
    block waiting on a lock held by the startup process, deadlock detection
    is as simple as that.
    No, it won't work. A deadlock could occur after the startup process has
    already been waiting for longer than the deadlock timeout.

    Better ideas welcome, but solutions may not be forthcoming in the time
    available.

    --
    Simon Riggs www.2ndQuadrant.com
  • Heikki Linnakangas at Jan 25, 2010 at 9:58 am

    Simon Riggs wrote:
    On Mon, 2010-01-25 at 09:52 +0200, Heikki Linnakangas wrote:
    Would this simple scheme work:

    When the startup process has waited for a short while (ie
    deadlock_timeout), it sends the signal "please check if you're holding a
    pin on buffer X" to all backends. When a backend receives that signal,
    it checks if it is holding a pin on the given buffer *and* waiting on a
    lock. If it is, abort the transaction. Assuming that a backend can only
    block waiting on a lock held by the startup process, deadlock detection
    is as simple as that.
    No, it won't work. A deadlock could occur after the startup process has
    already been waiting for longer than the deadlock timeout.
    Retry every deadlock_timeout seconds?

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Heikki Linnakangas at Jan 25, 2010 at 9:58 am

    Heikki Linnakangas wrote:
    Simon Riggs wrote:
    On Mon, 2010-01-25 at 09:52 +0200, Heikki Linnakangas wrote:
    Would this simple scheme work:

    When the startup process has waited for a short while (ie
    deadlock_timeout), it sends the signal "please check if you're holding a
    pin on buffer X" to all backends. When a backend receives that signal,
    it checks if it is holding a pin on the given buffer *and* waiting on a
    lock. If it is, abort the transaction. Assuming that a backend can only
    block waiting on a lock held by the startup process, deadlock detection
    is as simple as that.
    No, it won't work. A deadlock could occur after the startup process has
    already been waiting for longer than the deadlock timeout.
    Retry every deadlock_timeout seconds?
    Or better yet, also check if the current backend is holding the
    waited-for pin in CheckDeadLock().

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Simon Riggs at Jan 25, 2010 at 10:01 am

    On Mon, 2010-01-25 at 10:59 +0200, Heikki Linnakangas wrote:
    Heikki Linnakangas wrote:
    Simon Riggs wrote:
    On Mon, 2010-01-25 at 09:52 +0200, Heikki Linnakangas wrote:
    Would this simple scheme work:

    When the startup process has waited for a short while (ie
    deadlock_timeout), it sends the signal "please check if you're holding a
    pin on buffer X" to all backends. When a backend receives that signal,
    it checks if it is holding a pin on the given buffer *and* waiting on a
    lock. If it is, abort the transaction. Assuming that a backend can only
    block waiting on a lock held by the startup process, deadlock detection
    is as simple as that.
    No, it won't work. A deadlock could occur after the startup process has
    already been waiting for longer than the deadlock timeout.
    Retry every deadlock_timeout seconds?
    Or better yet, also check if the current backend is holding the
    waited-for pin in CheckDeadLock().
    The deadlock can be caused by either party. As long as the check occurs
    in both places, it can be done.

    The logic for the startup process must be enhanced to allow for both
    deadlocks and normal pin buffer checks happening at different times
    without confusion. The SIGUSR1 message received by backend would need to
    differ as to whether it was a deadlock check timeout or a normal buffer
    pin timeout.

    It can be done, though will require very careful testing. It's clearly a
    lower priority than other code based upon feedback from the Hot Standby
    user group. My assessment is too much code, too rare a case and too
    little time, so it is a relative, not absolute judgement.

    I would not personally argue this is something worth delaying for,
    though you and Greg may wish to do that. If you insisted it was me that
    did this, I would not be in a position to start it for about 10 days.

    --
    Simon Riggs www.2ndQuadrant.com
  • Heikki Linnakangas at Jan 25, 2010 at 2:22 pm

    Simon Riggs wrote:
    It's clearly a
    lower priority than other code based upon feedback from the Hot Standby
    user group.
    What's the "the Hot Standby user group"?

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Simon Riggs at Jan 25, 2010 at 2:26 pm

    On Mon, 2010-01-25 at 16:22 +0200, Heikki Linnakangas wrote:

    Simon Riggs wrote:
    It's clearly a
    lower priority than other code based upon feedback from the Hot Standby
    user group.
    What's the "the Hot Standby user group"?
    A group of people who have an interest in using Hot Standby, as
    advertised on postgresql.org and Weekly News.

    --
    Simon Riggs www.2ndQuadrant.com
  • Josh Berkus at Jan 25, 2010 at 6:50 pm

    A group of people who have an interest in using Hot Standby, as
    advertised on postgresql.org and Weekly News.
    There are pg users who won't be using HS/SR? ;-)

    --Josh

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedJan 23, '10 at 5:36p
activeJan 25, '10 at 6:50p
posts12
users4
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase