FAQ

[PostgreSQL-Hackers] LWLock Queue Jumping

Simon Riggs
Aug 28, 2009 at 7:06 pm
WALInsertLock is heavily contended and likely always will be even if we
apply some of the planned fixes.

Some callers of WALInsertLock are more important than others

* Writing new Clog or Multixact pages (serialized by ClogControlLock)
* For Hot Standby, writing SnapshotData (serialized by ProcArrayLock)

In these cases it seems like we can skip straight to the front of the
WALInsertLock queue without problem.

Most other items cannot be safely reordered, possibly no other items.

We already re-order the lock queues when we hold shared locks, so we
know in principle it is OK to do so. This is an extension of that
thought.

Implementing this would do much to remove my objection to performance
issues associated with simplifying the Hot Standby patch, as recently
suggested by Heikki.

Possible? If so, we can discuss implementation. No worries if not, but
just a side thought that may be fruitful.

--
Simon Riggs www.2ndQuadrant.com
reply

Search Discussions

11 responses

  • Jeff Janes at Aug 28, 2009 at 9:44 pm

    ---------- Forwarded message ----------
    From: Simon Riggs <simon@2ndQuadrant.com>
    To: pgsql-hackers <pgsql-hackers@postgresql.org>
    Date: Fri, 28 Aug 2009 20:07:32 +0100
    Subject: LWLock Queue Jumping

    WALInsertLock is heavily contended and likely always will be even if we
    apply some of the planned fixes.

    Some callers of WALInsertLock are more important than others

    * Writing new Clog or Multixact pages (serialized by ClogControlLock)
    * For Hot Standby, writing SnapshotData (serialized by ProcArrayLock)

    In these cases it seems like we can skip straight to the front of the
    WALInsertLock queue without problem.

    Most other items cannot be safely reordered, possibly no other items.

    We already re-order the lock queues when we hold shared locks, so we
    know in principle it is OK to do so. This is an extension of that
    thought.

    Implementing this would do much to remove my objection to performance
    issues associated with simplifying the Hot Standby patch, as recently
    suggested by Heikki.

    Possible? If so, we can discuss implementation. No worries if not, but
    just a side thought that may be fruitful.

    I'd previously implemented this just by copying and pasting and making some
    changes, perhaps not the most desirable way but I thought adding another
    parameter to all existing invocations would be a bit excessive. The
    attached patch will convert the existing LWLockAcquire into
    LWLockAcquire_head, rather than adding a new function. Sorry if that is not
    the optimal way to send this, I wanted to make it easy to see just the
    changes, even though the functions aren't technically the same thing
    anymore.

    I've tested it fairly thoroughly, in the context of using it in
    AdvanceXLInsertBuffer for acquiring the WALWriteLock.

    Jeff
  • Simon Riggs at Aug 29, 2009 at 11:00 am

    On Fri, 2009-08-28 at 14:44 -0700, Jeff Janes wrote:

    I'd previously implemented this just by copying and pasting and making
    some changes, perhaps not the most desirable way but I thought adding
    another parameter to all existing invocations would be a bit
    excessive.
    That's the way I would implement it also, but would call it
    LWLockAcquireWithPriority() so that it's purpose is clear, rather than
    refer to its implementation, which may change.
    I've tested it fairly thoroughly,
    Please send the tested patch, if this isn't it. What tests were made?
    in the context of using it in AdvanceXLInsertBuffer for acquiring the
    WALWriteLock.
    Apologies if you'd already suggested that recently. I read a few of your
    posts but not all of them.

    I don't think WALWriteLock from AdvanceXLInsertBuffer is an important
    area, but I don't see any harm from it either.

    --
    Simon Riggs www.2ndQuadrant.com
  • Jeff Janes at Aug 29, 2009 at 6:57 pm
    On Sat, Aug 29, 2009 at 4:02 AM, Simon Riggs wrote:
    On Fri, 2009-08-28 at 14:44 -0700, Jeff Janes wrote:

    I'd previously implemented this just by copying and pasting and making
    some changes, perhaps not the most desirable way but I thought adding
    another parameter to all existing invocations would be a bit
    excessive.
    That's the way I would implement it also, but would call it
    LWLockAcquireWithPriority() so that it's purpose is clear, rather than
    refer to its implementation, which may change.


    Yes, good idea. But thinking about it as a patch to be applied, rather than
    a proof of concept, I think the best solution would be to add a third
    argument (boolean Priority) to LWLockAquire, and hunt down all existing
    invocations and change them to include false as the extra argument. Copying
    160 lines of code to change 4 of them in the copy is temporarily easier, but
    not a good solution for the long term.

    I've tested it fairly thoroughly,
    Please send the tested patch, if this isn't it. What tests were made?

    I'd have a hard time coming up with the full origianl patch, as my changes
    for files other than lwlock.c were blown away by parallel efforts and an
    rsync to the repo. The above was just an exploratory tool, not proposed as
    an actual patch to be applied to HEAD. If we want to add a parameter to the
    existing LWLockAcquire, I'll work on coming up with a tested patch for
    that. My testing was to run the normal regression test (which often failed
    to detect my early buggy implementations), then load testing with pgbench
    (which always (that I know of) found them when -c > 1) and a custom Perl
    script I use. Since WALWriteLock is heavily used and contended under
    pgbench -c 20, and lwlock is agnostic to the exact identity of the
    underlying lock, I think this test was pretty thorough for the
    implementation. But not of course for starvation issues, which would have
    to be tested on a case by case basis when a specific Acquire invocation is
    changed to be high priority.

    If you have ideas for other tests to do, or corner cases that are likely to
    be overlooked by my tests, I'll try to work tests for them in too.

    in the context of using it in AdvanceXLInsertBuffer for acquiring the
    WALWriteLock.
    Apologies if you'd already suggested that recently. I read a few of your
    posts but not all of them.
    I don't think WALWriteLock from AdvanceXLInsertBuffer is an important
    area, but I don't see any harm from it either.


    I had not mentioned it before. The change helped by ~50% or so when
    wal_buffers was undersized (kept at the default setting) but did not help
    significantly when wal_buffers was generously sized. I didn't think we
    would be interested in introducing a new locking procedure just to optimize
    performance for a poorly configured server. But if we are to introduce this
    method for other reasons, I think it should be used for
    AdvanceXLInsertBuffer as well.

    Cheers,

    Jeff
  • Greg Stark at Aug 29, 2009 at 11:28 pm

    On Fri, Aug 28, 2009 at 8:07 PM, Simon Riggswrote:
    WALInsertLock is heavily contended and likely always will be even if we
    apply some of the planned fixes.
    I've lost any earlier messages, could you resend the raw data on which
    this is based?
    Some callers of WALInsertLock are more important than others

    * Writing new Clog or Multixact pages (serialized by ClogControlLock)
    * For Hot Standby, writing SnapshotData (serialized by ProcArrayLock)

    In these cases it seems like we can skip straight to the front of the
    WALInsertLock queue without problem.
    How does re-ordering reduce the contention? We reorder shared lockers
    ahead of exclusive lockers because they can all hold the lock at the
    same time so we can reduce the amount of time the lock is held.

    Reordering some exclusive lockers ahead of other exclusive lockers
    won't reduce the amount of time the lock is held at all. Are you
    saying the reason to do it is to reduce time spent waiting on this
    lock while holding other critical locks? Do we have tools to measure
    how long is being spent waiting on one lock while holding another lock
    so we can see if there's a problem and whether this helps?
  • Heikki Linnakangas at Aug 30, 2009 at 6:09 am

    Greg Stark wrote:
    On Fri, Aug 28, 2009 at 8:07 PM, Simon Riggswrote:
    WALInsertLock is heavily contended and likely always will be even if we
    apply some of the planned fixes.
    I've lost any earlier messages, could you resend the raw data on which
    this is based?
    I don't have any pointers right now, but WALInsertLock does often show
    up as a bottleneck in write-intensive benchmarks.
    Some callers of WALInsertLock are more important than others

    * Writing new Clog or Multixact pages (serialized by ClogControlLock)
    * For Hot Standby, writing SnapshotData (serialized by ProcArrayLock)

    In these cases it seems like we can skip straight to the front of the
    WALInsertLock queue without problem.
    Reordering some exclusive lockers ahead of other exclusive lockers
    won't reduce the amount of time the lock is held at all. Are you
    saying the reason to do it is to reduce time spent waiting on this
    lock while holding other critical locks?
    That's what I thought. I don't know about the clog/multixact issue, it
    doesn't seem like it would be too bad, given how seldom new clog or
    multixact pages are written.

    The Hot Standby thing has been discussed, but no-one has actually posted
    a patch which does the locking correctly, where the ProcArrayLock is
    held while the SnapshotData WAL record is inserted. So there is no
    evidence that it's actually a problem, we might be making a mountain out
    of a molehill. It will have practically no effect on throughput, given
    how seldom SnapshotData records are written (once per checkpoint), but
    if it causes a significant bump to response times, that might be a problem.

    This is a good idea to keep in mind, but right now it feels like a
    solution in search of a problem.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Stefan Kaltenbrunner at Aug 30, 2009 at 9:49 am

    Heikki Linnakangas wrote:
    Greg Stark wrote:
    On Fri, Aug 28, 2009 at 8:07 PM, Simon Riggswrote:
    WALInsertLock is heavily contended and likely always will be even if we
    apply some of the planned fixes.
    I've lost any earlier messages, could you resend the raw data on which
    this is based?
    I don't have any pointers right now, but WALInsertLock does often show
    up as a bottleneck in write-intensive benchmarks.
    yeah I recently ran accross that issue with testing concurrent COPY
    performance:

    http://www.kaltenbrunner.cc/blog/index.php?/archives/27-Benchmarking-8.4-Chapter-2bulk-loading.html
    discussed here:

    http://archives.postgresql.org/pgsql-hackers/2009-06/msg01019.php


    and (iirc) also here:

    http://archives.postgresql.org/pgsql-hackers/2009-06/msg01133.php


    however the general issue is easily visible in almost any write
    intensive concurrent workload on a fast IO subsystem(ie
    pgbench,sysbench,...).


    Stefan
  • Simon Riggs at Aug 30, 2009 at 12:21 pm

    On Sun, 2009-08-30 at 09:03 +0300, Heikki Linnakangas wrote:

    The Hot Standby thing has been discussed, but no-one has actually posted
    a patch which does the locking correctly, where the ProcArrayLock is
    held while the SnapshotData WAL record is inserted. So there is no
    evidence that it's actually a problem, we might be making a mountain out
    of a molehill. It will have practically no effect on throughput, given
    how seldom SnapshotData records are written (once per checkpoint), but
    if it causes a significant bump to response times, that might be a problem.

    This is a good idea to keep in mind, but right now it feels like a
    solution in search of a problem.
    The most important thing is to get HS committed and to do that I think
    it is important that I show you I am willing to respond to review
    comments. So I will implement it the way you propose and defer any
    further discussion about lock contention. The idea here is a simple fix
    and very easy enough to return to later, if we need it.

    --
    Simon Riggs www.2ndQuadrant.com
  • Jeff Janes at Aug 30, 2009 at 5:45 pm

    ---------- Forwarded message ----------
    From: Stefan Kaltenbrunner <stefan@kaltenbrunner.cc>
    To: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>
    Date: Sun, 30 Aug 2009 11:48:47 +0200
    Subject: Re: LWLock Queue Jumping
    Heikki Linnakangas wrote:
    Greg Stark wrote:
    On Fri, Aug 28, 2009 at 8:07 PM, Simon Riggs<simon@2ndquadrant.com>
    wrote:
    WALInsertLock is heavily contended and likely always will be even if we
    apply some of the planned fixes.
    I've lost any earlier messages, could you resend the raw data on which
    this is based?
    I don't have any pointers right now, but WALInsertLock does often show
    up as a bottleneck in write-intensive benchmarks.
    yeah I recently ran accross that issue with testing concurrent COPY
    performance:


    http://www.kaltenbrunner.cc/blog/index.php?/archives/27-Benchmarking-8.4-Chapter-2bulk-loading.html
    discussed here:

    http://archives.postgresql.org/pgsql-hackers/2009-06/msg01019.php


    It looks like this is the bulk loading of data into unindexed tables. How
    good is that as a target for optimization? I can see several (quite
    difficult to code and maintain) ways to make bulk loading into unindexed
    tables faster, but they would not speed up the more general cases.



    I played around a little with this, parallel bulk loads into a unindexed,
    very skinny table. If I hacked XLogInsert so that it did nothing but take
    the WALInsertLock, release it, then return a fake RecPtr, it scaled better
    but still not very well. So giant leaps in throughput would need to involve
    calling XLogInsert less often (or at least taking the WALInsertLock less
    often). You could nibble around the edges by tweaking what happens under
    the WALInsertLock, but I don't think that that will get you big wins by
    doing that for this case. But again, how important is this case? Are bulk
    loads into skinny unindexed tables the best test-bed for improving
    XLogInsert?

    (Sorry, I think I forgot to change the subject on previous message. Digests
    are great if you only read, but for contributing I guess I have to change to
    receiving each message)

    Jeff
  • Stefan Kaltenbrunner at Aug 30, 2009 at 6:02 pm

    Jeff Janes wrote:
    ---------- Forwarded message ----------
    From: Stefan Kaltenbrunner <stefan@kaltenbrunner.cc>
    To: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com
    Date: Sun, 30 Aug 2009 11:48:47 +0200
    Subject: Re: LWLock Queue Jumping
    Heikki Linnakangas wrote:

    Greg Stark wrote:

    On Fri, Aug 28, 2009 at 8:07 PM, Simon
    Riggswrote:

    WALInsertLock is heavily contended and likely always
    will be even if we
    apply some of the planned fixes.

    I've lost any earlier messages, could you resend the raw
    data on which
    this is based?


    I don't have any pointers right now, but WALInsertLock does
    often show
    up as a bottleneck in write-intensive benchmarks.


    yeah I recently ran accross that issue with testing concurrent COPY
    performance:

    http://www.kaltenbrunner.cc/blog/index.php?/archives/27-Benchmarking-8.4-Chapter-2bulk-loading.html
    discussed here:

    http://archives.postgresql.org/pgsql-hackers/2009-06/msg01019.php



    It looks like this is the bulk loading of data into unindexed tables.
    How good is that as a target for optimization? I can see several (quite
    difficult to code and maintain) ways to make bulk loading into unindexed
    tables faster, but they would not speed up the more general cases.
    well bulk loading into unindexed tables is quite a common workload -
    apart from dump/restore cycles (which we can now do in parallel) a lot
    of analytic workloads are that way.
    Import tons of data from various sources every night/weeek/month, index,
    analyze & aggregate, drop again.


    and (iirc) also here:

    http://archives.postgresql.org/pgsql-hackers/2009-06/msg01133.php



    I played around a little with this, parallel bulk loads into a
    unindexed, very skinny table. If I hacked XLogInsert so that it did
    nothing but take the WALInsertLock, release it, then return a fake
    RecPtr, it scaled better but still not very well. So giant leaps in
    throughput would need to involve calling XLogInsert less often (or at
    least taking the WALInsertLock less often). You could nibble around the
    edges by tweaking what happens under the WALInsertLock, but I don't
    think that that will get you big wins by doing that for this case. But
    again, how important is this case? Are bulk loads into skinny unindexed
    tables the best test-bed for improving XLogInsert?
    well you can get similiar looking profiles from other workloads (say
    pgbench) as well. Pretty sure the archives have examples for those as well..


    Stefan
  • Jeff Janes at Aug 30, 2009 at 8:47 pm

    On Sun, Aug 30, 2009 at 11:01 AM, Stefan Kaltenbrunner wrote:

    Jeff Janes wrote:
    ---------- Forwarded message ----------
    From: Stefan Kaltenbrunner <stefan@kaltenbrunner.cc>
    To: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com
    Date: Sun, 30 Aug 2009 11:48:47 +0200
    Subject: Re: LWLock Queue Jumping
    Heikki Linnakangas wrote:


    I don't have any pointers right now, but WALInsertLock does
    often show
    up as a bottleneck in write-intensive benchmarks.


    yeah I recently ran accross that issue with testing concurrent COPY
    performance:


    http://www.kaltenbrunner.cc/blog/index.php?/archives/27-Benchmarking-8.4-Chapter-2bulk-loading.html
    discussed here:

    http://archives.postgresql.org/pgsql-hackers/2009-06/msg01019.php


    It looks like this is the bulk loading of data into unindexed tables. How
    good is that as a target for optimization? I can see several (quite
    difficult to code and maintain) ways to make bulk loading into unindexed
    tables faster, but they would not speed up the more general cases.
    well bulk loading into unindexed tables is quite a common workload - apart
    from dump/restore cycles (which we can now do in parallel) a lot of analytic
    workloads are that way.
    Import tons of data from various sources every night/weeek/month, index,
    analyze & aggregate, drop again.

    In those cases where you end by dropping the tables, we should be willing to
    bypass WAL altogether, right? Is the problem we can bypass WAL (by doing
    the COPY in the same transaction that created or truncated the table), or we
    can COPY in parallel, but we can't do both simultaneously?


    Jeff
  • Stefan Kaltenbrunner at Aug 31, 2009 at 8:49 am

    Jeff Janes wrote:
    On Sun, Aug 30, 2009 at 11:01 AM, Stefan Kaltenbrunner
    wrote:

    Jeff Janes wrote:

    ---------- Forwarded message ----------
    From: Stefan Kaltenbrunner <stefan@kaltenbrunner.cc>
    To: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com
    Date: Sun, 30 Aug 2009 11:48:47 +0200
    Subject: Re: LWLock Queue Jumping
    Heikki Linnakangas wrote:


    I don't have any pointers right now, but WALInsertLock does
    often show
    up as a bottleneck in write-intensive benchmarks.


    yeah I recently ran accross that issue with testing
    concurrent COPY
    performance:


    http://www.kaltenbrunner.cc/blog/index.php?/archives/27-Benchmarking-8.4-Chapter-2bulk-loading.html
    discussed here:

    http://archives.postgresql.org/pgsql-hackers/2009-06/msg01019.php


    It looks like this is the bulk loading of data into unindexed
    tables. How good is that as a target for optimization? I can
    see several (quite difficult to code and maintain) ways to make
    bulk loading into unindexed tables faster, but they would not
    speed up the more general cases.


    well bulk loading into unindexed tables is quite a common workload -
    apart from dump/restore cycles (which we can now do in parallel) a
    lot of analytic workloads are that way.
    Import tons of data from various sources every night/weeek/month,
    index, analyze & aggregate, drop again.


    In those cases where you end by dropping the tables, we should be
    willing to bypass WAL altogether, right? Is the problem we can bypass
    WAL (by doing the COPY in the same transaction that created or truncated
    the table), or we can COPY in parallel, but we can't do both simultaneously?
    well yes that is part of the problem - if you bulk load into one or few
    tables concurrently you can only sometimes make use of the WAL bypass
    optimization. This is especially interesting if you consider that COPY
    alone is more or less CPU bottlenecked these days so using multiple
    cores makes sense to get higher load rates.


    Stefan

Related Discussions