We've got various facilities now for looking at LWLock waits, but I'd
like to have more information about the *reasons* for lock waits.

I know its possible to get backtraces in Dtrace at various tracepoints
but that can be fairly hard to interpret.

I'm thinking of adding an extra parameter onto every call to
LockBuffer() and LWLockAcquire() to explain the reason for the lock
request.

Then when we analyze lock waits we can apportion the lock wait reason
correctly and determine not just where the waits are happening, but
*why* the waits are happening. Activity Based Costing the beanies call
it.

This will be especially helpful with transitory events that might (or
might not) occur fairly frequently, yet the point of contention moves
around within the server. There's a few hotspot reasons that move
around, plus I'm certain there's one or two we have no idea about.

This won't help much for highly contended locks where a lock queue of
100 might have 99 fast lock holders and 1 slow one. But it will help for
the transitory locking where a lock is frequently not held, yet we want
to explain what was happening when the lock *was* held.

Reason would be an enum value and reporting would probably be covered by
LWLOCK_STATS.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

Search Discussions

  • Tom Lane at Feb 4, 2008 at 10:06 pm

    Simon Riggs writes:
    I'm thinking of adding an extra parameter onto every call to
    LockBuffer() and LWLockAcquire() to explain the reason for the lock
    request.
    Maybe I'm missing something, but I don't see what this would buy us,
    except for being able to track which call site resulted in a wait;
    which can already be measured with dtrace, no?

    I'm hesitant to decorate such widely-used functions with extra tracing
    information. You'd be breaking every third-party module and pending
    patch that uses either function, and imposing some performance penalty
    (probably not much, but it's hard to be sure) into the indefinite
    future, for a performance measurement need that might be fleeting.

    Basically I'd rather try to attack the problem with dtrace ...

    regards, tom lane
  • Simon Riggs at Feb 5, 2008 at 10:10 am

    On Mon, 2008-02-04 at 17:06 -0500, Tom Lane wrote:

    Basically I'd rather try to attack the problem with dtrace ...
    OK. I'll switch to Solaris. Or do you something I don't about dtrace on
    linux?

    --
    Simon Riggs
    2ndQuadrant http://www.2ndQuadrant.com
  • Heikki Linnakangas at Feb 5, 2008 at 2:15 pm

    Simon Riggs wrote:
    On Mon, 2008-02-04 at 17:06 -0500, Tom Lane wrote:

    Basically I'd rather try to attack the problem with dtrace ...
    OK. I'll switch to Solaris. Or do you something I don't about dtrace on
    linux?
    One idea would be to add new arguments to LWLockAcquire as you suggest,
    but instead of modifying all call sites, decorate it with a macro that
    passes __FILE__ and __LINE__ as the extra arguments. The good thing
    about that is that it's a relatively small patch and you can easily
    switch it on/off with a #ifdef. And there's no need to push for
    inclusion of that into CVS, because it would be an easy patch to
    maintain yourself.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Simon Riggs at Feb 5, 2008 at 2:56 pm

    On Tue, 2008-02-05 at 14:14 +0000, Heikki Linnakangas wrote:
    Simon Riggs wrote:
    On Mon, 2008-02-04 at 17:06 -0500, Tom Lane wrote:

    Basically I'd rather try to attack the problem with dtrace ...
    OK. I'll switch to Solaris. Or do you something I don't about dtrace on
    linux?
    One idea would be to add new arguments to LWLockAcquire as you suggest,
    but instead of modifying all call sites, decorate it with a macro that
    passes __FILE__ and __LINE__ as the extra arguments. The good thing
    about that is that it's a relatively small patch and you can easily
    switch it on/off with a #ifdef. And there's no need to push for
    inclusion of that into CVS, because it would be an easy patch to
    maintain yourself.
    Thanks for the idea. It had occurred to me to make a private patch, but
    I prefer my patches to be open, so they're easier to discuss results
    arising from them.

    --
    Simon Riggs
    2ndQuadrant http://www.2ndQuadrant.com
  • Tom Lane at Feb 5, 2008 at 4:55 pm

    Simon Riggs writes:
    On Mon, 2008-02-04 at 17:06 -0500, Tom Lane wrote:
    Basically I'd rather try to attack the problem with dtrace ...
    OK. I'll switch to Solaris. Or do you something I don't about dtrace on
    linux?
    Nope :-(. The SystemTap guys keep promising support for userspace
    probes but it's not there yet.

    regards, tom lane
  • Staale Smedseng at Feb 6, 2008 at 2:31 pm

    On Mon, 2008-02-04 at 19:46, Simon Riggs wrote:
    We've got various facilities now for looking at LWLock waits, but I'd
    like to have more information about the *reasons* for lock waits.

    I know its possible to get backtraces in Dtrace at various tracepoints
    but that can be fairly hard to interpret.
    Simon,

    A couple of guys here at Sun have started looking at trying to improve
    the scaling on Sun's T2000 Niagara servers (single core, 32 HW-threads).

    We mostly use various DTrace scripts to monitor locking usage. We'd be
    happy to share results, probes and DTrace scripts for monitoring if you
    like.

    So far we've found lock contention (especially for the ProcArrayLock) to
    be the likely source for our inability to saturate the CPU with a TPC-C
    like OLTP load with >1000 users.

    The following shows average time spent in and waiting for exclusive
    ProcArrayLock vs the time used in the transaction through commit, (i.e.,
    up until the exclusive ProcArrayLock acquire to update the PGPROC
    setting transaction no longer running):

    # ./pg-lwlock-procarray.d $(pgrep -n postgres)

    postgres`LWLockAcquire
    postgres`ProcArrayEndTransaction+0x10
    postgres`CommitTransaction+0xf0
    postgres`CommitTransactionCommand+0x90
    postgres`finish_xact_command+0x60
    postgres`exec_execute_message+0x3b4
    postgres`PostgresMain+0x13a0
    postgres`BackendRun+0x27c
    postgres`BackendStartup+0xe0
    postgres`ServerLoop+0x1a0
    postgres`PostmasterMain+0xcbc
    postgres`main+0x1d8
    postgres`_start+0x108
    23
    avg lwlock acquire service time [ns] 193051989
    transaction-commit [count] 23
    transaction-start [count] 23
    avg transaction time [ns] 12763079

    The stack trace shows that the only time the lock is acquired
    exclusively is from the call to ProcArrayEndTransaction() in
    CommitTransaction().

    Also, an interesting observation is that the hot locks seem to have
    changed from v8.2 to v8.3, making the ProcArrayLock more contended. See
    the following outputs:

    PostgreSQL 8.2 (32-bit):

    -bash-3.00# ./825_lwlock_wait.d

    Lock Id Mode Count
    FirstLockMgrLock Shared 2
    ProcArrayLock Shared 209
    XidGenLock Exclusive 1030
    XidGenLock Shared 1215
    WALInsertLock Exclusive 3942
    CLogControlLock Shared 4113
    ProcArrayLock Exclusive 6929
    WALWriteLock Exclusive 17155
    CLogControlLock Exclusive 128182

    Lock Id Combined Time (ns)
    FirstLockMgrLock 24705
    WALInsertLock 79644210
    XidGenLock 179886846
    ProcArrayLock 1486950738
    WALWriteLock 18425400504
    CLogControlLock 1507388036453



    PostgreSQL 8.3 (64-bit):

    -bash-3.00# ./83_lwlock_wait.d

    Lock Id Mode Count
    SInvalLock Exclusive 1
    WALWriteLock Exclusive 1
    SInvalLock Shared 20
    CLogControlLock Shared 534
    CLogControlLock Exclusive 845
    XidGenLock Exclusive 1140
    ProcArrayLock Shared 1821
    WALInsertLock Exclusive 17043
    ProcArrayLock Exclusive 49762

    Lock Id Mode Combined Time (ns)
    SInvalLock Exclusive 17216
    SInvalLock Shared 531129
    WALWriteLock Exclusive 2003064
    CLogControlLock Shared 61325819
    CLogControlLock Exclusive 73509195
    XidGenLock Exclusive 929477929
    WALInsertLock Exclusive 17941476941
    ProcArrayLock Shared 31114676303
    ProcArrayLock Exclusive 888356047549



    Regards,
    Staale Smedseng
    Database Technology Group, Sun Microsystems
  • Simon Riggs at Feb 6, 2008 at 2:48 pm

    On Wed, 2008-02-06 at 15:30 +0100, Staale Smedseng wrote:
    On Mon, 2008-02-04 at 19:46, Simon Riggs wrote:
    We've got various facilities now for looking at LWLock waits, but I'd
    like to have more information about the *reasons* for lock waits.

    I know its possible to get backtraces in Dtrace at various tracepoints
    but that can be fairly hard to interpret.
    Thanks for this Staale.
    Lock Id Mode Count
    ProcArrayLock Shared 1821
    ProcArrayLock Exclusive 49762

    Lock Id Mode Combined Time (ns)
    ProcArrayLock Shared 31114676303
    ProcArrayLock Exclusive 888356047549
    Which looks like a mean service time of 17ms for X lock waiters.

    What is the frequency distribution of lock wait time on ProcArrayLock?
    Does the distribution vary over time?

    --
    Simon Riggs
    2ndQuadrant http://www.2ndQuadrant.com
  • Staale Smedseng at Feb 6, 2008 at 5:45 pm
    What is the frequency distribution of lock wait time on ProcArrayLock?
    See below for wait time distributions for ProcArrayLock (both shared and
    exclusive). The time measured is from entry into LWLockAcquire() until
    return. I've recorded the same data in two different resolutions (ms,
    and us for the lower part of the distribution). The DTrace script is at
    the bottom.

    These results are for 1000 TPC-C like clients, and measured over the
    1000 PostgreSQL processes over a period of 10 seconds.
    Does the distribution vary over time?
    Hmm... I will have to get back to you on that one.

    Staale

    CPU ID FUNCTION:NAME
    6 71245 :tick-10sec

    Total LW_EXCLUSIVE 25178
    Total Transaction Starts 25679
    Total LW_SHARED 107211
    LW_SHARED [ms]
    value ------------- Distribution ------------- count
    < 0 | 0
    0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 100565
    10 | 1
    20 | 0
    30 | 0
    40 | 0
    50 | 0
    60 | 0
    70 | 6
    80 | 304
    90 |@ 1370
    100 |@ 2685
    110 |@ 1731
    120 | 307
    130 | 13
    140 | 0
    150 | 0
    160 | 5
    170 | 4
    180 | 26
    190 | 36
    200 | 24
    210 | 61
    220 | 49
    230 | 15
    240 | 0
    250 | 0
    260 | 0
    270 | 0
    280 | 2
    290 | 2
    300 | 0
    310 | 2
    320 | 1
    330 | 1
    340 | 0
    350 | 0
    360 | 0
    370 | 0
    380 | 0
    390 | 0
    400 | 1
    410 | 0

    LW_EXCLUSIVE [ms]
    value ------------- Distribution ------------- count
    < 0 | 0
    0 |@@ 1565
    10 | 0
    20 | 0
    30 | 0
    40 | 0
    50 | 0
    60 | 0
    70 | 16
    80 |@ 894
    90 |@@@@@@@ 4108
    100 |@@@@@@@@@@@@@ 8090
    110 |@@@@@@@@ 4863
    120 |@ 595
    130 | 64
    140 | 14
    150 | 0
    160 | 4
    170 | 93
    180 | 262
    190 |@ 549
    200 |@ 854
    210 |@@ 976
    220 |@ 770
    230 | 268
    240 | 40
    250 | 2
    260 | 9
    270 | 29
    280 | 46
    290 | 94
    300 | 134
    310 | 169
    320 | 164
    330 | 146
    340 | 94
    350 | 13
    360 | 2
    370 | 8
    380 | 15
    390 | 17
    400 | 20
    410 | 37
    420 | 27
    430 | 32
    440 | 29
    450 | 8
    460 | 4
    470 | 2
    480 | 1
    490 | 3
    = 500 | 48
    LW_SHARED [us]
    value ------------- Distribution ------------- count
    0 | 0
    10 | 39
    20 |@@@@@@@@@@@@ 31038
    30 |@@@@@@@@@@@@@@@@@@@@@@@@@ 66010
    40 |@ 2409
    50 | 103
    60 | 145
    70 | 85
    80 | 68
    90 | 57
    100 | 40
    110 | 33
    120 | 37
    130 | 24
    140 | 15
    150 | 35
    160 | 26
    170 | 35
    180 | 24
    190 | 16
    200 | 12
    210 | 13
    220 | 13
    230 | 8
    240 | 9
    250 | 3
    260 | 4
    270 | 3
    280 | 2
    290 | 4
    300 | 3
    310 | 3
    320 | 3
    330 | 2
    340 | 0
    350 | 1
    360 | 2
    370 | 0
    380 | 0
    390 | 2
    400 | 1
    410 | 2
    420 | 0
    430 | 0
    440 | 1
    450 | 0
    460 | 5
    470 | 1
    480 | 1
    490 | 4
    500 | 6
    510 | 1
    520 | 1
    530 | 0
    540 | 1
    550 | 1
    560 | 2
    570 | 0
    580 | 0
    590 | 0
    600 | 0
    610 | 1
    620 | 0
    630 | 0
    640 | 0
    650 | 0
    660 | 0
    670 | 1
    680 | 0
    690 | 2
    700 | 1
    710 | 1
    720 | 3
    730 | 0
    740 | 1
    750 | 0
    760 | 0
    770 | 0
    780 | 0
    790 | 0
    800 | 0
    810 | 0
    820 | 0
    830 | 0
    840 | 0
    850 | 0
    860 | 1
    870 | 0
    880 | 0
    890 | 0
    900 | 0
    910 | 0
    920 | 1
    930 | 0
    940 | 0
    950 | 2
    960 | 0
    970 | 0
    980 | 1
    990 | 0
    = 1000 |@@@ 6843
    LW_EXCLUSIVE [us]
    value ------------- Distribution ------------- count
    10 | 0
    20 |@ 357
    30 |@@ 1166
    40 | 35
    50 | 1
    60 | 1
    70 | 1
    80 | 1
    90 | 0
    100 | 0
    110 | 0
    120 | 0
    130 | 1
    140 | 0
    150 | 0
    160 | 0
    170 | 0
    180 | 0
    190 | 0
    200 | 0
    210 | 0
    220 | 0
    230 | 0
    240 | 0
    250 | 0
    260 | 0
    270 | 0
    280 | 0
    290 | 0
    300 | 0
    310 | 0
    320 | 0
    330 | 0
    340 | 0
    350 | 0
    360 | 0
    370 | 0
    380 | 0
    390 | 0
    400 | 0
    410 | 0
    420 | 0
    430 | 0
    440 | 0
    450 | 0
    460 | 0
    470 | 0
    480 | 0
    490 | 0
    500 | 0
    510 | 0
    520 | 0
    530 | 0
    540 | 0
    550 | 0
    560 | 0
    570 | 0
    580 | 0
    590 | 0
    600 | 0
    610 | 0
    620 | 0
    630 | 0
    640 | 0
    650 | 0
    660 | 0
    670 | 0
    680 | 0
    690 | 0
    700 | 0
    710 | 0
    720 | 0
    730 | 0
    740 | 0
    750 | 0
    760 | 0
    770 | 0
    780 | 0
    790 | 0
    800 | 0
    810 | 0
    820 | 0
    830 | 0
    840 | 0
    850 | 0
    860 | 0
    870 | 0
    880 | 0
    890 | 0
    900 | 0
    910 | 0
    920 | 0
    930 | 0
    940 | 0
    950 | 0
    960 | 0
    970 | 0
    980 | 0
    990 | 0
    = 1000 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 23615

    --------------------------------------------------------------

    #!/usr/sbin/dtrace -s

    /*
    * Show ProcArrayLock wait time
    *
    * $Id$
    */

    BEGIN
    {
    self->ts = 0;
    }

    tick-10sec
    {
    exit(0);
    }


    /*
    * Accumulate average time spent from start of LWLockAcquire()
    * until return from LWLockAcquire().
    */
    postgresql$1:::lwlock-acquire-begin
    / arg0 == 4 /
    {
    self->ts = timestamp;
    self->mode = arg1;
    }

    postgresql$1:::lwlock-acquire-end
    / self->ts /
    {
    @acqcount[self->mode==0?"Total LW_EXCLUSIVE":"Total LW_SHARED"]
    = count();

    /* whole frequency distribution 0-500ms */
    @wholedist[self->mode==0?"LW_EXCLUSIVE [ms]":"LW_SHARED [ms]"]
    = lquantize((timestamp - self->ts)/1000000, 0, 500, 10);

    /* the lower part of the distribution in detail 0-1000us */
    @lowdist[self->mode==0?"LW_EXCLUSIVE [us]":"LW_SHARED [us]"]
    = lquantize((timestamp - self->ts)/1000, 0, 1000, 10);

    self->ts = 0;
    }

    postgresql$1:::transaction-start
    {
    @acqcount["Total Transaction Starts"] = count();
    }
  • Simon Riggs at Feb 6, 2008 at 6:02 pm

    On Wed, 2008-02-06 at 18:44 +0100, Staale Smedseng wrote:
    What is the frequency distribution of lock wait time on ProcArrayLock?
    See below for wait time distributions for ProcArrayLock (both shared and
    exclusive). The time measured is from entry into LWLockAcquire() until
    return. I've recorded the same data in two different resolutions (ms,
    and us for the lower part of the distribution). The DTrace script is at
    the bottom.

    These results are for 1000 TPC-C like clients, and measured over the
    1000 PostgreSQL processes over a period of 10 seconds.
    Thanks! Interesting resonance pattern.

    --
    Simon Riggs
    2ndQuadrant http://www.2ndQuadrant.com
  • Gregory Stark at Feb 6, 2008 at 2:56 pm

    "Staale Smedseng" <Staale.Smedseng@Sun.COM> writes:

    The stack trace shows that the only time the lock is acquired
    exclusively is from the call to ProcArrayEndTransaction() in
    CommitTransaction().
    I'm not sure but I think that's only true in 8.3. As I understood it in 8.2
    transaction start also needed the exclusive lock.
    Also, an interesting observation is that the hot locks seem to have
    changed from v8.2 to v8.3, making the ProcArrayLock more contended. See
    the following outputs:

    PostgreSQL 8.2 (32-bit):
    ...
    PostgreSQL 8.3 (64-bit):
    ...
    I'm not sure 32-bit and 64-bit cases are going to be directly comparable. We
    could have a problem with cache line aliasing on only one or the other for
    example.

    But that is a pretty striking difference. Does the 8.3 run complete more
    transactions in that time?

    --
    Gregory Stark
    EnterpriseDB http://www.enterprisedb.com
    Ask me about EnterpriseDB's 24x7 Postgres support!
  • Staale Smedseng at Feb 6, 2008 at 6:35 pm

    I'm not sure 32-bit and 64-bit cases are going to be directly
    comparable. We
    could have a problem with cache line aliasing on only one or the
    other for
    example.
    Agreed, this is likely comparing apples and oranges. I'll see if I can
    get a one-to-one comparison done (these were the numbers I had close
    by when writing the email).
    But that is a pretty striking difference. Does the 8.3 run complete
    more
    transactions in that time?
    I'll make sure to include those numbers as well. :)

    Staale
  • Tom Lane at Feb 6, 2008 at 6:56 pm

    Gregory Stark writes:
    "Staale Smedseng" <Staale.Smedseng@Sun.COM> writes:
    Also, an interesting observation is that the hot locks seem to have
    changed from v8.2 to v8.3, making the ProcArrayLock more contended. See
    the following outputs:

    PostgreSQL 8.2 (32-bit):
    ...
    PostgreSQL 8.3 (64-bit):
    ...
    I'm not sure 32-bit and 64-bit cases are going to be directly comparable. We
    could have a problem with cache line aliasing on only one or the other for
    example.
    Yeah, I find these numbers highly dubious. AFAIR we didn't do anything
    that would have reduced CLogControlLock contention, and we definitely
    did work to reduce ProcArrayLock contention, so the claimed results seem
    directly opposite to expectation. I am wondering if the waits are being
    attributed to the right locks --- I remember such an error in a previous
    set of dtrace results, and some of the other details such as claiming
    shared lock delays but no exclusive lock delays for FirstLockMgrLock
    seem less than credible as well.

    regards, tom lane
  • Simon Riggs at Feb 6, 2008 at 7:20 pm

    On Wed, 2008-02-06 at 13:55 -0500, Tom Lane wrote:
    Gregory Stark <stark@enterprisedb.com> writes:
    "Staale Smedseng" <Staale.Smedseng@Sun.COM> writes:
    Also, an interesting observation is that the hot locks seem to have
    changed from v8.2 to v8.3, making the ProcArrayLock more contended. See
    the following outputs:

    PostgreSQL 8.2 (32-bit):
    ...
    PostgreSQL 8.3 (64-bit):
    ...
    I'm not sure 32-bit and 64-bit cases are going to be directly comparable. We
    could have a problem with cache line aliasing on only one or the other for
    example.
    Yeah, I find these numbers highly dubious. AFAIR we didn't do anything
    that would have reduced CLogControlLock contention, and we definitely
    did work to reduce ProcArrayLock contention, so the claimed results seem
    directly opposite to expectation. I am wondering if the waits are being
    attributed to the right locks --- I remember such an error in a previous
    set of dtrace results, and some of the other details such as claiming
    shared lock delays but no exclusive lock delays for FirstLockMgrLock
    seem less than credible as well.
    There were only 2 lock delays for FirstLockMgrLock in SHARED mode, so it
    seems believable that there were 0 lock delays in EXCLUSIVE mode.

    I assumed that Staale is running with clog buffers tweaked? Can you
    confirm Staale?

    --
    Simon Riggs
    2ndQuadrant http://www.2ndQuadrant.com
  • Tom Lane at Feb 6, 2008 at 7:42 pm

    Simon Riggs writes:
    There were only 2 lock delays for FirstLockMgrLock in SHARED mode, so it
    seems believable that there were 0 lock delays in EXCLUSIVE mode.
    Not really, considering the extremely limited use of LW_SHARED in lock.c
    (GetLockConflicts is used only by CREATE INDEX CONCURRENTLY, and
    GetLockStatusData only by the pg_locks view). For the type of benchmark
    that I gather this is, there should be *zero* LW_SHARED acquisitions at
    all. And even if there are some, they could only be blocking against
    the (undoubtedly much more frequent) LW_EXCLUSIVE acquisitions; it's not
    very credible that there is zero contention among the LW_EXCLUSIVE locks
    yet a few shared acquirers manage to get burnt.

    regards, tom lane
  • Simon Riggs at Feb 6, 2008 at 7:49 pm

    On Wed, 2008-02-06 at 14:42 -0500, Tom Lane wrote:
    Simon Riggs <simon@2ndquadrant.com> writes:
    There were only 2 lock delays for FirstLockMgrLock in SHARED mode, so it
    seems believable that there were 0 lock delays in EXCLUSIVE mode.
    Not really, considering the extremely limited use of LW_SHARED in lock.c
    (GetLockConflicts is used only by CREATE INDEX CONCURRENTLY, and
    GetLockStatusData only by the pg_locks view). For the type of benchmark
    that I gather this is, there should be *zero* LW_SHARED acquisitions at
    all. And even if there are some, they could only be blocking against
    the (undoubtedly much more frequent) LW_EXCLUSIVE acquisitions; it's not
    very credible that there is zero contention among the LW_EXCLUSIVE locks
    yet a few shared acquirers manage to get burnt.
    ...but the total wait time on those lock waits was 24 microseconds. I
    hardly call that burnt.

    --
    Simon Riggs
    2ndQuadrant http://www.2ndQuadrant.com
  • Tom Lane at Feb 6, 2008 at 8:01 pm

    Simon Riggs writes:
    On Wed, 2008-02-06 at 14:42 -0500, Tom Lane wrote:
    Not really, considering the extremely limited use of LW_SHARED in lock.c
    (GetLockConflicts is used only by CREATE INDEX CONCURRENTLY, and
    GetLockStatusData only by the pg_locks view). For the type of benchmark
    that I gather this is, there should be *zero* LW_SHARED acquisitions at
    all. And even if there are some, they could only be blocking against
    the (undoubtedly much more frequent) LW_EXCLUSIVE acquisitions; it's not
    very credible that there is zero contention among the LW_EXCLUSIVE locks
    yet a few shared acquirers manage to get burnt.
    ...but the total wait time on those lock waits was 24 microseconds. I
    hardly call that burnt.
    What you are failing to grasp is that the data is simply not credible
    (unless perhaps Staale fesses up that his benchmark includes a whole lot
    of pg_locks monitoring, in which case I'd want to see it redone without
    anyway).

    regards, tom lane
  • Staale Smedseng at Feb 7, 2008 at 3:30 pm

    On Wed, 2008-02-06 at 19:55, Tom Lane wrote:
    I am wondering if the waits are being
    attributed to the right locks --- I remember such an error in a previous
    set of dtrace results, and some of the other details such as claiming
    shared lock delays but no exclusive lock delays for FirstLockMgrLock
    seem less than credible as well.
    Good catch. We've checked the DTrace scripts against the respective
    versions of lwlock.h, and the FirstLockMgrLock is off (this is actually
    the results for FirstBufMappingLock).

    However, this is the last lock in the enum that we trace, the other
    lower-numbered lock enums were correctly mapped. (In particular the
    ProcArrayLock which we've been puzzled by.)

    And also, there was no clog buffer twaking or pg_lock monitoring done
    when this benchmark was run, AFAIK.

    We'll redo the benchmarks and post new scripts and results.

    Staale
  • Tom Lane at Feb 7, 2008 at 3:38 pm

    Staale Smedseng writes:
    Good catch. We've checked the DTrace scripts against the respective
    versions of lwlock.h, and the FirstLockMgrLock is off (this is actually
    the results for FirstBufMappingLock).
    However, this is the last lock in the enum that we trace, the other
    lower-numbered lock enums were correctly mapped. (In particular the
    ProcArrayLock which we've been puzzled by.)
    Hmm, do you mean that you're just neglecting to collect any stats about
    all the dynamically assigned locks? That doesn't seem like it's going
    to offer a very good view of what's going on. I think the most useful
    stats would include aggregate wait data for all the lockmgr locks,
    all the bufmapping locks, all the buffer content locks, and all the
    buffer I/O locks. We'd like to think that contention for those would
    be low because of the partitioning ... but the point of measuring is
    to find out, not just hope.

    regards, tom lane
  • Simon Riggs at Feb 7, 2008 at 5:12 pm

    On Thu, 2008-02-07 at 16:29 +0100, Staale Smedseng wrote:
    On Wed, 2008-02-06 at 19:55, Tom Lane wrote:
    I am wondering if the waits are being
    attributed to the right locks --- I remember such an error in a previous
    set of dtrace results, and some of the other details such as claiming
    shared lock delays but no exclusive lock delays for FirstLockMgrLock
    seem less than credible as well.
    Good catch. We've checked the DTrace scripts against the respective
    versions of lwlock.h, and the FirstLockMgrLock is off (this is actually
    the results for FirstBufMappingLock).
    I just realised you are using a lookup to get the text for the name of
    the lock. You used the same lookup table for both releases?

    --
    Simon Riggs
    2ndQuadrant http://www.2ndQuadrant.com
  • Jignesh K. Shah at Feb 7, 2008 at 5:57 pm
    I dont think my earlier message got through..

    We use separate lookup tables for 825 and 83 based on the respective
    lwlock.h for that version.


    -Jignesh


    Simon Riggs wrote:
    On Thu, 2008-02-07 at 16:29 +0100, Staale Smedseng wrote:
    On Wed, 2008-02-06 at 19:55, Tom Lane wrote:

    I am wondering if the waits are being
    attributed to the right locks --- I remember such an error in a previous
    set of dtrace results, and some of the other details such as claiming
    shared lock delays but no exclusive lock delays for FirstLockMgrLock
    seem less than credible as well.
    Good catch. We've checked the DTrace scripts against the respective
    versions of lwlock.h, and the FirstLockMgrLock is off (this is actually
    the results for FirstBufMappingLock).
    I just realised you are using a lookup to get the text for the name of
    the lock. You used the same lookup table for both releases?
  • Staale Smedseng at Feb 7, 2008 at 6:07 pm

    On Thu, 2008-02-07 at 18:12, Simon Riggs wrote:

    I just realised you are using a lookup to get the text for the name of
    the lock. You used the same lookup table for both releases?
    Oh, it wasn't quite that bad. :-) The two DTrace scripts had been
    revised to correspond with the two different declarations of LWLockId in
    lwlock.h (for 8.2.5 and 8.3 respectively). But somehow the value for the
    last lock in our lookup table corresponding to

    FirstLockMgrLock = FirstBufMappingLock + NUM_BUFFER_PARTITIONS,

    in the enum got turned into n+1 instead of n+NUM_BUFFER_PARTITIONS. The
    other locks should have been output correctly, however.

    But as Tom pointed out, the dynamic locks were not in the equation. So
    now we're measuring all lock waits instead of assuming. :-)

    Staale
  • Jignesh K. Shah at Feb 7, 2008 at 9:21 pm
    Last try for the script/results (truncating less significant portions of
    output which are too big)



    Staale Smedseng wrote:
    her locks should have been output correctly, however.

    But as Tom pointed out, the dynamic locks were not in the equation. So
    now we're measuring all lock waits instead of assuming. :-)

    Staale


    Here are some recent runs on 4-quadcore xeons using PostgreSQL 8.3.0
    with 10-second runs

    Hope it gets through this time

    -Jignesh

    for about 500users :
    Lock Id Mode Count
    64 Shared 754
    53 Exclusive 846
    43 Exclusive 1379
    11 Shared 2038
    3 Exclusive 2376
    47 Exclusive 2900
    4 Shared 3194
    42 Exclusive 3389
    50 Exclusive 3689
    41 Exclusive 5280
    11 Exclusive 6560
    48 Exclusive 10837
    7 Exclusive 20599
    44 Exclusive 29193
    46 Exclusive 30299
    4 Exclusive 66303

    Lock Id Mode Combined Time (ns)
    43 Exclusive 3440644758
    47 Exclusive 6392480847
    50 Exclusive 7199215298
    42 Exclusive 7865126413
    11 Exclusive 8503940437
    4 Shared 9142071412
    61 Shared 10613733376
    41 Exclusive 14380504651
    48 Exclusive 16631996177
    7 Exclusive 17485716438
    46 Exclusive 61584135053
    44 Exclusive 68889040292
    4 Exclusive 177458486936



    For about 700 Users:


    Lock Id Mode Count
    3 Exclusive 2873
    4 Shared 3436
    47 Exclusive 3555
    11 Shared 4098
    50 Exclusive 4392
    42 Exclusive 4573
    11 Exclusive 6209
    41 Exclusive 7552
    48 Exclusive 12335
    7 Exclusive 22469
    44 Exclusive 36987
    46 Exclusive 38467
    4 Exclusive 81453

    Lock Id Mode Combined Time (ns)
    7 Exclusive 12033761450
    43 Exclusive 12217953092
    50 Exclusive 15454797539
    47 Exclusive 15684068953
    42 Exclusive 19365161311
    4 Shared 27038955376
    48 Exclusive 32990318986
    41 Exclusive 48350689916
    46 Exclusive 193727946721
    44 Exclusive 212025745038
    4 Exclusive 713263386624

    At 1000 users


    Lock Id Mode Count
    59 Shared 10475
    58 Shared 11062
    57 Shared 13726
    7 Exclusive 18548
    44 Exclusive 29714
    46 Exclusive 33886
    4 Exclusive 74773
    11 Exclusive 79596

    Lock Id Mode Combined Time (ns)
    43 Exclusive 9067359926
    47 Exclusive 12259067669
    50 Exclusive 13239372833
    42 Exclusive 16534292830
    7 Exclusive 23505490039
    48 Exclusive 24991948402
    41 Exclusive 33874749560
    11 Shared 43963854482
    4 Shared 64098606143
    63 Shared 130988696365
    64 Shared 137865936811
    61 Shared 140978086498
    59 Shared 161661023016
    62 Shared 163857754020
    58 Shared 167946299990
    44 Exclusive 220719773416
    57 Shared 245170386594
    46 Exclusive 326272389569
    11 Exclusive 1312009781095
    4 Exclusive 2403450038504



    4 is ProcArray..
    ClogControlLock is not completely out but I dont see it consistently
    like ProcArrayLock. It would show up in few cases if I continue taking
    different snapshots of the output (maybe when Checkpoint kicks in or
    something , I dont know..)


    Lock Id Mode Count
    3 Exclusive 160
    7 Exclusive 396
    11 Shared 828
    44 Exclusive 991
    46 Exclusive 1093
    4 Exclusive 1550
    62 Shared 20332
    57 Shared 21244
    64 Shared 21492
    63 Shared 21791
    61 Shared 22132
    58 Shared 23747
    60 Shared 26230
    11 Exclusive 158168

    Lock Id Mode Combined Time (ns)
    64864 Shared 3743818402
    10830 Shared 3835685469
    89688 Shared 4164484448
    84354 Shared 4268747714
    870 Shared 4376161411
    17078 Shared 4712788740
    11 Shared 17718901870
    62 Shared 331016914134
    57 Shared 340929880655
    64 Shared 355793689545
    63 Shared 362464073193
    61 Shared 362561852786
    58 Shared 388597307969
    60 Shared 430378269549
    11 Exclusive 2641836280591



    Now the script


    # cat 83_all_wait.d
    #!/usr/sbin/dtrace -qs

    /*
    * Usage : 83_all_wait.d
    * Description: Show number and total time of LWLock waits by lock id and
    mode
    * Author : Jignesh Shah
    */

    dtrace:::BEGIN
    {
    lckmode[0] = "Exclusive";
    lckmode[1] = "Shared";
    /* Only for reference not used... Refer to src/include/storage/lwlock.h */
    lckid[0] = "BufFreelistLock";
    lckid[1] = "ShmemIndexLock";
    lckid[2] = "OidGenLock";
    lckid[3] = "XidGenLock";
    lckid[4] = "ProcArrayLock";
    lckid[5] = "SInvalLock";
    lckid[6] = "FreeSpaceLock";
    lckid[7] = "WALInsertLock";
    lckid[8] = "WALWriteLock";
    lckid[9] = "ControlFileLock";
    lckid[10] = "CheckpointLock";
    lckid[11] = "CLogControlLock";
    lckid[12] = "SubtransControlLock";
    lckid[13] = "MultiXactGenLock";
    lckid[14] = "MultiXactOffsetControlLock";
    lckid[15] = "MultiXactMemberControlLock";
    lckid[16] = "RelCacheInitLock";
    lckid[17] = "BgWriterCommLock";
    lckid[18] = "TwoPhaseStateLock";
    lckid[19] = "TablespaceCreateLock";
    lckid[20] = "BtreeVacuumLock";
    lckid[21] = "AddinShmemInitLock";
    lckid[22] = "AutovacuumLock";
    lckid[23] = "AutovacuumScheduleLock";
    lckid[24] = "SyncScanLock";
    lckid[25] = "FirstBufMappingLock";
    lckid[26] = "FirstLockMgrLock";

    }

    postgresql*:::lwlock-startwait
    {
    self->ts[arg0]=timestamp;
    @count[arg0, lckmode[arg1]] = count();
    }

    postgresql*:::lwlock-endwait
    /self->ts[arg0]/
    {
    @time[arg0 ,lckmode[arg1]] = sum (timestamp - self->ts[arg0]);
    self->ts[arg0]=0;
    }

    dtrace:::END {
    printf("\n%20s %15s %15s\n", "Lock Id", "Mode", "Count");
    printa("%20d %15s %@15d\n",@count);

    printf("\n%20s %15s %20s\n", "Lock Id","Mode", "Combined Time (ns)");
    printa("%20d %15s %@20d\n",@time);
    }

    tick-10sec
    {
    exit(0);
    }
  • Gregory Stark at Feb 7, 2008 at 10:56 pm

    "Jignesh K. Shah" <J.K.Shah@Sun.COM> writes:

    for about 500users :
    For about 700 Users:
    At 1000 users
    This is a tangent but are these actual Postgres processes? What's the logic
    behind trying to run a 1,000 processes on a box with 16 cpus? They're all just
    going to be queuing up for i/o requests or sitting runnable waiting for a
    timeslice.

    Was this with your patch to raise the size of the clog lru?

    What is MaxBackends actually set to for the runs. Is it the same even when
    you're actually running with fewer backends?


    --
    Gregory Stark
    EnterpriseDB http://www.enterprisedb.com
    Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!
  • Tom Lane at Feb 7, 2008 at 11:23 pm

    Gregory Stark writes:
    This is a tangent but are these actual Postgres processes? What's the logic
    behind trying to run a 1,000 processes on a box with 16 cpus?
    We should certainly be careful about trying to eliminate contention in
    this scenario at the cost of making things slower in more normal cases,
    but it seems interesting to stress the system just to see what happens.
    Was this with your patch to raise the size of the clog lru?
    That's an important question.
    What is MaxBackends actually set to for the runs.
    That I think is not. I'm fairly sure there are no performance-relevant
    paths in which cost is driven by MaxBackends rather than the actual
    current number of live backends. Certainly nothing in or around the
    ProcArray would act that way.

    regards, tom lane
  • Jignesh K. Shah at Feb 8, 2008 at 3:15 am

    Tom Lane wrote:
    Gregory Stark <stark@enterprisedb.com> writes:
    This is a tangent but are these actual Postgres processes? What's the logic
    behind trying to run a 1,000 processes on a box with 16 cpus?
    We should certainly be careful about trying to eliminate contention in
    this scenario at the cost of making things slower in more normal cases,
    but it seems interesting to stress the system just to see what happens.

    Was this with your patch to raise the size of the clog lru?
    That's an important question.

    What is MaxBackends actually set to for the runs.
    That I think is not. I'm fairly sure there are no performance-relevant
    paths in which cost is driven by MaxBackends rather than the actual
    current number of live backends. Certainly nothing in or around the
    ProcArray would act that way.


    regards, tom lane

    I guess I was not clear.. It was PostgreSQL 8.3.0 (with no source code
    change)
    I had compiled it 64-bit with DTRACE enabled.
    max-backend was set to 1500 But I dont think that causes any thing to
    work slow. But yes the connections are "pre-opened" in the sense when
    500 users are actively doing work there are about 1006 postgresql
    processes running.

    Yes I think I am taking the database to the extreme. But generally there
    is some THINK time of 50ms involved so there are time slices available
    for other users. Yes Commercial DB can also do pretty well on such
    systems so its not unrealistic to expect that PostgreSQL cannot perform
    here.

    The old idea of stress testing it is to prove that it can go beyond
    these 16cores infact our target is about 64-cores soon.

    Regards,
    Jignesh

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedFeb 4, '08 at 6:46p
activeFeb 8, '08 at 3:15a
posts26
users6
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase