FAQ
I've long harbored a suspicion, based on some testing I did on my home
machine, that WALInsertLock is a big performance bottleneck. But I
just did some benchmarking that doesn't entirely support that
contention. This is on Nate Boley's 32-core machine, with the
following settings:

max_connections = 100
shared_buffers = 8GB
synchronous_commit = off
checkpoint_segments = 100
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9

I did 5-minute pgbench runs with unlogged tables and with permanent
tables, restarting the database server and reinitializing the tables
between each run. The number at the beginning of each line is the
number of clients, while the p/u indicates which type of tables were
used:

01p tps = 628.626815 (including connections establishing)
01p tps = 635.394288 (including connections establishing)
01p tps = 634.972789 (including connections establishing)
08p tps = 3342.787325 (including connections establishing)
08p tps = 3883.876813 (including connections establishing)
08p tps = 3941.253567 (including connections establishing)
32p tps = 5597.472192 (including connections establishing)
32p tps = 5738.139895 (including connections establishing)
32p tps = 5794.490934 (including connections establishing)
80p tps = 4499.685286 (including connections establishing)
80p tps = 4917.060441 (including connections establishing)
80p tps = 5050.931933 (including connections establishing)

01u tps = 672.469142 (including connections establishing)
01u tps = 671.256686 (including connections establishing)
01u tps = 670.421003 (including connections establishing)
08u tps = 4087.749529 (including connections establishing)
08u tps = 3797.750851 (including connections establishing)
08u tps = 4181.393560 (including connections establishing)
32u tps = 8956.346905 (including connections establishing)
32u tps = 8898.442517 (including connections establishing)
32u tps = 8971.591569 (including connections establishing)
80u tps = 7287.550952 (including connections establishing)
80u tps = 7266.816989 (including connections establishing)
80u tps = 7255.968109 (including connections establishing)

The speed-up from using unlogged tables was not as large as I
expected. Turning off synchronous_commit here removes commit rate as
the bottleneck, and I think perhaps the main benefit of unlogged
tables in that case is the avoidance of I/O, and apparently this
machine has enough I/O bandwidth, and just enough memory in general,
that that's not an issue.

With either type of tables, the 8 client results are about 6.1 times
the single core results - not great, but not terrible, either. With
32 clients, there is some improvement: 13.4x vs. 9.1x, but even 13.4x
is a long way from linear. vmstat reveals that CPU usage is
substantially less than 100%. After some investigation, I found that
using unlogged tables wasn't actually getting rid of all the
write-ahead logging - the commit records were still being issued. So
I hacked up RecordTransactionCommit() not to emit transaction commit
records in that case. That doesn't actually completely eliminate the
WAL activity, because it still emits records for zeroing new SLRU
pages for CLOG, but it takes a big chunk out of it. The upshot is
that this improved both raw performance and scalability, but not
dramatically. Unlogged table results, with this change:

01h tps = 708.189337 (including connections establishing)
01h tps = 704.030175 (including connections establishing)
01h tps = 701.644199 (including connections establishing)
08h tps = 5196.615955 (including connections establishing)
08h tps = 5126.162200 (including connections establishing)
08h tps = 5067.568727 (including connections establishing)
32h tps = 10661.275636 (including connections establishing)
32h tps = 10621.085226 (including connections establishing)
32h tps = 10575.267197 (including connections establishing)
80h tps = 7557.965666 (including connections establishing)
80h tps = 7545.697547 (including connections establishing)
80h tps = 7556.379921 (including connections establishing)

Now the 32-client numbers have risen to 15.1x the single-client
numbers, but that's still not great.

What does this mean? Well, XLogInsert does strikes me as an awfully
complex piece of code to be running with a hot LWLock held in
exclusive mode. But even so, I think this certainly means that
WALInsertLock, at least on this workload, is not the whole problem...
in this test, I'm not only eliminating the overhead of inserting the
WAL, but also the overhead of writing it, flushing it, and generating
it. So there is something, other than WAL insertion, which is taking
a big bite out of performance here.

As to what that something might be, I reran this last test with
LWLOCK_STATS enabled and here are the top things that are blocking:

lwlock 310: shacq 96846 exacq 108433 blk 16275
lwlock 3: shacq 64 exacq 2628381 blk 36027
lwlock 7: shacq 0 exacq 2628615 blk 85220
lwlock 11: shacq 84913908 exacq 4539551 blk 2119423
lwlock 4: shacq 28667307 exacq 2628524 blk 3356651

During this 5-minute test run, an LWLock acquisition blocked 6180335
times. As you can see from the above results, ProcArrayLock accounts
for 54% of that blocking, and CLogControlLock accounts for another
34%. lwlock 7 is WALInsertLock, which manages to account for more
than 1% of the blocking despite the fact that WAL has been largely
eliminated in this test; lwlock 3 is XidGenLock. I dunno what LWLock
310 is but it doesn't matter. After that there is a long tail of
other stuff with gradually decreasing amounts of blocking. Experience
with the read scalability stuff has taught me also to look at which
LWLocks have the most shared acquisitions, as that can cause spinlock
and cache line contention. The top few culprits are:

lwlock 504: shacq 5315030 exacq 0 blk 0
lwlock 45: shacq 5967317 exacq 13284 blk 1722
lwlock 39: shacq 8219988 exacq 13342 blk 2291
lwlock 5: shacq 26306020 exacq 0 blk 0
lwlock 4: shacq 28667307 exacq 2628524 blk 3356651
lwlock 11: shacq 84913908 exacq 4539551 blk 2119423

In all, there were 238777533 shared LWLock acquisitions during this
test: 35% CLogControlLock, 12% ProcArrayLock, 11% SInvalReadLock (soon
to be dealt with, as discussed elsewhere on-list), and then it gets
down into the lock manager locks and a few others.

I'm mulling over what to do about all of this, but thought I'd share
the raw numbers first, in case anyone's interested.

Thanks,

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Search Discussions

  • Merlin Moncure at Jul 25, 2011 at 9:20 pm

    On Mon, Jul 25, 2011 at 3:07 PM, Robert Haas wrote:
    I've long harbored a suspicion, based on some testing I did on my home
    machine, that WALInsertLock is a big performance bottleneck.  But I
    just did some benchmarking that doesn't entirely support that
    contention.  This is on Nate Boley's 32-core machine, with the
    following settings:

    max_connections = 100
    shared_buffers = 8GB
    synchronous_commit = off
    checkpoint_segments = 100
    checkpoint_timeout = 15min
    checkpoint_completion_target = 0.9

    I did 5-minute pgbench runs with unlogged tables and with permanent
    tables, restarting the database server and reinitializing the tables
    between each run.  The number at the beginning of each line is the
    number of clients, while the p/u indicates which type of tables were
    used:

    01p tps = 628.626815 (including connections establishing)
    01p tps = 635.394288 (including connections establishing)
    01p tps = 634.972789 (including connections establishing)
    08p tps = 3342.787325 (including connections establishing)
    08p tps = 3883.876813 (including connections establishing)
    08p tps = 3941.253567 (including connections establishing)
    32p tps = 5597.472192 (including connections establishing)
    32p tps = 5738.139895 (including connections establishing)
    32p tps = 5794.490934 (including connections establishing)
    80p tps = 4499.685286 (including connections establishing)
    80p tps = 4917.060441 (including connections establishing)
    80p tps = 5050.931933 (including connections establishing)

    01u tps = 672.469142 (including connections establishing)
    01u tps = 671.256686 (including connections establishing)
    01u tps = 670.421003 (including connections establishing)
    08u tps = 4087.749529 (including connections establishing)
    08u tps = 3797.750851 (including connections establishing)
    08u tps = 4181.393560 (including connections establishing)
    32u tps = 8956.346905 (including connections establishing)
    32u tps = 8898.442517 (including connections establishing)
    32u tps = 8971.591569 (including connections establishing)
    80u tps = 7287.550952 (including connections establishing)
    80u tps = 7266.816989 (including connections establishing)
    80u tps = 7255.968109 (including connections establishing)

    The speed-up from using unlogged tables was not as large as I
    expected.  Turning off synchronous_commit here removes commit rate as
    the bottleneck, and I think perhaps the main benefit of unlogged
    tables in that case is the avoidance of I/O, and apparently this
    machine has enough I/O bandwidth, and just enough memory in general,
    that that's not an issue.

    With either type of tables, the 8 client results are about 6.1 times
    the single core results - not great, but not terrible, either.  With
    32 clients, there is some improvement: 13.4x vs. 9.1x, but even 13.4x
    is a long way from linear.  vmstat reveals that CPU usage is
    substantially less than 100%.  After some investigation, I found that
    using unlogged tables wasn't actually getting rid of all the
    write-ahead logging - the commit records were still being issued.  So
    I hacked up RecordTransactionCommit() not to emit transaction commit
    records in that case.  That doesn't actually completely eliminate the
    WAL activity, because it still emits records for zeroing new SLRU
    pages for CLOG, but it takes a big chunk out of it.  The upshot is
    that this improved both raw performance and scalability, but not
    dramatically.  Unlogged table results, with this change:

    01h tps = 708.189337 (including connections establishing)
    01h tps = 704.030175 (including connections establishing)
    01h tps = 701.644199 (including connections establishing)
    08h tps = 5196.615955 (including connections establishing)
    08h tps = 5126.162200 (including connections establishing)
    08h tps = 5067.568727 (including connections establishing)
    32h tps = 10661.275636 (including connections establishing)
    32h tps = 10621.085226 (including connections establishing)
    32h tps = 10575.267197 (including connections establishing)
    80h tps = 7557.965666 (including connections establishing)
    80h tps = 7545.697547 (including connections establishing)
    80h tps = 7556.379921 (including connections establishing)

    Now the 32-client numbers have risen to 15.1x the single-client
    numbers, but that's still not great.

    What does this mean?  Well, XLogInsert does strikes me as an awfully
    complex piece of code to be running with a hot LWLock held in
    exclusive mode.  But even so, I think this certainly means that
    WALInsertLock, at least on this workload, is not the whole problem...
    in this test, I'm not only eliminating the overhead of inserting the
    WAL, but also the overhead of writing it, flushing it, and generating
    it.   So there is something, other than WAL insertion, which is taking
    a big bite out of performance here.

    As to what that something might be, I reran this last test with
    LWLOCK_STATS enabled and here are the top things that are blocking:

    lwlock 310: shacq 96846 exacq 108433 blk 16275
    lwlock 3: shacq 64 exacq 2628381 blk 36027
    lwlock 7: shacq 0 exacq 2628615 blk 85220
    lwlock 11: shacq 84913908 exacq 4539551 blk 2119423
    lwlock 4: shacq 28667307 exacq 2628524 blk 3356651

    During this 5-minute test run, an LWLock acquisition blocked 6180335
    times.  As you can see from the above results, ProcArrayLock accounts
    for 54% of that blocking, and CLogControlLock accounts for another
    34%.  lwlock 7 is WALInsertLock, which manages to account for more
    than 1% of the blocking despite the fact that WAL has been largely
    eliminated in this test; lwlock 3 is XidGenLock.  I dunno what LWLock
    310 is but it doesn't matter.  After that there is a long tail of
    other stuff with gradually decreasing amounts of blocking.  Experience
    with the read scalability stuff has taught me also to look at which
    LWLocks have the most shared acquisitions, as that can cause spinlock
    and cache line contention.  The top few culprits are:

    lwlock 504: shacq 5315030 exacq 0 blk 0
    lwlock 45: shacq 5967317 exacq 13284 blk 1722
    lwlock 39: shacq 8219988 exacq 13342 blk 2291
    lwlock 5: shacq 26306020 exacq 0 blk 0
    lwlock 4: shacq 28667307 exacq 2628524 blk 3356651
    lwlock 11: shacq 84913908 exacq 4539551 blk 2119423

    In all, there were 238777533 shared LWLock acquisitions during this
    test: 35% CLogControlLock, 12% ProcArrayLock, 11% SInvalReadLock (soon
    to be dealt with, as discussed elsewhere on-list), and then it gets
    down into the lock manager locks and a few others.
    hm, all the CLogControlLock acquisitions in clog.c appear to be
    exclusive...or did you mean shared in some other sense?
    TransactionIdGetStatus is taking an exclusive lock which is a red flag
    and a good optimization target, I think.

    merlin
  • Alvaro Herrera at Jul 25, 2011 at 10:06 pm

    Excerpts from Merlin Moncure's message of lun jul 25 17:19:58 -0400 2011:
    On Mon, Jul 25, 2011 at 3:07 PM, Robert Haas wrote:

    Experience
    with the read scalability stuff has taught me also to look at which
    LWLocks have the most shared acquisitions, as that can cause spinlock
    and cache line contention.  The top few culprits are:

    lwlock 504: shacq 5315030 exacq 0 blk 0
    lwlock 45: shacq 5967317 exacq 13284 blk 1722
    lwlock 39: shacq 8219988 exacq 13342 blk 2291
    lwlock 5: shacq 26306020 exacq 0 blk 0
    lwlock 4: shacq 28667307 exacq 2628524 blk 3356651
    lwlock 11: shacq 84913908 exacq 4539551 blk 2119423

    In all, there were 238777533 shared LWLock acquisitions during this
    test: 35% CLogControlLock, 12% ProcArrayLock, 11% SInvalReadLock (soon
    to be dealt with, as discussed elsewhere on-list), and then it gets
    down into the lock manager locks and a few others.
    hm, all the CLogControlLock acquisitions in clog.c appear to be
    exclusive...or did you mean shared in some other sense?
    SLRU control locks are also acquired indirectly by slru.c, see
    SimpleLruReadPage_ReadOnly.
    TransactionIdGetStatus is taking an exclusive lock which is a red flag
    and a good optimization target, I think.
    In fact, if the page that TransactionIdGetStatus is looking for is in
    the buffers, it'll only take a shared lock.

    --
    Álvaro Herrera <alvherre@commandprompt.com>
    The PostgreSQL Company - Command Prompt, Inc.
    PostgreSQL Replication, Consulting, Custom Development, 24x7 support
  • Robert Haas at Jul 26, 2011 at 1:41 am

    On Mon, Jul 25, 2011 at 4:07 PM, Robert Haas wrote:
    As to what that something might be, I reran this last test with
    LWLOCK_STATS enabled and here are the top things that are blocking:

    lwlock 310: shacq 96846 exacq 108433 blk 16275
    lwlock 3: shacq 64 exacq 2628381 blk 36027
    lwlock 7: shacq 0 exacq 2628615 blk 85220
    lwlock 11: shacq 84913908 exacq 4539551 blk 2119423
    lwlock 4: shacq 28667307 exacq 2628524 blk 3356651

    During this 5-minute test run, an LWLock acquisition blocked 6180335
    times.  As you can see from the above results, ProcArrayLock accounts
    for 54% of that blocking, and CLogControlLock accounts for another
    34%.  lwlock 7 is WALInsertLock, which manages to account for more
    than 1% of the blocking despite the fact that WAL has been largely
    eliminated in this test...
    I reran this test on master with permanent (not unlogged) tables, and
    got the following stats for blocking:

    lwlock 11: shacq 58376164 exacq 2122076 blk 1130974
    lwlock 4: shacq 15168924 exacq 1367303 blk 1555799
    lwlock 7: shacq 0 exacq 8440615 blk 1726896
    grand total: shacq 144563929 exacq 41618873 blk 4886107

    So that's 35% WALInsertLock, 32% ProcArrayLock, and 23%
    CLogControlLock. No other single lock accounted for more than 1% of
    the blocking. It's a good guess we're going to have to fix more than
    one thing to really make this zippy.

    Also note that CLogControlLock accounts for better than 40% of the
    shared-lock acquisitions across all LWLocks. The next-most-frequently
    share-locked LWLock is ProcArrayLock, with 10% of the total
    shared-lock acquisitions, followed by SInvalReadLock, at 9%.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Greg Smith at Jul 26, 2011 at 2:14 am

    On 07/25/2011 04:07 PM, Robert Haas wrote:
    I did 5-minute pgbench runs with unlogged tables and with permanent
    tables, restarting the database server and reinitializing the tables
    between each run.
    Database scale? One or multiple pgbench worker threads? A reminder on
    the amount of RAM in the server would be helpful for interpreting the
    results too.

    The other thing I'd recommend if you're running more write-heavy tests
    is to turn off autovacuum. Whether or not it kicks in depends on the
    test duration and the TPS rate, which adds a source of variability
    better avoided here. It also means that faster tests end up getting
    penalized by having it run near their end, which makes them no longer
    look like fast results.

    --
    Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
    PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
  • Robert Haas at Jul 26, 2011 at 1:07 pm

    On Mon, Jul 25, 2011 at 10:14 PM, Greg Smith wrote:
    On 07/25/2011 04:07 PM, Robert Haas wrote:

    I did 5-minute pgbench runs with unlogged tables and with permanent
    tables, restarting the database server and reinitializing the tables
    between each run.
    Database scale?  One or multiple pgbench worker threads?  A reminder on the
    amount of RAM in the server would be helpful for interpreting the results
    too.
    Ah, sorry. scale = 100, so small. pgbench invocation is:

    pgbench -T $time -c $clients -j $clients

    128GB RAM.
    The other thing I'd recommend if you're running more write-heavy tests is to
    turn off autovacuum.  Whether or not it kicks in depends on the test
    duration and the TPS rate, which adds a source of variability better avoided
    here. It also means that faster tests end up getting penalized by having it
    run near their end, which makes them no longer look like fast results.
    Well, I kind of don't want to do that, because surely you wouldn't run
    that way in production. Anyway, it seems clear that nearly all the
    contention is in CLogControlLock, ProcArrayLock, and WALInsertLock,
    and I don't think shutting off autovacuum is going to change that.
    The point of my benchmarking here may be a bit different from the
    benchmarking you normally do: I'm not trying to figure out whether
    I've got the best hardware, or how good it is, so I don't really care
    about squeezing out every last tps. I just need number that can be
    reproduced quickly with different patches to see whether a given code
    change helps and by how much. I wasn't sure that 5-minute pgbench
    runs would be good enough for that, but so far the results have been
    stable enough that I'm fairly confident I'm seeing real effects rather
    than random noise.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Pavan Deolasee at Jul 26, 2011 at 3:41 pm

    On Tue, Jul 26, 2011 at 9:07 AM, Robert Haas wrote:
    On Mon, Jul 25, 2011 at 10:14 PM, Greg Smith wrote:
    On 07/25/2011 04:07 PM, Robert Haas wrote:

    I did 5-minute pgbench runs with unlogged tables and with permanent
    tables, restarting the database server and reinitializing the tables
    between each run.
    Database scale?  One or multiple pgbench worker threads?  A reminder on the
    amount of RAM in the server would be helpful for interpreting the results
    too.
    Ah, sorry.  scale = 100, so small.  pgbench invocation is:
    It might be worthwhile to test only with the accounts and history
    table and also increasing the number of statements in a transaction.
    Otherwise the tiny tables can quickly become a bottleneck.

    Thanks,
    Pavan

    --
    Pavan Deolasee
    EnterpriseDB     http://www.enterprisedb.com
  • Robert Haas at Jul 26, 2011 at 4:24 pm

    On Tue, Jul 26, 2011 at 11:40 AM, Pavan Deolasee wrote:
    On Tue, Jul 26, 2011 at 9:07 AM, Robert Haas wrote:
    On Mon, Jul 25, 2011 at 10:14 PM, Greg Smith wrote:
    On 07/25/2011 04:07 PM, Robert Haas wrote:

    I did 5-minute pgbench runs with unlogged tables and with permanent
    tables, restarting the database server and reinitializing the tables
    between each run.
    Database scale?  One or multiple pgbench worker threads?  A reminder on the
    amount of RAM in the server would be helpful for interpreting the results
    too.
    Ah, sorry.  scale = 100, so small.  pgbench invocation is:
    It might be worthwhile to test only with the accounts and history
    table and also increasing the number of statements in a transaction.
    Otherwise the tiny tables can quickly become a bottleneck.
    What kind of bottleneck?

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Pavan Deolasee at Jul 26, 2011 at 4:29 pm

    On Tue, Jul 26, 2011 at 12:24 PM, Robert Haas wrote:
    On Tue, Jul 26, 2011 at 11:40 AM, Pavan Deolasee
    wrote:
    On Tue, Jul 26, 2011 at 9:07 AM, Robert Haas wrote:
    On Mon, Jul 25, 2011 at 10:14 PM, Greg Smith wrote:
    On 07/25/2011 04:07 PM, Robert Haas wrote:

    I did 5-minute pgbench runs with unlogged tables and with permanent
    tables, restarting the database server and reinitializing the tables
    between each run.
    Database scale?  One or multiple pgbench worker threads?  A reminder on the
    amount of RAM in the server would be helpful for interpreting the results
    too.
    Ah, sorry.  scale = 100, so small.  pgbench invocation is:
    It might be worthwhile to test only with the accounts and history
    table and also increasing the number of statements in a transaction.
    Otherwise the tiny tables can quickly become a bottleneck.
    What kind of bottleneck?
    So many transactions trying to update a small set of rows in a table.
    Is that what we really want to measure ? My thinking is that we might
    see different result if they are updating different parts of the table
    and the transaction start/stop overhead is spread across few
    statements.

    Thanks,
    Pavan

    --
    Pavan Deolasee
    EnterpriseDB     http://www.enterprisedb.com
  • Pavan Deolasee at Jul 26, 2011 at 4:33 pm

    On Tue, Jul 26, 2011 at 12:29 PM, Pavan Deolasee wrote:


    So many transactions trying to update a small set of rows in a table.
    Is that what we really want to measure ? My thinking is that we might
    see different result if they are updating different parts of the table
    and the transaction start/stop overhead is spread across few
    statements.
    I think what I am suggesting is that the default pgbench test setup
    would probably not give you a good scalability as number of clients
    are increased and one reason could be the contention in the small
    table. So it might be a good idea to get rid of that and see if we get
    much better scalability and understand other bottlenecks.

    Thanks,
    Pavan

    --
    Pavan Deolasee
    EnterpriseDB     http://www.enterprisedb.com
  • Greg Smith at Jul 26, 2011 at 6:02 pm

    On 07/26/2011 12:33 PM, Pavan Deolasee wrote:
    I think what I am suggesting is that the default pgbench test setup
    would probably not give you a good scalability as number of clients
    are increased and one reason could be the contention in the small
    table. So it might be a good idea to get rid of that and see if we get
    much better scalability and understand other bottlenecks.
    You can easily see this form of contention pulling down results when the
    database itself is really small and the overall transaction rate is very
    high. With Robert using a scale=100, no more than 80 clients, and
    transaction rates peaking at <10K TPS on a 24 core box, I wouldn't
    expect this form of contention to be a large issue. It may be dropping
    results a few percent, but I'd be surprised if it was any more than that.

    It's easy enough to use "-N" to skip the updates to the smaller tellers
    and branches table to pull that out of the way. TPS will go up, because
    it's doing less per transaction. That's not necessarily a better test
    case though, it's just a different one. The regular case includes a lot
    of overwriting the same blocks in the hot branches and tellers tables.
    That effectively pushes a lot more I/O toward being likely to happen at
    checkpoint time. Those tables rarely have any significant I/O outside
    of the checkpoint in the standard "TPC-B like" scenario, because their
    usage counts stay high most of the time.

    Contention for small tables that are being heavily updated is still
    important to optimize too though. Which type of test makes more sense
    depends on what aspect of performance you're trying to focus on. I'll
    sometimes do a full pgbench-tools "sweep" (range of multiple scales and
    clients at each scale) in both regular write and "-N" write modes, just
    to collect the slightly different data each provides. The form of
    checkpoint I/O spike you see at sync time is very different in the two
    cases, but both usage profiles are important to consider and optimize.
    That Robert has started with the regular case doesn't worry me too much
    now that I've seen the parameters he's using, he's not running it in a
    way where I'd expect branch/teller contention to be a major limiting
    factor on the results.

    --
    Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
    PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
  • Simon Riggs at Jul 26, 2011 at 4:55 pm

    On Tue, Jul 26, 2011 at 4:40M, Pavan Deolasee wrote:
    On Tue, Jul 26, 2011 at 9:07 AM, Robert Haas wrote:
    On Mon, Jul 25, 2011 at 10:14 PM, Greg Smith wrote:
    On 07/25/2011 04:07 PM, Robert Haas wrote:

    I did 5-minute pgbench runs with unlogged tables and with permanent
    tables, restarting the database server and reinitializing the tables
    between each run.
    Database scale?  One or multiple pgbench worker threads?  A reminder on the
    amount of RAM in the server would be helpful for interpreting the results
    too.
    Ah, sorry.  scale = 100, so small.  pgbench invocation is:
    It might be worthwhile to test only with the accounts and history
    table and also increasing the number of statements in a transaction.
    Otherwise the tiny tables can quickly become a bottleneck.
    +1

    --
    Simon Riggs                   http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedJul 25, '11 at 8:07p
activeJul 26, '11 at 6:02p
posts12
users6
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase