FAQ
Greetings,
I have a 4 server postgresql-9.1.3 cluster (one master doing streaming
replication to 3 hot standby servers).  All of them are running
Fedora-16-x86_64.

http://wiki.postgresql.org/wiki/Lock_Monitoring

I'm finding that I cannot runpg_basebackup at all, or it slows down all
SQL queries from running until pg_basebackup has completed (and the
load on the box just takes off to over 30.00).  By "blocks" I mean
that any query that is submitted just hangs and does not return for
seconds or sometimes even minutes
until pg_basebackup has stopped.   I'm assuming that this isn't
expected behavior, so I'm rather confused on what is going on.  The
command that I'm issuing is:
pg_basebackup -v -D /mnt/backups/backups/tmp0 -x -Ft -U postgres

Can someone provide some guidance on how to debug this? Or is there
some way to reduce the performance/priority of pg_basebackup so that
it has much less impact on overall performance?

thanks!

Search Discussions

  • Fabricio at Jun 7, 2012 at 6:02 pm
    Hi.

    I have this problem:

    I have PostgreSQL 9.1.3 and the last night crash it.

    This was the first error after an autovacuum (the night before last):

    <2012-06-06 00:59:07 MDT 814 4fceffbb.32e
    LOG: autovacuum: found orphan temp table
    "(null)"."tmpmuestadistica" in database
    "dbRX"
    <2012-06-06 01:05:26 MDT 1854 4fc7d1eb.73e
    LOG: could not rename temporary statistics file
    "pg_stat_tmp/pgstat.tmp" to "pg_stat_tmp/pgstat.stat": No such file or
    directory
    <2012-06-06 01:05:28 MDT 1383 4fcf0136.567 >ERROR: tuple concurrently updated
    <2012-06-06 01:05:28 MDT 1383 4fcf0136.567 >CONTEXT: automatic vacuum of table "global.pg_catalog.pg_attrdef"
    <2012-06-06
    01:06:09 MDT 1851 4fc7d1eb.73b >ERROR: xlog flush request
    4/E29EE490 is not satisfied --- flushed only to 3/13527A10
    <2012-06-06 01:06:09 MDT 1851 4fc7d1eb.73b >CONTEXT: writing block 0 of relation base/311360/12244_vm
    <2012-06-06
    01:06:10 MDT 1851 4fc7d1eb.73b >ERROR: xlog flush request
    4/E29EE490 is not satisfied --- flushed only to 3/13527A10
    <2012-06-06 01:06:10 MDT 1851 4fc7d1eb.73b >CONTEXT: writing block 0 of relation base/311360/12244_vm
    <2012-06-06 01:06:10 MDT 1851 4fc7d1eb.73b >WARNING: could not write block 0 of base/311360/12244_vm
    <2012-06-06 01:06:10 MDT 1851 4fc7d1eb.73b >DETAIL: Multiple failures --- write error might be permanent.


    Last night it was terminated by signal 6.

    <2012-06-07 01:36:44 MDT 2509 4fd05a0c.9cd >LOG: startup process (PID 2525) was terminated by signal 6: Aborted
    <2012-06-07 01:36:44 MDT 2509 4fd05a0c.9cd >LOG: aborting startup due to startup process failure
    <2012-06-07
    01:37:37 MDT 2680 4fd05a41.a78 >LOG: database system shutdown
    was interrupted; last known up at 2012-06-07 01:29:40 MDT
    <2012-06-07
    01:37:37 MDT 2680 4fd05a41.a78 >LOG: could not open file
    "pg_xlog/000000010000000300000013" (log file 3, segment 19): No such
    file or directory
    <2012-06-07 01:37:37 MDT 2680 4fd05a41.a78 >LOG: invalid primary checkpoint record

    And the only option was pg_resetxlog.

    After this a lot of querys showed me this error:
    <2012-06-07 09:24:22 MDT 1306 4fd0c7a6.51a >ERROR: missing chunk number 0 for toast value 393330 in pg_toast_2619
    <2012-06-07 09:24:31 MDT 1306 4fd0c7a6.51a >ERROR: missing chunk number 0 for toast value 393332 in pg_toast_2619

    I lost some databases.

    I restarted the cluster again with initdb and then I restored the databases that I could backup (for the other I restored an old backup)

    no space or permissions problem. No filesystem or disk error.

    Can you help me to know what happened?

    Thanks and regards...
  • Magnus Hagander at Jun 12, 2012 at 2:27 pm

    On Thu, Jun 7, 2012 at 8:01 PM, Fabricio wrote:
    Hi.

    I have this problem:

    I have PostgreSQL 9.1.3 and the last night crash it.

    This was the first error after an autovacuum (the night before last):

    <2012-06-06 00:59:07 MDT    814 4fceffbb.32e >LOG:  autovacuum: found orphan
    temp table "(null)"."tmpmuestadistica" in database "dbRX"
    <2012-06-06 01:05:26 MDT    1854 4fc7d1eb.73e >LOG:  could not rename
    temporary statistics file "pg_stat_tmp/pgstat.tmp" to
    "pg_stat_tmp/pgstat.stat": No such file or directory
    <2012-06-06 01:05:28 MDT    1383 4fcf0136.567 >ERROR:  tuple concurrently
    updated
    <2012-06-06 01:05:28 MDT    1383 4fcf0136.567 >CONTEXT:  automatic vacuum of
    table "global.pg_catalog.pg_attrdef"
    <2012-06-06 01:06:09 MDT    1851 4fc7d1eb.73b >ERROR:  xlog flush request
    4/E29EE490 is not satisfied --- flushed only to 3/13527A10
    <2012-06-06 01:06:09 MDT    1851 4fc7d1eb.73b >CONTEXT:  writing block 0 of
    relation base/311360/12244_vm
    <2012-06-06 01:06:10 MDT    1851 4fc7d1eb.73b >ERROR:  xlog flush request
    4/E29EE490 is not satisfied --- flushed only to 3/13527A10
    <2012-06-06 01:06:10 MDT    1851 4fc7d1eb.73b >CONTEXT:  writing block 0 of
    relation base/311360/12244_vm
    <2012-06-06 01:06:10 MDT    1851 4fc7d1eb.73b >WARNING:  could not write
    block 0 of base/311360/12244_vm
    <2012-06-06 01:06:10 MDT    1851 4fc7d1eb.73b >DETAIL:  Multiple failures
    --- write error might be permanent.


    Last night it was terminated by signal 6.

    <2012-06-07 01:36:44 MDT    2509 4fd05a0c.9cd >LOG:  startup process (PID
    2525) was terminated by signal 6: Aborted
    <2012-06-07 01:36:44 MDT    2509 4fd05a0c.9cd >LOG:  aborting startup due to
    startup process failure
    <2012-06-07 01:37:37 MDT    2680 4fd05a41.a78 >LOG:  database system
    shutdown was interrupted; last known up at 2012-06-07 01:29:40 MDT
    <2012-06-07 01:37:37 MDT    2680 4fd05a41.a78 >LOG:  could not open file
    "pg_xlog/000000010000000300000013" (log file 3, segment 19): No such file or
    directory
    <2012-06-07 01:37:37 MDT    2680 4fd05a41.a78 >LOG:  invalid primary
    checkpoint record

    And the only option was pg_resetxlog.

    After this a lot of querys showed me this error:
    <2012-06-07 09:24:22 MDT 1306 4fd0c7a6.51a >ERROR: missing chunk number 0
    for toast value 393330 in pg_toast_2619
    <2012-06-07 09:24:31 MDT 1306 4fd0c7a6.51a >ERROR: missing chunk number 0
    for toast value 393332 in pg_toast_2619

    I lost some databases.

    I restarted the cluster again with initdb and then I restored  the databases
    that I could backup (for the other I restored an old backup)

    no space or permissions problem. No filesystem or disk error.

    Can you help me to know what happened?
    I'd say that everything still points to a filesystem error. Have you
    tried unmounting it and running an offline check?
  • Lonni J Friedman at Jun 7, 2012 at 6:05 pm

    On Thu, Jun 7, 2012 at 10:41 AM, Lonni J Friedman wrote:
    Greetings,
    I have a 4 server postgresql-9.1.3 cluster (one master doing streaming
    replication to 3 hot standby servers).  All of them are running
    Fedora-16-x86_64.

    http://wiki.postgresql.org/wiki/Lock_Monitoring
    err, i included that URL but neglected to explain why. On a different
    list someone suggested that I verify that there were no locks that
    were blocking things, and I did so, and found no locks.

    So I'm still at a loss why pg_basebackup is killing perf, and would
    appreciate pointers on how to debug it or at least reduce its impact
    on performance if that is possible.

    tahnks
  • Magnus Hagander at Jun 7, 2012 at 7:41 pm

    On Thu, Jun 7, 2012 at 8:04 PM, Lonni J Friedman wrote:
    On Thu, Jun 7, 2012 at 10:41 AM, Lonni J Friedman wrote:
    Greetings,
    I have a 4 server postgresql-9.1.3 cluster (one master doing streaming
    replication to 3 hot standby servers).  All of them are running
    Fedora-16-x86_64.

    http://wiki.postgresql.org/wiki/Lock_Monitoring
    err, i included that URL but neglected to explain why.  On a different
    list someone suggested that I verify that there were no locks that
    were blocking things, and I did so, and found no locks.

    So I'm still at a loss why pg_basebackup is killing perf, and would
    appreciate pointers on how to debug it or at least reduce its impact
    on performance if that is possible.
    My guess would be that you are overloading your I/O system. You should
    look at values from iostat and vmstat from when the system works fine
    and when you run pg_basebackup, that should give you a hint in the
    right direction.
  • Lonni J Friedman at Jun 7, 2012 at 8:08 pm

    On Thu, Jun 7, 2012 at 12:40 PM, Magnus Hagander wrote:
    On Thu, Jun 7, 2012 at 8:04 PM, Lonni J Friedman wrote:
    On Thu, Jun 7, 2012 at 10:41 AM, Lonni J Friedman wrote:
    Greetings,
    I have a 4 server postgresql-9.1.3 cluster (one master doing streaming
    replication to 3 hot standby servers).  All of them are running
    Fedora-16-x86_64.

    http://wiki.postgresql.org/wiki/Lock_Monitoring
    err, i included that URL but neglected to explain why.  On a different
    list someone suggested that I verify that there were no locks that
    were blocking things, and I did so, and found no locks.

    So I'm still at a loss why pg_basebackup is killing perf, and would
    appreciate pointers on how to debug it or at least reduce its impact
    on performance if that is possible.
    My guess would be that you are overloading your I/O system. You should
    look at values from iostat and vmstat from when the system works fine
    and when you run pg_basebackup, that should give you a hint in the
    right direction.
    ok, thanks. i'll take a look at that. If this turns out to be the
    issue, is there some way to get pg_basebackup to run more slowly, so
    that it has less impact? Or could I do this with ionice on the
    pg_basebackup process?
  • Jerry Sievers at Jun 8, 2012 at 12:07 am

    Lonni J Friedman writes:
    On Thu, Jun 7, 2012 at 12:40 PM, Magnus Hagander wrote:
    On Thu, Jun 7, 2012 at 8:04 PM, Lonni J Friedman wrote:
    On Thu, Jun 7, 2012 at 10:41 AM, Lonni J Friedman wrote:
    Greetings,
    I have a 4 server postgresql-9.1.3 cluster (one master doing streaming
    replication to 3 hot standby servers).  All of them are running
    Fedora-16-x86_64.

    http://wiki.postgresql.org/wiki/Lock_Monitoring
    err, i included that URL but neglected to explain why.  On a different
    list someone suggested that I verify that there were no locks that
    were blocking things, and I did so, and found no locks.

    So I'm still at a loss why pg_basebackup is killing perf, and would
    appreciate pointers on how to debug it or at least reduce its impact
    on performance if that is possible.
    My guess would be that you are overloading your I/O system. You should
    look at values from iostat and vmstat from when the system works fine
    and when you run pg_basebackup, that should give you a hint in the
    right direction.
    ok, thanks. i'll take a look at that. If this turns out to be the
    issue, is there some way to get pg_basebackup to run more slowly, so
    that it has less impact? Or could I do this with ionice on the
    pg_basebackup process?
    You might try stopping pg_basebackup in place with SIGSTOP and check
    if problem goes away. SIGCONT and you should start having
    sluggishness again.

    If verified, then any sort of throttling mechanism should work.
    --
    Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-admin
    --
    Jerry Sievers
    Postgres DBA/Development Consulting
    e: postgres.consulting@comcast.net
    p: 732.216.7255
  • Lonni J Friedman at Jun 8, 2012 at 1:02 am

    On Thu, Jun 7, 2012 at 5:07 PM, Jerry Sievers wrote:
    Lonni J Friedman <netllama@gmail.com> writes:
    On Thu, Jun 7, 2012 at 12:40 PM, Magnus Hagander wrote:
    On Thu, Jun 7, 2012 at 8:04 PM, Lonni J Friedman wrote:
    On Thu, Jun 7, 2012 at 10:41 AM, Lonni J Friedman wrote:
    Greetings,
    I have a 4 server postgresql-9.1.3 cluster (one master doing streaming
    replication to 3 hot standby servers).  All of them are running
    Fedora-16-x86_64.

    http://wiki.postgresql.org/wiki/Lock_Monitoring
    err, i included that URL but neglected to explain why.  On a different
    list someone suggested that I verify that there were no locks that
    were blocking things, and I did so, and found no locks.

    So I'm still at a loss why pg_basebackup is killing perf, and would
    appreciate pointers on how to debug it or at least reduce its impact
    on performance if that is possible.
    My guess would be that you are overloading your I/O system. You should
    look at values from iostat and vmstat from when the system works fine
    and when you run pg_basebackup, that should give you a hint in the
    right direction.
    ok, thanks.  i'll take a look at that.  If this turns out to be the
    issue, is there some way to get pg_basebackup to run more slowly, so
    that it has less impact?  Or could I do this with ionice on the
    pg_basebackup process?
    You might try stopping pg_basebackup in place with SIGSTOP and check
    if problem goes away.  SIGCONT and you should  start having
    sluggishness again.

    If verified, then any sort of throttling mechanism should work.
    I'm certain that the problem is triggered only when pg_basebackup is
    running. Its very predictable, and goes away as soon as pg_basebackup
    finishes running. What do you mean by a throttling mechanism?
  • Craig Ringer at Jun 8, 2012 at 6:05 am

    On 06/08/2012 09:01 AM, Lonni J Friedman wrote:
    On Thu, Jun 7, 2012 at 5:07 PM, Jerry Sieverswrote:
    You might try stopping pg_basebackup in place with SIGSTOP and check
    if problem goes away. SIGCONT and you should start having
    sluggishness again.

    If verified, then any sort of throttling mechanism should work.
    I'm certain that the problem is triggered only when pg_basebackup is
    running. Its very predictable, and goes away as soon as pg_basebackup
    finishes running. What do you mean by a throttling mechanism?
    Sure, it only happens when pg_basebackup is running. But if you *pause*
    pg_basebackup, so it's still running but not currently doing work, does
    the problem go away? Does it come back when you unpause pg_basebackup?
    That's what Jerry was telling you to try.

    If the problem goes away when you pause pg_basebackup and comes back
    when you unpause it, it's probably a system load problem.

    If it doesn't go away, it's more likely to be a locking issue or
    something _other_ than simple load.

    SIGSTOP ("kill -STOP") pauses a process, and SIGCONT ("kill -CONT")
    resumes it, so on Linux you can use these to try and find out. When you
    SIGSTOP pg_basebackup then the postgres backend associated with it
    should block shortly afterwards as its buffers fill up and it can't send
    more data, so the load should come off the server.

    A "throttling mechanism" refers to anything that limits the rate or
    speed of a thing. In this case, what you want to do if your problem is
    system overload is to limit the speed at which pg_basebackup does its
    work so other things can still get work done. In other words you want to
    throttle it. Typical throttling mechanisms include the "ionice" and
    "renice" commands to change I/O and CPU priority, respectively.

    Note that you may need to change the priority of the *backend* that
    pg_basebackup is using, not necessarily the pg_basebackup command its
    self. I haven't done enough with Pg's replication to know how that
    works, so someone else will have to fill that bit in.

    --
    Craig Ringer
  • Lonni J Friedman at Jun 8, 2012 at 7:31 pm

    On Thu, Jun 7, 2012 at 11:04 PM, Craig Ringer wrote:
    On 06/08/2012 09:01 AM, Lonni J Friedman wrote:

    On Thu, Jun 7, 2012 at 5:07 PM, Jerry Sievers<gsievers19@comcast.net>
    wrote:
    You might try stopping pg_basebackup in place with SIGSTOP and check

    if problem goes away.  SIGCONT and you should  start having
    sluggishness again.

    If verified, then any sort of throttling mechanism should work.

    I'm certain that the problem is triggered only when pg_basebackup is
    running.  Its very predictable, and goes away as soon as pg_basebackup
    finishes running.  What do you mean by a throttling mechanism?

    Sure, it only happens when pg_basebackup is running. But if you *pause*
    pg_basebackup, so it's still running but not currently doing work, does the
    problem go away? Does it come back when you unpause pg_basebackup? That's
    what Jerry was telling you to try.

    If the problem goes away when you pause pg_basebackup and comes back when
    you unpause it, it's probably a system load problem.

    If it doesn't go away, it's more likely to be a locking issue or something
    _other_ than simple load.

    SIGSTOP ("kill -STOP") pauses a process, and SIGCONT ("kill -CONT") resumes
    it, so on Linux you can use these to try and find out. When you SIGSTOP
    pg_basebackup then the postgres backend associated with it should block
    shortly afterwards as its buffers fill up and it can't send more data, so
    the load should come off the server.

    A "throttling mechanism" refers to anything that limits the rate or speed of
    a thing. In this case, what you want to do if your problem is system
    overload is to limit the speed at which pg_basebackup does its work so other
    things can still get work done. In other words you want to throttle it.
    Typical throttling mechanisms include the "ionice" and "renice" commands to
    change I/O and CPU priority, respectively.

    Note that you may need to change the priority of the *backend* that
    pg_basebackup is using, not necessarily the pg_basebackup command its self.
    I haven't done enough with Pg's replication to know how that works, so
    someone else will have to fill that bit in.
    Thanks for your reply. I've confirmed that issuing a SIGSTOP does
    eliminate the thrashing, and issuing a SIGCONT resumes the thrash.

    I've looked at iostat output both before & during pg_basebackup runs,
    and I'm not seeing any indication that the problem is due to disk IO
    bottlenecks. The numbers don't vary very much at all between the good
    & bad times. This is typical when pg_basebackup is running:
    ########
    Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
    avgrq-sz avgqu-sz await r_await w_await svctm %util
    md0
    0.00 0.00 67.76 68.62 4.42 1.46
    88.34 0.00 0.00 0.00 0.00 0.00 0.00
    ########

    and this is when the system is ok:
    ########
    Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
    avgrq-sz avgqu-sz await r_await w_await svctm %util
    md0
    0.00 0.00 68.04 68.56 4.44 1.46
    88.39 0.00 0.00 0.00 0.00 0.00 0.00
    ########


    I looked at vmstat output, but nothing is jumping out at me as being
    dramatically different when pg_basebackup is running. swap in and
    swap out are zero 100% of the time for the good & bad perf cases. I
    can post example output if someone is interested, or if there's
    something specific that I should be looking at as a potential problem,
    let me know.

    thanks
  • Fujii Masao at Jun 9, 2012 at 2:29 am

    On Sat, Jun 9, 2012 at 4:30 AM, Lonni J Friedman wrote:
    On Thu, Jun 7, 2012 at 11:04 PM, Craig Ringer wrote:
    On 06/08/2012 09:01 AM, Lonni J Friedman wrote:

    On Thu, Jun 7, 2012 at 5:07 PM, Jerry Sievers<gsievers19@comcast.net>
    wrote:
    You might try stopping pg_basebackup in place with SIGSTOP and check

    if problem goes away.  SIGCONT and you should  start having
    sluggishness again.

    If verified, then any sort of throttling mechanism should work.

    I'm certain that the problem is triggered only when pg_basebackup is
    running.  Its very predictable, and goes away as soon as pg_basebackup
    finishes running.  What do you mean by a throttling mechanism?

    Sure, it only happens when pg_basebackup is running. But if you *pause*
    pg_basebackup, so it's still running but not currently doing work, does the
    problem go away? Does it come back when you unpause pg_basebackup? That's
    what Jerry was telling you to try.

    If the problem goes away when you pause pg_basebackup and comes back when
    you unpause it, it's probably a system load problem.

    If it doesn't go away, it's more likely to be a locking issue or something
    _other_ than simple load.

    SIGSTOP ("kill -STOP") pauses a process, and SIGCONT ("kill -CONT") resumes
    it, so on Linux you can use these to try and find out. When you SIGSTOP
    pg_basebackup then the postgres backend associated with it should block
    shortly afterwards as its buffers fill up and it can't send more data, so
    the load should come off the server.

    A "throttling mechanism" refers to anything that limits the rate or speed of
    a thing. In this case, what you want to do if your problem is system
    overload is to limit the speed at which pg_basebackup does its work so other
    things can still get work done. In other words you want to throttle it.
    Typical throttling mechanisms include the "ionice" and "renice" commands to
    change I/O and CPU priority, respectively.

    Note that you may need to change the priority of the *backend* that
    pg_basebackup is using, not necessarily the pg_basebackup command its self.
    I haven't done enough with Pg's replication to know how that works, so
    someone else will have to fill that bit in.
    Thanks for your reply.  I've confirmed that issuing a SIGSTOP does
    eliminate the thrashing, and issuing a SIGCONT resumes the thrash.

    I've looked at iostat output both before & during pg_basebackup runs,
    and I'm not seeing any indication that the problem is due to disk IO
    bottlenecks.  The numbers don't vary very much at all between the good
    & bad times.  This is typical when pg_basebackup is running:
    ########
    Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
    avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
    md0
    0.00     0.00   67.76   68.62     4.42     1.46
    88.34     0.00    0.00    0.00    0.00   0.00   0.00
    ########

    and this is when the system is ok:
    ########
    Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
    avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
    md0
    0.00     0.00   68.04   68.56     4.44     1.46
    88.39     0.00    0.00    0.00    0.00   0.00   0.00
    ########


    I looked at vmstat output, but nothing is jumping out at me as being
    dramatically different when pg_basebackup is running.  swap in and
    swap out are zero 100% of the time for the good & bad perf cases.  I
    can post example output if someone is interested, or if there's
    something specific that I should be looking at as a potential problem,
    let me know.
    Did you set synchronous_standby_names to '*'? If so, the problem you
    encountered can happen.

    When synchronous_standby_names is '*', you cannot control which
    standbys take a role of synchronous standby. The standby which you
    expect to run as asynchronous one might be synchronous one. So
    my guess is that at first one of your three standbys was running as
    synchronous standby, and all queries were executed normally. But
    when you started pg_basebackup, pg_basebackup unexpectedly
    got the role of synchronous standby from another standby. Since
    pg_basebackup doesn't send the information about replication
    progress back to the master, all queries (more precisely, transaction
    commit) got stuck, and kept waiting for the reply from synchronous
    standby.

    You can avoid this problem by setting synchronous_standby_names
    to the names of your standbys instead of '*'.

    This seems a bug. I think we should prevent pg_basebackup from
    becoming synchronous standby. Thought?

    Regards,

    --
    Fujii Masao
  • Tom Lane at Jun 9, 2012 at 12:51 pm

    Fujii Masao writes:
    This seems a bug. I think we should prevent pg_basebackup from
    becoming synchronous standby. Thought?
    Absolutely. If we have replication clients that are not actually
    capable of being standbys, there *must* be a way for the master
    to know that.

    regards, tom lane
  • Magnus Hagander at Jun 10, 2012 at 10:43 am

    On Sat, Jun 9, 2012 at 2:51 PM, Tom Lane wrote:
    Fujii Masao <masao.fujii@gmail.com> writes:
    This seems a bug. I think we should prevent pg_basebackup from
    becoming synchronous standby. Thought?
    Absolutely.  If we have replication clients that are not actually
    capable of being standbys, there *must* be a way for the master
    to know that.
    I thought we fixed this already by sending InvalidXlogRecPtr as flush
    location? And that this only applied in 9.2?

    Are you saying we picked pg_basebackup *in backup mode* (not log
    streaming) as synchronous standby? If so then yes, that is
    *definitely* a bug that should be fixed. We should never select a
    connection that's not even streaming log as standby!
  • Fujii Masao at Jun 10, 2012 at 12:25 pm

    On Sun, Jun 10, 2012 at 7:43 PM, Magnus Hagander wrote:
    On Sat, Jun 9, 2012 at 2:51 PM, Tom Lane wrote:
    Fujii Masao <masao.fujii@gmail.com> writes:
    This seems a bug. I think we should prevent pg_basebackup from
    becoming synchronous standby. Thought?
    Absolutely.  If we have replication clients that are not actually
    capable of being standbys, there *must* be a way for the master
    to know that.
    I thought we fixed this already by sending InvalidXlogRecPtr as flush
    location? And that this only applied in 9.2?

    Are you saying we picked pg_basebackup *in backup mode* (not log
    streaming) as synchronous standby? Yes.
    If so then yes, that is
    *definitely* a bug that should be fixed. We should never select a
    connection that's not even streaming log as standby!
    Agreed. Attached patch prevents pg_basebackup from becoming sync
    standby. Also this patch fixes another problem: currently only walsender
    which reaches STREAMING state can become sync walsender. OTOH,
    sync walsender thinks that walsender with higher priority will be sync one
    whether its state is STREAMING, and switches to potential sync walsender.
    So when the standby with higher priority connects to the master, we
    might have no sync standby until it reaches the STREAMING state.
    To fix this problem, the patch switches walsender's state from sync to
    potential *after* walsender with higher priority has reached the
    STREAMING state.

    We also should not select (1) background stream process forked from
    pg_basebackup and (2) pg_receivexlog as sync standby because they
    don't send back replication progress. To address this, I'm thinking to
    introduce new option "NOSYNC" in "START_REPLICATION" command
    as follows, and to change (1) and (2) so that they specify NOSYNC.

    START_REPLICATION XXX/XXX [NOSYNC]

    If the standby specifies NOSYNC option, it's never assigned as sync
    standby even if its name is in synchronous_standby_names. Thought?

    BTW, we are discussing about changing pg_receivexlog so that it sends
    back replication progress, in another thread. So if this change will have
    been applied, probably we don't need to change pg_receivexlog so that
    it uses NOSYNC option.

    Regards,

    --
    Fujii Masao
  • Fujii Masao at Jun 10, 2012 at 1:34 pm

    On Sun, Jun 10, 2012 at 9:25 PM, Fujii Masao wrote:
    On Sun, Jun 10, 2012 at 7:43 PM, Magnus Hagander wrote:
    On Sat, Jun 9, 2012 at 2:51 PM, Tom Lane wrote:
    Fujii Masao <masao.fujii@gmail.com> writes:
    This seems a bug. I think we should prevent pg_basebackup from
    becoming synchronous standby. Thought?
    Absolutely.  If we have replication clients that are not actually
    capable of being standbys, there *must* be a way for the master
    to know that.
    I thought we fixed this already by sending InvalidXlogRecPtr as flush
    location? And that this only applied in 9.2?

    Are you saying we picked pg_basebackup *in backup mode* (not log
    streaming) as synchronous standby? Yes.
    If so then yes, that is
    *definitely* a bug that should be fixed. We should never select a
    connection that's not even streaming log as standby!
    Agreed. Attached patch prevents pg_basebackup from becoming sync
    standby. Also this patch fixes another problem: currently only walsender
    which reaches STREAMING state can become sync walsender. OTOH,
    sync walsender thinks that walsender with higher priority will be sync one
    whether its state is STREAMING, and switches to potential sync walsender.
    So when the standby with higher priority connects to the master, we
    might have no sync standby until it reaches the STREAMING state.
    To fix this problem, the patch switches walsender's state from sync to
    potential *after* walsender with higher priority has reached the
    STREAMING state.

    We also should not select (1) background stream process forked from
    pg_basebackup and (2) pg_receivexlog as sync standby because they
    don't send back replication progress. To address this, I'm thinking to
    introduce new option "NOSYNC" in "START_REPLICATION" command
    as follows, and to change (1) and (2) so that they specify NOSYNC.

    START_REPLICATION XXX/XXX [NOSYNC]

    If the standby specifies NOSYNC option, it's never assigned as sync
    standby even if its name is in synchronous_standby_names. Thought?
    The standby which always sends InvalidXLogRecPtr back should not
    become sync one. So instead of NOSYNC option, by checking whether
    InvalidXLogRecPtr is sent, we can avoid problematic sync standby.

    Regards,

    --
    Fujii Masao
  • Fujii Masao at Jun 10, 2012 at 2:09 pm

    On Sun, Jun 10, 2012 at 10:34 PM, Fujii Masao wrote:
    On Sun, Jun 10, 2012 at 9:25 PM, Fujii Masao wrote:
    On Sun, Jun 10, 2012 at 7:43 PM, Magnus Hagander wrote:
    On Sat, Jun 9, 2012 at 2:51 PM, Tom Lane wrote:
    Fujii Masao <masao.fujii@gmail.com> writes:
    This seems a bug. I think we should prevent pg_basebackup from
    becoming synchronous standby. Thought?
    Absolutely.  If we have replication clients that are not actually
    capable of being standbys, there *must* be a way for the master
    to know that.
    I thought we fixed this already by sending InvalidXlogRecPtr as flush
    location? And that this only applied in 9.2?

    Are you saying we picked pg_basebackup *in backup mode* (not log
    streaming) as synchronous standby? Yes.
    If so then yes, that is
    *definitely* a bug that should be fixed. We should never select a
    connection that's not even streaming log as standby!
    Agreed. Attached patch prevents pg_basebackup from becoming sync
    standby. Also this patch fixes another problem: currently only walsender
    which reaches STREAMING state can become sync walsender. OTOH,
    sync walsender thinks that walsender with higher priority will be sync one
    whether its state is STREAMING, and switches to potential sync walsender.
    So when the standby with higher priority connects to the master, we
    might have no sync standby until it reaches the STREAMING state.
    To fix this problem, the patch switches walsender's state from sync to
    potential *after* walsender with higher priority has reached the
    STREAMING state.

    We also should not select (1) background stream process forked from
    pg_basebackup and (2) pg_receivexlog as sync standby because they
    don't send back replication progress. To address this, I'm thinking to
    introduce new option "NOSYNC" in "START_REPLICATION" command
    as follows, and to change (1) and (2) so that they specify NOSYNC.

    START_REPLICATION XXX/XXX [NOSYNC]

    If the standby specifies NOSYNC option, it's never assigned as sync
    standby even if its name is in synchronous_standby_names. Thought?
    The standby which always sends InvalidXLogRecPtr back should not
    become sync one. So instead of NOSYNC option, by checking whether
    InvalidXLogRecPtr is sent, we can avoid problematic sync standby.
    We should not do this because Magnus is proposing the patch
    (http://archives.postgresql.org/pgsql-hackers/2012-06/msg00348.php)
    which breaks the above assumption at all. So we should introduce
    something like NOSYNC option.

    Regards,

    --
    Fujii Masao
  • Magnus Hagander at Jun 10, 2012 at 2:11 pm

    On Sun, Jun 10, 2012 at 4:08 PM, Fujii Masao wrote:
    On Sun, Jun 10, 2012 at 10:34 PM, Fujii Masao wrote:
    On Sun, Jun 10, 2012 at 9:25 PM, Fujii Masao wrote:
    On Sun, Jun 10, 2012 at 7:43 PM, Magnus Hagander wrote:
    On Sat, Jun 9, 2012 at 2:51 PM, Tom Lane wrote:
    Fujii Masao <masao.fujii@gmail.com> writes:
    This seems a bug. I think we should prevent pg_basebackup from
    becoming synchronous standby. Thought?
    Absolutely.  If we have replication clients that are not actually
    capable of being standbys, there *must* be a way for the master
    to know that.
    I thought we fixed this already by sending InvalidXlogRecPtr as flush
    location? And that this only applied in 9.2?

    Are you saying we picked pg_basebackup *in backup mode* (not log
    streaming) as synchronous standby? Yes.
    If so then yes, that is
    *definitely* a bug that should be fixed. We should never select a
    connection that's not even streaming log as standby!
    Agreed. Attached patch prevents pg_basebackup from becoming sync
    standby. Also this patch fixes another problem: currently only walsender
    which reaches STREAMING state can become sync walsender. OTOH,
    sync walsender thinks that walsender with higher priority will be sync one
    whether its state is STREAMING, and switches to potential sync walsender.
    So when the standby with higher priority connects to the master, we
    might have no sync standby until it reaches the STREAMING state.
    To fix this problem, the patch switches walsender's state from sync to
    potential *after* walsender with higher priority has reached the
    STREAMING state.

    We also should not select (1) background stream process forked from
    pg_basebackup and (2) pg_receivexlog as sync standby because they
    don't send back replication progress. To address this, I'm thinking to
    introduce new option "NOSYNC" in "START_REPLICATION" command
    as follows, and to change (1) and (2) so that they specify NOSYNC.

    START_REPLICATION XXX/XXX [NOSYNC]

    If the standby specifies NOSYNC option, it's never assigned as sync
    standby even if its name is in synchronous_standby_names. Thought?
    The standby which always sends InvalidXLogRecPtr back should not
    become sync one. So instead of NOSYNC option, by checking whether
    InvalidXLogRecPtr is sent, we can avoid problematic sync standby.
    We should not do this because Magnus is proposing the patch
    (http://archives.postgresql.org/pgsql-hackers/2012-06/msg00348.php)
    which breaks the above assumption at all. So we should introduce
    something like NOSYNC option.
    Wouldn't the better choice there in that case be to give a switch to
    pg_receivexlog if you *want* it to be able to become a sync replica,
    and by default disallow it? And then keep the backend just treating
    InvalidXlogRecPtr as don't-become-sync-replica.
  • Fujii Masao at Jun 10, 2012 at 2:30 pm

    On Sun, Jun 10, 2012 at 11:10 PM, Magnus Hagander wrote:
    On Sun, Jun 10, 2012 at 4:08 PM, Fujii Masao wrote:
    On Sun, Jun 10, 2012 at 10:34 PM, Fujii Masao wrote:
    On Sun, Jun 10, 2012 at 9:25 PM, Fujii Masao wrote:
    On Sun, Jun 10, 2012 at 7:43 PM, Magnus Hagander wrote:
    On Sat, Jun 9, 2012 at 2:51 PM, Tom Lane wrote:
    Fujii Masao <masao.fujii@gmail.com> writes:
    This seems a bug. I think we should prevent pg_basebackup from
    becoming synchronous standby. Thought?
    Absolutely.  If we have replication clients that are not actually
    capable of being standbys, there *must* be a way for the master
    to know that.
    I thought we fixed this already by sending InvalidXlogRecPtr as flush
    location? And that this only applied in 9.2?

    Are you saying we picked pg_basebackup *in backup mode* (not log
    streaming) as synchronous standby? Yes.
    If so then yes, that is
    *definitely* a bug that should be fixed. We should never select a
    connection that's not even streaming log as standby!
    Agreed. Attached patch prevents pg_basebackup from becoming sync
    standby. Also this patch fixes another problem: currently only walsender
    which reaches STREAMING state can become sync walsender. OTOH,
    sync walsender thinks that walsender with higher priority will be sync one
    whether its state is STREAMING, and switches to potential sync walsender.
    So when the standby with higher priority connects to the master, we
    might have no sync standby until it reaches the STREAMING state.
    To fix this problem, the patch switches walsender's state from sync to
    potential *after* walsender with higher priority has reached the
    STREAMING state.

    We also should not select (1) background stream process forked from
    pg_basebackup and (2) pg_receivexlog as sync standby because they
    don't send back replication progress. To address this, I'm thinking to
    introduce new option "NOSYNC" in "START_REPLICATION" command
    as follows, and to change (1) and (2) so that they specify NOSYNC.

    START_REPLICATION XXX/XXX [NOSYNC]

    If the standby specifies NOSYNC option, it's never assigned as sync
    standby even if its name is in synchronous_standby_names. Thought?
    The standby which always sends InvalidXLogRecPtr back should not
    become sync one. So instead of NOSYNC option, by checking whether
    InvalidXLogRecPtr is sent, we can avoid problematic sync standby.
    We should not do this because Magnus is proposing the patch
    (http://archives.postgresql.org/pgsql-hackers/2012-06/msg00348.php)
    which breaks the above assumption at all. So we should introduce
    something like NOSYNC option.
    Wouldn't the better choice there in that case be to give a switch to
    pg_receivexlog if you *want* it to be able to become a sync replica,
    and by default disallow it? And then keep the backend just treating
    InvalidXlogRecPtr as don't-become-sync-replica.
    I don't object to making pg_receivexlog as sync standby at all. So at least
    for me, that switch is not necessary. What I'm worried about is the
    background stream process forked from pg_basebackup. I think that
    it should not run as sync standby but sending back its replication progress
    seems helpful because a user can see the progress from pg_stat_replication.
    So I'm thinking that something like NOSYNC option is required.

    Regards,

    --
    Fujii Masao
  • Magnus Hagander at Jun 10, 2012 at 2:46 pm

    On Sun, Jun 10, 2012 at 4:29 PM, Fujii Masao wrote:
    On Sun, Jun 10, 2012 at 11:10 PM, Magnus Hagander wrote:
    On Sun, Jun 10, 2012 at 4:08 PM, Fujii Masao wrote:
    On Sun, Jun 10, 2012 at 10:34 PM, Fujii Masao wrote:
    On Sun, Jun 10, 2012 at 9:25 PM, Fujii Masao wrote:
    On Sun, Jun 10, 2012 at 7:43 PM, Magnus Hagander wrote:
    On Sat, Jun 9, 2012 at 2:51 PM, Tom Lane wrote:
    Fujii Masao <masao.fujii@gmail.com> writes:
    This seems a bug. I think we should prevent pg_basebackup from
    becoming synchronous standby. Thought?
    Absolutely.  If we have replication clients that are not actually
    capable of being standbys, there *must* be a way for the master
    to know that.
    I thought we fixed this already by sending InvalidXlogRecPtr as flush
    location? And that this only applied in 9.2?

    Are you saying we picked pg_basebackup *in backup mode* (not log
    streaming) as synchronous standby? Yes.
    If so then yes, that is
    *definitely* a bug that should be fixed. We should never select a
    connection that's not even streaming log as standby!
    Agreed. Attached patch prevents pg_basebackup from becoming sync
    standby. Also this patch fixes another problem: currently only walsender
    which reaches STREAMING state can become sync walsender. OTOH,
    sync walsender thinks that walsender with higher priority will be sync one
    whether its state is STREAMING, and switches to potential sync walsender.
    So when the standby with higher priority connects to the master, we
    might have no sync standby until it reaches the STREAMING state.
    To fix this problem, the patch switches walsender's state from sync to
    potential *after* walsender with higher priority has reached the
    STREAMING state.

    We also should not select (1) background stream process forked from
    pg_basebackup and (2) pg_receivexlog as sync standby because they
    don't send back replication progress. To address this, I'm thinking to
    introduce new option "NOSYNC" in "START_REPLICATION" command
    as follows, and to change (1) and (2) so that they specify NOSYNC.

    START_REPLICATION XXX/XXX [NOSYNC]

    If the standby specifies NOSYNC option, it's never assigned as sync
    standby even if its name is in synchronous_standby_names. Thought?
    The standby which always sends InvalidXLogRecPtr back should not
    become sync one. So instead of NOSYNC option, by checking whether
    InvalidXLogRecPtr is sent, we can avoid problematic sync standby.
    We should not do this because Magnus is proposing the patch
    (http://archives.postgresql.org/pgsql-hackers/2012-06/msg00348.php)
    which breaks the above assumption at all. So we should introduce
    something like NOSYNC option.
    Wouldn't the better choice there in that case be to give a switch to
    pg_receivexlog if you *want* it to be able to become a sync replica,
    and by default disallow it? And then keep the backend just treating
    InvalidXlogRecPtr as don't-become-sync-replica.
    I don't object to making pg_receivexlog as sync standby at all. So at least
    for me, that switch is not necessary. What I'm worried about is the
    background stream process forked from pg_basebackup. I think that
    it should not run as sync standby but sending back its replication progress
    seems helpful because a user can see the progress from pg_stat_replication.
    So I'm thinking that something like NOSYNC option is required.
    On principle, no. By default, yes.

    How about:
    pg_basebackup background: *never* sends flush location, and therefor
    won't become sync replica
    pg_receivexlog *optionally* sends flush location. by defualt own't
    become sync replica, but can be made so with a switch

    (this is on top of the "make sure pg_basebackup in *non-streaming*
    mode can never be picked" of coursE)
  • Magnus Hagander at Jun 11, 2012 at 1:19 pm

    On Sun, Jun 10, 2012 at 2:25 PM, Fujii Masao wrote:
    On Sun, Jun 10, 2012 at 7:43 PM, Magnus Hagander wrote:
    On Sat, Jun 9, 2012 at 2:51 PM, Tom Lane wrote:
    Fujii Masao <masao.fujii@gmail.com> writes:
    This seems a bug. I think we should prevent pg_basebackup from
    becoming synchronous standby. Thought?
    Absolutely.  If we have replication clients that are not actually
    capable of being standbys, there *must* be a way for the master
    to know that.
    I thought we fixed this already by sending InvalidXlogRecPtr as flush
    location? And that this only applied in 9.2?

    Are you saying we picked pg_basebackup *in backup mode* (not log
    streaming) as synchronous standby? Yes.
    If so then yes, that is
    *definitely* a bug that should be fixed. We should never select a
    connection that's not even streaming log as standby!
    Agreed. Attached patch prevents pg_basebackup from becoming sync
    standby. Also this patch fixes another problem: currently only walsender
    which reaches STREAMING state can become sync walsender. OTOH,
    sync walsender thinks that walsender with higher priority will be sync one
    whether its state is STREAMING, and switches to potential sync walsender.
    So when the standby with higher priority connects to the master, we
    might have no sync standby until it reaches the STREAMING state.
    To fix this problem, the patch switches walsender's state from sync to
    potential *after* walsender with higher priority has reached the
    STREAMING state.
    This fix needs to be applied independently of the other discussions,
    since it affects 9.1 and needs to be backpatched.

    So - applied, and backpatched.

    The issues wrt the pg_basebackup background process and pg_receivexlog
    are only for 9.2...
  • Lonni J Friedman at Jun 11, 2012 at 5:37 pm

    On Fri, Jun 8, 2012 at 7:29 PM, Fujii Masao wrote:
    On Sat, Jun 9, 2012 at 4:30 AM, Lonni J Friedman wrote:
    On Thu, Jun 7, 2012 at 11:04 PM, Craig Ringer wrote:
    On 06/08/2012 09:01 AM, Lonni J Friedman wrote:

    On Thu, Jun 7, 2012 at 5:07 PM, Jerry Sievers<gsievers19@comcast.net>
    wrote:
    You might try stopping pg_basebackup in place with SIGSTOP and check

    if problem goes away.  SIGCONT and you should  start having
    sluggishness again.

    If verified, then any sort of throttling mechanism should work.

    I'm certain that the problem is triggered only when pg_basebackup is
    running.  Its very predictable, and goes away as soon as pg_basebackup
    finishes running.  What do you mean by a throttling mechanism?

    Sure, it only happens when pg_basebackup is running. But if you *pause*
    pg_basebackup, so it's still running but not currently doing work, does the
    problem go away? Does it come back when you unpause pg_basebackup? That's
    what Jerry was telling you to try.

    If the problem goes away when you pause pg_basebackup and comes back when
    you unpause it, it's probably a system load problem.

    If it doesn't go away, it's more likely to be a locking issue or something
    _other_ than simple load.

    SIGSTOP ("kill -STOP") pauses a process, and SIGCONT ("kill -CONT") resumes
    it, so on Linux you can use these to try and find out. When you SIGSTOP
    pg_basebackup then the postgres backend associated with it should block
    shortly afterwards as its buffers fill up and it can't send more data, so
    the load should come off the server.

    A "throttling mechanism" refers to anything that limits the rate or speed of
    a thing. In this case, what you want to do if your problem is system
    overload is to limit the speed at which pg_basebackup does its work so other
    things can still get work done. In other words you want to throttle it.
    Typical throttling mechanisms include the "ionice" and "renice" commands to
    change I/O and CPU priority, respectively.

    Note that you may need to change the priority of the *backend* that
    pg_basebackup is using, not necessarily the pg_basebackup command its self.
    I haven't done enough with Pg's replication to know how that works, so
    someone else will have to fill that bit in.
    Thanks for your reply.  I've confirmed that issuing a SIGSTOP does
    eliminate the thrashing, and issuing a SIGCONT resumes the thrash.

    I've looked at iostat output both before & during pg_basebackup runs,
    and I'm not seeing any indication that the problem is due to disk IO
    bottlenecks.  The numbers don't vary very much at all between the good
    & bad times.  This is typical when pg_basebackup is running:
    ########
    Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
    avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
    md0
    0.00     0.00   67.76   68.62     4.42     1.46
    88.34     0.00    0.00    0.00    0.00   0.00   0.00
    ########

    and this is when the system is ok:
    ########
    Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
    avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
    md0
    0.00     0.00   68.04   68.56     4.44     1.46
    88.39     0.00    0.00    0.00    0.00   0.00   0.00
    ########


    I looked at vmstat output, but nothing is jumping out at me as being
    dramatically different when pg_basebackup is running.  swap in and
    swap out are zero 100% of the time for the good & bad perf cases.  I
    can post example output if someone is interested, or if there's
    something specific that I should be looking at as a potential problem,
    let me know.
    Did you set synchronous_standby_names to '*'? If so, the problem you
    encountered can happen.

    When synchronous_standby_names is '*', you cannot control which
    standbys take a role of synchronous standby. The standby which you
    expect to run as asynchronous one might be synchronous one. So
    my guess is that at first one of your three standbys was running as
    synchronous standby, and all queries were executed normally. But
    when you started pg_basebackup, pg_basebackup unexpectedly
    got the role of synchronous standby from another standby. Since
    pg_basebackup doesn't send the information about replication
    progress back to the master, all queries (more precisely, transaction
    commit) got stuck, and kept waiting for the reply from synchronous
    standby.

    You can avoid this problem by setting synchronous_standby_names
    to the names of your standbys instead of '*'.
    I don't have synchronous_standby_names set at all. I'm only doing
    asynchronous replication.
  • Fujii Masao at Jun 12, 2012 at 5:49 pm

    On Tue, Jun 12, 2012 at 2:37 AM, Lonni J Friedman wrote:
    On Fri, Jun 8, 2012 at 7:29 PM, Fujii Masao wrote:
    On Sat, Jun 9, 2012 at 4:30 AM, Lonni J Friedman wrote:
    On Thu, Jun 7, 2012 at 11:04 PM, Craig Ringer wrote:
    On 06/08/2012 09:01 AM, Lonni J Friedman wrote:

    On Thu, Jun 7, 2012 at 5:07 PM, Jerry Sievers<gsievers19@comcast.net>
    wrote:
    You might try stopping pg_basebackup in place with SIGSTOP and check

    if problem goes away.  SIGCONT and you should  start having
    sluggishness again.

    If verified, then any sort of throttling mechanism should work.

    I'm certain that the problem is triggered only when pg_basebackup is
    running.  Its very predictable, and goes away as soon as pg_basebackup
    finishes running.  What do you mean by a throttling mechanism?

    Sure, it only happens when pg_basebackup is running. But if you *pause*
    pg_basebackup, so it's still running but not currently doing work, does the
    problem go away? Does it come back when you unpause pg_basebackup? That's
    what Jerry was telling you to try.

    If the problem goes away when you pause pg_basebackup and comes back when
    you unpause it, it's probably a system load problem.

    If it doesn't go away, it's more likely to be a locking issue or something
    _other_ than simple load.

    SIGSTOP ("kill -STOP") pauses a process, and SIGCONT ("kill -CONT") resumes
    it, so on Linux you can use these to try and find out. When you SIGSTOP
    pg_basebackup then the postgres backend associated with it should block
    shortly afterwards as its buffers fill up and it can't send more data, so
    the load should come off the server.

    A "throttling mechanism" refers to anything that limits the rate or speed of
    a thing. In this case, what you want to do if your problem is system
    overload is to limit the speed at which pg_basebackup does its work so other
    things can still get work done. In other words you want to throttle it.
    Typical throttling mechanisms include the "ionice" and "renice" commands to
    change I/O and CPU priority, respectively.

    Note that you may need to change the priority of the *backend* that
    pg_basebackup is using, not necessarily the pg_basebackup command its self.
    I haven't done enough with Pg's replication to know how that works, so
    someone else will have to fill that bit in.
    Thanks for your reply.  I've confirmed that issuing a SIGSTOP does
    eliminate the thrashing, and issuing a SIGCONT resumes the thrash.

    I've looked at iostat output both before & during pg_basebackup runs,
    and I'm not seeing any indication that the problem is due to disk IO
    bottlenecks.  The numbers don't vary very much at all between the good
    & bad times.  This is typical when pg_basebackup is running:
    ########
    Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
    avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
    md0
    0.00     0.00   67.76   68.62     4.42     1.46
    88.34     0.00    0.00    0.00    0.00   0.00   0.0
    ########

    and this is when the system is ok:
    ########
    Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
    avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
    md0
    0.00     0.00   68.04   68.56     4.44     1.46
    88.39     0.00    0.00    0.00    0.00   0.00   0.0
    ########


    I looked at vmstat output, but nothing is jumping out at me as being
    dramatically different when pg_basebackup is running.  swap in and
    swap out are zero 100% of the time for the good & bad perf cases.  I
    can post example output if someone is interested, or if there's
    something specific that I should be looking at as a potential problem,
    let me know.
    Did you set synchronous_standby_names to '*'? If so, the problem you
    encountered can happen.

    When synchronous_standby_names is '*', you cannot control which
    standbys take a role of synchronous standby. The standby which you
    expect to run as asynchronous one might be synchronous one. So
    my guess is that at first one of your three standbys was running as
    synchronous standby, and all queries were executed normally. But
    when you started pg_basebackup, pg_basebackup unexpectedly
    got the role of synchronous standby from another standby. Since
    pg_basebackup doesn't send the information about replication
    progress back to the master, all queries (more precisely, transaction
    commit) got stuck, and kept waiting for the reply from synchronous
    standby.

    You can avoid this problem by setting synchronous_standby_names
    to the names of your standbys instead of '*'.
    I don't have synchronous_standby_names set at all.  I'm only doing
    asynchronous replication.
    Hmm... I have no idea about what happened on your environment, for now.
    Could you show me the self-contained test case?

    Regards,

    --
    Fujii Masao
  • Lonni J Friedman at Jun 12, 2012 at 6:37 pm

    On Tue, Jun 12, 2012 at 10:49 AM, Fujii Masao wrote:
    On Tue, Jun 12, 2012 at 2:37 AM, Lonni J Friedman wrote:
    On Fri, Jun 8, 2012 at 7:29 PM, Fujii Masao wrote:
    On Sat, Jun 9, 2012 at 4:30 AM, Lonni J Friedman wrote:
    On Thu, Jun 7, 2012 at 11:04 PM, Craig Ringer wrote:
    On 06/08/2012 09:01 AM, Lonni J Friedman wrote:

    On Thu, Jun 7, 2012 at 5:07 PM, Jerry Sievers<gsievers19@comcast.net>
    wrote:
    You might try stopping pg_basebackup in place with SIGSTOP and check

    if problem goes away.  SIGCONT and you should  start having
    sluggishness again.

    If verified, then any sort of throttling mechanism should work.

    I'm certain that the problem is triggered only when pg_basebackup is
    running.  Its very predictable, and goes away as soon as pg_basebackup
    finishes running.  What do you mean by a throttling mechanism?

    Sure, it only happens when pg_basebackup is running. But if you *pause*
    pg_basebackup, so it's still running but not currently doing work, does the
    problem go away? Does it come back when you unpause pg_basebackup? That's
    what Jerry was telling you to try.

    If the problem goes away when you pause pg_basebackup and comes back when
    you unpause it, it's probably a system load problem.

    If it doesn't go away, it's more likely to be a locking issue or something
    _other_ than simple load.

    SIGSTOP ("kill -STOP") pauses a process, and SIGCONT ("kill -CONT") resumes
    it, so on Linux you can use these to try and find out. When you SIGSTOP
    pg_basebackup then the postgres backend associated with it should block
    shortly afterwards as its buffers fill up and it can't send more data, so
    the load should come off the server.

    A "throttling mechanism" refers to anything that limits the rate or speed of
    a thing. In this case, what you want to do if your problem is system
    overload is to limit the speed at which pg_basebackup does its work so other
    things can still get work done. In other words you want to throttle it.
    Typical throttling mechanisms include the "ionice" and "renice" commands to
    change I/O and CPU priority, respectively.

    Note that you may need to change the priority of the *backend* that
    pg_basebackup is using, not necessarily the pg_basebackup command its self.
    I haven't done enough with Pg's replication to know how that works, so
    someone else will have to fill that bit in.
    Thanks for your reply.  I've confirmed that issuing a SIGSTOP does
    eliminate the thrashing, and issuing a SIGCONT resumes the thrash.

    I've looked at iostat output both before & during pg_basebackup runs,
    and I'm not seeing any indication that the problem is due to disk IO
    bottlenecks.  The numbers don't vary very much at all between the good
    & bad times.  This is typical when pg_basebackup is running:
    ########
    Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
    avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
    md0
    0.00     0.00   67.76   68.62     4.42     1.46
    88.34     0.00    0.00    0.00    0.00   0.00   0.00
    ########

    and this is when the system is ok:
    ########
    Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
    avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
    md0
    0.00     0.00   68.04   68.56     4.44     1.46
    88.39     0.00    0.00    0.00    0.00   0.00   0.00
    ########


    I looked at vmstat output, but nothing is jumping out at me as being
    dramatically different when pg_basebackup is running.  swap in and
    swap out are zero 100% of the time for the good & bad perf cases.  I
    can post example output if someone is interested, or if there's
    something specific that I should be looking at as a potential problem,
    let me know.
    Did you set synchronous_standby_names to '*'? If so, the problem you
    encountered can happen.

    When synchronous_standby_names is '*', you cannot control which
    standbys take a role of synchronous standby. The standby which you
    expect to run as asynchronous one might be synchronous one. So
    my guess is that at first one of your three standbys was running as
    synchronous standby, and all queries were executed normally. But
    when you started pg_basebackup, pg_basebackup unexpectedly
    got the role of synchronous standby from another standby. Since
    pg_basebackup doesn't send the information about replication
    progress back to the master, all queries (more precisely, transaction
    commit) got stuck, and kept waiting for the reply from synchronous
    standby.

    You can avoid this problem by setting synchronous_standby_names
    to the names of your standbys instead of '*'.
    I don't have synchronous_standby_names set at all.  I'm only doing
    asynchronous replication.
    Hmm... I have no idea about what happened on your environment, for now.
    Could you show me the self-contained test case?
    I'm running the following, which gets piped over ssh to a remote
    server (at gigabit ethernet speed):
    pg_basebackup -v -D - -x -Ft -U postgres

    One thing that I've discovered is that if I throttle back the speed of
    what is getting piped to the remote server, that directly correlates
    to the load on the server.
  • Magnus Hagander at Jun 12, 2012 at 6:39 pm

    On Tue, Jun 12, 2012 at 8:37 PM, Lonni J Friedman wrote:
    On Tue, Jun 12, 2012 at 10:49 AM, Fujii Masao wrote:
    On Tue, Jun 12, 2012 at 2:37 AM, Lonni J Friedman wrote:
    On Fri, Jun 8, 2012 at 7:29 PM, Fujii Masao wrote:
    On Sat, Jun 9, 2012 at 4:30 AM, Lonni J Friedman wrote:
    On Thu, Jun 7, 2012 at 11:04 PM, Craig Ringer wrote:
    On 06/08/2012 09:01 AM, Lonni J Friedman wrote:

    On Thu, Jun 7, 2012 at 5:07 PM, Jerry Sievers<gsievers19@comcast.net>
    wrote:
    You might try stopping pg_basebackup in place with SIGSTOP and check

    if problem goes away.  SIGCONT and you should  start having
    sluggishness again.

    If verified, then any sort of throttling mechanism should work.

    I'm certain that the problem is triggered only when pg_basebackup is
    running.  Its very predictable, and goes away as soon as pg_basebackup
    finishes running.  What do you mean by a throttling mechanism?

    Sure, it only happens when pg_basebackup is running. But if you *pause*
    pg_basebackup, so it's still running but not currently doing work, does the
    problem go away? Does it come back when you unpause pg_basebackup? That's
    what Jerry was telling you to try.

    If the problem goes away when you pause pg_basebackup and comes back when
    you unpause it, it's probably a system load problem.

    If it doesn't go away, it's more likely to be a locking issue or something
    _other_ than simple load.

    SIGSTOP ("kill -STOP") pauses a process, and SIGCONT ("kill -CONT") resumes
    it, so on Linux you can use these to try and find out. When you SIGSTOP
    pg_basebackup then the postgres backend associated with it should block
    shortly afterwards as its buffers fill up and it can't send more data, so
    the load should come off the server.

    A "throttling mechanism" refers to anything that limits the rate or speed of
    a thing. In this case, what you want to do if your problem is system
    overload is to limit the speed at which pg_basebackup does its work so other
    things can still get work done. In other words you want to throttle it.
    Typical throttling mechanisms include the "ionice" and "renice" commands to
    change I/O and CPU priority, respectively.

    Note that you may need to change the priority of the *backend* that
    pg_basebackup is using, not necessarily the pg_basebackup command its self.
    I haven't done enough with Pg's replication to know how that works, so
    someone else will have to fill that bit in.
    Thanks for your reply.  I've confirmed that issuing a SIGSTOP does
    eliminate the thrashing, and issuing a SIGCONT resumes the thrash.

    I've looked at iostat output both before & during pg_basebackup runs,
    and I'm not seeing any indication that the problem is due to disk IO
    bottlenecks.  The numbers don't vary very much at all between the good
    & bad times.  This is typical when pg_basebackup is running:
    ########
    Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
    avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
    md0
    0.00     0.00   67.76   68.62     4.42     1.46
    88.34     0.00    0.00    0.00    0.00   0.00   0.00
    ########

    and this is when the system is ok:
    ########
    Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
    avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
    md0
    0.00     0.00   68.04   68.56     4.44     1.46
    88.39     0.00    0.00    0.00    0.00   0.00   0.00
    ########


    I looked at vmstat output, but nothing is jumping out at me as being
    dramatically different when pg_basebackup is running.  swap in and
    swap out are zero 100% of the time for the good & bad perf cases.  I
    can post example output if someone is interested, or if there's
    something specific that I should be looking at as a potential problem,
    let me know.
    Did you set synchronous_standby_names to '*'? If so, the problem you
    encountered can happen.

    When synchronous_standby_names is '*', you cannot control which
    standbys take a role of synchronous standby. The standby which you
    expect to run as asynchronous one might be synchronous one. So
    my guess is that at first one of your three standbys was running as
    synchronous standby, and all queries were executed normally. But
    when you started pg_basebackup, pg_basebackup unexpectedly
    got the role of synchronous standby from another standby. Since
    pg_basebackup doesn't send the information about replication
    progress back to the master, all queries (more precisely, transaction
    commit) got stuck, and kept waiting for the reply from synchronous
    standby.

    You can avoid this problem by setting synchronous_standby_names
    to the names of your standbys instead of '*'.
    I don't have synchronous_standby_names set at all.  I'm only doing
    asynchronous replication.
    Hmm... I have no idea about what happened on your environment, for now.
    Could you show me the self-contained test case?
    I'm running the following, which gets piped over ssh to a remote
    server (at gigabit ethernet speed):
    pg_basebackup -v -D - -x -Ft -U postgres

    One thing that I've discovered is that if I throttle back the speed of
    what is getting piped to the remote server, that directly correlates
    to the load on the server.
    That seems to indicate that you're overloading the I/O system... Or
    the CPU, but more likely I/O.
  • Simon Riggs at Jun 20, 2012 at 12:03 pm

    On 13 June 2012 02:39, Magnus Hagander wrote:

    I'm running the following, which gets piped over ssh to a remote
    server (at gigabit ethernet speed):
    pg_basebackup -v -D - -x -Ft -U postgres

    One thing that I've discovered is that if I throttle back the speed of
    what is getting piped to the remote server, that directly correlates
    to the load on the server.
    That seems to indicate that you're overloading the I/O system... Or
    the CPU, but more likely I/O.
    CPU utilisation of ssl connections is bad. If network bandwidth is
    good, perhaps running WALSender at full speed with encryption can tank
    the server.

    An effect related to cacheing of WAL files? Perhaps we need to mark
    them as FADV_DONTNEED at some point.

    Hard to say without detailed analysis.

    --
    Simon Riggs                   http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
  • Lonni J Friedman at Jun 20, 2012 at 1:27 pm

    On Wed, Jun 20, 2012 at 5:02 AM, Simon Riggs wrote:
    On 13 June 2012 02:39, Magnus Hagander wrote:

    I'm running the following, which gets piped over ssh to a remote
    server (at gigabit ethernet speed):
    pg_basebackup -v -D - -x -Ft -U postgres

    One thing that I've discovered is that if I throttle back the speed of
    what is getting piped to the remote server, that directly correlates
    to the load on the server.
    That seems to indicate that you're overloading the I/O system... Or
    the CPU, but more likely I/O.
    CPU utilisation of ssl connections is bad. If network bandwidth is
    good, perhaps running WALSender at full speed with encryption can tank
    the server.
    I'm not using SSL.
  • Welty, Richard at Jun 20, 2012 at 1:58 pm
    Lonni J Friedman writes:
    I'm not using SSL.
    ummm, ssh uses ssl.

    richard
  • Lonni J Friedman at Jun 20, 2012 at 2:06 pm

    On Wed, Jun 20, 2012 at 6:56 AM, Welty, Richard wrote:
    Lonni J Friedman writes:
    I'm not using SSL.
    ummm, ssh uses ssl.
    Sure, although I thought that Simon was referring to the database
    itself. However, I don't think ssh is the problem, as I can scp a
    file from the server and the load doesn't go crazy, nor does the
    database ground to a halt.
  • Scott Marlowe at Jun 9, 2012 at 6:53 am

    On Fri, Jun 8, 2012 at 1:30 PM, Lonni J Friedman wrote:
    I've looked at iostat output both before & during pg_basebackup runs,
    and I'm not seeing any indication that the problem is due to disk IO
    bottlenecks.  The numbers don't vary very much at all between the good
    & bad times.  This is typical when pg_basebackup is running:
    ########
    Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
    avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
    md0
    0.00     0.00   67.76   68.62     4.42     1.46
    88.34     0.00    0.00    0.00    0.00   0.00   0.00
    ########

    and this is when the system is ok:
    ########
    Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
    avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
    md0
    0.00     0.00   68.04   68.56     4.44     1.46
    88.39     0.00    0.00    0.00    0.00   0.00   0.00
    ########
    Two points. 1: md0 don't show things like %util, only the physical
    drives will have that output, which is what you want to see, if it's
    hopping up to 100%. 2: you need to run it with a number and get
    something AFTER the first line, which is the average since the machine
    was first turned on.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-admin @
categoriespostgresql
postedJun 7, '12 at 5:41p
activeJun 20, '12 at 2:06p
posts29
users10
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase