FAQ

[PostgreSQL] Hope for a new PostgreSQL era?

Rodrigo E. De León Plicet
Dec 8, 2011 at 3:11 am
http://www.dbms2.com/2011/11/23/hope-for-a-new-postgresql-era/

Some of the points mentioned:
- MySQL is narrowing the gap, but PostgreSQL is still ahead of MySQL
in some ways. (Database extensibility if nothing else.)
- Neither EnterpriseDB (which now calls itself “The enterprise
PostgreSQL
company”) nor the PostgreSQL community leadership have covered
themselves
with stewardship glory.
- PostgreSQL advancement is not dead.

Comments?
reply

Search Discussions

58 responses

  • Scott Marlowe at Dec 8, 2011 at 4:09 am

    On Wed, Dec 7, 2011 at 7:52 PM, Rodrigo E. De León Plicet wrote:
    http://www.dbms2.com/2011/11/23/hope-for-a-new-postgresql-era/

    Some of the points mentioned:
    - MySQL is narrowing the gap, but PostgreSQL is still ahead of MySQL
    in some ways.  (Database extensibility if nothing else.)
    - Neither EnterpriseDB (which now calls itself “The enterprise
    PostgreSQL
    company”) nor the PostgreSQL community leadership have covered
    themselves
    with stewardship glory.
    - PostgreSQL advancement is not dead.
    I'd like to see the author's thoughts filled out on these points.
    they seem rather vague and overly simplistic, and I wonder what
    specific points he might have to make rather than this vague "hand
    wavy" list he has so far.
  • Chris Travers at Dec 8, 2011 at 5:26 am

    On Wed, Dec 7, 2011 at 8:08 PM, Scott Marlowe wrote:
    I'd like to see the author's thoughts filled out on these points.
    they seem rather vague and overly simplistic, and  I wonder what
    specific points he might have to make rather than this vague "hand
    wavy" list he has so far.
    Additionally I am not entirely sure what he means by the last point.
    If you look at the work that NTT along with EDB has put into
    Postgres-XC, for example, it looks to me like the Postgres ecosystem
    is growing by leaps and bounds and we are approaching an era where
    Oracle is no longer ahead in any significant use case.

    The thing I am personally worried about is the ability of one company
    to dominate the framing of PostgreSQL service offerings. For example
    while in the US it hasn't caught on, a lot of people at MYGOSSCON
    accepted EnterpriseDB's framing of the official PostgreSQL release as
    the "community edition." If you have a single vendor which dominates
    the dialogue that's a bad thing. To be clear this isn't a criticism
    of EDB. I greatly appreciate the substantial effort they have put
    into building Pg awareness here in SE Asia. However, it is a caution
    about the recommendation that we need a corporate steward. I argue
    corporate stewardship would be a strong net negative because it would
    be first and foremost a way to crowd everyone else out. We have
    stewardship. It's the core committee, and it's the best kind of
    stewardship we can have.

    Here's a useful post that I was forwarded by another LSMB developer.
    http://openlife.cc/blogs/2010/november/how-grow-your-open-source-project-10x-and-revenues-5x

    Additionally, I would suggest that PostgreSQL has a lot of users
    because we have a great---and open---community. I think a new
    PostgreSQL era is coming but I don't think it will happen the way that
    blog poster implies. There is a tremendous need for Pg skills in SE
    Asia right now, and I expect this to continue to grow exponentially.

    PostgreSQL advancement also by my view is also not merely "not dead"
    but in fact accelerating.

    Best Wishes,
    Chris Travers
  • Craig Ringer at Dec 8, 2011 at 11:24 am

    On Dec 8, 2011 1:27 PM, "Chris Travers" wrote:
    Additionally I am not entirely sure what he means by the last point.
    If you look at the work that NTT along with EDB has put into
    Postgres-XC, for example, it looks to me like the Postgres ecosystem
    is growing by leaps and bounds and we are approaching an era where
    Oracle is no longer ahead in any significant use case.
    While Pg is impressively capable now, I don't agree that Oracle (if DB2,
    MS-SQL etc) isn't ahead for any significant use case. Not on a purely
    technical basis anyway - once cost is considered there may be a stronger
    argument.

    Multi-tenant hosting is a weak pint for Pg for quite a few reasons, done of
    which appear below. It's not the only role Pg isn't a great fit for, but
    probably one of the more obvious.

    Areas in which Pg seems significantly less capable include:

    - multi-tenant hosting and row level security

    - admission control, queuing and resource limiting to optimally load a
    machine. Some limited level is possible with external pooling, but only by
    limiting concurrent workers.

    - performance monitoring and diagnostics. It's way harder to find out
    what's causing load on a busy Pg server or report on frequent/expensive
    queries etc. Tooling is limited and fairly primitive. It's find, but
    nowhere near as powerful and easy as some if the other DBs.

    - prioritisation of queries or users. It's hard to say "prefer this query
    over this one, give it more resources" or "user A's work always preempts
    user B's" in Pg.

    - transparent failover and recovery back to the original master.

    - shared-storage clustering. Dunno if anyone still cares about this one
    though.
  • Simon Riggs at Dec 8, 2011 at 12:27 pm

    On Thu, Dec 8, 2011 at 11:24 AM, Craig Ringer wrote:

    Areas in which Pg seems significantly less capable include:
    Please can you explain the features Oracle has in these area, I'm not
    clear. Thanks.

    - admission control, queuing and resource limiting to optimally load a
    machine. Some limited level is possible with external pooling, but only by
    limiting concurrent workers.
    - prioritisation of queries or users. It's hard to say "prefer this query
    over this one, give it more resources" or "user A's work always preempts
    user B's" in Pg.

    --
     Simon Riggs                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
  • Marc Cousin at Dec 8, 2011 at 3:12 pm
    Le Thu, 8 Dec 2011 12:27:22 +0000,
    Simon Riggs <simon@2ndQuadrant.com> a écrit :
    On Thu, Dec 8, 2011 at 11:24 AM, Craig Ringer wrote:

    Areas in which Pg seems significantly less capable include:
    Please can you explain the features Oracle has in these area, I'm not
    clear. Thanks.
    Maybe I can answer from my own Oracle experience. I hope it will be what
    Craig had in mind :)
    - admission control, queuing and resource limiting to optimally
    load a machine. Some limited level is possible with external
    pooling, but only by limiting concurrent workers.
    Oracle has natively two ways of handling inbound connections:
    - Dedicated, which is very similar to the PostgreSQL way of accepting
    connections: accept(), fork() and so on
    - Shared, which is based on processes listening and handling the
    connections (called dispatchers) and processes doing the real work
    (called workers, obviously). All of this works internally with
    some sort of queuing and storing results in shared memory (I don't
    remember the details of it)

    The advantage of this second architecture being of course that you
    can't have more than N workers hitting your database simultaneously. So
    it's easier to keep the load on the server to a reasonable value.
    - prioritisation of queries or users. It's hard to say "prefer this
    query over this one, give it more resources" or "user A's work
    always preempts user B's" in Pg.
    It's called the resource manager in Oracle. You define 'resource plans',
    'consumer groups', etc… and you get some sort of QoS for your queries.
    It's mostly about CPU resource allocation if I remember correctly (I
    never used it, except during training :) )

    Being able of changing the backend's nice level may do something
    similar I guess. I don't think Oracle's resource manager solves
    the priority inversion due to locking in the database, but I'm not sure
    of it.
  • Merlin Moncure at Dec 8, 2011 at 3:29 pm

    On Thu, Dec 8, 2011 at 9:11 AM, Marc Cousin wrote:
    Le Thu, 8 Dec 2011 12:27:22 +0000,
    Simon Riggs <simon@2ndQuadrant.com> a écrit :
    On Thu, Dec 8, 2011 at 11:24 AM, Craig Ringer <ringerc@ringerc.id.au>
    wrote:
    Areas in which Pg seems significantly less capable include:
    Please can you explain the features Oracle has in these area, I'm not
    clear. Thanks.
    Maybe I can answer from my own Oracle experience. I hope it will be what
    Craig had in mind :)
    - admission control, queuing and resource limiting to optimally
    load a machine. Some limited level is possible with external
    pooling, but only by limiting concurrent workers.
    Oracle has natively two ways of handling inbound connections:
    - Dedicated, which is very similar to the PostgreSQL way of accepting
     connections: accept(), fork() and so on
    - Shared, which is based on processes listening and handling the
     connections (called dispatchers) and processes doing the real work
     (called workers, obviously). All of this works internally with
     some sort of queuing and storing results in shared memory (I don't
     remember the details of it)

    The advantage of this second architecture being of course that you
    can't have more than N workers hitting your database simultaneously. So
    it's easier to keep the load on the server to a reasonable value.
    you have a couple of very good options to achieve the same in postgres
    -- pgbouncer, pgpool.

    merlin
  • Andrew Sullivan at Dec 8, 2011 at 4:04 pm

    On Thu, Dec 08, 2011 at 09:29:28AM -0600, Merlin Moncure wrote:

    you have a couple of very good options to achieve the same in postgres
    -- pgbouncer, pgpool.
    One of the central issues that Postgres has in the "enterprise" land
    is exactly this sort of answer: "you have good options" but they're
    "not part of the core release".

    Now, we all know that this is a stupid and wrong way of thinking about
    it. But one cannot complain about being held to those sorts of
    enterprisey standards when one is having an enterprisey discussion.
    The original analysis (on the blog) seemed to be primarily aimed at
    exactly that sort of discussion, and I suspect that this is the kind
    of thing that was meant by the "community leadership" not having
    covered itself in stewardship glory. One of the "stewardship" tests,
    from a business-analysis point of view, is whether you're going to be
    able to find a reliable supply of experienced admins at all levels for
    your systems.

    Having a bunch of different, indifferently-documented projects that
    are all doing similar but slightly different things is, to someone
    looking from that point of view, a liability and not a strength. I
    happen to disagree, but it always seemed to me that something the
    Postgres community did poorly (and I count myself in that number,
    though less now than in the past) was understanding the hardships of
    the integrator and coming up with reasonably simple answers for those
    kinds of questions. It is not unreasonable to say that there are no
    simple answers here; but as unhappy as it makes me, those reasonably
    simple answers are necessary for some classes of users.

    And let's face it: companies like Oracle (and products like MySQL) are
    in a position to treat those sorts of answers as part of the cost of
    doing business, because they have revenue associated with their
    licenses so they can pay for coming up with those answers that way.
    In Postgres-land, everyone needs to charge money for those answers
    (i.e. be consultants), because that's the only real place to make a
    living. Alternatively, you can put together those answers as part of
    your own package; but in that case, it's not "the core PostgreSQL
    product", but something else.

    In this respect, the decision of the core team a number of years ago
    to say, "We're going to have 'integrated' replication that does x, y,
    and z," was the right one, despite the fact that it undermined the
    momentum of other interesting projects (and ones better suited to some
    environments). Sometimes, it's better to cut off options.

    Best,

    A

    --
    Andrew Sullivan
    ajs@crankycanuck.ca
  • Marc Cousin at Dec 8, 2011 at 4:11 pm
    Le Thu, 8 Dec 2011 09:29:28 -0600,
    Merlin Moncure <mmoncure@gmail.com> a écrit :
    On Thu, Dec 8, 2011 at 9:11 AM, Marc Cousin wrote:
    Le Thu, 8 Dec 2011 12:27:22 +0000,
    Simon Riggs <simon@2ndQuadrant.com> a écrit :
    On Thu, Dec 8, 2011 at 11:24 AM, Craig Ringer
    wrote:
    Areas in which Pg seems significantly less capable include:
    Please can you explain the features Oracle has in these area, I'm
    not clear. Thanks.
    Maybe I can answer from my own Oracle experience. I hope it will be
    what Craig had in mind :)
    - admission control, queuing and resource limiting to optimally
    load a machine. Some limited level is possible with external
    pooling, but only by limiting concurrent workers.
    Oracle has natively two ways of handling inbound connections:
    - Dedicated, which is very similar to the PostgreSQL way of
    accepting connections: accept(), fork() and so on
    - Shared, which is based on processes listening and handling the
    connections (called dispatchers) and processes doing the real work
    (called workers, obviously). All of this works internally with
    some sort of queuing and storing results in shared memory (I don't
    remember the details of it)

    The advantage of this second architecture being of course that you
    can't have more than N workers hitting your database
    simultaneously. So it's easier to keep the load on the server to a
    reasonable value.
    you have a couple of very good options to achieve the same in postgres
    -- pgbouncer, pgpool.
    I wish it was the same (I use and like both pgbouncer and pgpool too,
    and they do a good job, I'm not arguing on that). But unfortunately it
    isn't: you still have the notion of session for each connected client
    in Oracle when using the shared servers model.

    It means you keep your session variables, your prepared statements,
    your running transaction, etc… in each individual session while having
    the multiplexing equivalent of a 'statement level' from pgbouncer.
  • Nicholson, Brad (Toronto, ON, CA) at Dec 8, 2011 at 4:29 pm
    On Thu, Dec 8, 2011 at 9:11 AM, Marc Cousin <cousinmarc@gmail.com>
    wrote:
    I wish it was the same (I use and like both pgbouncer and pgpool too,
    and they do a good job, I'm not arguing on that). But unfortunately it
    isn't: you still have the notion of session for each connected client
    in Oracle when using the shared servers model.

    It means you keep your session variables, your prepared statements,
    your running transaction, etc… in each individual session while having
    the multiplexing equivalent of a 'statement level' from pgbouncer.
    In Oracle - can the pool share connections between DB users and/or databases on the instance? If the answer is yes to either, that is a fair bit better than what we can achieve today.

    Brad.
  • Marc Cousin at Dec 8, 2011 at 5:01 pm
    Le Thu, 8 Dec 2011 16:27:56 +0000,
    "Nicholson, Brad (Toronto, ON, CA)" <bnicholson@hp.com> a écrit :
    On Thu, Dec 8, 2011 at 9:11 AM, Marc Cousin <cousinmarc@gmail.com>
    wrote:
    I wish it was the same (I use and like both pgbouncer and pgpool
    too, and they do a good job, I'm not arguing on that). But
    unfortunately it isn't: you still have the notion of session for
    each connected client in Oracle when using the shared servers model.

    It means you keep your session variables, your prepared statements,
    your running transaction, etc… in each individual session while
    having the multiplexing equivalent of a 'statement level' from
    pgbouncer.
    In Oracle - can the pool share connections between DB users and/or
    databases on the instance? If the answer is yes to either, that is a
    fair bit better than what we can achieve today.
    Between users yes. But there is only one DB per instance in Oracle :)
  • Tomas Vondra at Dec 8, 2011 at 5:11 pm

    On 8 Prosinec 2011, 18:00, Marc Cousin wrote:
    Le Thu, 8 Dec 2011 16:27:56 +0000,
    "Nicholson, Brad (Toronto, ON, CA)" <bnicholson@hp.com> a écrit :
    On Thu, Dec 8, 2011 at 9:11 AM, Marc Cousin <cousinmarc@gmail.com>
    wrote:
    I wish it was the same (I use and like both pgbouncer and pgpool
    too, and they do a good job, I'm not arguing on that). But
    unfortunately it isn't: you still have the notion of session for
    each connected client in Oracle when using the shared servers model.

    It means you keep your session variables, your prepared statements,
    your running transaction, etc… in each individual session while
    having the multiplexing equivalent of a 'statement level' from
    pgbouncer.
    In Oracle - can the pool share connections between DB users and/or
    databases on the instance? If the answer is yes to either, that is a
    fair bit better than what we can achieve today.
    Between users yes. But there is only one DB per instance in Oracle :)
    Because Oracle uses schemas instead of databases. One schema = one user =
    one database.

    Tomas
  • Tomas Vondra at Dec 8, 2011 at 5:08 pm

    On 8 Prosinec 2011, 17:27, Nicholson, Brad (Toronto, ON, CA) wrote:
    On Thu, Dec 8, 2011 at 9:11 AM, Marc Cousin <cousinmarc@gmail.com>
    wrote:
    I wish it was the same (I use and like both pgbouncer and pgpool too,
    and they do a good job, I'm not arguing on that). But unfortunately it
    isn't: you still have the notion of session for each connected client
    in Oracle when using the shared servers model.

    It means you keep your session variables, your prepared statements,
    your running transaction, etc… in each individual session while having
    the multiplexing equivalent of a 'statement level' from pgbouncer.
    In Oracle - can the pool share connections between DB users and/or
    databases on the instance? If the answer is yes to either, that is a fair
    bit better than what we can achieve today.
    Yes, each session has a UGA (User Global Area) memory, and this iss placed
    either in SGA (Shared Global Area) in case of "dedicated server" or PGA
    (Process Global Are) in case of "shared server." If you use shared server,
    then each request might be handled by a different backend process.

    PostgreSQL uses dedicated architecture which means simpler code base,
    configuration etc. If you need something like a "shared server" then you
    can use a connection pool, but you have to handle the session state on
    your own (in the application).

    Tomas
  • Merlin Moncure at Dec 8, 2011 at 4:56 pm

    On Thu, Dec 8, 2011 at 10:11 AM, Marc Cousin wrote:
    Le Thu, 8 Dec 2011 09:29:28 -0600,
    Merlin Moncure <mmoncure@gmail.com> a écrit :
    On Thu, Dec 8, 2011 at 9:11 AM, Marc Cousin <cousinmarc@gmail.com>
    wrote:
    Le Thu, 8 Dec 2011 12:27:22 +0000,
    Simon Riggs <simon@2ndQuadrant.com> a écrit :
    On Thu, Dec 8, 2011 at 11:24 AM, Craig Ringer
    wrote:
    Areas in which Pg seems significantly less capable include:
    Please can you explain the features Oracle has in these area, I'm
    not clear. Thanks.
    Maybe I can answer from my own Oracle experience. I hope it will be
    what Craig had in mind :)
    - admission control, queuing and resource limiting to optimally
    load a machine. Some limited level is possible with external
    pooling, but only by limiting concurrent workers.
    Oracle has natively two ways of handling inbound connections:
    - Dedicated, which is very similar to the PostgreSQL way of
    accepting connections: accept(), fork() and so on
    - Shared, which is based on processes listening and handling the
    connections (called dispatchers) and processes doing the real work
    (called workers, obviously). All of this works internally with
    some sort of queuing and storing results in shared memory (I don't
    remember the details of it)

    The advantage of this second architecture being of course that you
    can't have more than N workers hitting your database
    simultaneously. So it's easier to keep the load on the server to a
    reasonable value.
    you have a couple of very good options to achieve the same in postgres
    -- pgbouncer, pgpool.
    I wish it was the same (I use and like both pgbouncer and pgpool too,
    and they do a good job, I'm not arguing on that). But unfortunately it
    isn't: you still have the notion of session for each connected client
    in Oracle when using the shared servers model.

    It means you keep your session variables, your prepared statements,
    your running transaction, etc… in each individual session while having
    the multiplexing equivalent of a 'statement level' from pgbouncer.
    yeah -- maybe we could use a server side feature that could allow you
    to save a session state and load it up later to make life easier for
    connection pooled applications. however, it's not really that much
    work to organize most of the things you'd use for this in an
    application managed session instead of database managed one.

    regarding the "enterprises won't use community supplied postgresql add
    ons" point, this completely true in many cases. I do think pgbouncer
    should be seriously considered for advancement as a core feature. That
    said, this should be done on its own merits, not to satisfy the
    capricious whims of enterprises.

    merlin
  • Marc Cousin at Dec 8, 2011 at 5:17 pm
    Le Thu, 8 Dec 2011 10:56:39 -0600,
    Merlin Moncure <mmoncure@gmail.com> a écrit :
    On Thu, Dec 8, 2011 at 10:11 AM, Marc Cousin wrote:
    Le Thu, 8 Dec 2011 09:29:28 -0600,
    Merlin Moncure <mmoncure@gmail.com> a écrit :
    On Thu, Dec 8, 2011 at 9:11 AM, Marc Cousin <cousinmarc@gmail.com>
    wrote:
    Le Thu, 8 Dec 2011 12:27:22 +0000,
    Simon Riggs <simon@2ndQuadrant.com> a écrit :
    On Thu, Dec 8, 2011 at 11:24 AM, Craig Ringer
    wrote:
    Areas in which Pg seems significantly less capable include:
    Please can you explain the features Oracle has in these area,
    I'm not clear. Thanks.
    Maybe I can answer from my own Oracle experience. I hope it will
    be what Craig had in mind :)
    - admission control, queuing and resource limiting to
    optimally load a machine. Some limited level is possible with
    external pooling, but only by limiting concurrent workers.
    Oracle has natively two ways of handling inbound connections:
    - Dedicated, which is very similar to the PostgreSQL way of
    accepting connections: accept(), fork() and so on
    - Shared, which is based on processes listening and handling the
    connections (called dispatchers) and processes doing the real
    work (called workers, obviously). All of this works internally
    with some sort of queuing and storing results in shared memory
    (I don't remember the details of it)

    The advantage of this second architecture being of course that
    you can't have more than N workers hitting your database
    simultaneously. So it's easier to keep the load on the server to
    a reasonable value.
    you have a couple of very good options to achieve the same in
    postgres -- pgbouncer, pgpool.
    I wish it was the same (I use and like both pgbouncer and pgpool
    too, and they do a good job, I'm not arguing on that). But
    unfortunately it isn't: you still have the notion of session for
    each connected client in Oracle when using the shared servers model.

    It means you keep your session variables, your prepared statements,
    your running transaction, etc… in each individual session while
    having the multiplexing equivalent of a 'statement level' from
    pgbouncer.
    yeah -- maybe we could use a server side feature that could allow you
    to save a session state and load it up later to make life easier for
    connection pooled applications. however, it's not really that much
    work to organize most of the things you'd use for this in an
    application managed session instead of database managed one.
    For us who can change our application code, of course. But some people
    can't.
    regarding the "enterprises won't use community supplied postgresql add
    ons" point, this completely true in many cases. I do think pgbouncer
    should be seriously considered for advancement as a core feature. That
    said, this should be done on its own merits, not to satisfy the
    capricious whims of enterprises.
    Sure. I'm not advocating this. Neither am I advocating using Oracle by
    the way. I'm just as pro-postgresql as anyone else. I just wanted to
    weigh the pros and cons of Oracle's way of doing things versus
    pgbouncer. And the shared server approach has its merits.
  • Tomas Vondra at Dec 8, 2011 at 5:23 pm

    On 8 Prosinec 2011, 17:56, Merlin Moncure wrote:
    regarding the "enterprises won't use community supplied postgresql add
    ons" point, this completely true in many cases. I do think pgbouncer
    should be seriously considered for advancement as a core feature. That
    said, this should be done on its own merits, not to satisfy the
    capricious whims of enterprises.
    I don't think so. In my experience, the enterprises are not using
    PostgreSQL (or other OS software in genera) because they think there's no
    guarantee or support available. If there's a third party (might be a local
    consulting company) providing acceptable guarantees and support for
    PostgreSQL, it may as well provide guarantees for pgbouncer and the
    enterprise customer is fine.

    They simply want a package with guarantees, it does not matter whether
    it's in core or not.

    Tomas
  • Tomas Vondra at Dec 8, 2011 at 5:03 pm

    On 8 Prosinec 2011, 17:11, Marc Cousin wrote:
    Le Thu, 8 Dec 2011 09:29:28 -0600,
    - admission control, queuing and resource limiting to optimally
    load a machine. Some limited level is possible with external
    pooling, but only by limiting concurrent workers.
    Oracle has natively two ways of handling inbound connections:
    - Dedicated, which is very similar to the PostgreSQL way of
    accepting connections: accept(), fork() and so on
    - Shared, which is based on processes listening and handling the
    connections (called dispatchers) and processes doing the real work
    (called workers, obviously). All of this works internally with
    some sort of queuing and storing results in shared memory (I don't
    remember the details of it)

    The advantage of this second architecture being of course that you
    can't have more than N workers hitting your database
    simultaneously. So it's easier to keep the load on the server to a
    reasonable value.
    you have a couple of very good options to achieve the same in postgres
    -- pgbouncer, pgpool.
    I wish it was the same (I use and like both pgbouncer and pgpool too,
    and they do a good job, I'm not arguing on that). But unfortunately it
    isn't: you still have the notion of session for each connected client
    in Oracle when using the shared servers model.
    True, it is not exactly the same, it's similar. I don't think PostgreSQL
    will ever provide 'shared backends' the way Oracle does - it would require
    significant code change. Most of the benefits can be achieved by using a
    connection pool without the added complexity.
    It means you keep your session variables, your prepared statements,
    your running transaction, etc… in each individual session while having
    the multiplexing equivalent of a 'statement level' from pgbouncer.
    Yes. But if you expect that PostgreSQL will mimic Oracle architecture,
    then you're mistaken. It's simply a different solution with different
    architecture, and that means you may need to use different application
    design sometimes.

    Tomas
  • Marc Cousin at Dec 8, 2011 at 5:14 pm
    Le Thu, 8 Dec 2011 18:02:51 +0100,
    "Tomas Vondra" <tv@fuzzy.cz> a écrit :
    On 8 Prosinec 2011, 17:11, Marc Cousin wrote:
    Le Thu, 8 Dec 2011 09:29:28 -0600,
    - admission control, queuing and resource limiting to
    optimally load a machine. Some limited level is possible with
    external pooling, but only by limiting concurrent workers.
    Oracle has natively two ways of handling inbound connections:
    - Dedicated, which is very similar to the PostgreSQL way of
    accepting connections: accept(), fork() and so on
    - Shared, which is based on processes listening and handling the
    connections (called dispatchers) and processes doing the real
    work (called workers, obviously). All of this works internally
    with some sort of queuing and storing results in shared memory
    (I don't remember the details of it)

    The advantage of this second architecture being of course that
    you can't have more than N workers hitting your database
    simultaneously. So it's easier to keep the load on the server to
    a reasonable value.
    you have a couple of very good options to achieve the same in
    postgres -- pgbouncer, pgpool.
    I wish it was the same (I use and like both pgbouncer and pgpool
    too, and they do a good job, I'm not arguing on that). But
    unfortunately it isn't: you still have the notion of session for
    each connected client in Oracle when using the shared servers model.
    True, it is not exactly the same, it's similar. I don't think
    PostgreSQL will ever provide 'shared backends' the way Oracle does -
    it would require significant code change. Most of the benefits can be
    achieved by using a connection pool without the added complexity.
    I didn't ask for it. It's just not the same, which was what I was
    answering to.
    It means you keep your session variables, your prepared statements,
    your running transaction, etc… in each individual session while
    having the multiplexing equivalent of a 'statement level' from
    pgbouncer.
    Yes. But if you expect that PostgreSQL will mimic Oracle architecture,
    then you're mistaken. It's simply a different solution with different
    architecture, and that means you may need to use different application
    design sometimes.
    I just don't understand what you're getting all heated up for. I don't
    want PostgreSQL to mimic Oracle, and I'm only answering to Simon's
    question at the begining.

    I'm just saying that there are differences between the pgbouncer approach
    and the shared server approach, and benefits to the later (and to the
    former too by the way, mainly simplicity so less bugs, as shared servers
    architecture suffered from a lot of bugs).
  • Tomas Vondra at Dec 8, 2011 at 5:35 pm

    On 8 Prosinec 2011, 18:14, Marc Cousin wrote:
    It means you keep your session variables, your prepared statements,
    your running transaction, etc… in each individual session while
    having the multiplexing equivalent of a 'statement level' from
    pgbouncer.
    Yes. But if you expect that PostgreSQL will mimic Oracle architecture,
    then you're mistaken. It's simply a different solution with different
    architecture, and that means you may need to use different application
    design sometimes.
    I just don't understand what you're getting all heated up for. I don't
    want PostgreSQL to mimic Oracle, and I'm only answering to Simon's
    question at the begining.
    Sorry, I was not arguing with you nor heated up. It's rather a translation
    mistake - in my native language we often use 'you' when we actually mean
    'someone.' So it should be something like "If someone expects ... he's
    mistaken," etc.

    Tomas
  • Marc Cousin at Dec 8, 2011 at 5:46 pm

    On Thursday 08 December 2011 18:34:35 Tomas Vondra wrote:
    On 8 Prosinec 2011, 18:14, Marc Cousin wrote:
    It means you keep your session variables, your prepared
    statements,
    your running transaction, etc… in each individual session while
    having the multiplexing equivalent of a 'statement level' from
    pgbouncer.
    Yes. But if you expect that PostgreSQL will mimic Oracle architecture,
    then you're mistaken. It's simply a different solution with different
    architecture, and that means you may need to use different application
    design sometimes.
    I just don't understand what you're getting all heated up for. I don't
    want PostgreSQL to mimic Oracle, and I'm only answering to Simon's
    question at the begining.
    Sorry, I was not arguing with you nor heated up. It's rather a translation
    mistake - in my native language we often use 'you' when we actually mean
    'someone.' So it should be something like "If someone expects ... he's
    mistaken," etc.
    Ok, no problem, that just felt weird :)
  • Craig Ringer at Dec 9, 2011 at 3:14 am

    On 12/09/2011 01:02 AM, Tomas Vondra wrote:
    On 8 Prosinec 2011, 17:11, Marc Cousin wrote:
    Le Thu, 8 Dec 2011 09:29:28 -0600,
    - admission control, queuing and resource limiting to optimally
    load a machine. Some limited level is possible with external
    pooling, but only by limiting concurrent workers.
    Oracle has natively two ways of handling inbound connections:
    - Dedicated, which is very similar to the PostgreSQL way of
    accepting connections: accept(), fork() and so on
    - Shared, which is based on processes listening and handling the
    connections (called dispatchers) and processes doing the real work
    (called workers, obviously). All of this works internally with
    some sort of queuing and storing results in shared memory (I don't
    remember the details of it)

    The advantage of this second architecture being of course that you
    can't have more than N workers hitting your database
    simultaneously. So it's easier to keep the load on the server to a
    reasonable value.
    you have a couple of very good options to achieve the same in postgres
    -- pgbouncer, pgpool.
    I wish it was the same (I use and like both pgbouncer and pgpool too,
    and they do a good job, I'm not arguing on that). But unfortunately it
    isn't: you still have the notion of session for each connected client
    in Oracle when using the shared servers model.
    True, it is not exactly the same, it's similar. I don't think PostgreSQL
    will ever provide 'shared backends' the way Oracle does - it would require
    significant code change. Most of the benefits can be achieved by using a
    connection pool without the added complexity.
    Yep - a connection pool that could save and restore session state,
    separating "executor/backend" from "connection/session", would produce
    much the same result with a lot less complexity. It's one of the reasons
    I'd love to see in-core pooling, because I don't see how an out-of-core
    solution can maintain session state like advisory locks, HOLD cursors, etc.

    --
    Craig Ringer
  • Tomas Vondra at Dec 8, 2011 at 4:54 pm

    On 8 Prosinec 2011, 16:11, Marc Cousin wrote:
    - admission control, queuing and resource limiting to optimally
    load a machine. Some limited level is possible with external
    pooling, but only by limiting concurrent workers.
    Oracle has natively two ways of handling inbound connections:
    - Dedicated, which is very similar to the PostgreSQL way of accepting
    connections: accept(), fork() and so on
    - Shared, which is based on processes listening and handling the
    connections (called dispatchers) and processes doing the real work
    (called workers, obviously). All of this works internally with
    some sort of queuing and storing results in shared memory (I don't
    remember the details of it)

    The advantage of this second architecture being of course that you
    can't have more than N workers hitting your database simultaneously. So
    it's easier to keep the load on the server to a reasonable value.
    Which is exactly what pgbouncer and other connection pools are for ...
    - prioritisation of queries or users. It's hard to say "prefer this
    query over this one, give it more resources" or "user A's work
    always preempts user B's" in Pg.
    It's called the resource manager in Oracle. You define 'resource plans',
    'consumer groups', etc… and you get some sort of QoS for your queries.
    It's mostly about CPU resource allocation if I remember correctly (I
    never used it, except during training :) )
    And it's damn difficult to get it working properly ... the simpler the
    better here.

    Tomas
  • Marc Cousin at Dec 8, 2011 at 5:07 pm
    Le Thu, 8 Dec 2011 17:54:20 +0100,
    "Tomas Vondra" <tv@fuzzy.cz> a écrit :
    On 8 Prosinec 2011, 16:11, Marc Cousin wrote:
    - admission control, queuing and resource limiting to optimally
    load a machine. Some limited level is possible with external
    pooling, but only by limiting concurrent workers.
    Oracle has natively two ways of handling inbound connections:
    - Dedicated, which is very similar to the PostgreSQL way of
    accepting connections: accept(), fork() and so on
    - Shared, which is based on processes listening and handling the
    connections (called dispatchers) and processes doing the real work
    (called workers, obviously). All of this works internally with
    some sort of queuing and storing results in shared memory (I don't
    remember the details of it)

    The advantage of this second architecture being of course that you
    can't have more than N workers hitting your database
    simultaneously. So it's easier to keep the load on the server to a
    reasonable value.
    Which is exactly what pgbouncer and other connection pools are for ...
    Yep. But with some limitations (not that important, but they exist) as
    detailed in another message.

    I like the pgbouncer approach as it is much simpler, but it has the
    limitation that the real sessions aren't in the database anymore, so
    context is lost, etc…
    - prioritisation of queries or users. It's hard to say "prefer
    this query over this one, give it more resources" or "user A's
    work always preempts user B's" in Pg.
    It's called the resource manager in Oracle. You define 'resource
    plans', 'consumer groups', etc… and you get some sort of QoS for
    your queries. It's mostly about CPU resource allocation if I
    remember correctly (I never used it, except during training :) )
    And it's damn difficult to get it working properly ... the simpler the
    better here.
    Yep, it's very hard and ugly to use. It's by the way the reason I used
    it only in training, not in production situations (in production, when
    it doesn't work, you have to debug the damn thing, not just throw it
    away :) )
  • Simon Riggs at Dec 9, 2011 at 10:51 am

    On Thu, Dec 8, 2011 at 3:11 PM, Marc Cousin wrote:
    Le Thu, 8 Dec 2011 12:27:22 +0000,
    Simon Riggs <simon@2ndQuadrant.com> a écrit :
    On Thu, Dec 8, 2011 at 11:24 AM, Craig Ringer <ringerc@ringerc.id.au>
    wrote:
    Areas in which Pg seems significantly less capable include:
    Please can you explain the features Oracle has in these area, I'm not
    clear. Thanks.
    Maybe I can answer from my own Oracle experience. I hope it will be what
    Craig had in mind :)
    - admission control, queuing and resource limiting to optimally
    load a machine. Some limited level is possible with external
    pooling, but only by limiting concurrent workers.
    Oracle has natively two ways of handling inbound connections:
    - Dedicated, which is very similar to the PostgreSQL way of accepting
    connections: accept(), fork() and so on
    - Shared, which is based on processes listening and handling the
    connections (called dispatchers) and processes doing the real work
    (called workers, obviously). All of this works internally with
    some sort of queuing and storing results in shared memory (I don't
    remember the details of it)
    The advantage of this second architecture being of course that you
    can't have more than N workers hitting your database simultaneously. So
    it's easier to keep the load on the server to a reasonable value.
    - prioritisation of queries or users. It's hard to say "prefer this
    query over this one, give it more resources" or "user A's work
    always preempts user B's" in Pg.
    It's called the resource manager in Oracle. You define 'resource plans',
    'consumer groups', etc… and you get some sort of QoS for your queries.
    It's mostly about CPU resource allocation if I remember correctly (I
    never used it, except during training :) )

    Being able of changing the backend's nice level may do something
    similar I guess. I don't think Oracle's resource manager solves
    the priority inversion due to locking in the database, but I'm not sure
    of it.
    Thanks, sounds interesting.

    --
    Simon Riggs                   http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
  • Craig Ringer at Dec 9, 2011 at 3:11 am

    On 12/08/2011 08:27 PM, Simon Riggs wrote:
    On Thu, Dec 8, 2011 at 11:24 AM, Craig Ringerwrote:
    Areas in which Pg seems significantly less capable include:
    Please can you explain the features Oracle has in these area, I'm not
    clear. Thanks.
    Marc has, as I was hoping, done so much better than I could. Most of
    what I know is 2nd hand from Oracle users - I'm not one myself.

    It's interesting to see the view that the resource manager for query and
    user prioritisation is hard to use in practice. That's not something I'd
    heard before, but I can't say I'm entirely surprised given how
    complicated problems around lock management and priority inversion are
    to get right even in a system where there *aren't* free-form dynamic
    user-defined queries running.

    --
    Craig Ringer
  • Marc Cousin at Dec 9, 2011 at 10:26 am
    Le Fri, 09 Dec 2011 11:11:12 +0800,
    Craig Ringer <ringerc@ringerc.id.au> a écrit :
    On 12/08/2011 08:27 PM, Simon Riggs wrote:
    On Thu, Dec 8, 2011 at 11:24 AM, Craig
    Ringerwrote:
    Areas in which Pg seems significantly less capable include:
    Please can you explain the features Oracle has in these area, I'm
    not clear. Thanks.
    Marc has, as I was hoping, done so much better than I could. Most of
    what I know is 2nd hand from Oracle users - I'm not one myself.

    It's interesting to see the view that the resource manager for query
    and user prioritisation is hard to use in practice. That's not
    something I'd heard before, but I can't say I'm entirely surprised
    given how complicated problems around lock management and priority
    inversion are to get right even in a system where there *aren't*
    free-form dynamic user-defined queries running.
    The complexity, at least for me, came from the user interface (at
    least a dozen of stored procedures with a complex syntax) to set up and
    monitor the resource manager.

    I don't think it manages the priority inversion problems, just CPU
    priorities. I asked the Oracle trainer, who wasn't sure either :)
  • Tomas Vondra at Dec 8, 2011 at 12:54 pm

    On 8 Prosinec 2011, 12:24, Craig Ringer wrote:
    - admission control, queuing and resource limiting to optimally load a
    machine. Some limited level is possible with external pooling, but only by
    limiting concurrent workers.
    The first thing I'd like to see is "user profiles" - being able to set
    things like work_mem, synchronous_commit, etc. on per-user basis
    separately.
    - performance monitoring and diagnostics. It's way harder to find out
    what's causing load on a busy Pg server or report on frequent/expensive
    queries etc. Tooling is limited and fairly primitive. It's find, but
    nowhere near as powerful and easy as some if the other DBs.
    True. Greg Smith actually mentioned this as one of the frequently asked
    features in his post about two weeks ago
    (http://blog.2ndquadrant.com/en/2011/11/global-trends-in-deploying-pos.html).
    I've started to build my own tool and got it somehow working for my needs,
    and there are other tools available, but none of them is really a complete
    solution. Would be nice to form a dev group that would work on this.
    - prioritisation of queries or users. It's hard to say "prefer this query
    over this one, give it more resources" or "user A's work always preempts
    user B's" in Pg.
    I wonder if the prioritisation could be done using nice - each backend is
    a separate process, so why not to do 'nice(10)' for low priority processes
    or something like that.
  • Craig Ringer at Dec 8, 2011 at 1:17 pm

    On 12/08/2011 08:53 PM, Tomas Vondra wrote:
    On 8 Prosinec 2011, 12:24, Craig Ringer wrote:
    - admission control, queuing and resource limiting to optimally load a
    machine. Some limited level is possible with external pooling, but only by
    limiting concurrent workers.
    o d
    The first thing I'd like to see is "user profiles"z- being able to set
    things like work_mem, synchronous_commit, etc. on per-user basis
    separately.
    You can.

    ALTER USER username SET work_mem = '100MB';

    It's not a hard cap - the user can raise/lower it however they like. The
    initial value can be set globally, per-user, per-database, or globally.
    I wonder if the prioritisation could be done using nice - each backend
    is a separate process, so why not to do 'nice(10)' for low priority
    processes or something like that.
    Yes, to a limited degree you can prioritise queries using nice and
    ionice, but it's awkward because:

    - All queries run as `postgres' so you can't do per-user limiting very
    easily

    - The postmaster doesn't have a way to set the nice level and ionice
    level when it forks a backend, nor does the backend have any way to do
    it later. You can use your own user-defined C functions for this, though.

    - Most importantly, even if you nice and ionice using C functions or
    manually with the cmdline utilities, you can't affect the bgwriter, nor
    can you affect how much data a low-priority query pushes out of cache.

    --
    Craig Ringer
  • Tomas Vondra at Dec 8, 2011 at 4:50 pm

    On 8 Prosinec 2011, 14:17, Craig Ringer wrote:
    You can.

    ALTER USER username SET work_mem = '100MB';

    It's not a hard cap - the user can raise/lower it however they like. The
    initial value can be set globally, per-user, per-database, or globally.
    Oh, shame on me! I wasn't aware of this. Too bad it's not possible to
    restrict this (changing certain config values). A simple 'before SET' hook
    might do the trick.
    I wonder if the prioritisation could be done using nice - each backend
    is a separate process, so why not to do 'nice(10)' for low priority
    processes or something like that.
    Yes, to a limited degree you can prioritise queries using nice and
    ionice, but it's awkward because:

    - All queries run as `postgres' so you can't do per-user limiting very
    easily

    - The postmaster doesn't have a way to set the nice level and ionice
    level when it forks a backend, nor does the backend have any way to do
    it later. You can use your own user-defined C functions for this, though.
    Yes, that's what I meant.
    - Most importantly, even if you nice and ionice using C functions or
    manually with the cmdline utilities, you can't affect the bgwriter, nor
    can you affect how much data a low-priority query pushes out of cache.
    IMHO bgwriter may be reasonably tuned by bgwriter_* GUC variables. The
    user backends are probably more interesting here.

    Tomas
  • Eduardo Morras at Dec 12, 2011 at 12:10 am
    Sorry for delay,

    I wonder if the prioritisation could be done
    using nice - each backend is a separate
    process, so why not to do 'nice(10)' for low
    priority processes or something like that.
    Yes, to a limited degree you can prioritise
    queries using nice and ionice, but it's awkward because:

    - All queries run as `postgres' so you can't do per-user limiting very easily

    - The postmaster doesn't have a way to set the
    nice level and ionice level when it forks a
    backend, nor does the backend have any way to do
    it later. You can use your own user-defined C functions for this, though.
    Postmaster hasn't a way to modify the system nice
    or ionice, but it can send signals to child
    process. Sending a user signal to sleep
    completely the child (i/o and cpu) or before any
    i/o call can do the trick. Perhaps it's time to
    add a query scheduler or a more complete/complex one in postgres.
    --
    Craig Ringer
    ----------------------------------------------
    Si la vida te da la espalda, ¡tocale el culo!
  • Andrew Sullivan at Dec 8, 2011 at 1:21 pm

    On Thu, Dec 08, 2011 at 01:53:45PM +0100, Tomas Vondra wrote:
    I wonder if the prioritisation could be done using nice - each backend is
    a separate process, so why not to do 'nice(10)' for low priority processes
    or something like that.
    This won't work because if you are holding a lock on something someone
    else needs, your low nice score is going to cause them problems. It
    could make things worse rather than better. (This suggestion comes up
    a lot, by the way, so there's been a lot of discussion of it
    historically.)

    --
    Andrew Sullivan
    ajs@crankycanuck.ca
  • Tomas Vondra at Dec 8, 2011 at 5:48 pm

    On 8 Prosinec 2011, 14:20, Andrew Sullivan wrote:
    On Thu, Dec 08, 2011 at 01:53:45PM +0100, Tomas Vondra wrote:
    I wonder if the prioritisation could be done using nice - each backend
    is
    a separate process, so why not to do 'nice(10)' for low priority
    processes
    or something like that.
    This won't work because if you are holding a lock on something someone
    else needs, your low nice score is going to cause them problems. It
    could make things worse rather than better. (This suggestion comes up
    a lot, by the way, so there's been a lot of discussion of it
    historically.)
    I'm aware of that, but there are cases when this may actually work.

    For example we do have an OLTP system, but we need to build exports to
    other systems regularly. The export may need to read a lot of data, but I
    don't want to annoy the people who are using the system. So I could lower
    the priority for the backend generating the report.

    Yes, there are cases where this "priority inversion" makes it unusable.

    Tomas
  • Satoshi Nagayasu at Dec 8, 2011 at 2:48 pm

    2011/12/08 21:53, Tomas Vondra wrote:
    - performance monitoring and diagnostics. It's way harder to find out
    what's causing load on a busy Pg server or report on frequent/expensive
    queries etc. Tooling is limited and fairly primitive. It's find, but
    nowhere near as powerful and easy as some if the other DBs.
    True. Greg Smith actually mentioned this as one of the frequently asked
    features in his post about two weeks ago
    (http://blog.2ndquadrant.com/en/2011/11/global-trends-in-deploying-pos.html).
    I've started to build my own tool and got it somehow working for my needs,
    and there are other tools available, but none of them is really a complete
    solution. Would be nice to form a dev group that would work on this.
    Seems a good point. I'm trying to build "a complete solution". :)

    Anyway, one of the reasons of such difficulties to build "a complete solution"
    is based on necessity of the support from the *entire* core code. Without the
    core support, a complete solution would never be built. Obtaining LWLock
    statistics or write I/O operations is actually pretty tough work for
    "non-experienced" PostgreSQL DBA, like me. :)

    For examples, I've been working on investigating PostgreSQL LWLock behaviors
    precisely for a few weeks, and it could not be obtained within PostgreSQL
    itself, therefore, I picked up SystemTap. However, SystemTap could not be
    used in a production system, because it often kills the target processes. :(
    How can I observe LWLocks in the production system?

    There are several tools to monitor system behaviors around operating systems,
    but it is far from understanding PostgreSQL behavior. And DBAs coming from
    other RDBMSes, in particular proprietary RDBMSes, need it, because they've
    already been using such facilities (or tools) in their RDBMSes.
    That's the reason why we need more facilities to observe inside PostgreSQL.

    In addition, one more reason of the difficulties is that experienced
    PostgreSQL DBAs (or hackers) do not need such facilities in general,
    because they can imagine how PostgreSQL works in such particular situation.

    I still think we can implement (or enhance) for those facilities if we
    focus on it, but I sometimes feel it's like "a chicken and egg situation".

    Regards,
    --
    NAGAYASU Satoshi <satoshi.nagayasu@gmail.com>
  • Greg Smith at Dec 10, 2011 at 1:54 am

    On 12/08/2011 09:48 AM, Satoshi Nagayasu wrote:
    For examples, I've been working on investigating PostgreSQL LWLock
    behaviors
    precisely for a few weeks, and it could not be obtained within PostgreSQL
    itself, therefore, I picked up SystemTap. However, SystemTap could not be
    used in a production system, because it often kills the target
    processes. :(
    How can I observe LWLocks in the production system?
    I decided about a year ago that further work on using SystemTap was a
    black hole: time goes in, nothing really usable on any production
    server seems to come out. It can be useful for collecting data in a
    developer context. But the sort of problems people are more interested
    in all involve "why is the production server doing this?", and as you've
    also discovered the only reasonable answer so far doesn't involve
    SystemTap; it involves DTrace and either Solaris or FreeBSD (or Mac OS,
    for smaller server hardware deployments). Since those platforms are
    problematic to run database servers on in many cases, that doesn't help
    very much.

    I'm planning to put that instrumentation into the database directly,
    which is what people with Oracle background are asking for. There are
    two underlying low-level problems to solve before even starting that:

    -How can the overhead of collecting the timing data be kept down? It's
    really high in some places. This is being worked out right now on
    pgsql-hackers, see "Timing overhead and Linux clock sources"

    -How do you log the potentially large amount of data collected without
    killing server performance? Initial discussions also happening right
    now, see "logging in high performance systems".

    I feel this will increasingly be the top blocker for performance
    sensitive deployments in the coming year, people used to having these
    tools in Oracle cannot imagine how they would operate without them. One
    of my big pictures goals is have this available as a compile-time option
    starting in PostgreSQL 9.3 in 2013, piggybacked off the existing DTrace
    support. And the earlier the better--since many migrations have a long
    lead time, just knowing it's coming in the next version would be good
    enough for some people who are blocked right now to start working on theirs.

    --
    Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
    PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
  • Craig Ringer at Dec 11, 2011 at 2:28 am

    On 12/10/2011 09:54 AM, Greg Smith wrote:
    I'm planning to put that instrumentation into the database directly,
    which is what people with Oracle background are asking for.
    FWIW, even for folks like me who've come from a general OSS DB
    background with a smattering of old Sybase and other primitive stuff,
    it's still a concern. I know enough about Linux's guts, I/O and memory
    behaviour to be able to track down many issues but it takes *time*, time
    I can't spend on any of the numerous other things I have to also be
    doing. Right now Pg performance fine-tuning is more of an expert skill
    (as you know!) and for those with a less strong background in Linux/UNIX
    systems and tuning it's a bit of a barrier.

    Thankfully my workload is so light I don't need to care; manual EXPLAIN
    ANALYSE and auto_explain along with vmstat/iotop is enough for me.
    -How can the overhead of collecting the timing data be kept down?
    It's really high in some places. This is being worked out right now
    on pgsql-hackers, see "Timing overhead and Linux clock sources"
    One thing I think would be interesting for this would be to identify
    slow queries (without doing detailed plan timing) and flag them for more
    detailed timing if they're run again within <x> time. I suspect this
    would only be practical with parameterised prepared statements where the
    query string remained the same, but that'd still be interesting -
    essentially automatically upgrading the log level for problem queries
    from slow query logging to auto_explain with explain analyse.

    The main issue would be exempting queries that're expected to take
    longer than the slow query threshold, like reporting queries, where you
    wouldn't want to pay that overhead. That should be handled by forgetting
    about slow queries that aren't run again too soon, so they get flagged
    for EXPLAIN ANALYZE next run but forgotten about before they're next run.

    I don't actually need this myself, it's just something I've been
    thinking about as a way to reduce the admin load of identifying and
    tuning problem queries.
    I feel this will increasingly be the top blocker for performance
    sensitive deployments in the coming year, people used to having these
    tools in Oracle cannot imagine how they would operate without them.
    Yep, especially since there's nothing in Pg to manage user/query
    priorities for I/O or CPU, so the ability to actively manage performance
    problems from the outside is particularly crucial. You'll always want to
    do that of course, and it's probably _better_ than relying on work
    priorities, especially since it sounds from recent comments like even on
    Oracle those priority features aren't what you'd call friendly.

    Personally I'd choose good performance monitoring over user/query
    priorities any day. With good perf monitoring I can script from the
    outside I have a lot more control, can send alerts, etc etc.

    --
    Craig Ringer
  • Scott Marlowe at Dec 11, 2011 at 3:29 am

    On Sat, Dec 10, 2011 at 7:28 PM, Craig Ringer wrote:
    The main issue would be exempting queries that're expected to take longer
    than the slow query threshold, like reporting queries, where you wouldn't
    want to pay that overhead.
    One trick you can use for this is to assign the reporting application
    a different user and then alter user yada yada to turn off logging of
    slow queries etc for that user. Since it's often useful to increase
    work_mem and / or change random_page_cost and so on for a reporting
    user, it's pretty common to do this anyway.
  • Greg Smith at Dec 11, 2011 at 6:19 am

    On 12/10/2011 09:28 PM, Craig Ringer wrote:
    One thing I think would be interesting for this would be to identify
    slow queries (without doing detailed plan timing) and flag them for
    more detailed timing if they're run again within <x> time. I suspect
    this would only be practical with parameterised prepared statements
    where the query string remained the same, but that'd still be interesting
    There are actually two patches sitting in the current PostgreSQL
    CommitFest that allow normalizing query strings in a way that they could
    be handled like this even if not prepared, as part of
    pg_stat_statements. What you're asking for is basically a hybrid of
    that and auto_explain, with something smarter deciding when the explain
    is triggered. Interesting idea, I hadn't thought of that heuristic
    before. It won't be hard to do if the query normalization stuff commits.
    Personally I'd choose good performance monitoring over user/query
    priorities any day. With good perf monitoring I can script from the
    outside I have a lot more control, can send alerts, etc etc.
    Luckily for you it's hard to do it in any other order. When I think
    about how we'd have to validate whether query prioritization code was
    operating as expected or not, I imagine some extra monitoring tools
    really need to get built first. Might as well expose those for people
    like yourself too, once they're built for that purpose.

    --
    Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
    PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
  • Chris Curvey at Dec 12, 2011 at 12:11 pm

    On Sat, Dec 10, 2011 at 9:28 PM, Craig Ringer wrote:
    One thing I think would be interesting for this would be to identify slow
    queries (without doing detailed plan timing) and flag them for more
    detailed timing if they're run again within <x> time. I suspect this would
    only be practical with parameterised prepared statements where the query
    string remained the same, but that'd still be interesting - essentially
    automatically upgrading the log level for problem queries from slow query
    logging to auto_explain with explain analyse.

    I'll suggest a different take. How about adding a feature where the system
    flags queries that are taking longer than the optimizer expects? The
    optimizer must be coming up with some kind of cost number that it uses to
    rank query plans. If Postgres is using significantly (1) more than the
    expected cost when executing the query, then that's a sign that something
    is wrong (statistics wrong, cost ratios out of whack, etc).

    I could see a future where Postgres could either alert a DBA to the issue,
    or try to take corrective action on it's own (queue up a table or index for
    a statistics update). Maybe the next time a table is probed, let Postgres
    collect the distribution statistics as a side effect of the query. I'm not
    aware of any database engine that does that today.

    I wish I had the programming chops to take a swing at this....

    (1) the definition of "significantly" is left as an exercise for the
    configuration file :)


    --
    e-Mail is the equivalent of a postcard written in pencil. This message may
    not have been sent by me, or intended for you. It may have been read or
    even modified while in transit. e-Mail disclaimers have the same force in
    law as a note passed in study hall. If your corporate attorney says that
    you need an disclaimer in your signature, you need a new corporate attorney.
  • Satoshi Nagayasu at Dec 12, 2011 at 12:09 am

    2011/12/10 10:54, Greg Smith wrote:
    On 12/08/2011 09:48 AM, Satoshi Nagayasu wrote:
    For examples, I've been working on investigating PostgreSQL LWLock behaviors
    precisely for a few weeks, and it could not be obtained within PostgreSQL
    itself, therefore, I picked up SystemTap. However, SystemTap could not be
    used in a production system, because it often kills the target processes. :(
    How can I observe LWLocks in the production system?
    I decided about a year ago that further work on using SystemTap was a black hole: time goes in, nothing really usable on any production server seems to come out. It can be useful for collecting data in a developer context. But the sort of problems people are more interested in all involve "why is the production server doing this?", and as you've also discovered the only reasonable answer so far doesn't involve SystemTap; it involves DTrace and either Solaris or FreeBSD (or Mac OS, for smaller server hardware deployments). Since those platforms are problematic to run database servers on in many cases, that doesn't help very much.
    Absolutely. SystemTap would be useful if I'm able to reproduce the situation
    outside the production system. However, in most cases, it would be actually
    difficult.
    I'm planning to put that instrumentation into the database directly, which is what people with Oracle background are asking for. There are two underlying low-level problems to solve before even starting that:

    -How can the overhead of collecting the timing data be kept down? It's really high in some places. This is being worked out right now on pgsql-hackers, see "Timing overhead and Linux clock sources"

    -How do you log the potentially large amount of data collected without killing server performance? Initial discussions also happening right now, see "logging in high performance systems".

    I feel this will increasingly be the top blocker for performance sensitive deployments in the coming year, people used to having these tools in Oracle cannot imagine how they would operate without them. One of my big pictures goals is have this available as a compile-time option starting in PostgreSQL 9.3 in 2013, piggybacked off the existing DTrace support. And the earlier the better--since many migrations have a long lead time, just knowing it's coming in the next version would be good enough for some people who are blocked right now to start working on theirs.
    I'm glad to hear that. I'm very interested in focusing on it,
    and will follow the threads. Thanks.

    --
    NAGAYASU Satoshi <satoshi.nagayasu@gmail.com>
  • Jayadevan M at Dec 12, 2011 at 4:42 am
    Hello,
    Absolutely. SystemTap would be useful if I'm able to reproduce the situation
    outside the production system. However, in most cases, it would be actually
    difficult.
    At the db level, Oracle provides "Database replay" feature. that lets you
    replay the production server events in the development/test environment.
    http://docs.oracle.com/cd/B28359_01/server.111/e12253/dbr_intro.htm
    Won't something like this be useful in PostgreSQL? It will let us mimic
    the production environment load and analyze it better.
    Regards,
    Jayadevan






    DISCLAIMER:

    "The information in this e-mail and any attachment is intended only for
    the person to whom it is addressed and may contain confidential and/or
    privileged material. If you have received this e-mail in error, kindly
    contact the sender and destroy all copies of the original communication.
    IBS makes no warranty, express or implied, nor guarantees the accuracy,
    adequacy or completeness of the information contained in this email or any
    attachment and is not liable for any errors, defects, omissions, viruses
    or for resultant loss or damage, if any, direct or indirect."
  • Ondrej Ivanič at Dec 12, 2011 at 4:47 am
    Hi,
    On 12 December 2011 15:39, Jayadevan M wrote:
    At the db level, Oracle provides "Database replay" feature. that lets you
    replay the production server events in the development/test environment.
    http://docs.oracle.com/cd/B28359_01/server.111/e12253/dbr_intro.htm
    Won't something like this be useful in PostgreSQL? It will let us mimic the
    production environment load and analyze it better.
    There is project called pgreplay (http://pgreplay.projects.postgresql.org/):
    pgreplay reads a PostgreSQL log file (not a WAL file), extracts the
    SQL statements and executes them in the same order and relative time
    against a PostgreSQL database cluster. The idea is to replay a
    real-world database workload as exactly as possible.

    --
    Ondrej Ivanic
    (ondrej.ivanic@gmail.com)
  • Greg Smith at Dec 12, 2011 at 11:43 am

    On 12/11/2011 11:39 PM, Jayadevan M wrote:
    At the db level, Oracle provides "Database replay" feature. that lets
    you replay the production server events in the development/test
    environment.
    http://docs.oracle.com/cd/B28359_01/server.111/e12253/dbr_intro.htm
    Won't something like this be useful in PostgreSQL? It will let us
    mimic the production environment load and analyze it better.
    There are several projects aiming at this goal in various ways:
    http://wiki.postgresql.org/wiki/Statement_Playback

    Some of the features currently under development right now will make
    this sort of thing easier to build into the core database. For example,
    the recent "Command Triggers" feature submission will make it easier to
    catch DDL changes as well as queries for this sort of thing.

    --
    Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
    PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
  • Jayadevan M at Dec 13, 2011 at 3:36 am
    Hello,
    Some of the features currently under development right now will make
    this sort of thing easier to build into the core database. For
    example, the recent "Command Triggers" feature submission will make
    it easier to catch DDL changes as well as queries for this sort of
    thing.
    Thank you for all the responses. I am a long-time Oracle guy who loves
    trying PostgreSQL because it is looks like a great database. I try it out
    on my pc, and have no production experience with PostgrSQL. We recently
    tried moving our product (using Oracle) with more than 100 GB data to
    PostgreSQL and conducted performance testing. It did an outstanding job!
    But I miss all those user-friendly trouble-shooting utilities (like
    Automatic Workload Repository,Active Session History etc etc) in
    PostgreSQL. Yes - some of them are there, but one has to search,download,
    configure etc. I hope many of these features will become part of the
    'core' soon.
    Regards,
    Jayadevan





    DISCLAIMER:

    "The information in this e-mail and any attachment is intended only for
    the person to whom it is addressed and may contain confidential and/or
    privileged material. If you have received this e-mail in error, kindly
    contact the sender and destroy all copies of the original communication.
    IBS makes no warranty, express or implied, nor guarantees the accuracy,
    adequacy or completeness of the information contained in this email or any
    attachment and is not liable for any errors, defects, omissions, viruses
    or for resultant loss or damage, if any, direct or indirect."
  • Greg Smith at Dec 13, 2011 at 11:22 pm

    On 12/12/2011 10:33 PM, Jayadevan M wrote:
    But I miss all those user-friendly trouble-shooting utilities (like
    Automatic Workload Repository,Active Session History etc etc) in
    PostgreSQL. Yes - some of them are there, but one has to
    search,download, configure etc. I hope many of these features will
    become part of the 'core' soon.
    That's unlikely, simply because the definition of "core" in PostgreSQL
    doesn't quite include the full experience of user tools like this. For
    example, you might see the core collect the data needed for something
    that acts like a AWR baseline. But you're unlikely to ever get the sort
    of easy baseline management+graph management tools that Oracle's
    Enterprise Manager layers on top of them in core. There's no place to
    put a GUI/web tool like that there, and I wouldn't expect that to ever
    change. You might see it provided as a feature to the pgAdmin tool
    though, or as a separate web application.

    The fact that there are multiple pieces of software involved doesn't
    have to make this hard. Ultimately the problem you're identifying is a
    packaging one. Something doesn't have to be in the PostgreSQL core to
    be packaged nicely so that you can easily install and use it. It's
    probably easy for you to get pgAdmin installed and working for example,
    and that's not a part of core. There's just been a lot more work put
    into packaging it than most tools have gotten so far.

    --
    Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
    PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
  • Greg Smith at Dec 12, 2011 at 12:10 pm

    On 12/09/2011 08:54 PM, Greg Smith wrote:
    I decided about a year ago that further work on using Systemtap was a
    black hole: time goes in, nothing really usable on any production
    server seems to come out.
    My off-list e-mail this weekend has, quite rightly, pointed out that
    this cheap shot is unfair bordering on libel toward the hard working
    Systemtap developers. I'd like to publicly apologize for that and
    clarify my frustrated statement here (I'd *really* like this sort of
    tool available more)

    The main problem I've had with Systemtap is its reputation; I don't
    actually have any real, informed gripes about its current state. But
    the sort of customers I have are very risk-adverse. PostgreSQL does a
    good job attracting that sort of user. I'm sure we have a
    disproportionate number of them relative to your average open-source
    program. Accordingly, unless a piece of software is very trusted, it's
    hard for me to convince anyone to use it. (See "why Greg hates the
    disclaimers around the PostgreSQL contrib modules") That makes it hard
    for me to give Systemtap a serious spin on most of the production
    servers I see my hardest problems on. That's the reason behind the
    statement I made badly here--regardless of how much I know about it, I
    can't seem to get Systemtap deployed in the places I spent the most time
    working at.

    True or false, the "Systemtap is more likely to crash your kernel than
    DTrace" meme is out there. I think some of that is an unexpectedly bad
    side-effect of its open-source development. DTrace had the luxury of
    being hidden from the world at large until it was well formed. Whereas
    a lot of people saw Systemtap in a really early state, formed opinions
    several years ago, and the oldest of those are some of the highest
    ranking pages when you search for information. I just searched again
    today, and there's mounds of stuff from 2006 and 2007 that surely
    doesn't reflect the current state of things coming back from that.
    Systemtap didn't get a 1.0 release until September 2009.

    As someone who has spent a lot of time at the wrong end of the
    "PostgreSQL is slower than MySQL" meme, I shouldn't have just thrown
    this sort of criticism out there without explaining the basis for my
    statement. I hope this clears up what I meant. Ultimately I think we
    need both more tools like Systemtap and DTrace, as well as more
    instrumentation inside PostgreSQL, to cover all of the things people
    would like visibility into.

    --
    Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
    PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
  • Joshua D. Drake at Dec 8, 2011 at 6:15 pm

    On 12/08/2011 03:24 AM, Craig Ringer wrote:

    - shared-storage clustering. Dunno if anyone still cares about this one
    though.
    This one seems to be moving into the legacy category over the next 3-5
    years.

    JD


    --
    Command Prompt, Inc. - http://www.commandprompt.com/
    PostgreSQL Support, Training, Professional Services and Development
    The PostgreSQL Conference - http://www.postgresqlconference.org/
    @cmdpromptinc - @postgresconf - 509-416-6579
  • John R Pierce at Dec 8, 2011 at 6:54 pm

    On 12/08/11 10:14 AM, Joshua D. Drake wrote:
    - shared-storage clustering. Dunno if anyone still cares about this one
    though.
    This one seems to be moving into the legacy category over the next 3-5
    years.
    um, I believe this is referring to Oracle RAC clustering, not HA
    active/standby. I seriously doubt Oracle is dropping RAC.



    --
    john r pierce N 37, W 122
    santa cruz ca mid-left coast
  • Joshua D. Drake at Dec 8, 2011 at 7:16 pm

    On 12/08/2011 10:54 AM, John R Pierce wrote:
    On 12/08/11 10:14 AM, Joshua D. Drake wrote:

    - shared-storage clustering. Dunno if anyone still cares about this one
    though.
    This one seems to be moving into the legacy category over the next 3-5
    years.
    um, I believe this is referring to Oracle RAC clustering, not HA
    active/standby. I seriously doubt Oracle is dropping RAC.
    I meant worrying about it for Pg.

    JD

    --
    Command Prompt, Inc. - http://www.commandprompt.com/
    PostgreSQL Support, Training, Professional Services and Development
    The PostgreSQL Conference - http://www.postgresqlconference.org/
    @cmdpromptinc - @postgresconf - 509-416-6579
  • John R Pierce at Dec 8, 2011 at 7:24 pm

    On 12/08/11 11:16 AM, Joshua D. Drake wrote:

    um, I believe this is referring to Oracle RAC clustering, not HA
    active/standby. I seriously doubt Oracle is dropping RAC.
    I meant worrying about it for Pg.
    the odds of Postgres developing something as complex and intricate as
    RAC are probably between zilch and none. RAC was for many years
    completely unusable, and even now, its complicated, fragile, and expensive.

    --
    john r pierce N 37, W 122
    santa cruz ca mid-left coast
  • Joshua D. Drake at Dec 8, 2011 at 7:39 pm

    On 12/08/2011 11:24 AM, John R Pierce wrote:
    On 12/08/11 11:16 AM, Joshua D. Drake wrote:

    um, I believe this is referring to Oracle RAC clustering, not HA
    active/standby. I seriously doubt Oracle is dropping RAC.
    I meant worrying about it for Pg.
    the odds of Postgres developing something as complex and intricate as
    RAC are probably between zilch and none. RAC was for many years
    completely unusable, and even now, its complicated, fragile, and expensive.
    Exactly.

    --
    Command Prompt, Inc. - http://www.commandprompt.com/
    PostgreSQL Support, Training, Professional Services and Development
    The PostgreSQL Conference - http://www.postgresqlconference.org/
    @cmdpromptinc - @postgresconf - 509-416-6579
  • Rodger Donaldson at Dec 9, 2011 at 12:09 am

    On Thu, 08 Dec 2011 11:24:12 -0800, John R Pierce wrote:
    On 12/08/11 11:16 AM, Joshua D. Drake wrote:

    um, I believe this is referring to Oracle RAC clustering, not HA
    active/standby. I seriously doubt Oracle is dropping RAC.
    I meant worrying about it for Pg.
    the odds of Postgres developing something as complex and intricate as
    RAC are probably between zilch and none. RAC was for many years
    completely unusable, and even now, its complicated, fragile, and
    expensive.

    Happily, the complications and fragility are now utilised by Oracle to
    help sell ExaData units, on the basis that if you give Oracle even more
    money, they'll sell you a RAC that actually works!

Related Discussions