Hi,

I note that the implementation of tab completion for SET TRANSACTION in PSQL
could benefit from the implementation of autonomous transactions (also
TODO).

Regards,

Colin

Search Discussions

  • Robert Haas at Sep 15, 2010 at 5:30 pm

    On Wed, Sep 15, 2010 at 3:37 AM, Colin 't Hart wrote:
    I note that the implementation of tab completion for SET TRANSACTION in PSQL
    could benefit from the implementation of autonomous transactions (also
    TODO).
    I think it's safe to say that if we ever manage to get autonomous
    transactions working, there are a GREAT MANY things which will benefit
    from that. There's probably an easier way to get at that Todo item,
    though, if someone feels like beating on it.

    One problem with autonomous transactions is that you have to figure
    out where to store all the state associated with the autonomous
    transaction and its subtransactions. Another is that you have to
    avoid an unacceptable slowdown in the tuple-visibility checks in the
    process.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Darren Duncan at Sep 15, 2010 at 6:39 pm

    Robert Haas wrote:
    On Wed, Sep 15, 2010 at 3:37 AM, Colin 't Hart wrote:
    I note that the implementation of tab completion for SET TRANSACTION in PSQL
    could benefit from the implementation of autonomous transactions (also
    TODO).
    I think it's safe to say that if we ever manage to get autonomous
    transactions working, there are a GREAT MANY things which will benefit
    from that. There's probably an easier way to get at that Todo item,
    though, if someone feels like beating on it.

    One problem with autonomous transactions is that you have to figure
    out where to store all the state associated with the autonomous
    transaction and its subtransactions. Another is that you have to
    avoid an unacceptable slowdown in the tuple-visibility checks in the
    process.
    As I understand it, in many ways, autonomous transactions are like distinct
    database client sessions, but that the client in this case is another database
    session, especially if the autonomous transaction can make a commit that
    persists even if the initial session afterwards does a rollback.

    Similarly, using autonomous transactions is akin to multi-processing. Normal
    distinct database client sessions are like distinct processes, but usually are
    started externally to the DBMS, but autonomous transactions are like processes
    started within the DBMS.

    Also, under the assumption that everything in a DBMS session should be subject
    to transactions, so that both data-manipulation and data-definition can be
    rolled back, autonomous transactions are like a generalization of supporting
    sequence generators that remember their incremented state even when the action
    that incremented it is rolled back; the sequence generator update is effectively
    an autonomous transaction, in that case.

    The point being, the answer to how to implement autonomous transactions could be
    as simple as, do the same thing as how you manage multiple concurrent client
    sessions, more or less. If each client gets its own Postgres OS process, then
    an autonomous transaction just farms out to another one of those which does the
    work. Or maybe there could be a lighter weight version of this.

    Does this design principle seem reasonable?

    If autonomous transactions could be used a lot, then maybe the other process
    could be kept connected and be fed other subsequent autonomous actions, such as
    if it is being used to implement an activity log, so some kind of IPC would be
    going on.

    -- Darren Duncan
  • Robert Haas at Sep 15, 2010 at 6:57 pm

    On Wed, Sep 15, 2010 at 2:32 PM, Darren Duncan wrote:
    The point being, the answer to how to implement autonomous transactions
    could be as simple as, do the same thing as how you manage multiple
    concurrent client sessions, more or less.  If each client gets its own
    Postgres OS process, then an autonomous transaction just farms out to
    another one of those which does the work.  Or maybe there could be a lighter
    weight version of this.

    Does this design principle seem reasonable?
    I guess so, but the devil is in the details. I suspect that we don't
    actually want to fork a new backend for every autonomous transactions.
    That would be pretty expensive, and we already have an expensive way
    of emulating this functionality using dblink. Finding all of the bits
    that think there's only one top-level transaction per backend and
    generalizing them to support multiple top-level transactions per
    backend doesn't sound easy, though, especially since you must do it
    without losing performance.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Darren Duncan at Sep 15, 2010 at 7:22 pm

    Robert Haas wrote:
    On Wed, Sep 15, 2010 at 2:32 PM, Darren Duncan wrote:
    The point being, the answer to how to implement autonomous transactions
    could be as simple as, do the same thing as how you manage multiple
    concurrent client sessions, more or less. If each client gets its own
    Postgres OS process, then an autonomous transaction just farms out to
    another one of those which does the work. Or maybe there could be a lighter
    weight version of this.

    Does this design principle seem reasonable?
    I guess so, but the devil is in the details. I suspect that we don't
    actually want to fork a new backend for every autonomous transactions.
    That would be pretty expensive, and we already have an expensive way
    of emulating this functionality using dblink. Finding all of the bits
    that think there's only one top-level transaction per backend and
    generalizing them to support multiple top-level transactions per
    backend doesn't sound easy, though, especially since you must do it
    without losing performance.
    As you say, the devil is in the details, but I see this as mainly being an
    implementation issue, where an essentially same task could abstract different
    possible implementations, some more light or heavyweight.

    This is loosely how I look at the issue conceptually, meaning like the illusion
    that the DBMS presents to the user:

    The DBMS is a multi-process virtual machine, the database being worked on is the
    file system or disk, and uncommitted transactions are data structures in memory
    that may have multiple versions. Each autonomous transaction is associated with
    a single process. A process can either be started by the user (client
    connection) or by another process (autonomous transaction). Regardless of how a
    process is started, the way to manage multiple autonomous tasks is that each has
    its own process. Tasks that are not mutually autonomous would be within the
    same process. Child transactions or savepoints have the same process as their
    parent when the parent can rollback their commits.

    Whether the DBMS uses multiple OS threads or multiple OS processes or uses
    coroutines or whatever is an implementation detail.

    A point here being that over time Postgres can evolve to use either multiple OS
    processes or multiple threads or a coroutine system within a single
    thread/process, to provide the illusion of each autonomous transaction being an
    independent process, and the data structures and algorithms for managing
    autonomous transactions can be similar to or the same as multiple client
    connections, since conceptually they are alike.

    -- Darren Duncan
  • Alvaro Herrera at Sep 15, 2010 at 10:33 pm

    Excerpts from Robert Haas's message of mié sep 15 14:57:29 -0400 2010:

    I guess so, but the devil is in the details. I suspect that we don't
    actually want to fork a new backend for every autonomous transactions.
    That would be pretty expensive, and we already have an expensive way
    of emulating this functionality using dblink. Finding all of the bits
    that think there's only one top-level transaction per backend and
    generalizing them to support multiple top-level transactions per
    backend doesn't sound easy, though,
    Yeah, and the transaction handling code is already pretty complex.
    especially since you must do it without losing performance.
    Presumably we'd have fast paths for the main transaction, and
    any autonomous transactions beside that one would incur in some
    slowdown.

    I think the complex parts are, first, figuring out what to do with
    global variables that currently represent a transaction (they are
    sprinkled all over the place); and second, how to represent the
    autonomous transactions in shared memory without requiring the PGPROC
    array to be arbitrarily resizable.

    The other alternative would be to bolt the autonomous transaction
    somehow in the current subtransaction stack thing and marking it in some
    different way so that we can reuse the games we play with "push/pop"
    there. That still leaves us with the PGPROC problem.

    --
    Álvaro Herrera <alvherre@commandprompt.com>
    The PostgreSQL Company - Command Prompt, Inc.
    PostgreSQL Replication, Consulting, Custom Development, 24x7 support
  • Robert Haas at Sep 16, 2010 at 12:43 am

    On Wed, Sep 15, 2010 at 6:21 PM, Alvaro Herrera wrote:
    Excerpts from Robert Haas's message of mié sep 15 14:57:29 -0400 2010:
    I guess so, but the devil is in the details.  I suspect that we don't
    actually want to fork a new backend for every autonomous transactions.
    That would be pretty expensive, and we already have an expensive way
    of emulating this functionality using dblink.  Finding all of the bits
    that think there's only one top-level transaction per backend and
    generalizing them to support multiple top-level transactions per
    backend doesn't sound easy, though,
    Yeah, and the transaction handling code is already pretty complex.
    Yep.
    especially since you must do it without losing performance.
    Presumably we'd have fast paths for the main transaction, and
    any autonomous transactions beside that one would incur in some
    slowdown.

    I think the complex parts are, first, figuring out what to do with
    global variables that currently represent a transaction (they are
    sprinkled all over the place); and second, how to represent the
    autonomous transactions in shared memory without requiring the PGPROC
    array to be arbitrarily resizable.

    The other alternative would be to bolt the autonomous transaction
    somehow in the current subtransaction stack thing and marking it in some
    different way so that we can reuse the games we play with "push/pop"
    there.  That still leaves us with the PGPROC problem.
    I wonder if we could use/generalize pg_subtrans in some way to handle
    the PGPROC problem. I haven't thought about it much, though.

    One thing that strikes me (maybe this is obvious) is that the
    execution of the main transaction and the autonomous transaction are
    not interleaved: it's a stack. So in terms of globals and stuff,
    assuming you knew which things needed to be updated, you could push
    all that stuff off to the side, do whatever with the new transaction,
    and then restore all the context afterwards. That doesn't help in
    terms of PGPROC, of course, but for backend-local state it seems
    workable.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Dimitri Fontaine at Sep 16, 2010 at 9:19 am

    Robert Haas writes:
    One thing that strikes me (maybe this is obvious) is that the
    execution of the main transaction and the autonomous transaction are
    not interleaved: it's a stack. So in terms of globals and stuff,
    assuming you knew which things needed to be updated, you could push
    all that stuff off to the side, do whatever with the new transaction,
    and then restore all the context afterwards.
    I think they call that dynamic scope, in advanced programming
    language. I guess that's calling for a quote of Greenspun's Tenth Rule:

    Any sufficiently complicated C or Fortran program contains an ad hoc
    informally-specified bug-ridden slow implementation of half of Common
    Lisp.

    So the name of the game could be to find out a way to implement (a
    limited form of) dynamic scoping in PostgreSQL, in C, then find out all
    and any backend local variable that needs that to support autonomous
    transactions, then make it happen… Right?

    Regards,
    --
    dim
  • Robert Haas at Sep 16, 2010 at 2:20 pm

    On Thu, Sep 16, 2010 at 5:19 AM, Dimitri Fontaine wrote:
    Robert Haas <robertmhaas@gmail.com> writes:
    One thing that strikes me (maybe this is obvious) is that the
    execution of the main transaction and the autonomous transaction are
    not interleaved: it's a stack.  So in terms of globals and stuff,
    assuming you knew which things needed to be updated, you could push
    all that stuff off to the side, do whatever with the new transaction,
    and then restore all the context afterwards.
    I think they call that dynamic scope, in advanced programming
    language. I guess that's calling for a quote of Greenspun's Tenth Rule:

    Any sufficiently complicated C or Fortran program contains an ad hoc
    informally-specified bug-ridden slow implementation of half of Common
    Lisp.

    So the name of the game could be to find out a way to implement (a
    limited form of) dynamic scoping in PostgreSQL, in C, then find out all
    and any backend local variable that needs that to support autonomous
    transactions, then make it happen… Right?
    Interestingly, PostgreSQL was originally written in LISP, and there
    are remnants of that in the code today; for example, our heavy use of
    List nodes. But I don't think that has much to do with this project.
    I plan to reserve judgment on the best way of managing the relevant
    state until such time as someone has gone to the trouble of
    identifying what state that is.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Tom Lane at Sep 16, 2010 at 2:42 pm

    Robert Haas writes:
    I plan to reserve judgment on the best way of managing the relevant
    state until such time as someone has gone to the trouble of
    identifying what state that is.
    The really fundamental problem here is that you never will be able to
    identify all such state. Even assuming that you successfully completed
    the herculean task of fixing the core backend, what of add-on code?

    (This is also why I'm quite unimpressed with the idea of trying to
    get backends to switch to a different database after startup.)

    regards, tom lane
  • Darren Duncan at Sep 17, 2010 at 3:28 am

    Robert Haas wrote:
    On Thu, Sep 16, 2010 at 5:19 AM, Dimitri Fontaine wrote:
    I think they call that dynamic scope, in advanced programming
    language. I guess that's calling for a quote of Greenspun's Tenth Rule:

    Any sufficiently complicated C or Fortran program contains an ad hoc
    informally-specified bug-ridden slow implementation of half of Common
    Lisp.

    So the name of the game could be to find out a way to implement (a
    limited form of) dynamic scoping in PostgreSQL, in C, then find out all
    and any backend local variable that needs that to support autonomous
    transactions, then make it happen… Right?
    Interestingly, PostgreSQL was originally written in LISP, and there
    are remnants of that in the code today; for example, our heavy use of
    List nodes. But I don't think that has much to do with this project.
    I plan to reserve judgment on the best way of managing the relevant
    state until such time as someone has gone to the trouble of
    identifying what state that is.
    It would probably do Pg some good to try and recapture its functional language
    roots where reasonably possible. I believe that, design-wise, functional
    languages really are the best way to do object-relational databases, given that
    pure functions and immutable data structures are typically the best way to
    express anything one would do with them. -- Darren Duncan
  • Markus Wanner at Sep 16, 2010 at 9:02 am
    Hi,
    On 09/15/2010 07:30 PM, Robert Haas wrote:
    One problem with autonomous transactions is that you have to figure
    out where to store all the state associated with the autonomous
    transaction and its subtransactions. Another is that you have to
    avoid an unacceptable slowdown in the tuple-visibility checks in the
    process.
    It just occurs to me that this is the other potential use case for
    bgworkers: autonomous transactions. Simply store any kind of state in
    the bgworker and use one per autonomous transaction.

    What's left to be done: implement communication between the controlling
    backend (with the client connection) and the bgworker (imessages), drop
    the bgworker's session to user privileges (and re-raise to superuser
    after the job) and implement better error handling, as those would have
    to be propagated back to the controlling backend.

    Regards

    Markus Wanner

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedSep 15, '10 at 7:37a
activeSep 17, '10 at 3:28a
posts12
users7
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase