On Mon, Feb 22, 2010 at 1:18 PM, Heikki Linnakangas wrote:
Jaime Casanova wrote:
so, is this idea (having some user processes be "tied" to postmaster
start/stop) going to somewhere?
I've added this to the TODO list. Now we just need someone to write it.
if we can do this, how should it work?
Simon said:
"""
Yes, I think so. Rough design...

integrated_user_processes = 'x, y, z'

would run x(), y() and z() in their own processes. These would execute
after startup, or at consistent point in recovery. The code for these
would come from preload_libraries etc.

They would not block smart shutdown, though their shudown sequence might
delay it. User code would be executed last at startup and first thing at
shutdown.

API would be user_process_startup(), user_process_shutdown().
"""

so it should be a GUC, that is settable only at start time.
we need those integrated processes at all when in a standby server?

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

Search Discussions

  • Heikki Linnakangas at Feb 22, 2010 at 6:50 pm

    Jaime Casanova wrote:
    if we can do this, how should it work?
    Simon said:
    """
    Yes, I think so. Rough design...

    integrated_user_processes = 'x, y, z'

    would run x(), y() and z() in their own processes. These would execute
    after startup, or at consistent point in recovery. The code for these
    would come from preload_libraries etc.

    They would not block smart shutdown, though their shudown sequence might
    delay it. User code would be executed last at startup and first thing at
    shutdown.

    API would be user_process_startup(), user_process_shutdown().
    """

    so it should be a GUC, that is settable only at start time.
    A GUC like that was my first thought too. We've already come up with
    many uses for it, so whatever the interface is, will need to make sure
    it's flexible enough to cater for all the use cases.
    we need those integrated processes at all when in a standby server?
    Yes. You might want to run e.g. scheduled reports from a standby
    reporting server, launched by a scheduler process. Or backups.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Jaime Casanova at Feb 22, 2010 at 7:00 pm

    On Mon, Feb 22, 2010 at 1:50 PM, Heikki Linnakangas wrote:
    we need those integrated processes at all when in a standby server?
    Yes. You might want to run e.g. scheduled reports from a standby
    reporting server, launched by a scheduler process. Or backups.
    ah! fair enough!

    --
    Atentamente,
    Jaime Casanova
    Soporte y capacitación de PostgreSQL
    Asesoría y desarrollo de sistemas
    Guayaquil - Ecuador
    Cel. +59387171157
  • Tom Lane at Feb 22, 2010 at 7:53 pm

    Jaime Casanova writes:
    wrote:
    API would be user_process_startup(), user_process_shutdown().
    so it should be a GUC, that is settable only at start time.
    we need those integrated processes at all when in a standby server?
    This seems like a solution in search of a problem to me. The most
    salient aspect of such processes is that they would necessarily run
    as the postgres user, which means that you could never run any untrusted
    code in them. That cuts the space of "user problems" they could solve
    way down.

    I still haven't seen a good reason for not using cron or Task Scheduler
    or other standard tools.

    regards, tom lane
  • Jaime Casanova at Feb 22, 2010 at 8:08 pm

    On Mon, Feb 22, 2010 at 2:53 PM, Tom Lane wrote:
    I still haven't seen a good reason for not using cron or Task Scheduler
    or other standard tools.
    - marketing? don't you hate when people say: Oracle has it!?
    - user dumbness: they forgot to start daemons they need (yes, i have
    seen that) or they simply don't know about them...
    it's amazing the amount of people how ask me just after i tell them to
    use cron or the windows task scheduler: and how i use that? Yes, in
    Latin America are still very primitive... we use only those things
    that are very very easy ;)

    the ability to have processes that start when postmaster starts and
    stop when postmaster stops is just one more way to be extensible
    without integrating every piece of code into core

    --
    Atentamente,
    Jaime Casanova
    Soporte y capacitación de PostgreSQL
    Asesoría y desarrollo de sistemas
    Guayaquil - Ecuador
    Cel. +59387171157
  • Jaime Casanova at Feb 22, 2010 at 8:10 pm

    On Mon, Feb 22, 2010 at 3:08 PM, Jaime Casanova wrote:
    On Mon, Feb 22, 2010 at 2:53 PM, Tom Lane wrote:

    I still haven't seen a good reason for not using cron or Task Scheduler
    or other standard tools.
    - marketing? don't you hate when people say: Oracle has it!?
    just before someone insult me... this comment was thought about the
    in-core scheduler, something we can live without with this and still
    doesn't have to hear that

    --
    Atentamente,
    Jaime Casanova
    Soporte y capacitación de PostgreSQL
    Asesoría y desarrollo de sistemas
    Guayaquil - Ecuador
    Cel. +59387171157
  • Dimitri Fontaine at Feb 22, 2010 at 9:18 pm

    Tom Lane writes:
    This seems like a solution in search of a problem to me. The most
    salient aspect of such processes is that they would necessarily run
    as the postgres user
    I happen to run my PGQ tickers and londiste daemons as "londiste" user
    and make it a superuser (at least while installing, as they need to
    install some PL/C stuff). Then there's pgbouncer too, which I always run
    as postgres system user, if only to be able to open a socket in the same
    directory where postgres opens them (/var/run/postgresql on my system).

    The precedent are archive and restore command. They do run as postgres
    user too, don't they? I think we could have made walreceiver and
    walsender some generic out-of-core facilities too, within this model.

    The other common use case is to schedule maintenance (vacuum, cluster
    some table, maintain a materialized view, backup), all of which can be
    run as postgres user too, only adaptation could be to have a security
    definer function.

    So, out of the only scheduler use case, if you want to see some C code
    that I'd like to be able to run as a postmaster's child, have a look at
    pgqd, the next skytools version ticker daemon, here:

    http://github.com/markokr/skytools-dev/blob/master/sql/ticker/pgqd.c
    http://github.com/markokr/skytools-dev/blob/master/sql/ticker/ticker.c

    You'll see mainly a C daemon which connects to some database and calls
    stored procedures there. There could be separate schedules in fact, the
    main loop for ticking the snapshots, another one for managing the retry
    event queue, and yet another one for managing the maintenance
    operations.


    What I think I'd like to have is a user process supervisor as a
    postmaster child, its job would be to start and stop the user processes
    at the right time frames, and notice their death. A restart policy
    should be attached to each child, which is either permanent, transient
    or temporary. To avoid infinitely restarting a process, the supervisor
    has 2 GUCs, supervisor_max_restarts in supervisor_max_time. Being unable
    to manage a "user" permanent child process (worker) would trigger a
    postmaster stop.

    All of this is heavily inspired by the Erlang approach, which I've found
    simple and effective:
    http://erlang.org/doc/man/supervisor.html

    The supervised processes will have to offer a main entry point, which
    will get called once the supervisor has forked, in the child process,
    and must be prepared to receive SIGHUP, SIGINT and SIGTERM signals.

    The setup will get done with the existing custom_variable_classes, and
    more generally I guess we're reusing the PGXS and custom .so
    infrastructure (shared_preload_libraries).

    The main good reason to have this is to allow extension authors to
    develop autonomous daemon in a portable way, benefiting from all those
    efforts PostgreSQL made to have a fork() model available on windows et
    al.

    I guess we need a way to start the same supervised daemon extension code
    more than once too, for example several pgbouncer setups on different
    ports in different pooling modes.
    I still haven't seen a good reason for not using cron or Task Scheduler
    or other standard tools.
    We're past the scheduler alone. You won't turn archive_command,
    restore_command, walsender, walreceiver, pgbouncer or PGQ as a cron job,
    but you could have them managed by the postmaster, as plugins.

    Your good reason would be less code to keep an eye on :)

    Back to the scheduling, you can backup the maintenance schedule with the
    database itself. If all they do is call some function, which in my case
    the only exception is pg_dump, then you don't need to re-validate then
    when you upgrade your OS, or migrate from CentOS to debian or from
    developer station running windows to production server running some Unix
    variant.

    Once more, nothing you couldn't implement already. Maybe PostgreSQL is
    growing fast enough that now is the time to look at how to enable non
    core things to be easily shipped with the core product?

    Do we need a PostgreSQL distribution? I know David Wheeler's opinion on
    that, and think PGAN + pg_restore friendly extensions + supervised
    helper daemons will be huge enablers.

    Regards,
    --
    dim
  • Tom Lane at Feb 22, 2010 at 9:37 pm

    Dimitri Fontaine writes:
    Tom Lane <tgl@sss.pgh.pa.us> writes:
    This seems like a solution in search of a problem to me. The most
    salient aspect of such processes is that they would necessarily run
    as the postgres user
    The precedent are archive and restore command. They do run as postgres
    user too, don't they?
    Well, yeah, but you *must* trust those commands because every last bit
    of your database content passes through their hands. That is not an
    argument why you need to trust a scheduling facility --- much less the
    tasks it schedules.

    I still say that every use case so far presented here would be equally
    if not better served outside the database. Putting it inside just
    creates more failure scenarios and security risks.

    regards, tom lane
  • Dimitri Fontaine at Feb 22, 2010 at 9:53 pm

    Tom Lane writes:
    Well, yeah, but you *must* trust those commands because every last bit
    of your database content passes through their hands. That is not an
    argument why you need to trust a scheduling facility --- much less the
    tasks it schedules.
    It seems to me that CREATE FUNCTION maintenance.foo() ... SECURITY
    DEFINER means that I can schedule tasks that will not run a
    superuser. On the reliability, see above.
    I still say that every use case so far presented here would be equally
    if not better served outside the database. Putting it inside just
    creates more failure scenarios and security risks.
    I can understand why you say that, but I'll have to disagree.

    The fact that the database server is still available when pgbouncer
    crashes, for example, still means that none of my applications are
    able to connect.

    When the current PGQ (or slony) code crashes, it's already C loaded code
    that crashes, and it already takes PostgreSQL down with it.

    I'm not the security oriented paranoid^W guy, so I won't ever try to
    argue about that world, and not with you.

    All in all, when the daemons I'm considering running as user processes
    do crash, the fact that PostgreSQL is still alive means nothing for
    me. Have its supervisor trigger a fast shutdown and restart sounds way
    more reliable from here, the alternative being some alerting system
    wakes me up and I get to restart the failed services while my
    application is not available, but PostgreSQL is (but for no one).

    Regards,
    --
    dim
  • Jaime Casanova at Feb 22, 2010 at 11:23 pm

    On Mon, Feb 22, 2010 at 4:37 PM, Tom Lane wrote:
    Dimitri Fontaine <dfontaine@hi-media.com> writes:
    Tom Lane <tgl@sss.pgh.pa.us> writes:
    This seems like a solution in search of a problem to me.  The most
    salient aspect of such processes is that they would necessarily run
    as the postgres user
    The precedent are archive and restore command. They do run as postgres
    user too, don't they?
    Well, yeah, but you *must* trust those commands because every last bit
    of your database content passes through their hands.  That is not an
    argument why you need to trust a scheduling facility --- much less the
    tasks it schedules.
    Ok, let's forget the scheduler for a minute... this is not about that
    anymore, is about having the ability to launch user processes when the
    postmaster is ready to accept connections, this could be used for
    launching an scheduler but also for launching other tools (ie:
    pgbouncer, slon daemons, etc)

    --
    Atentamente,
    Jaime Casanova
    Soporte y capacitación de PostgreSQL
    Asesoría y desarrollo de sistemas
    Guayaquil - Ecuador
    Cel. +59387171157
  • David Christensen at Feb 22, 2010 at 11:41 pm

    On Feb 22, 2010, at 5:22 PM, Jaime Casanova wrote:
    On Mon, Feb 22, 2010 at 4:37 PM, Tom Lane wrote:
    Dimitri Fontaine <dfontaine@hi-media.com> writes:
    Tom Lane <tgl@sss.pgh.pa.us> writes:
    This seems like a solution in search of a problem to me. The most
    salient aspect of such processes is that they would necessarily run
    as the postgres user
    The precedent are archive and restore command. They do run as
    postgres
    user too, don't they?
    Well, yeah, but you *must* trust those commands because every last
    bit
    of your database content passes through their hands. That is not an
    argument why you need to trust a scheduling facility --- much less
    the
    tasks it schedules.
    Ok, let's forget the scheduler for a minute... this is not about that
    anymore, is about having the ability to launch user processes when the
    postmaster is ready to accept connections, this could be used for
    launching an scheduler but also for launching other tools (ie:
    pgbouncer, slon daemons, etc)
    Just a few questions off the top of my head:

    What are the semantics? If you launch a process and it crashes, is
    the postmaster responsible for relaunching it? Is there any
    additional monitoring of that process it would be expected to do?
    What defined hooks/events would you want to launch these processes
    from? If you have to kill a backend postmaster, do the auxiliary
    processes get killed as well, and with what signal? Are they killed
    when you stop the postmaster, and are they guaranteed to have stopped
    at this point? Can failing to stop prevent/delay the shutdown/restart
    of the server? Etc.

    Regards,

    David
    --
    David Christensen
    End Point Corporation
    david@endpoint.com
  • Alvaro Herrera at Feb 23, 2010 at 12:48 am

    David Christensen wrote:

    What are the semantics? If you launch a process and it crashes, is
    the postmaster responsible for relaunching it? Is there any
    additional monitoring of that process it would be expected to do?
    What defined hooks/events would you want to launch these processes
    from? If you have to kill a backend postmaster, do the auxiliary
    processes get killed as well, and with what signal? Are they killed
    when you stop the postmaster, and are they guaranteed to have
    stopped at this point? Can failing to stop prevent/delay the
    shutdown/restart of the server? Etc.
    I think most of these should be defined by the called process, i.e.
    there needs to be a way to pass flags to postmaster. For example, some
    processes will need to cause a full postmaster restart, while most will
    not. For those that do, we need some robustness check; for example we
    could require that they participate in the PMChildSlot mechanism.

    Regarding hooks or events, I think postmaster should be kept simple:
    launch at start, reset at crash recovery, kill at stop. Salt and pepper
    allowed but that's about it -- more complex ingredients are out of the
    question due to added code to postmaster, which we want to be as robust
    as possible and thus not able to cook much of anything else.

    Now, if you run a postmaster with such a thing attached, you get no
    support here on crash reports unless you can prove the crash can be
    reproduced with it turned off (i.e. taint mode).

    --
    Alvaro Herrera http://www.CommandPrompt.com/
    PostgreSQL Replication, Consulting, Custom Development, 24x7 support
  • Tom Lane at Feb 23, 2010 at 5:02 am

    Alvaro Herrera writes:
    Regarding hooks or events, I think postmaster should be kept simple:
    launch at start, reset at crash recovery, kill at stop. Salt and pepper
    allowed but that's about it -- more complex ingredients are out of the
    question due to added code to postmaster, which we want to be as robust
    as possible and thus not able to cook much of anything else.
    This is exactly why I think the whole proposal is a nonstarter. It is
    necessarily pushing more complexity into the postmaster, which means
    an overall reduction in system reliability. There are some things
    I'm willing to accept extra postmaster complexity for, but I say again
    that not one single one of the arguments made in this thread are
    convincing reasons to take that risk.

    regards, tom lane
  • Dimitri Fontaine at Feb 23, 2010 at 9:41 am

    Tom Lane writes:
    Alvaro Herrera <alvherre@commandprompt.com> writes:
    Regarding hooks or events, I think postmaster should be kept simple:
    launch at start, reset at crash recovery, kill at stop.
    This is exactly why I think the whole proposal is a nonstarter. It is
    necessarily pushing more complexity into the postmaster, which means
    an overall reduction in system reliability.
    I was under the illusion that having a separate "supervisor" process
    child of postmaster to care about the user daemons would protect
    postmaster itself. At least the only thing it'd have to do is start a
    new child. Then let it care.

    How much that would give us as far as postmaster reliability is concerned?
    --
    dim
  • Alvaro Herrera at Feb 23, 2010 at 2:34 pm

    Dimitri Fontaine wrote:
    Tom Lane <tgl@sss.pgh.pa.us> writes:
    Alvaro Herrera <alvherre@commandprompt.com> writes:
    Regarding hooks or events, I think postmaster should be kept simple:
    launch at start, reset at crash recovery, kill at stop.
    This is exactly why I think the whole proposal is a nonstarter. It is
    necessarily pushing more complexity into the postmaster, which means
    an overall reduction in system reliability.
    I was under the illusion that having a separate "supervisor" process
    child of postmaster to care about the user daemons would protect
    postmaster itself. At least the only thing it'd have to do is start a
    new child. Then let it care.
    The problem I have with this design is that those processes are then not
    direct children of postmaster itself, which is a problem when it wants
    them to stop and such. (This is why autovacuum workers are started by
    postmaster and not by the launcher directly. If I knew of a way to make
    it work reliably, I wouldn't have bothered with that signalling
    mechanism, which is quite fragile and gets its fair share of bug
    reports.)

    (Hmm, but then, autovacuum workers are backends and so they need to be
    more closely linked to postmaster. These other processes needn't be.)

    --
    Alvaro Herrera http://www.CommandPrompt.com/
    The PostgreSQL Company - Command Prompt, Inc.
  • Simon Riggs at Feb 23, 2010 at 2:54 pm

    On Tue, 2010-02-23 at 00:02 -0500, Tom Lane wrote:
    Alvaro Herrera <alvherre@commandprompt.com> writes:
    Regarding hooks or events, I think postmaster should be kept simple:
    launch at start, reset at crash recovery, kill at stop. Salt and pepper
    allowed but that's about it -- more complex ingredients are out of the
    question due to added code to postmaster, which we want to be as robust
    as possible and thus not able to cook much of anything else.
    This is exactly why I think the whole proposal is a nonstarter. It is
    necessarily pushing more complexity into the postmaster, which means
    an overall reduction in system reliability. There are some things
    I'm willing to accept extra postmaster complexity for, but I say again
    that not one single one of the arguments made in this thread are
    convincing reasons to take that risk.
    Nobody wants to weigh down and sink the postmaster.

    What is wanted is a means to integrate parts of a solution that are
    already intimately tied to Postgres. Non-integration makes the whole
    Postgres-based solution less reliable and harder to operate. Postgres
    should not assume that it is the only aspect of the server: in almost
    all other DBMS features are built into the database: session pools,
    trigger-based replication, scheduling, etc..

    --
    Simon Riggs www.2ndQuadrant.com
  • Alvaro Herrera at Feb 23, 2010 at 3:00 pm

    Simon Riggs wrote:

    What is wanted is a means to integrate parts of a solution that are
    already intimately tied to Postgres. Non-integration makes the whole
    Postgres-based solution less reliable and harder to operate. Postgres
    should not assume that it is the only aspect of the server: in almost
    all other DBMS features are built into the database: session pools,
    trigger-based replication, scheduling, etc..
    Yeah, back when autovac wasn't integrated, it was a pain to work with --
    the need to start and stop it separately from postmaster was a hard task
    to manage. The Debian init script used to have some very ugly hacks to
    work with it. Having it now integrated makes thing *so* much easier.
    Giving postmaster the ability to manage other processes (whether
    directly or through a supervisor) would make people lives simpler as
    well.

    I think it was Dimitri who said that even if postmaster is running but
    the connection pooler is down, the system is effectively down for some
    users, and thus you really want postmaster to be able to do something
    about it. I cannot agree more. (You can set up monitoring and such,
    but this is merely working around the fact that it doesn't work in the
    first place.)

    --
    Alvaro Herrera http://www.CommandPrompt.com/
    The PostgreSQL Company - Command Prompt, Inc.
  • Steve Atkins at Feb 23, 2010 at 4:02 pm

    On Feb 22, 2010, at 9:02 PM, Tom Lane wrote:

    Alvaro Herrera <alvherre@commandprompt.com> writes:
    Regarding hooks or events, I think postmaster should be kept simple:
    launch at start, reset at crash recovery, kill at stop. Salt and pepper
    allowed but that's about it -- more complex ingredients are out of the
    question due to added code to postmaster, which we want to be as robust
    as possible and thus not able to cook much of anything else.
    This is exactly why I think the whole proposal is a nonstarter. It is
    necessarily pushing more complexity into the postmaster, which means
    an overall reduction in system reliability. There are some things
    I'm willing to accept extra postmaster complexity for, but I say again
    that not one single one of the arguments made in this thread are
    convincing reasons to take that risk.
    Would having a higher level process manager be adequate - one
    that spawns the postmaster and a list of associated processes
    (queue manager, job scheduler, random user daemons that are
    used for database application maintenance). It sounds like
    something like that would be able to start up and shut down
    an entire family of daemons, of which the postmaster is the major
    one, gracefully.

    It could also be developed almost independently of core code,
    at most it might benefit from a way for the postmaster to tell it
    when it's started up successfully.

    Cheers,
    Steve
  • Alvaro Herrera at Feb 23, 2010 at 4:08 pm

    Steve Atkins wrote:

    Would having a higher level process manager be adequate - one
    that spawns the postmaster and a list of associated processes
    (queue manager, job scheduler, random user daemons that are
    used for database application maintenance). It sounds like
    something like that would be able to start up and shut down
    an entire family of daemons, of which the postmaster is the major
    one, gracefully.
    Sort of a super-pg_ctl, eh? Hmm, that sounds like it could work ...
    It could also be developed almost independently of core code,
    at most it might benefit from a way for the postmaster to tell it
    when it's started up successfully.
    Right -- pg_ping pops up again ...

    I think it'd also want to be signalled when postmaster undergoes a
    restart cycle, so that it can handle the other daemons appropriately.

    --
    Alvaro Herrera http://www.CommandPrompt.com/
    The PostgreSQL Company - Command Prompt, Inc.
  • Jaime Casanova at Feb 25, 2010 at 5:53 pm

    On Tue, Feb 23, 2010 at 11:08 AM, Alvaro Herrera wrote:
    Steve Atkins wrote:
    Would having a higher level process manager be adequate - one
    that spawns the postmaster and a list of associated processes
    (queue manager, job scheduler, random user daemons that are
    used for database application maintenance). It sounds like
    something like that would be able to start up and shut down
    an entire family of daemons, of which the postmaster is the major
    one, gracefully.
    Sort of a super-pg_ctl, eh?  Hmm, that sounds like it could work ...
    Summarizing:

    so we want some kind of super postmaster that starts some processes
    (including the postgres' postmaster itself), and track their
    availability.
    - processes that doesn't need to connect to shared memory should start
    here (ie: pgagent, slony daemons, pgbouncer, LISTEN applications, etc)
    - processes that need to connect to shared memory should be childs of
    postgres' postmaster

    is this so different from what the postgres' postmaster itself does? i
    mean, can we reuse that code?
    this project of course growth beyond my known abilities, so while i
    will try it (if anyone seems like he can takle it please go for it)...
    maybe we can add this to the TODO if seems acceptable? specially, i'd
    love to hear Tom's opinion on this one...

    --
    Atentamente,
    Jaime Casanova
    Soporte y capacitación de PostgreSQL
    Asesoría y desarrollo de sistemas
    Guayaquil - Ecuador
    Cel. +59387171157
  • Merlin Moncure at Feb 22, 2010 at 9:35 pm

    On Mon, Feb 22, 2010 at 2:53 PM, Tom Lane wrote:
    I still haven't seen a good reason for not using cron or Task Scheduler
    or other standard tools.
    *) provided and popular feature in higher end databases

    *) the audience you cater to expects it

    *) IMO, it should simply not be necessary to incorporate a secondary
    scripting environment to do things like vacuum and backups

    *) portable. for example, you can dump a database on linux and restore
    to windows without re-implementing your scheduler/scripts

    as a consequence,
    *) off the shelf utilities/pgfoundry projects, etc can rely and
    utilize scheduling behavior

    merlin
  • Takahiro Itagaki at Feb 23, 2010 at 5:47 am

    Jaime Casanova wrote:

    integrated_user_processes = 'x, y, z'
    API would be user_process_startup(), user_process_shutdown().
    FYI, pg_statsinfo version 2 emulates the same behavior with
    shared_preload_libraries and spawn an user process in _PG_init().
    But it's still ugly and not so reliable. Official APIs would be better.

    http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/pgstatsinfo/pg_statsinfo/lib/libstatsinfo.c

    It came from voices from end users that an extension should behave as
    a postgres internal daemon rather than a wrapper of postgres.

    Regards,
    ---
    Takahiro Itagaki
    NTT Open Source Software Center

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedFeb 22, '10 at 6:34p
activeFeb 25, '10 at 5:53p
posts22
users10
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase