FAQ

On 21 May 2012 20:40, Stephen Frost wrote:

This is important. I like the idea of breaking down the barriers
between databases to allow it to be an option for one backend to
access tables in multiple databases. The current mechanism doesn't
actually prevent looking at data from other databases using internal
APIs, so full security doesn't exist. It's a very common user
requirement to wish to join tables stored in different databases,
which ought to be possible more cleanly with correct privileges.
That's really a whole different ball of wax and I don't believe what
Robert was proposing would actually allow that to happen due to the
other database-level things which are needed to keep everything
consistent...  That's my understanding, anyway.  I'd be happy as anyone
if we could actually make it work, but isn't like the SysCache stuff per
database?  Also, cross-database queries would actually make it more
difficult to have per-database roles, which is one thing that I was
hoping we might be able to work into this, though perhaps we could have
a shared roles table and a per-database roles table and only 'global'
roles would be able to issue cross-database queries..
So collecting a few requirements from various places:

* Ability to have a Role that can only access one Database

* Allow user info to be dumped with a database, to make a db
completely self-consistent

* Allow databases to be transportable

* Allow users to access tables in >1 database easily, with appropriate rights.


I don't see any reasons why these things would be against each other.

The main objectives are to make a Database a more easily used
administrative grouping. At present, people who use multiple Databases
face many problems - they aren't as separate as you'd like, but
neither can they be ignored when required.

The idea of "one main database per session" is fine, but wiring it so
closely into the backend has a few disadvantages, many of them weird
internal things.

Are there arguments against those requirements before we spend time on
design/thinking?

--
Simon Riggs                   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Search Discussions

  • José Luis Tallón at May 22, 2012 at 11:06 am

    On 22/05/12 11:46, Simon Riggs wrote:
    On 21 May 2012 20:40, Stephen Frostwrote:
    This is important. I like the idea of breaking down the barriers
    between databases to allow it to be an option for one backend to
    access tables in multiple databases. The current mechanism doesn't
    actually prevent looking at data from other databases using internal
    APIs, so full security doesn't exist. It's a very common user
    requirement to wish to join tables stored in different databases,
    which ought to be possible more cleanly with correct privileges.
    That's really a whole different ball of wax and I don't believe what
    Robert was proposing would actually allow that to happen due to the
    other database-level things which are needed to keep everything
    consistent... That's my understanding, anyway. I'd be happy as anyone
    if we could actually make it work, but isn't like the SysCache stuff per
    database? Also, cross-database queries would actually make it more
    difficult to have per-database roles, which is one thing that I was
    hoping we might be able to work into this, though perhaps we could have
    a shared roles table and a per-database roles table and only 'global'
    roles would be able to issue cross-database queries..
    IMVHO: s/database/schema/g does resolve many of the problems that you
    were referring to... and 'dblink' should solve the rest, right?
    Please, feel free to point out what I am (most probably) not considering
    -- not experienced enough yet :)

    On the other hand, the separation of databases allows what otherwise
    would only be possible by using multiple instances of the database
    server (à la Oracle, AFAIK ) -- save for resource management, but that
    is another question whatsoever.
    So collecting a few requirements from various places:

    * Ability to have a Role that can only access one Database
    Yes, please
    * Allow user info to be dumped with a database, to make a db
    completely self-consistent +1
    * Allow databases to be transportable
    +1. Ideally, the binary format could be make platform-independent, so
    that a snapshot/rsync of the cluster can span architectures easily.
    AFAIK, endianness-change is relatively cheap on current processors [1
    ASM instruction?] and it's not like we are memory-mapping tuples anyway
    (TOASTed values can certainly not be mapped), so it shouldn't be
    noticeable performance-wise.
    * Allow users to access tables in>1 database easily, with appropriate rights.
    See above, but I am probably wrong ...

    I don't see any reasons why these things would be against each other.
    Look quite orthogonal to me.
    The main objectives are to make a Database a more easily used
    administrative grouping. At present, people who use multiple Databases
    face many problems - they aren't as separate as you'd like, but
    neither can they be ignored when required.

    The idea of "one main database per session" is fine, but wiring it so
    closely into the backend has a few disadvantages, many of them weird
    internal things.

    Are there arguments against those requirements before we spend time on
    design/thinking?
    OTOH, the postmaster/cluster - session/database coupling looks to me
    clean, simple... and seems to make the code simpler. This is can only be
    good (but again, I don't know enough yet to be sure)

    Regards,

    Jose Luis Tallon
  • Simon Riggs at May 22, 2012 at 11:24 am

    On 22 May 2012 12:05, José Luis Tallón wrote:

    IMVHO:  s/database/schema/g does resolve many of the problems that you were
    referring to... and 'dblink' should solve the rest, right?
    Please, feel free to point out what I am (most probably) not considering --
    not experienced enough yet :)
    The choice of schema/database is an important one. If you get it
    wrong, you are in major difficulty. In many cases schemas would be a
    better choice, but not in all cases. So I'm interested in solving the
    problems for people who have multiple databases on same server.

    dblink is the only solution, but its very poor way to do this when we
    have 2 databases on same server.

    My thinking is that reaching out to multiple databases is actually
    mostly easy, except in a few places where dbid is hardwired into the
    backend.
    On the other hand, the separation of databases allows what otherwise would
    only be possible by using multiple instances of the database server (à la
    Oracle, AFAIK ) -- save for resource management, but that is another
    question whatsoever.
    Separation of databases is fine. I have no intention to change that,
    as long as the user wishes that.

    --
    Simon Riggs                   http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
  • José Luis Tallón at May 22, 2012 at 12:05 pm

    On 22/05/12 13:24, Simon Riggs wrote:
    On 22 May 2012 12:05, José Luis Tallónwrote:
    IMVHO: s/database/schema/g does resolve many of the problems that you were
    referring to... and 'dblink' should solve the rest, right?
    Please, feel free to point out what I am (most probably) not considering --
    not experienced enough yet :)
    The choice of schema/database is an important one. If you get it
    wrong, you are in major difficulty. In many cases schemas would be a
    better choice, but not in all cases. So I'm interested in solving the
    problems for people who have multiple databases on same server.
    Ok. Understood.
    Thank you for the clarification
    dblink is the only solution, but its very poor way to do this when we
    have 2 databases on same server.

    My thinking is that reaching out to multiple databases is actually
    mostly easy, except in a few places where dbid is hardwired into the
    backend.
    The only drawback I see is that it might weaken the separation.

    Even though arguably a kludge, dblink could have a "shortcut" added,
    whereby connections to another database within the same cluster would be
    serviced directly within the backend, as opossed to opening a new db
    connection. This is effectively a fastpath within dblink, which
    optimizes a relatively common case while at the same time not loosing
    generality.
    On the other hand, the separation of databases allows what otherwise would
    only be possible by using multiple instances of the database server (à la
    Oracle, AFAIK ) -- save for resource management, but that is another
    question whatsoever.
    Separation of databases is fine. I have no intention to change that,
    as long as the user wishes that.
    Perfect.

    Thanks,

    Jose Luis Tallon
  • Florian Pflug at May 22, 2012 at 11:35 am

    On May22, 2012, at 11:46 , Simon Riggs wrote:
    * Ability to have a Role that can only access one Database

    * Allow user info to be dumped with a database, to make a db
    completely self-consistent
    These two could be achieved by having database-local roles I think.
    * Allow databases to be transportable
    That would be very useful, but extremely hard to do unless we switch to
    per-database XID spaces. Or unless we're content with having databases
    only be transportable after some special "MAKE TRANSPORTABLE" operation,
    which would freeze all tuples and make the database read-only.
    * Allow users to access tables in >1 database easily, with appropriate rights.
    That one I'm very sceptical about. In the long run, I think we want better
    separation of databases, not less, and this requirement carries a huge risk
    of standing in the way of that. Also, I think that once we integrate the postgres
    FDW into core (that's the plan in the long run, right?), we're going to get
    a good approximation of that essentially for free.

    best regards,
    Florian Pflug
  • Simon Riggs at May 22, 2012 at 11:47 am

    On 22 May 2012 12:35, Florian Pflug wrote:

    * Allow users to access tables in >1 database easily, with appropriate rights.
    That one I'm very sceptical about. In the long run, I think we want better
    separation of databases, not less, and this requirement carries a huge risk
    of standing in the way of that. Also, I think that once we integrate the postgres
    FDW into core (that's the plan in the long run, right?), we're going to get
    a good approximation of that essentially for free.
    It's a poor approximation of it, free or not.

    If it actually worked well, I'd be happy. It doesn't. No proper
    transaction support, no session pool, poor planning etc etc. At best
    its a band-aid, not something sufficiently good to recommend for
    general production use.

    FDWs are very good but aren't good enough for intra-database usage.

    --
    Simon Riggs                   http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
  • José Luis Tallón at May 22, 2012 at 12:14 pm

    On 22/05/12 13:47, Simon Riggs wrote:
    On 22 May 2012 12:35, Florian Pflugwrote:
    * Allow users to access tables in>1 database easily, with appropriate rights.
    That one I'm very sceptical about. In the long run, I think we want better
    separation of databases, not less, and this requirement carries a huge risk
    of standing in the way of that. Also, I think that once we integrate the postgres
    FDW into core (that's the plan in the long run, right?), we're going to get
    a good approximation of that essentially for free.
    It's a poor approximation of it, free or not.

    If it actually worked well, I'd be happy. It doesn't. No proper
    transaction support, no session pool, poor planning etc etc. At best
    its a band-aid, not something sufficiently good to recommend for
    general production use.
    - Transaction support: should be fixed, I guess.

    - Session pool: is this really needed? I would it externally -- for
    example, by specifying a connection string to a pgPool/pgBouncer as
    opposed to directly to the db server.

    - Planning: add a tunable specifying a higher cost (with a exception for
    cluster-local dblinks, which would have a lower cost), and the rest
    should be straightforward. Of course, planning would'nt be as accurate
    ---we can't access the other db statistics in order to potentially
    rewrite conditions---, but I don't think that would be a proper approach
    (separation concerns, again)
    FDWs are very good but aren't good enough for intra-database usage.
    The optimization I just proposed (plus the required fixes to FDW) might
    very well solve this, while providing useful enhancements for all users,
    whether they are accessing cluster-local databases or not (or even truly
    foreign datasources)


    Regards,

    Jose Luis Tallon
  • Florian Pflug at May 22, 2012 at 12:23 pm

    On May22, 2012, at 13:47 , Simon Riggs wrote:
    On 22 May 2012 12:35, Florian Pflug wrote:
    * Allow users to access tables in >1 database easily, with appropriate rights.
    That one I'm very sceptical about. In the long run, I think we want better
    separation of databases, not less, and this requirement carries a huge risk
    of standing in the way of that. Also, I think that once we integrate the postgres
    FDW into core (that's the plan in the long run, right?), we're going to get
    a good approximation of that essentially for free.
    It's a poor approximation of it, free or not.

    If it actually worked well, I'd be happy. It doesn't. No proper
    transaction support, no session pool, poor planning etc etc. At best
    its a band-aid, not something sufficiently good to recommend for
    general production use.
    These all sound fixable, though.
    FDWs are very good but aren't good enough for intra-database usage.
    OTOH, cross-database queries would by design be limited to databases
    within one cluster, whereas a FDW-based solution would not. I don't really see
    the different between telling people "put all your tables into one database
    if you need to access them from within one session" and "put all your databases
    on one server if you need to access them from within one session".

    Plus, the more tightly different databases in the same cluster are coupled,
    the more people will setup one cluster per database for performance reasons.
    Then, when they discovered they need inter-database queries after all, we'll
    again have to tell them "well, then don't set things up the way you have".

    If we want to make it easier for people to migrate from multiple databases
    to a single database with multiple schemas, maybe we should look into allowing
    nested schemas? AFAIK, the main reason not to do that are ambiguities in the
    meaning of identifiers, which cross-database queries would have to deal with
    also.

    best regards,
    Florian Pflug
  • Robert Haas at May 22, 2012 at 11:56 am

    On Tue, May 22, 2012 at 7:35 AM, Florian Pflug wrote:
    * Allow users to access tables in >1 database easily, with appropriate rights.
    That one I'm very sceptical about. In the long run, I think we want better
    separation of databases, not less, and this requirement carries a huge risk
    of standing in the way of that. Also, I think that once we integrate the postgres
    FDW into core (that's the plan in the long run, right?), we're going to get
    a good approximation of that essentially for free.
    +1.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Andrew Dunstan at May 22, 2012 at 12:40 pm

    On 05/22/2012 07:56 AM, Robert Haas wrote:
    On Tue, May 22, 2012 at 7:35 AM, Florian Pflugwrote:
    * Allow users to access tables in>1 database easily, with appropriate rights.
    That one I'm very sceptical about. In the long run, I think we want better
    separation of databases, not less, and this requirement carries a huge risk
    of standing in the way of that. Also, I think that once we integrate the postgres
    FDW into core (that's the plan in the long run, right?), we're going to get
    a good approximation of that essentially for free.
    +1.


    That seems to be leaving aside the fact that we don't currently have any
    notion of how to allow FDWs to write the foreign tables.

    What is more, isn't the postgres FDW about talking to any postgres
    source? If so, does it have special smarts for when we are talking to
    ourselves? And if it doesn't then it seems unlikely to be an acceptable
    substitute for allowing talking direct to a sibling database.

    I'm not at all yet sold on Simon's plan, but I'm skeptical that an FDW
    would adequately meet the case if we wanted to go that way.

    cheers

    andrew
  • Robert Haas at May 22, 2012 at 1:06 pm

    On Tue, May 22, 2012 at 8:40 AM, Andrew Dunstan wrote:
    That seems to be leaving aside the fact that we don't currently have any
    notion of how to allow FDWs to write the foreign tables.

    What is more, isn't the postgres FDW about talking to any postgres source?
    If so, does it have special smarts for when we are talking to ourselves? And
    if it doesn't then it seems unlikely to be an acceptable substitute for
    allowing talking direct to a sibling database.

    I'm not at all yet sold on Simon's plan, but I'm skeptical that an FDW would
    adequately meet the case if we wanted to go that way.
    Well, I don't think anyone is claiming that FDWs as they exist today
    solve all of the problems in this area. But I think that improving
    FDWs is a more promising line of attack than trying to make backends
    talk to multiple databases. Doing the latter will require massive
    surgery on the relcache, the catcache, most of the secondary catalog
    caches, the ProcArray, and every portion of the backend that thinks an
    OID uniquely identifies an SQL object. Basically, they'd all need
    database OID as an additional key field, which is undesirable for
    performance reasons even if there were no issue of code churn.

    So I'd rather see us put the effort into pgsql_fdw, which, as Florian
    says, will also let us talk to a completely separate server. If
    you've got multiple databases in the same cluster and really need to
    be doing queries across all of them, that's what schemas are
    supposedly for. Now I know that people feel that doesn't work as well
    as it needs to, but there again I think it would be easier to fix
    schemas than to make cross-database queries work.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Simon Riggs at May 22, 2012 at 5:28 pm

    On 22 May 2012 14:05, Robert Haas wrote:
    On Tue, May 22, 2012 at 8:40 AM, Andrew Dunstan wrote:
    That seems to be leaving aside the fact that we don't currently have any
    notion of how to allow FDWs to write the foreign tables.

    What is more, isn't the postgres FDW about talking to any postgres source?
    If so, does it have special smarts for when we are talking to ourselves? And
    if it doesn't then it seems unlikely to be an acceptable substitute for
    allowing talking direct to a sibling database.

    I'm not at all yet sold on Simon's plan, but I'm skeptical that an FDW would
    adequately meet the case if we wanted to go that way.
    Well, I don't think anyone is claiming that FDWs as they exist today
    solve all of the problems in this area.  But I think that improving
    FDWs is a more promising line of attack than trying to make backends
    talk to multiple databases.  Doing the latter will require massive
    surgery on the relcache, the catcache, most of the secondary catalog
    caches, the ProcArray, and every portion of the backend that thinks an
    OID uniquely identifies an SQL object.  Basically, they'd all need
    database OID as an additional key field, which is undesirable for
    performance reasons even if there were no issue of code churn.
    Ack, part from the bit about OIDs no longer being unique. That might
    be an upgrade issue but its obviously something we wouldn't allow if
    we did that.

    I'm not sure I see changing the caches as being massive surgery.
    Perhaps we could just bypass them altogether.
    So I'd rather see us put the effort into pgsql_fdw, which, as Florian
    says, will also let us talk to a completely separate server.  If
    you've got multiple databases in the same cluster and really need to
    be doing queries across all of them, that's what schemas are
    supposedly for.  Now I know that people feel that doesn't work as well
    as it needs to, but there again I think it would be easier to fix
    schemas than to make cross-database queries work.
    We're a very long way from making that work well. IMHO easily much
    further than direct access.

    If I have a customer with 1 database per user, how do they run a query
    against 100 user tables? It would require 100 connections to the
    database. Doing that would require roughly x100 the planning time and
    x100 the connection delay. Larger SQL statements pass their results
    between executor steps using libpq rather than direct calls.

    I find it hard to believe FDWs will ever work sufficiently well to fix
    those problems.

    --
    Simon Riggs                   http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
  • Josh Berkus at May 22, 2012 at 5:35 pm

    If I have a customer with 1 database per user, how do they run a query
    against 100 user tables? It would require 100 connections to the
    database. Doing that would require roughly x100 the planning time and
    x100 the connection delay. Larger SQL statements pass their results
    between executor steps using libpq rather than direct calls.
    Why is this hypothetical customer using separate databases? This really
    seems like a case of "doctor, it hurts when I do this".

    --
    Josh Berkus
    PostgreSQL Experts Inc.
    http://pgexperts.com
  • Simon Riggs at May 22, 2012 at 5:43 pm

    On 22 May 2012 18:35, Josh Berkus wrote:
    If I have a customer with 1 database per user, how do they run a query
    against 100 user tables? It would require 100 connections to the
    database. Doing that would require roughly x100 the planning time and
    x100 the connection delay. Larger SQL statements pass their results
    between executor steps using libpq rather than direct calls.
    Why is this hypothetical customer using separate databases?  This really
    seems like a case of "doctor, it hurts when I do this".
    Databases are great for separating things, but sometimes you want to
    un-separate them in a practical way.

    I'm surprised that you're so negative about an ease of use feature. I
    had understood you cared about fixing problems experienced by our
    developers.

    --
    Simon Riggs                   http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
  • Daniel Farina at May 22, 2012 at 5:52 pm

    On Tue, May 22, 2012 at 10:43 AM, Simon Riggs wrote:
    On 22 May 2012 18:35, Josh Berkus wrote:

    If I have a customer with 1 database per user, how do they run a query
    against 100 user tables? It would require 100 connections to the
    database. Doing that would require roughly x100 the planning time and
    x100 the connection delay. Larger SQL statements pass their results
    between executor steps using libpq rather than direct calls.
    Why is this hypothetical customer using separate databases?  This really
    seems like a case of "doctor, it hurts when I do this".
    Databases are great for separating things, but sometimes you want to
    un-separate them in a practical way.
    In my experience, these un-separations are (thankfully) relieved of
    the requirement of consistency between databases, and so the contract
    is much more favorable.

    The planning time problem is quite hard.

    However, I think the connection-delay one is not as hard a one to
    answer: I think multiplexed protocols are going to become the norm in
    the near future (they have been a pretty uncontested part of the SPDY
    protocol, for example, after flow control was added) and have a number
    of useful properties, and it may be time to consider how we're going
    to divorce the notion of one socket implies exactly one backend.

    --
    fdr
  • Robert Haas at May 22, 2012 at 5:51 pm

    On Tue, May 22, 2012 at 1:27 PM, Simon Riggs wrote:
    Ack, part from the bit about OIDs no longer being unique. That might
    be an upgrade issue but its obviously something we wouldn't allow if
    we did that.
    And how exactly are you going to disallow that? We currently enforce
    the uniqueness of OIDs within a database by putting a unique index on
    the relevant system catalog columns, but that clearly won't work as a
    means of guaranteeing cross-database uniqueness. Unless of course you
    put all the catalog entries from all the databases into a single set
    of catalogs; but then they're pretty much the same as the schemas we
    already have.
    I'm not sure I see changing the caches as being massive surgery.
    Perhaps we could just bypass them altogether.
    You're entitled to your opinion on this one, but we have those caches
    for a very good reason: the system runs about 20,000 times slower
    without them.
    So I'd rather see us put the effort into pgsql_fdw, which, as Florian
    says, will also let us talk to a completely separate server.  If
    you've got multiple databases in the same cluster and really need to
    be doing queries across all of them, that's what schemas are
    supposedly for.  Now I know that people feel that doesn't work as well
    as it needs to, but there again I think it would be easier to fix
    schemas than to make cross-database queries work.
    We're a very long way from making that work well. IMHO easily much
    further than direct access.

    If I have a customer with 1 database per user, how do they run a query
    against 100 user tables? It would require 100 connections to the
    database. Doing that would require roughly x100 the planning time and
    x100 the connection delay. Larger SQL statements pass their results
    between executor steps using libpq rather than direct calls.

    I find it hard to believe FDWs will ever work sufficiently well to fix
    those problems.
    If you have got a customer with one database per user, and yet you
    want to run queries across all those tables, then you're using the
    database system for something for which it is not designed, and it's
    probably not going to work very well. That sounds like a use case for
    schemas, or maybe better, some kind of row-level security we don't
    have yet - but hopefully will in 9.3, since KaiGai intends to work on
    it. Databases are designed to act as a firewall, so that somebody
    using one database isn't affected by what happen in some other
    database. Unfortunately, because of shared catalogs, that's not
    completely true, but that's the idea, and if anything we need to
    isolate them more, not less.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Stephen Frost at May 22, 2012 at 1:04 pm

    * Simon Riggs (simon@2ndQuadrant.com) wrote:
    * Ability to have a Role that can only access one Database
    Alright, I'd like to think about this one specifically and solicit
    feedback on the idea that we keep the existing shared role tables but
    add on additional tables for per-database roles.

    In the past, I feel like we've been focused on the idea of moving all
    roles to be per-database instead of per-cluster, which certainly has a
    lot of problems associated with it, but in the end, I think people would
    be really happy with some shared roles and some per-DB roles.

    What would the semantics of that look like though? Which is "preferred"
    when you do a 'grant select' or 'grant role'? Or do we just disallow
    overlaps between per-DB roles and global roles? If we don't allow
    duplicates, I suspect a lot of the other questions suddenly become a lot
    easier to deal with, but would that be too much of a restriction? How
    would you handle migrating an existing global role to a per-database
    role?

    Thanks,

    Stephen
  • Thom Brown at May 22, 2012 at 1:38 pm

    On 22 May 2012 14:04, Stephen Frost wrote:

    What would the semantics of that look like though?  Which is "preferred"
    when you do a 'grant select' or 'grant role'?  Or do we just disallow
    overlaps between per-DB roles and global roles?  If we don't allow
    duplicates, I suspect a lot of the other questions suddenly become a lot
    easier to deal with, but would that be too much of a restriction?  How
    would you handle migrating an existing global role to a per-database
    role?
    Perhaps:

    CREATE [ GLOBAL | LOCAL ] ROLE name [ LIKE role_name ] [ [ WITH ]
    option [ ... ] ]

    Then:

    CREATE LOCAL ROLE localrolename LIKE globalrolename;

    REASSIGN OWNED BY globalrolename TO localrolename;

    Conflicts would occur where localrolename matches an existing local
    role name within the same database, or a global role name, but not a
    local role name within another database. The problem with this,
    however, is that creating global roles would need conflict checks
    against local roles in every database, unless a manifest of all local
    roles were registered globally.

    --
    Thom
  • Tom Lane at May 22, 2012 at 2:09 pm

    Thom Brown writes:
    Conflicts would occur where localrolename matches an existing local
    role name within the same database, or a global role name, but not a
    local role name within another database. The problem with this,
    however, is that creating global roles would need conflict checks
    against local roles in every database, unless a manifest of all local
    roles were registered globally.
    Yeah. The same type of issue arises for the roles' OIDs. You'd really
    want local and global roles to have nonconflicting OIDs, else it's
    necessary to carry around an indication of which type each role is;
    which would be more or less a show-stopper in terms of the number of
    catalogs and internal APIs affected. But I don't currently see any
    nice way to guarantee that if each database has a private table of
    local roles.

    You could possibly make it work if all roles, local and global, are
    stored in a single shared catalog. But that seems pretty ugly.

    BTW, I wonder whether this type of problem isn't also pretty fatal for
    the sorts of hierarchical catalog structures we were speculating about
    at PGCon. When we were talking about that I was sort of assuming that
    the more-closely-nested levels could just hide conflicting objects at
    outer levels, but on reflection that seems a bit scary.

    regards, tom lane
  • Florian Pflug at May 22, 2012 at 3:58 pm

    On May22, 2012, at 16:09 , Tom Lane wrote:
    Thom Brown <thom@linux.com> writes:
    Conflicts would occur where localrolename matches an existing local
    role name within the same database, or a global role name, but not a
    local role name within another database. The problem with this,
    however, is that creating global roles would need conflict checks
    against local roles in every database, unless a manifest of all local
    roles were registered globally.
    Yeah. The same type of issue arises for the roles' OIDs. You'd really
    want local and global roles to have nonconflicting OIDs, else it's
    necessary to carry around an indication of which type each role is;
    which would be more or less a show-stopper in terms of the number of
    catalogs and internal APIs affected. But I don't currently see any
    nice way to guarantee that if each database has a private table of
    local roles.
    Maybe we could simply make all global role's OIDs even, and all local ones
    odd, or something like that.

    best regards,
    Florian Pflug
  • Thom Brown at May 22, 2012 at 4:03 pm

    On 22 May 2012 16:57, Florian Pflug wrote:
    On May22, 2012, at 16:09 , Tom Lane wrote:
    Thom Brown <thom@linux.com> writes:
    Conflicts would occur where localrolename matches an existing local
    role name within the same database, or a global role name, but not a
    local role name within another database.  The problem with this,
    however, is that creating global roles would need conflict checks
    against local roles in every database, unless a manifest of all local
    roles were registered globally.
    Yeah.  The same type of issue arises for the roles' OIDs.  You'd really
    want local and global roles to have nonconflicting OIDs, else it's
    necessary to carry around an indication of which type each role is;
    which would be more or less a show-stopper in terms of the number of
    catalogs and internal APIs affected.  But I don't currently see any
    nice way to guarantee that if each database has a private table of
    local roles.
    Maybe we could simply make all global role's OIDs even, and all local ones
    odd, or something like that.
    Wouldn't that instantly make all previous versions of database
    clusters un-upgradable?

    --
    Thom
  • Tom Lane at May 22, 2012 at 4:19 pm

    Thom Brown writes:
    On 22 May 2012 16:57, Florian Pflug wrote:
    Maybe we could simply make all global role's OIDs even, and all local ones
    odd, or something like that.
    Wouldn't that instantly make all previous versions of database
    clusters un-upgradable?
    IIRC, pg_upgrade doesn't need to force role OIDs to be the same in the
    new cluster, so we could get away with this trick for the specific case
    of roles. It wouldn't work for introducing local/global versions of
    some other types of objects though.

    Another objection is that it wouldn't scale up nicely to multiple levels
    of catalog hierarchy. But maybe local/global is enough.

    regards, tom lane
  • Stephen Frost at May 22, 2012 at 5:28 pm

    On May 22, 2012, at 12:18, Tom Lane wrote:
    Another objection is that it wouldn't scale up nicely to multiple levels
    of catalog hierarchy. But maybe local/global is enough.
    That would be a huge improvement and this wouldn't get in the way of
    any solution to the global oid conflict issue coming along in the
    future..

    We still have the issue of name conflict between the global roles and
    the local roles, right? Unless we allow those to overlap and pick
    whatever is "closest"? Then there is the question of role membership
    and if we would allow that to go between local and global or what the
    semantics of that would look like..

    Thanks,

    Stephen
  • Florian Pflug at May 22, 2012 at 4:24 pm

    On May22, 2012, at 18:03 , Thom Brown wrote:
    On 22 May 2012 16:57, Florian Pflug wrote:
    On May22, 2012, at 16:09 , Tom Lane wrote:
    Thom Brown <thom@linux.com> writes:
    Conflicts would occur where localrolename matches an existing local
    role name within the same database, or a global role name, but not a
    local role name within another database. The problem with this,
    however, is that creating global roles would need conflict checks
    against local roles in every database, unless a manifest of all local
    roles were registered globally.
    Yeah. The same type of issue arises for the roles' OIDs. You'd really
    want local and global roles to have nonconflicting OIDs, else it's
    necessary to carry around an indication of which type each role is;
    which would be more or less a show-stopper in terms of the number of
    catalogs and internal APIs affected. But I don't currently see any
    nice way to guarantee that if each database has a private table of
    local roles.
    Maybe we could simply make all global role's OIDs even, and all local ones
    odd, or something like that.
    Wouldn't that instantly make all previous versions of database
    clusters un-upgradable?
    Only if pg_upgrade needs to preserve the OIDs of roles. I kinda hoped it
    wouldn't, because role OIDs aren't usually stored in non-system tables.

    Hm… thinking about this further, it'd actually be sufficient for all newly
    allocated role OIDs to follow the odd/even rule, if we additionally check
    for conflicts with existing global role OIDs when allocating the OID of a new
    local role. Which is much, much easier than checking for conflicts when
    allocating a global OIDs, because for that you'd have to check against the
    local role OIDs within *all* databases, not just against one shared table.

    best regards,
    Florian Pflug
  • Robert Haas at May 22, 2012 at 2:19 pm

    On Tue, May 22, 2012 at 9:37 AM, Thom Brown wrote:
    On 22 May 2012 14:04, Stephen Frost wrote:
    What would the semantics of that look like though?  Which is "preferred"
    when you do a 'grant select' or 'grant role'?  Or do we just disallow
    overlaps between per-DB roles and global roles?  If we don't allow
    duplicates, I suspect a lot of the other questions suddenly become a lot
    easier to deal with, but would that be too much of a restriction?  How
    would you handle migrating an existing global role to a per-database
    role?
    Perhaps:

    CREATE [ GLOBAL | LOCAL ] ROLE name [ LIKE role_name ] [ [ WITH ]
    option [ ... ] ]

    Then:

    CREATE LOCAL ROLE localrolename LIKE globalrolename;

    REASSIGN OWNED BY globalrolename TO localrolename;

    Conflicts would occur where localrolename matches an existing local
    role name within the same database, or a global role name, but not a
    local role name within another database.  The problem with this,
    however, is that creating global roles would need conflict checks
    against local roles in every database, unless a manifest of all local
    roles were registered globally.
    There are race conditions to worry about, too. In most cases, we rely
    on the btree index machinery as a final backstop against duplicate
    catalog entries. But that doesn't work unless everything's in one
    catalog, nor for anything more complicated than "this set of columns
    taken together should be unique over every role". Even if we were OK
    with incurring the ugliness of storing per-database roles in a shared
    catalog, the uniqueness constraint you'd want is something like "no
    two roles can share the same name unless they have unequal database
    OIDs neither of which is zero", which I don't believe we can enforce
    via the btree machinery, at least not without an expression index that
    won't work in a system catalog anyway.

    In retrospect, I think the idea of shared catalogs was probably a bad
    idea. I think we should have made roles and tablespaces database
    objects rather than shared objects, and come up with some ad-hoc
    method of representing the set of available databases. But that
    decision seems to have been made sometime pre-1996, so the thought of
    changing it now is pretty painful, but I can dream...

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Bruce Momjian at May 24, 2012 at 10:22 pm

    On Tue, May 22, 2012 at 10:19:12AM -0400, Robert Haas wrote:
    In retrospect, I think the idea of shared catalogs was probably a bad
    idea. I think we should have made roles and tablespaces database
    objects rather than shared objects, and come up with some ad-hoc
    method of representing the set of available databases. But that
    decision seems to have been made sometime pre-1996, so the thought of
    changing it now is pretty painful, but I can dream...
    Yes, pre-1996. I think the fact that authentication/user names appear
    in pg_hba.conf really locked the user name idea into global objects, and
    we have never really been able to make a dent in that.

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://enterprisedb.com

    + It's impossible for everything to be true. +
  • Robert Haas at May 26, 2012 at 2:23 am

    On Thu, May 24, 2012 at 6:21 PM, Bruce Momjian wrote:
    On Tue, May 22, 2012 at 10:19:12AM -0400, Robert Haas wrote:
    In retrospect, I think the idea of shared catalogs was probably a bad
    idea.  I think we should have made roles and tablespaces database
    objects rather than shared objects, and come up with some ad-hoc
    method of representing the set of available databases.  But that
    decision seems to have been made sometime pre-1996, so the thought of
    changing it now is pretty painful, but I can dream...
    Yes, pre-1996.  I think the fact that authentication/user names appear
    in pg_hba.conf really locked the user name idea into global objects, and
    we have never really been able to make a dent in that.
    Eh? Why would the presence of usernames in pg_hba.conf mean that they
    have to be global objects?

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Stephen Frost at May 26, 2012 at 2:35 am

    * Robert Haas (robertmhaas@gmail.com) wrote:
    On Thu, May 24, 2012 at 6:21 PM, Bruce Momjian wrote:
    Yes, pre-1996.  I think the fact that authentication/user names appear
    in pg_hba.conf really locked the user name idea into global objects, and
    we have never really been able to make a dent in that.
    Eh? Why would the presence of usernames in pg_hba.conf mean that they
    have to be global objects?
    I havn't had a chance (yet) to look, but perhaps the current code
    attempts to validate the role before figuring out what database is being
    requested? We'd have to essentially invert that, of course, for this..
    One thing I was wondering about is if we're going to have an issue
    supporting things like "tell me what databases exist" (psql -l), which
    connect to the 'postgres' by default, for local-only roles. I'm not
    sure that I actually care, to be honest, but it's something to consider.
    I don't think we should require users to create every local role also in
    postgres, nor do I feel that we should allow connections to postgres by
    any role, nor do I want to break tools which use 'postgres' to basically
    get access to shared catalogs- but I don't see an immediate or easy
    solution..

    Thanks,

    Stephen
  • Bruce Momjian at May 26, 2012 at 3:30 am

    On Fri, May 25, 2012 at 10:34:54PM -0400, Stephen Frost wrote:
    * Robert Haas (robertmhaas@gmail.com) wrote:
    On Thu, May 24, 2012 at 6:21 PM, Bruce Momjian wrote:
    Yes, pre-1996.  I think the fact that authentication/user names appear
    in pg_hba.conf really locked the user name idea into global objects, and
    we have never really been able to make a dent in that.
    Eh? Why would the presence of usernames in pg_hba.conf mean that they
    have to be global objects?
    I havn't had a chance (yet) to look, but perhaps the current code
    attempts to validate the role before figuring out what database is being
    requested? We'd have to essentially invert that, of course, for this..
    One thing I was wondering about is if we're going to have an issue
    supporting things like "tell me what databases exist" (psql -l), which
    connect to the 'postgres' by default, for local-only roles. I'm not
    sure that I actually care, to be honest, but it's something to consider.
    I don't think we should require users to create every local role also in
    postgres, nor do I feel that we should allow connections to postgres by
    any role, nor do I want to break tools which use 'postgres' to basically
    get access to shared catalogs- but I don't see an immediate or easy
    solution..
    Yes. In a simple case, you have a username, you want to validate it
    against LDAP or kerberos --- how do you partition the external
    authentication tool based on database name? Seems like an obvious
    problem to me.

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://enterprisedb.com

    + It's impossible for everything to be true. +
  • Robert Haas at May 28, 2012 at 7:59 pm

    On Fri, May 25, 2012 at 11:12 PM, Bruce Momjian wrote:
    On Fri, May 25, 2012 at 10:34:54PM -0400, Stephen Frost wrote:
    * Robert Haas (robertmhaas@gmail.com) wrote:
    On Thu, May 24, 2012 at 6:21 PM, Bruce Momjian wrote:
    Yes, pre-1996.  I think the fact that authentication/user names appear
    in pg_hba.conf really locked the user name idea into global objects, and
    we have never really been able to make a dent in that.
    Eh?  Why would the presence of usernames in pg_hba.conf mean that they
    have to be global objects?
    I havn't had a chance (yet) to look, but perhaps the current code
    attempts to validate the role before figuring out what database is being
    requested?  We'd have to essentially invert that, of course, for this..
    One thing I was wondering about is if we're going to have an issue
    supporting things like "tell me what databases exist" (psql -l), which
    connect to the 'postgres' by default, for local-only roles.  I'm not
    sure that I actually care, to be honest, but it's something to consider.
    I don't think we should require users to create every local role also in
    postgres, nor do I feel that we should allow connections to postgres by
    any role, nor do I want to break tools which use 'postgres' to basically
    get access to shared catalogs- but I don't see an immediate or easy
    solution..
    Yes.  In a simple case, you have a username, you want to validate it
    against LDAP or kerberos --- how do you partition the external
    authentication tool based on database name?  Seems like an obvious
    problem to me.
    Well, when people try to connect to database "it", you set up
    pg_hba.conf to point them at the Kerberos server, but when they try to
    connect to database "sales", you just use MD5 for that. Or whatever
    your site policy happens to be. I'm not seeing the problem;
    pg_hba.conf already allows different authentication methods for
    different databases.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Tom Lane at May 26, 2012 at 4:19 am

    Stephen Frost writes:
    * Robert Haas (robertmhaas@gmail.com) wrote:
    Eh? Why would the presence of usernames in pg_hba.conf mean that they
    have to be global objects?
    I havn't had a chance (yet) to look, but perhaps the current code
    attempts to validate the role before figuring out what database is being
    requested? We'd have to essentially invert that, of course, for this..
    Even more to the point, what do you do when the "database" column is
    "all", or a list of more than one database name?

    It's possible that we could define this away by saying that only
    globally known usernames can be listed in pg_hba.conf, but I think
    we'll still have implementation problems with doing authentication
    for per-database usernames.

    regards, tom lane
  • Peter Eisentraut at May 27, 2012 at 6:54 pm

    On tis, 2012-05-22 at 10:19 -0400, Robert Haas wrote:
    I think we should have made roles and tablespaces database
    objects rather than shared objects,
    User names are global objects in the SQL standard, which is part of the
    reason that the current setup was never seriously challenged.
  • Robert Haas at May 28, 2012 at 7:56 pm

    On Sun, May 27, 2012 at 2:53 PM, Peter Eisentraut wrote:
    On tis, 2012-05-22 at 10:19 -0400, Robert Haas wrote:
    I think we should have made roles and tablespaces database
    objects rather than shared objects,
    User names are global objects in the SQL standard, which is part of the
    reason that the current setup was never seriously challenged.
    Does the SQL standard really discriminate between the database and the
    cluster? Wow.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Stephen Frost at May 22, 2012 at 2:19 pm

    * Thom Brown (thom@linux.com) wrote:
    Conflicts would occur where localrolename matches an existing local
    role name within the same database, or a global role name, but not a
    local role name within another database. The problem with this,
    however, is that creating global roles would need conflict checks
    against local roles in every database, unless a manifest of all local
    roles were registered globally.
    Hmm, right, that's a bit of a sticky point. wrt your suggestion- it
    works great if we don't allow duplicates, but then people have to accept
    their role name is getting changed. That said, perhaps that's not that
    big of a deal, since you could rename it afterwards.

    The issue with the constraints on other databases might actually be
    cause enough to allow duplicates, just to avoid that issue.. We could
    mirror the per-database roles into a shared space that isn't externally
    visible, but at that point, maybe we should try to get RLS for the
    catalogs instead, or just modify the views to only show roles which can
    connect to this database. That's not going to make them completely
    transparent, but it might be enough for some use cases.

    Thanks,

    Stephen
  • Josh Berkus at May 22, 2012 at 5:33 pm
    Stephen,
    Which is "preferred"
    when you do a 'grant select' or 'grant role'?
    The local role is preferred, the same way we allow objects in the local
    schema to overshadow objects in the global schema.
    Or do we just disallow
    overlaps between per-DB roles and global roles? If we don't allow
    duplicates, I suspect a lot of the other questions suddenly become a lot
    easier to deal with, but would that be too much of a restriction?
    The feature wouldn't be useful if we didn't allow conflicts between two
    local role names. However, we could prohibit conflicts between a local
    role name and a global role name if it made the feature considerably
    easier. Users would find workarounds which weren't too arduous.
    How
    would you handle migrating an existing global role to a per-database
    role?
    Again, I think it would be OK not handling it. i.e., the user needs to
    do the following:

    1. create a new local role
    2. reassign all the objects belonging to the global role to the local role
    3. drop the global role
    4. rename the local role

    It'd be somewhat of a PITA, but I suspect that most people using the
    "local roles" feature would recreate their databases from scratch
    anyway. And we could offer some sample scripts for the above on the
    wiki and elsewhere. Obviously, a more elegant migration command would
    be ideal, but that could wait for the following PG release; we usually
    follow the "make things possible first, and easy later" plan anyway.

    Given that I'd love to have this feature, I'm trying to pare down its
    requirements to a managable size. Trying to do everything at once will
    only result in the feature stalling until 10.5.

    --
    Josh Berkus
    PostgreSQL Experts Inc.
    http://pgexperts.com
  • Stephen Frost at May 22, 2012 at 8:35 pm

    * Josh Berkus (josh@agliodbs.com) wrote:
    The local role is preferred, the same way we allow objects in the local
    schema to overshadow objects in the global schema.
    I would think we'd want the exact opposite. I don't want my global
    'postgres' user to be overwritten by some local one that the admin of
    this particular DB created..
    The feature wouldn't be useful if we didn't allow conflicts between two
    local role names. However, we could prohibit conflicts between a local
    role name and a global role name if it made the feature considerably
    easier. Users would find workarounds which weren't too arduous.
    Sorry, I was meaning between global space and local space. Clearly we
    must allow and handle cleanly overlaps between local spaces.

    The issue with not allowing global spaces to overlap local ones is that
    we'd have to check every local list when creating a global account;
    that doesn't seem very easy to do. On the flip side, allowing
    duplicates between global and local would remove the need to check local
    lists when creating global accounts, but would add complexity and could
    lead to odd semantics when there is a duplicate.
    1. create a new local role
    2. reassign all the objects belonging to the global role to the local role
    3. drop the global role
    4. rename the local role
    Right, that seems like it would work fine.
    It'd be somewhat of a PITA, but I suspect that most people using the
    "local roles" feature would recreate their databases from scratch
    anyway. And we could offer some sample scripts for the above on the
    wiki and elsewhere. Obviously, a more elegant migration command would
    be ideal, but that could wait for the following PG release; we usually
    follow the "make things possible first, and easy later" plan anyway.

    Sure.
    Given that I'd love to have this feature, I'm trying to pare down its
    requirements to a managable size. Trying to do everything at once will
    only result in the feature stalling until 10.5.
    If you could help me work out the semantics and the high-level issues,
    I'd love to spend time on this for 9.3...

    Thanks,

    Stephen
  • Florian Pflug at May 22, 2012 at 9:19 pm

    On May22, 2012, at 22:35 , Stephen Frost wrote:
    * Josh Berkus (josh@agliodbs.com) wrote:
    The local role is preferred, the same way we allow objects in the local
    schema to overshadow objects in the global schema.
    I would think we'd want the exact opposite. I don't want my global
    'postgres' user to be overwritten by some local one that the admin of
    this particular DB created..
    From a security POV, yup, you'd want global roles to take precedence.
    But OTOH, you wouldn't want your application to suddenly break because
    someone created a global role which shadows the local role you've created
    a year ago.

    Hm… thinking about this… No matter which takes precedence, we'd need some
    way to explicitly specify global or local scope anyway. And we'd have to
    prevent roles from being named in a way that conflicts with whatever
    explicit specification we come up with, even if that causes pain for some
    unlucky existing users. Avoiding these conflicts entirely is going to be
    impossible, I fear, since we don't currently restrict role names in any way
    AFAIK, and we store them in GUCs without any quoting.

    So maybe we should just pick some qualification like prefixing local roles
    with 'local.', forbid global roles from starting with 'local.', and be done
    with it? Not the most elegant solution maybe, but it avoids surprises...
    The issue with not allowing global spaces to overlap local ones is that
    we'd have to check every local list when creating a global account;
    that doesn't seem very easy to do.
    "Not very easy" is quite an understatement, I fear. "Very nearly impossible"
    is more like it IMHO.

    best regards,
    Florian Pflug
  • Josh Berkus at May 22, 2012 at 10:22 pm

    The issue with not allowing global spaces to overlap local ones is that
    we'd have to check every local list when creating a global account;
    that doesn't seem very easy to do. On the flip side, allowing
    duplicates between global and local would remove the need to check local
    lists when creating global accounts, but would add complexity and could
    lead to odd semantics when there is a duplicate.
    On the other hand, keep in mind that creating a global account can be
    slow. For anyone who has a huge multi-tenant setup with 200 database
    each with their own local users, creating a new global account will be
    an event which occurs once or twice a year. Just so that we don't pay
    the same check cost for people who don't use local accounts.
    If you could help me work out the semantics and the high-level issues,
    I'd love to spend time on this for 9.3...
    Syntax seems simple: CREATE LOCAL ROLE ...

    For that matter, let's keep other things simple:

    1. local roles can inherit only from other local roles
    2. global roles can inherit only from other global roles
    3. only a global role can be a database owner


    --
    Josh Berkus
    PostgreSQL Experts Inc.
    http://pgexperts.com
  • Florian Pflug at May 23, 2012 at 9:46 am

    On May23, 2012, at 00:21 , Josh Berkus wrote:
    If you could help me work out the semantics and the high-level issues,
    I'd love to spend time on this for 9.3...
    Syntax seems simple: CREATE LOCAL ROLE ...

    For that matter, let's keep other things simple:

    1. local roles can inherit only from other local roles
    2. global roles can inherit only from other global roles
    3. only a global role can be a database owner
    +1 to (2) and (3).

    If we do (1), we need to allow granting privileges on global objects
    (think tablespaces) to local roles. Otherwise, how would you make a
    tablespace usable to a certain local role? That, however, seems
    difficult, because we wouldn't want local role OIDs to appear in
    pg_tablespace's spacl column.

    If we, OTOH, allowed local roles to inherit from global roles, we
    could still forbit such GRANTs. One would then simply have to create a
    global proxy role which has the necessary privileges, and which local roles
    would inherit from instead of being granted the privileges directly.

    Allowing (1) doesn't seem that hard, because it just requires us to
    be able to store dependencies of local roles on global roles, which
    pg_shdepend already (almost, we'd probably need to add deptype
    SHARED_DEPENDENCY_MEMBER) support.

    best regards,
    Florian Pflug
  • Christopher Browne at May 22, 2012 at 11:12 pm

    On Tue, May 22, 2012 at 4:35 PM, Stephen Frost wrote:
    * Josh Berkus (josh@agliodbs.com) wrote:
    The local role is preferred, the same way we allow objects in the local
    schema to overshadow objects in the global schema.
    I would think we'd want the exact opposite.  I don't want my global
    'postgres' user to be overwritten by some local one that the admin of
    this particular DB created..
    In object-orientedness, the usual behaviour is for more specific
    methods to override the more generic ones, which is reasonable. I'm
    not certain which direction is more to be preferred, whether:
    a) To consider the global user as a default, to be overridden if possible, or
    b) To consider the local user as the default, to be overridden if possible.
    They're both tenable positions.

    But I think I agree with Stephen, that what's desirable, with global
    users, is to use them as the override. They're gonna be expensive,
    you should get something for the price :-).
    --
    When confronted by a difficult problem, solve it by reducing it to the
    question, "How would the Lone Ranger handle this?"
  • Albe Laurenz at May 22, 2012 at 1:28 pm

    Simon Riggs wrote:
    On 21 May 2012 20:40, Stephen Frost wrote:

    This is important. I like the idea of breaking down the barriers
    between databases to allow it to be an option for one backend to
    access tables in multiple databases.
    So collecting a few requirements from various places: [...]
    * Allow users to access tables in >1 database easily, with appropriate rights.
    The main objectives are to make a Database a more easily used
    administrative grouping. At present, people who use multiple Databases
    face many problems - they aren't as separate as you'd like, but
    neither can they be ignored when required.

    The idea of "one main database per session" is fine, but wiring it so
    closely into the backend has a few disadvantages, many of them weird
    internal things.

    Are there arguments against those requirements before we spend time on
    design/thinking?
    From my perspective it is a great advantage that a user connected
    to one database cannot access objects from a different database
    without additional software, no matter what permissions he or she has
    (short of superuser, who can do anything).

    This enables us to have many different databases in one cluster
    without having to worry that they can affect each other.

    If you need different applications to routinely access each other's
    tables, why not assign them to different schemas in one database?

    For those cases where you absolutely need access to a different
    database, you can use dblink or a foreign data wrapper (hopefully
    in 9.3).

    So -1 on that particular suggestion.

    Yours,
    Laurenz Albe
  • Susanne Ebrecht at May 22, 2012 at 3:37 pm

    Am 22.05.2012 15:27, schrieb Albe Laurenz:
    If you need different applications to routinely access each other's
    tables, why not assign them to different schemas in one database?
    The use case in my mind for accessing more databases is when you want to
    access stuff different languages.

    You only can set encoding / LC_Collate per database not per schema.

    So for different languages you might need different databases to do
    correct sorting / indexing.

    Susanne

    --
    Dipl. Inf. Susanne Ebrecht - 2ndQuadrant
    PostgreSQL Development, 24x7 Support, Training and Services
    www.2ndQuadrant.com
  • Tom Lane at May 22, 2012 at 3:43 pm

    Susanne Ebrecht writes:
    The use case in my mind for accessing more databases is when you want to
    access stuff different languages.
    You only can set encoding / LC_Collate per database not per schema.
    So for different languages you might need different databases to do
    correct sorting / indexing.
    Encoding yes, but since 9.1 we have pretty fine-grained control of
    collation. So I think this argument is a lot weaker than it used
    to be. It would only really apply if you have one of the corner
    cases where utf8 doesn't work for you.

    regards, tom lane
  • Susanne Ebrecht at May 22, 2012 at 4:01 pm

    Am 22.05.2012 17:42, schrieb Tom Lane:
    Encoding yes, but since 9.1 we have pretty fine-grained control of
    collation. So I think this argument is a lot weaker than it used
    to be. It would only really apply if you have one of the corner
    cases where utf8 doesn't work for you.
    Yeah it got better - but it isn't perfect yet.

    Maybe I am blind or 9.1 documentation has a bug - but according to the
    documentation you can't change default collation per schema or per table.
    You can set collation per column - but do you really want to set
    collation for
    every single column of every single supported language in your 200+ tables
    web tool?

    That is a huge effort and a huge maintenance effort.

    Usually you want to set the collation once per language schema. E.g. schema
    russian gets Russian collation and schema British gets British collation
    and so on.

    CREATE SCHEMA foo LC_COLLATE bar isn't supported so you went up a level and
    do it by creating a database.

    I would like to get default collation per schema / table in 9.2 or 9.3
    but that is my personal
    wish,

    Susanne


    --
    Dipl. Inf. Susanne Ebrecht - 2ndQuadrant
    PostgreSQL Development, 24x7 Support, Training and Services
    www.2ndQuadrant.com
  • Robert Haas at May 22, 2012 at 4:03 pm

    On Tue, May 22, 2012 at 12:00 PM, Susanne Ebrecht wrote:
    Usually you want to set the collation once per language schema. E.g. schema
    russian gets Russian collation and schema British gets British collation and
    so on.

    CREATE SCHEMA foo LC_COLLATE bar isn't supported so you went up a level and
    do it by creating a database.

    I would like to get default collation per schema / table in 9.2 or 9.3 but
    that is my personal
    wish,
    Interesting idea. Sort of like ALTER DEFAULT PRIVILEGES.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Florian Pflug at May 22, 2012 at 4:14 pm

    On May22, 2012, at 18:00 , Susanne Ebrecht wrote:
    Am 22.05.2012 17:42, schrieb Tom Lane:
    Encoding yes, but since 9.1 we have pretty fine-grained control of
    collation. So I think this argument is a lot weaker than it used
    to be. It would only really apply if you have one of the corner
    cases where utf8 doesn't work for you.
    Yeah it got better - but it isn't perfect yet.
    Still, the whole reason that the encoding is a per-database property is
    that we don't tag each string with its encoding. To support cross-database
    queries, we'd have to do that, and could then just as well make the encoding
    a per-column property, or at least so I think.
    Maybe I am blind or 9.1 documentation has a bug - but according to the
    documentation you can't change default collation per schema or per table.
    You can set collation per column - but do you really want to set collation for
    every single column of every single supported language in your 200+ tables
    web tool?

    That is a huge effort and a huge maintenance effort.
    You could always write at pl/pgsql function which iterates over all columns
    of type text or varchar within a schema and sets the desired collation, but
    I would like to get default collation per schema / table in 9.2 or 9.3 but
    that is my personal wish,
    yeah, that'd definitely be nice.

    best regards,
    Florian Pflug
  • Peter Eisentraut at May 23, 2012 at 7:24 pm

    On tis, 2012-05-22 at 18:00 +0200, Susanne Ebrecht wrote:
    CREATE SCHEMA foo LC_COLLATE bar isn't supported so you went up a
    level and do it by creating a database.

    I would like to get default collation per schema / table in 9.2 or 9.3
    but that is my personal wish,
    Another way I've been thinking about is to have a session collation.
    Then you don't have to decide about it when the tables are created, but
    the application or user interface can decide it at run time without
    having to hardcode it into all the queries.

    This would effectively take the place of being able to say SET
    lc_collate at run time.
  • Susanne Ebrecht at May 24, 2012 at 9:39 am

    Am 22.05.2012 15:27, schrieb Albe Laurenz:
    If you need different applications to routinely access each other's
    tables, why not assign them to different schemas in one database?
    I just saw another use case here.

    There are lots of offices / departments creating maps. Topography maps,
    pipeline maps, nature conservancy (e.g. where are the nests from endangered
    birds?), mineral resources, wire maps, street maps, bicycle / jogging maps,
    tourists maps, tree maps, cadastral land register, and so on.

    All this departments have their own databases for their own maps.
    They only map their own stuff.

    Towns / states / regions have a department where all these maps get
    collected.

    You can go to your town and ask for weird maps today - e.g. a map with
    all jogging
    routes and waste water pipes but without autobahns.

    You could say that you have one database per layer.

    As I said - I saw this construction in real world outside. I am pretty
    sure that other
    states maybe have other solutions but the described solution exist.

    Susanne

    --
    Dipl. Inf. Susanne Ebrecht - 2ndQuadrant
    PostgreSQL Development, 24x7 Support, Training and Services
    www.2ndQuadrant.com
  • Florian Pflug at May 24, 2012 at 10:29 am

    On May24, 2012, at 11:39 , Susanne Ebrecht wrote:
    There are lots of offices / departments creating maps. Topography maps,
    pipeline maps, nature conservancy (e.g. where are the nests from endangered
    birds?), mineral resources, wire maps, street maps, bicycle / jogging maps,
    tourists maps, tree maps, cadastral land register, and so on.

    All this departments have their own databases for their own maps.
    They only map their own stuff.

    Towns / states / regions have a department where all these maps get collected.
    The question is, how do they get collected? If they use some home-grown replication,
    they might just as well collect them into schemas instead of databases. The same is
    possible with slony, I think. And if they use WAL-based replication, they have no
    choice but to collect them in different clusters, so cross-database queries within
    a cluster wouldn't help.

    I think that you're right that reporting would one of the main use-cases for cross-
    database queries. But reporting is also, I think, one of the main uses-cases for
    WAL-based replication. So having cross-database queries with don't allow queries
    across multiple replicas will leave quite a few people out in the cold.

    best regards,
    Florian Pflug
  • Josh Berkus at May 22, 2012 at 5:18 pm

    1. Ability to have a Role that can only access one Database

    2. Allow user info to be dumped with a database, to make a db
    completely self-consistent

    3. Allow databases to be transportable

    4. Allow users to access tables in >1 database easily, with appropriate rights.
    The last requirement seems completely contradictory to the other three.
    Either we're trying to make databases even more isolated as
    multi-tenant Catalogs, or we're not. Trying to do both at the same time
    is failure-by-design.

    Given that we offer schema as an alternative to multiple databases, and
    users are starting to get used to them, I think that requirement (4) is
    just a bad idea, and not worth pursuing, except in the context of pgsql_fdw.

    --
    Josh Berkus
    PostgreSQL Experts Inc.
    http://pgexperts.com
  • Simon Riggs at May 22, 2012 at 5:34 pm

    On 22 May 2012 18:18, Josh Berkus wrote:
    1. Ability to have a Role that can only access one Database

    2. Allow user info to be dumped with a database, to make a db
    completely self-consistent

    3. Allow databases to be transportable

    4. Allow users to access tables in >1 database easily, with appropriate rights.
    The last requirement seems completely contradictory to the other three.
    Either we're trying to make databases even more isolated as
    multi-tenant Catalogs, or we're not.  Trying to do both at the same time
    is failure-by-design.
    Why is it OK to allow somebody to access multiple schema in one query,
    but not multiple databases? Are you arguing that schemas are also
    broken?

    I see no failure by design. I see an idea for greater ease of use
    being discussed.
    Given that we offer schema as an alternative to multiple databases, and
    users are starting to get used to them, I think that requirement (4) is
    just a bad idea, and not worth pursuing,
    Personally, I have long recommended that people use schemas. But
    people do use databases and when they do they are pretty much screwed.
    I brought this up as a way of improving our ease of use.
    except in the context of pgsql_fdw.
    That is not a realistic option.

    --
    Simon Riggs                   http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services

Related Discussions

People

Translate

site design / logo © 2021 Grokbase