Folks,

For the past couple of years, I've been hearing from the PostGIS
people among others that our type system just isn't flexible enough
for their needs. It's really starting to show its age, or possibly
design compromises that seemed reasonable a decade or more ago, but
are less so now.

To that end, I've put up a page on the wiki that includes a list of
issues to be addressed. It's intended to be changed, possibly
completely.

http://wiki.postgresql.org/wiki/Refactor_Type_System

What might the next version of the type system look like?

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Search Discussions

  • Andrew Dunstan at Nov 12, 2010 at 5:03 pm

    On 11/12/2010 11:34 AM, David Fetter wrote:
    Folks,

    For the past couple of years, I've been hearing from the PostGIS
    people among others that our type system just isn't flexible enough
    for their needs. It's really starting to show its age, or possibly
    design compromises that seemed reasonable a decade or more ago, but
    are less so now.
    This is so general as to be quite meaningless to me. What is it that is
    wanted that we don't have. (And don't say "flexibility", that's way too
    general - say something much more concrete and specific. If you want
    flexibility we can store everything as text, but I doubt you'll like the
    result.)

    cheers

    andrew
  • Dimitri Fontaine at Nov 12, 2010 at 8:28 pm

    Andrew Dunstan writes:
    This is so general as to be quite meaningless to me. What is it that is
    wanted that we don't have. (And don't say "flexibility", that's way too
    general - say something much more concrete and specific. If you want
    flexibility we can store everything as text, but I doubt you'll like the
    result.)
    The way I understand it is (unsurprisingly) related to user data in
    extensions. PostGIS maintains a table of user attributes related to
    their types, if I've understood correctly. Things that won't fit in the
    typmod, that will be different depending on the columns or some other
    environment meta-data, and that will have consequences on the meaning
    and running of user queries.

    Ok, that's not that more precise, but that's a typmod which does not fit
    in 32 bits so is saved away in some PostGIS table and referred to from
    the main storage.

    About all the other problems with the type system flexibility that I've
    read on, I think they are in the "type inference" category: we like the
    strong typing of the database system but would like it to get forgotten
    about. The paramount of this I think was the proposal of the LAMBDA
    expressions at the time the DO utility statement appeared.

    I don't know how far in the type inference system we want to go (but I
    think we already have parts of that in our implicit cast rules). Maybe
    we want to think about having (user) functions be types, too.

    Also, tables are some types already, and JOINs and resultsets are
    relations too, so they are some types too. I don't know how far the
    current typing system is considering tables and joins and relation as
    the same thing as types, but there's something there with NULL handling
    and some ROW and record facilities that we see surprised people about in
    -bugs and other places.

    Well, just my very unorganised 2¢,
    --
    Dimitri Fontaine
    http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
  • Jeff Davis at Nov 12, 2010 at 11:12 pm

    On Fri, 2010-11-12 at 12:03 -0500, Andrew Dunstan wrote:
    On 11/12/2010 11:34 AM, David Fetter wrote:
    Folks,

    For the past couple of years, I've been hearing from the PostGIS
    people among others that our type system just isn't flexible enough
    for their needs. It's really starting to show its age, or possibly
    design compromises that seemed reasonable a decade or more ago, but
    are less so now.
    This is so general as to be quite meaningless to me. What is it that is
    wanted that we don't have.
    Some kind of generics, type generators, or type interfaces (like an
    interface in Java or type class in haskell). A real subtyping system
    might also be nice.

    That being said, a few details are left to be decided (an
    understatement).

    Regards,
    Jeff Davis
  • Robert Haas at Nov 12, 2010 at 11:35 pm

    On Fri, Nov 12, 2010 at 6:12 PM, Jeff Davis wrote:
    That being said, a few details are left to be decided (an
    understatement).
    Best... comment... ever.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Jeff Davis at Nov 13, 2010 at 12:07 am

    On Fri, 2010-11-12 at 08:34 -0800, David Fetter wrote:
    Folks,

    For the past couple of years, I've been hearing from the PostGIS
    people among others that our type system just isn't flexible enough
    for their needs. It's really starting to show its age, or possibly
    design compromises that seemed reasonable a decade or more ago, but
    are less so now.

    To that end, I've put up a page on the wiki that includes a list of
    issues to be addressed. It's intended to be changed, possibly
    completely.

    http://wiki.postgresql.org/wiki/Refactor_Type_System

    What might the next version of the type system look like?
    This problems (as stated) strikes me as pretty overwhelming.

    1. As far as I can tell, we have the best type system of any SQL DBMS.
    2. Getting a type system right is a hard problem by itself, and there
    isn't any obvious consensus (although I think there is some agreement on
    some issues).
    3. Type systems are more challenging for a DBMS because you need to
    account for things like storage, indexing, and optimization in ways that
    programming languages don't (consider: when comparing an int4 and an
    int8, you may want to coerce based on what indexes you have available).
    4. SQL standard issues. In particular, I think that any modern type
    system will run into pretty severe problems with NULLs in one way or
    another. I think we'd have to pay very close attention to the standard
    when designing a new type system, because I suspect that retrofitting
    the standard onto a system we invent independently would be a disaster.
    5. Backwards compatibility issues.

    I think the best we'll do is be able to hack on some of the things that
    we actively want and have clear use cases for, such as type interfaces.
    We might have to give up on some of the more ambitious ideas that
    involve propagating interesting information through the type inference
    system; or having any real type that wasn't declared with CREATE TYPE.
    Consider that right now we bundle the element type information along
    with the array _value_.

    Ideas welcome. Particularly if there are a few clear use cases.

    Regards,
    Jeff Davis
  • Andrew Dunstan at Nov 13, 2010 at 1:29 am

    On 11/12/2010 07:07 PM, Jeff Davis wrote:
    On Fri, 2010-11-12 at 08:34 -0800, David Fetter wrote:
    Folks,

    For the past couple of years, I've been hearing from the PostGIS
    people among others that our type system just isn't flexible enough
    for their needs. It's really starting to show its age, or possibly
    design compromises that seemed reasonable a decade or more ago, but
    are less so now.

    To that end, I've put up a page on the wiki that includes a list of
    issues to be addressed. It's intended to be changed, possibly
    completely.

    http://wiki.postgresql.org/wiki/Refactor_Type_System

    What might the next version of the type system look like?
    This problems (as stated) strikes me as pretty overwhelming.

    1. As far as I can tell, we have the best type system of any SQL DBMS.
    2. Getting a type system right is a hard problem by itself, and there
    isn't any obvious consensus (although I think there is some agreement on
    some issues).
    3. Type systems are more challenging for a DBMS because you need to
    account for things like storage, indexing, and optimization in ways that
    programming languages don't (consider: when comparing an int4 and an
    int8, you may want to coerce based on what indexes you have available).
    4. SQL standard issues. In particular, I think that any modern type
    system will run into pretty severe problems with NULLs in one way or
    another. I think we'd have to pay very close attention to the standard
    when designing a new type system, because I suspect that retrofitting
    the standard onto a system we invent independently would be a disaster.
    5. Backwards compatibility issues.

    I think the best we'll do is be able to hack on some of the things that
    we actively want and have clear use cases for, such as type interfaces.
    We might have to give up on some of the more ambitious ideas that
    involve propagating interesting information through the type inference
    system; or having any real type that wasn't declared with CREATE TYPE.
    Consider that right now we bundle the element type information along
    with the array _value_.
    Yeah, composites too, IIRC. It's a bit sad. But those are really just
    warts show the difficulties we face in implementing types. I'm still
    waiting for some seriously different yet possible thing we could do.
    (And I agree we do have about the best type system around).

    cheers

    andrew
  • Daniel Farina at Nov 14, 2010 at 11:15 am

    On Fri, Nov 12, 2010 at 4:07 PM, Jeff Davis wrote:
    I think the best we'll do is be able to hack on some of the things that
    we actively want and have clear use cases for, such as type interfaces.
    We might have to give up on some of the more ambitious ideas that
    involve propagating interesting information through the type inference
    system; or having any real type that wasn't declared with CREATE TYPE.
    Consider that right now we bundle the element type information along
    with the array _value_.
    Here are some weaknesses in the SUM aggregate that run up against the
    type system. Maybe they'll help crystallize some discussion:

    SUM(int2) => int4
    SUM(int4) => int8
    SUM(int8) => numeric

    Some weaknesses:

    SUM, of any precision, assumes that the precision being accumulated
    into (which is also the return-precision) is enough to avoid overflow.
    This is generally the case, but there's no reason why it *must* be
    true. I'm especially looking at the int2 to int4 conversion. One could
    imagine a more interesting scenario where overflow behavior could
    occur much more easily.

    SUM always promotes types upwards in precision, and is unable to keep
    types of the smallest possible precision should SUM expressions be
    composed.

    SUM is unable to maintain any supplementary information about
    precision, i.e. say anything interesting about the typmod, which
    defeats or makes impossible many interesting optimizations.

    I think that a type-interface system moves towards solving the first
    couple of problems, since SUM can return some abstract type such as
    "Integer" and use that to promote more aggressively (avoiding
    overflow) or keep representations small (avoiding unnecessary
    promotion) at run-time. It might require Integer to be an abstract,
    non-storable data type, though, so the current CREATE TYPE is not
    going to make life easy.

    The third problem is slightly different...it might require some
    user-pluggable code to be called as part of semantic analysis. I have
    felt the idea of making a Postgres Type a more robust first-class data
    type and somehow being able to attach a function to another
    function/aggregate that is responsible for getting called during
    semantic analysis and returning the proper signature roll around in my
    head, but it might be the whispers of cthulu, too.

    fdr
  • Tom Lane at Nov 14, 2010 at 3:47 pm

    Daniel Farina writes:
    Here are some weaknesses in the SUM aggregate that run up against the
    type system. Maybe they'll help crystallize some discussion:
    SUM(int2) => int4
    SUM(int4) => int8
    SUM(int8) => numeric
    Some weaknesses:
    SUM, of any precision, assumes that the precision being accumulated
    into (which is also the return-precision) is enough to avoid overflow.
    This is not a flaw of the type system, it's just an implementation
    choice in the SUM() aggregates. We could easily have chosen wider
    accumulation and/or result types.

    regards, tom lane
  • Daniel Farina at Nov 14, 2010 at 7:27 pm

    On Sun, Nov 14, 2010 at 7:47 AM, Tom Lane wrote:
    Daniel Farina <drfarina@acm.org> writes:
    Here are some weaknesses in the SUM aggregate that run up against the
    type system. Maybe they'll help crystallize some discussion:
    SUM(int2) => int4
    SUM(int4) => int8
    SUM(int8) => numeric
    Some weaknesses:
    SUM, of any precision, assumes that the precision being accumulated
    into (which is also the return-precision) is enough to avoid overflow.
    This is not a flaw of the type system, it's just an implementation
    choice in the SUM() aggregates.  We could easily have chosen wider
    accumulation and/or result types.
    That's true, but there are downsides to escalating the precision so
    aggressively.

    The case I was thinking about in particular involves composition of
    SUM. If one can assume that a relation has int4s and that will never
    overflow an int8 (as is done now), I don't see a great way to optimize
    the following case without special exceptions in the optimizer for
    particular aggregates known a-priori. Here's what would happen now:

    SELECT SUM(x::int8)::numeric
    FROM (SELECT SUM(x::int4)::int8 AS x
    FROM rel
    GROUP BY y) some_name;

    Could be rendered, by this assumption, as:

    SELECT SUM(x::int8)::int8
    ....(same FROM clause)

    (Why would anyone write a query like this? Views. Possibly inlined SQL
    UDFs, too.)

    This can be measurably faster. It also more properly constrains the
    result type, as numeric can also handle non-integer quantities.

    I should have underscored that a positive aspect of having a
    type-class like facility that allows declaration things like this
    hypothetical Integer when backed by concrete types that might support
    a superset of functionality.

    fdr
  • Robert Haas at Nov 14, 2010 at 9:25 pm

    On Sun, Nov 14, 2010 at 2:27 PM, Daniel Farina wrote:
    On Sun, Nov 14, 2010 at 7:47 AM, Tom Lane wrote:
    Daniel Farina <drfarina@acm.org> writes:
    Here are some weaknesses in the SUM aggregate that run up against the
    type system. Maybe they'll help crystallize some discussion:
    SUM(int2) => int4
    SUM(int4) => int8
    SUM(int8) => numeric
    Some weaknesses:
    SUM, of any precision, assumes that the precision being accumulated
    into (which is also the return-precision) is enough to avoid overflow.
    This is not a flaw of the type system, it's just an implementation
    choice in the SUM() aggregates.  We could easily have chosen wider
    accumulation and/or result types.
    That's true, but there are downsides to escalating the precision so
    aggressively.

    The case I was thinking about in particular involves composition of
    SUM. If one can assume that a relation has int4s and that will never
    overflow an int8 (as is done now), I don't see a great way to optimize
    the following case without special exceptions in the optimizer for
    particular aggregates known a-priori. Here's what would happen now:

    SELECT SUM(x::int8)::numeric
    FROM (SELECT SUM(x::int4)::int8 AS x
    FROM rel
    GROUP BY y) some_name;

    Could be rendered, by this assumption, as:

    SELECT SUM(x::int8)::int8
    ....(same FROM clause)

    (Why would anyone write a query like this? Views. Possibly inlined SQL
    UDFs, too.)

    This can be measurably faster. It also more properly constrains the
    result type, as numeric can also handle non-integer quantities.

    I should have underscored that a positive aspect of having a
    type-class like facility that allows declaration things like this
    hypothetical Integer when backed by concrete types that might support
    a superset of functionality.
    Like Tom, I'm not sure this is really a type-system problem. This
    sounds like a complaint that operations on "numeric" are much slower
    than operations on "int4" and "int8", even for values that could be
    represented by either type. I think that's a valid complaint, but I
    don't see how changing the type system would help. I think what you'd
    need to is optimize the existing numeric type, or provide a new
    numeric-ish type with optimizations for dealing with
    small-to-medium-sized integers.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Daniel Farina at Nov 14, 2010 at 10:21 pm

    On Sun, Nov 14, 2010 at 1:25 PM, Robert Haas wrote:
    Like Tom, I'm not sure this is really a type-system problem.  This
    sounds like a complaint that operations on "numeric" are much slower
    than operations on "int4" and "int8", even for values that could be
    represented by either type.  I think that's a valid complaint, but I
    don't see how changing the type system would help.  I think what you'd
    need to is optimize the existing numeric type, or provide a new
    numeric-ish type with optimizations for dealing with
    small-to-medium-sized integers.
    There are other ways one might be able to attack the performance part
    of the problem, but consider the loss of information about the type
    from int(2|4|8) to numeric when composing a series of sums: we know
    the value produced fits the abstract notion of an Integer, but we lose
    that information. The same could be said of SUM(x::numeric(1000,0))
    yielding an unrestricted numeric, rather than one of scale 0. Not only
    is there more information available to the user, but the optimizer
    should be able to benefit from that information as well. However, for
    an arbitrary user-defined operator to take advantage it would seem to
    me that there needs to be a hook where some reasoning can take place
    over its input types and subsequently determine the return prototype
    at that call site.

    I am not trying to suggest that there is no way that one could attack
    the details around SUM I have mentioned without changing the type
    system, only that they are limitations I encountered and worked around
    where a more powerful type system came to mind as an avenue for a
    simpler/more proper solution.

    fdr
  • Tom Lane at Nov 14, 2010 at 10:28 pm

    Daniel Farina writes:
    There are other ways one might be able to attack the performance part
    of the problem, but consider the loss of information about the type
    from int(2|4|8) to numeric when composing a series of sums: we know
    the value produced fits the abstract notion of an Integer, but we lose
    that information. The same could be said of SUM(x::numeric(1000,0))
    yielding an unrestricted numeric, rather than one of scale 0. Not only
    is there more information available to the user, but the optimizer
    should be able to benefit from that information as well. However, for
    an arbitrary user-defined operator to take advantage it would seem to
    me that there needs to be a hook where some reasoning can take place
    over its input types and subsequently determine the return prototype
    at that call site.
    Yeah, this has been discussed before. The problem is that it's too much
    work for too little reward. There are too many different possible
    behaviors for functions, and nobody is going to go through and write
    a subtype-inference helper function for every function in the system,
    or even any usefully large fraction of the functions in the system.
    What's more, the subtype definitions have largely been predetermined
    for us by SQL, and predetermined in such a way that knowing the subtype
    isn't really all that exciting. Is anybody willing to put in
    months of work to teach the system that sum(numeric(7,0)) yields
    numeric(something,0) rather than numeric(something,something)?
    I doubt it. The return on that knowledge would be too small, and
    there are too many cases where you couldn't deduce anything anyway.

    regards, tom lane
  • Greg Stark at Nov 15, 2010 at 2:17 am

    On Sun, Nov 14, 2010 at 11:15 AM, Daniel Farina wrote:
    SUM(int2) => int4
    SUM(int4) => int8
    SUM(int8) => numeric

    Some weaknesses:

    SUM, of any precision, assumes that the precision being accumulated
    into (which is also the return-precision) is enough to avoid overflow.
    This is generally the case, but there's no reason why it *must* be
    true. I'm especially looking at the int2 to int4 conversion. One could
    imagine a more interesting scenario where overflow behavior could
    occur much more easily.
    Fwiw I think he's right that sum(int2) should perhaps be redefined to
    return int8. As it stands all it would take is a 64k rows to
    potentially overflow. It's not super likely but it is plausible and
    the performance penalty to use int8 wouldn't be super big either.

    int4 doesn't seem like as realistic a problem since it would take 4
    billion rows and the performance penalty for using numeric would be
    much higher.

    --
    greg
  • Robert Haas at Nov 15, 2010 at 2:37 am

    On Sun, Nov 14, 2010 at 9:16 PM, Greg Stark wrote:
    On Sun, Nov 14, 2010 at 11:15 AM, Daniel Farina wrote:
    SUM(int2) => int4
    SUM(int4) => int8
    SUM(int8) => numeric

    Some weaknesses:

    SUM, of any precision, assumes that the precision being accumulated
    into (which is also the return-precision) is enough to avoid overflow.
    This is generally the case, but there's no reason why it *must* be
    true. I'm especially looking at the int2 to int4 conversion. One could
    imagine a more interesting scenario where overflow behavior could
    occur much more easily.
    Fwiw I think he's right that sum(int2) should perhaps be redefined to
    return int8. As it stands all it would take is a 64k rows to
    potentially overflow. It's not super likely but it is plausible and
    the performance penalty to use int8 wouldn't be super big either.

    int4 doesn't seem like as realistic a problem since it would take 4
    billion rows and the performance penalty for using numeric would be
    much higher.
    I think you could get the same effect by removing sum(int2)
    altogether, which I wouldn't lose any sleep over.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Tom Lane at Nov 15, 2010 at 5:27 am

    Greg Stark writes:
    Fwiw I think he's right that sum(int2) should perhaps be redefined to
    return int8. As it stands all it would take is a 64k rows to
    potentially overflow. It's not super likely but it is plausible and
    the performance penalty to use int8 wouldn't be super big either.
    It's not unreasonable. I think the performance penalty for int8 was
    higher when that choice was made than it is now --- and in particular,
    on a 64-bit machine it's now pretty much negligible.

    On the other hand, you can always execute sum(foo::int4) if you need a
    wider sum, just like the escape hatch if any of the other datatype
    choices aren't working for you. It's not clear that we should force a
    performance loss on people who don't need it (and I can't offhand recall
    *ever* hearing a complaint about sum(int2) overflowing...)

    I believe what the OP was arguing for was not so much this sort of
    marginal tinkering as inventing a more general notion of "integer type"
    that would avoid the whole issue. The problem with that is that we have
    to work within the set of types specified by the SQL standard.

    regards, tom lane
  • Darren Duncan at Nov 14, 2010 at 3:54 am

    David Fetter wrote:
    For the past couple of years, I've been hearing from the PostGIS
    people among others that our type system just isn't flexible enough
    for their needs. It's really starting to show its age, or possibly
    design compromises that seemed reasonable a decade or more ago, but
    are less so now.

    To that end, I've put up a page on the wiki that includes a list of
    issues to be addressed. It's intended to be changed, possibly
    completely.

    http://wiki.postgresql.org/wiki/Refactor_Type_System

    What might the next version of the type system look like?
    Are you talking about changes to the type system as users see it or just changes
    to how the existing behavior is implemented internally? If you're talking
    about, as users see it, which the other replies to this thread seem to be
    saying, though not necessarily the url you pointed to which looks more internals ...

    As a statement which may surprise no one who's heard me talk about it before ...

    I've mostly completed a type system specification that would be useable by
    Postgres, as the most fundamental part of my Muldis D language.

    The type system is arguably the most central piece of any DBMS, around which
    everything else is defined and built.

    You have data, which is structured in some way, and has operators for it.

    If you look at a DBMS from the perspective of being a programming language
    implementation, you find that a database is just a variable that holds a value
    of a structured type. In the case of a relational database, said database is a
    tuple whose attribute values are relations; or in the case of
    namespaces/schemas, the database tuple has tuple attributes having relation
    attributes.

    If a database is a variable, then all database constraints are type constraints
    on the declared type of that variable, and you can make said constraints
    arbitrarily complicated.

    From basic structures like nestable tuples and relations, plus a complement of
    basic types like numbers and strings, and arbitrary constraints, you can define
    data types of any shape or form.

    A key component of a good type system is that users can define data types, and
    moreover where possible, system-defined types are defined in the same ways as
    users define types. For example, stuff like temporal types or geospatial types
    are prime candidates for being defined like user-defined types.

    If you define all structures using tuples and relations, you can easily flatten
    this out on the implementation end and basically do everything as associated
    flat relation variables as you do now.

    So what I propose is both very flexible and easy to implement, scale, and
    optimize, relatively speaking.

    You don't have to kludge things by implementing arrays as blobs for example; you
    can implement them as relations instead. Geospatial types can just be tuples.
    Arrays of structured types can just be relations with an attribute per type
    attribute. Arrays of simple types can just be unary relations.

    You can also emulate all of the existing Pg features and syntax that you have
    now over the type system I've defined, maintaining compatibility too.

    I also want to emphasize that, while I drew inspiration from many sources when
    defining Muldis D, and there was/is a lot I still didn't/don't know about
    Postgres, I have found that as I use and learn Postgres, I'm finding frequently
    that how Postgres does things is similar and compatible to how I independently
    came up with Muldis D's design; I'm finding more similarities all the time.

    -- Darren Duncan
  • Daniel Farina at Nov 14, 2010 at 10:38 am

    On Sat, Nov 13, 2010 at 7:54 PM, Darren Duncan wrote:
    A key component of a good type system is that users can define data types,
    and moreover where possible, system-defined types are defined in the same
    ways as users define types.  For example, stuff like temporal types or
    geospatial types are prime candidates for being defined like user-defined
    types.
    They are user defined types, and even separately compiled,
    distributed, and dynamically linked.
    You don't have to kludge things by implementing arrays as blobs for example;
    you can implement them as relations instead.  Geospatial types can just be
    tuples. Arrays of structured types can just be relations with an attribute
    per type attribute.  Arrays of simple types can just be unary relations.
    In practice, my guess is the performance of these approaches would be
    terrible for a number of workloads. I don't agree that arrays having
    their own storage implementation is a kludge: there are even operators
    like unnest that can be used to turn them back into relations.

    I have long thought (but never really gave voice to) there being value
    having first-class relation values, but I would also say that's
    another kettle of fish. I also want closures, and don't think that's
    completely nuts.
    I also want to emphasize that, while I drew inspiration from many sources
    when defining Muldis D, and there was/is a lot I still didn't/don't know
    about Postgres, I have found that as I use and learn Postgres, I'm finding
    frequently that how Postgres does things is similar and compatible to how I
    independently came up with Muldis D's design; I'm finding more similarities
    all the time.
    You may want to learn more about this, first. Postgres's type system,
    while relatively simple, is not as ill-conceived or primitive as you
    seem to assume it is. There are also a lot of important system
    details, like expanding/replacing the typmod facility that only allows
    Postgres 32 bits of type-augmenting data (such as the (1) in the type
    numeric(1)).

    fdr
  • Darren Duncan at Nov 14, 2010 at 8:55 pm

    Daniel Farina wrote:
    On Sat, Nov 13, 2010 at 7:54 PM, Darren Duncan wrote:
    You don't have to kludge things by implementing arrays as blobs for example;
    you can implement them as relations instead. Geospatial types can just be
    tuples. Arrays of structured types can just be relations with an attribute
    per type attribute. Arrays of simple types can just be unary relations.
    In practice, my guess is the performance of these approaches would be
    terrible for a number of workloads. I don't agree that arrays having
    their own storage implementation is a kludge: there are even operators
    like unnest that can be used to turn them back into relations.
    I'm not discounting this at all. The data model can formally define types in
    one way and a DBMS can implement various well known cases in specialized ways
    that are more efficient. Arrays are a good example. But with other cases they
    still get a default implementation. If arrays are flexible enough, then
    different arrays could be implemented differently, eg some as blobs and some as
    relations; the former could be better for small or simple arrays and the latter
    for large or complex arrays.
    I have long thought (but never really gave voice to) there being value
    having first-class relation values, but I would also say that's
    another kettle of fish. I also want closures, and don't think that's
    completely nuts.
    I think that first-class functions are important at least, if not full-blown
    closures. You can define generic relational restrictions or summaries or
    whatever with such; eg, the general case of a WHERE is a function that takes a
    relation and a function as input, and results in a relation; more restricted
    cases of WHERE can be defined with simpler functions that take 2 relation inputs
    instead.
    You may want to learn more about this, first. Postgres's type system,
    while relatively simple, is not as ill-conceived or primitive as you
    seem to assume it is. There are also a lot of important system
    details, like expanding/replacing the typmod facility that only allows
    Postgres 32 bits of type-augmenting data (such as the (1) in the type
    numeric(1)).
    I'm not saying that Pg's system is primitive et al. The thread starter said it
    needed an overhaul, so indicating there are deficiencies, and I'm suggesting
    some effective ways to do that.

    -- Darren Duncan

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedNov 12, '10 at 4:43p
activeNov 15, '10 at 5:27a
posts19
users9
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase