In bug #5717, Richard Huxton complains that a domain declared like
CREATE DOMAIN mynums numeric(4,2)[1];
doesn't work properly, ie, the typmod isn't enforced in places where
it reasonably ought to be. I dug into this a bit, and found that there
are more worms in this can than I first expected.

In the first place, plpgsql seems a few bricks shy of a load, in that
its code to handle assignment to an array element doesn't appear to
be considering arrays with typmod at all: it just passes typmod -1 to
exec_simple_cast_value() in the PLPGSQL_DTYPE_ARRAYELEM case in
exec_assign_value(). Nonetheless, if you try a case like

declare
x numeric(4,2)[1];
begin
x[1] := $1;

you'll find the typmod *is* enforced. It turns out that the reason that
it works is that after constructing the modified array value,
exec_assign_value() calls itself recursively to do the actual assignment
to the array variable --- and that call sees that the array variable has
a typmod, so it applies the cast *at the array level*, ie it
disassembles the array, re-coerces each element, and builds a new array.
So it's horridly inefficient but it works.

That could and should be improved, but it seems to be just a matter
of local changes in pl_exec.c, and it's only an efficiency hack anyway.
The big problem is that none of this happens when the variable is
declared as a domain, and I believe that that is a significantly more
wide-ranging issue.

The real issue as I see it is that it's possible to subscript an array
without realizing that the array's type is really a domain that
incorporates a typmod specification. This happens because of a hack
we did to simplify implementation of domains over arrays: we expose the
array element type as typelem of the domain as well. For example, given
Richard's declaration we have

regression=# select typname,oid,typelem,typbasetype,typtypmod from pg_type where typname = 'mynums';
typname | oid | typelem | typbasetype | typtypmod
---------+-------+---------+-------------+-----------
mynums | 92960 | 1700 | 1231 | 262150
(1 row)

If we were to wrap another domain around that, we'd have

regression=# create domain mynums2 as mynums;
CREATE DOMAIN
regression=# select typname,oid,typelem,typbasetype,typtypmod from pg_type where typname = 'mynums2';
typname | oid | typelem | typbasetype | typtypmod
---------+-------+---------+-------------+-----------
mynums2 | 92976 | 1700 | 92960 | -1
(1 row)

and at this point it's completely unobvious from inspection of the
pg_type entry that any typmod needs to be applied when subscripting.

So I'm suspicious that there are a boatload of bugs related to this kind
of domain declaration, not just the one case in plpgsql.

I think that what we ought to do about it is to stop exposing typelem
in domains' pg_type rows. If you want to subscript a domain value, you
should have to drill down to its base type with getBaseTypeAndTypmod,
which would also give you the correct typmod to apply. If we set
typelem to zero in domain pg_type rows, it shouldn't take too long to
find any places that are missing this consideration --- the breakage
will be obvious rather than subtle.

This catalog definitional change obviously shouldn't be back-patched,
but possibly the ensuing code changes could be, since the typelem change
is just to expose where things are wrong and wouldn't be a prerequisite
for corrected code to behave correctly. Or we could decide that this is
a corner case we're not going to try to fix in back branches. It's been
wrong since day 0, so there's certainly an argument that it's not
important enough to risk back-patching.

Comments?

regards, tom lane

Search Discussions

  • Robert Haas at Oct 20, 2010 at 12:47 am

    On Tue, Oct 19, 2010 at 6:14 PM, Tom Lane wrote:
    In bug #5717, Richard Huxton complains that a domain declared like
    CREATE DOMAIN mynums numeric(4,2)[1];
    doesn't work properly, ie, the typmod isn't enforced in places where
    it reasonably ought to be.  I dug into this a bit, and found that there
    are more worms in this can than I first expected.

    In the first place, plpgsql seems a few bricks shy of a load, in that
    its code to handle assignment to an array element doesn't appear to
    be considering arrays with typmod at all: it just passes typmod -1 to
    exec_simple_cast_value() in the PLPGSQL_DTYPE_ARRAYELEM case in
    exec_assign_value().  Nonetheless, if you try a case like

    declare
    x numeric(4,2)[1];
    begin
    x[1] := $1;

    you'll find the typmod *is* enforced.  It turns out that the reason that
    it works is that after constructing the modified array value,
    exec_assign_value() calls itself recursively to do the actual assignment
    to the array variable --- and that call sees that the array variable has
    a typmod, so it applies the cast *at the array level*, ie it
    disassembles the array, re-coerces each element, and builds a new array.
    So it's horridly inefficient but it works.

    That could and should be improved, but it seems to be just a matter
    of local changes in pl_exec.c, and it's only an efficiency hack anyway.
    The big problem is that none of this happens when the variable is
    declared as a domain, and I believe that that is a significantly more
    wide-ranging issue.

    The real issue as I see it is that it's possible to subscript an array
    without realizing that the array's type is really a domain that
    incorporates a typmod specification.  This happens because of a hack
    we did to simplify implementation of domains over arrays: we expose the
    array element type as typelem of the domain as well.  For example, given
    Richard's declaration we have

    regression=# select typname,oid,typelem,typbasetype,typtypmod from pg_type where typname = 'mynums';
    typname |  oid  | typelem | typbasetype | typtypmod
    ---------+-------+---------+-------------+-----------
    mynums  | 92960 |    1700 |        1231 |    262150
    (1 row)

    If we were to wrap another domain around that, we'd have

    regression=# create domain mynums2 as mynums;
    CREATE DOMAIN
    regression=# select typname,oid,typelem,typbasetype,typtypmod from pg_type where typname = 'mynums2';
    typname |  oid  | typelem | typbasetype | typtypmod
    ---------+-------+---------+-------------+-----------
    mynums2 | 92976 |    1700 |       92960 |        -1
    (1 row)

    and at this point it's completely unobvious from inspection of the
    pg_type entry that any typmod needs to be applied when subscripting.

    So I'm suspicious that there are a boatload of bugs related to this kind
    of domain declaration, not just the one case in plpgsql.

    I think that what we ought to do about it is to stop exposing typelem
    in domains' pg_type rows.  If you want to subscript a domain value, you
    should have to drill down to its base type with getBaseTypeAndTypmod,
    which would also give you the correct typmod to apply.  If we set
    typelem to zero in domain pg_type rows, it shouldn't take too long to
    find any places that are missing this consideration --- the breakage
    will be obvious rather than subtle.
    I fear that this is going to degrade the performance of common cases
    as a way of debugging rare cases.
    This catalog definitional change obviously shouldn't be back-patched,
    but possibly the ensuing code changes could be, since the typelem change
    is just to expose where things are wrong and wouldn't be a prerequisite
    for corrected code to behave correctly.  Or we could decide that this is
    a corner case we're not going to try to fix in back branches.  It's been
    wrong since day 0, so there's certainly an argument that it's not
    important enough to risk back-patching.

    Comments?
    It might be reasonable to back-patch whatever we decide on into 9.0,
    because it is so new, but I would be reluctant to go back further
    unless we have some evidence that it's bothering people. It seems to
    me that this can could have a lot of worms in it, and I fear that
    there could be several rounds of fixes, which I would rather not
    inflict on users of supposedly-stable branches.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Tom Lane at Oct 20, 2010 at 1:17 am

    Robert Haas writes:
    On Tue, Oct 19, 2010 at 6:14 PM, Tom Lane wrote:
    I think that what we ought to do about it is to stop exposing typelem
    in domains' pg_type rows.  If you want to subscript a domain value, you
    should have to drill down to its base type with getBaseTypeAndTypmod,
    which would also give you the correct typmod to apply.  If we set
    typelem to zero in domain pg_type rows, it shouldn't take too long to
    find any places that are missing this consideration --- the breakage
    will be obvious rather than subtle.
    I fear that this is going to degrade the performance of common cases
    as a way of debugging rare cases.
    We've already accepted the cost of doing getBaseTypeAndTypmod() in a
    whole lot of performance-critical parsing paths, on the off chance that
    the target datatype might be a domain. It's not apparent to me that
    array subscripting is so important as to deserve an exemption from that.
    Especially when not doing so doesn't work.
    Comments?
    It might be reasonable to back-patch whatever we decide on into 9.0,
    because it is so new, but I would be reluctant to go back further
    unless we have some evidence that it's bothering people. It seems to
    me that this can could have a lot of worms in it, and I fear that
    there could be several rounds of fixes, which I would rather not
    inflict on users of supposedly-stable branches.
    Well, we have bug #5717 as evidence that it's bothering people ;-).
    But I agree that the case for back-patching is a bit thin, especially
    if it might result in any user-visible behavioral changes.

    One case that I've realized is a problem is domain constraints at the
    array level:

    regression=# create domain orderedpair as int[2] check (value[1] < value[2]);
    CREATE DOMAIN
    regression=# select array[2,1]::orderedpair; -- expect failure
    ERROR: value for domain orderedpair violates check constraint "orderedpair_check"
    regression=# create table op (f1 orderedpair);
    CREATE TABLE
    regression=# insert into op values (array[1,2]);
    INSERT 0 1
    regression=# insert into op values (array[2,1]); -- expect failure
    ERROR: value for domain orderedpair violates check constraint "orderedpair_check"
    regression=# update op set f1[2] = 0; -- expect failure ... oops
    UPDATE 1
    regression=# select * from op;
    f1
    -------
    {1,0}
    (1 row)

    The reason this fails is that the result of the ArrayRef "f1[2] := 0"
    is considered to be of the domain type, so we don't recheck the
    constraint. As this example demonstrates, that assumption is simply
    broken. The correct implementation, I believe, is that the result
    ought to be considered to be of the base type (ie, just int[]), which
    would cause an implicit re-coercion to the domain type to occur during
    the assignment, offering a chance to re-verify the constraint.

    Right offhand I don't see a way that that sort of change can safely be
    back-patched. The incorrect assumption about the ArrayRef's result
    type is already baked into stored rules in existing databases.

    regards, tom lane
  • Robert Haas at Oct 20, 2010 at 2:06 am

    On Tue, Oct 19, 2010 at 9:17 PM, Tom Lane wrote:
    Robert Haas <robertmhaas@gmail.com> writes:
    On Tue, Oct 19, 2010 at 6:14 PM, Tom Lane wrote:
    I think that what we ought to do about it is to stop exposing typelem
    in domains' pg_type rows.  If you want to subscript a domain value, you
    should have to drill down to its base type with getBaseTypeAndTypmod,
    which would also give you the correct typmod to apply.  If we set
    typelem to zero in domain pg_type rows, it shouldn't take too long to
    find any places that are missing this consideration --- the breakage
    will be obvious rather than subtle.
    I fear that this is going to degrade the performance of common cases
    as a way of debugging rare cases.
    We've already accepted the cost of doing getBaseTypeAndTypmod() in a
    whole lot of performance-critical parsing paths, on the off chance that
    the target datatype might be a domain.  It's not apparent to me that
    array subscripting is so important as to deserve an exemption from that.
    Especially when not doing so doesn't work.
    Hmm... so are there no cases where zeroing out the typelem will cost
    us an otherwise-unnecessary syscache lookup?

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Tom Lane at Oct 20, 2010 at 2:03 pm

    Robert Haas writes:
    On Tue, Oct 19, 2010 at 9:17 PM, Tom Lane wrote:
    We've already accepted the cost of doing getBaseTypeAndTypmod() in a
    whole lot of performance-critical parsing paths, on the off chance that
    the target datatype might be a domain.  It's not apparent to me that
    array subscripting is so important as to deserve an exemption from that.
    Especially when not doing so doesn't work.
    Hmm... so are there no cases where zeroing out the typelem will cost
    us an otherwise-unnecessary syscache lookup?
    My point is that anyplace that is relying on the surface typelem,
    without drilling down to see what the base type is, is wrong.
    So yeah, those lookups are (will be) necessary.

    regards, tom lane
  • Robert Haas at Oct 20, 2010 at 2:47 pm

    On Wed, Oct 20, 2010 at 10:03 AM, Tom Lane wrote:
    Robert Haas <robertmhaas@gmail.com> writes:
    On Tue, Oct 19, 2010 at 9:17 PM, Tom Lane wrote:
    We've already accepted the cost of doing getBaseTypeAndTypmod() in a
    whole lot of performance-critical parsing paths, on the off chance that
    the target datatype might be a domain.  It's not apparent to me that
    array subscripting is so important as to deserve an exemption from that.
    Especially when not doing so doesn't work.
    Hmm... so are there no cases where zeroing out the typelem will cost
    us an otherwise-unnecessary syscache lookup?
    My point is that anyplace that is relying on the surface typelem,
    without drilling down to see what the base type is, is wrong.
    So yeah, those lookups are (will be) necessary.
    OK. In that case, +1 from me.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Tom Lane at Oct 21, 2010 at 4:46 pm

    Robert Haas writes:
    On Wed, Oct 20, 2010 at 10:03 AM, Tom Lane wrote:
    My point is that anyplace that is relying on the surface typelem,
    without drilling down to see what the base type is, is wrong.
    So yeah, those lookups are (will be) necessary.
    OK. In that case, +1 from me.
    I've come across another interesting definitional issue, which is what
    properties should domains have with respect to matching to polymorphic
    arguments. Currently, the polymorphic matching functions take domains
    at face value (ie, without noticing their relationships to their base
    types), with one exception: because they use get_element_type() to
    decide if a type matches ANYARRAY, domains over arrays will be
    considered to match ANYARRAY. This leads to some weird inconsistencies
    and at least one genuine bug. Observe (this is with 9.0.x HEAD):

    regression=# create domain myi as int;
    CREATE DOMAIN
    regression=# select array[1,2] || 3;
    ?column?
    ----------
    {1,2,3}
    (1 row)

    regression=# select array[1,2] || 3::myi;
    ERROR: operator does not exist: integer[] || myi

    In this case, one might expect myi to be automatically downcast to int
    so that it could be matched up with the int array, but that's not
    happening. However:

    regression=# create domain myia as int[];
    CREATE DOMAIN
    regression=# select array[1,2]::myia || 3;
    ?column?
    ----------
    {1,2,3}
    (1 row)

    So we will downcast myia to int[], or at least one might assume that's
    what's happening. But actually it's worse than that: the result of this
    operation is thought to be myia not int[], because myia itself is taken
    as matching ANYARRAY, and the operator result is the same ANYARRAY type.
    Thus, this case goes off the rails completely:

    regression=# create domain myia2 as int[] check(array_length(value,1) = 2);
    CREATE DOMAIN
    regression=# select array[1,2]::myia2;
    array
    -------
    {1,2}
    (1 row)

    regression=# select array[1,2,3]::myia2;
    ERROR: value for domain myia2 violates check constraint "myia2_check"
    regression=# select array[1,2]::myia2 || 3;
    ?column?
    ----------
    {1,2,3}
    (1 row)

    The result of the || is considered to be myia2, as can be seen for
    example this way:

    regression=# create view vvv as select array[1,2]::myia2 || 3 as x;
    CREATE VIEW
    regression=# \d vvv
    View "public.vvv"
    Column | Type | Modifiers
    --------+-------+-----------
    x | myia2 |

    So we have a value that's claimed to belong to the domain, but it
    doesn't meet the domain's constraints.

    What I am intending to do about this in the short run is to leave the
    anyarray-ness tests in the polymorphic-compatibility-checking functions
    as-is. That will mean (with the change in typelem for domains) that a
    domain over array doesn't match ANYARRAY unless you explicitly downcast
    it. I argue that this is consistent with the current behavior of not
    auto-downcasting domains to match the element type of an array. We
    could go back and change it later, but if we do, we should try to make
    both cases provide auto-downcast-when-needed behavior. I have not dug
    into just what code changes would be needed for that. Auto-downcast
    wouldn't be exactly compatible with the current behavior anyway, since
    it would result in a different claimed type for the operator result.

    Comments?

    regards, tom lane
  • Kevin Grittner at Oct 21, 2010 at 5:11 pm

    Tom Lane wrote:

    regression=# select array[1,2] || 3::myi;
    ERROR: operator does not exist: integer[] || myi

    In this case, one might expect myi to be automatically downcast to
    < int so that it could be matched up with the int array, but that's
    not happening.
    I guess it should allow that, although for my uses of domains it's
    hard to see a reasonable use case for it, so that part doesn't
    bother me too much.
    regression=# create domain myia as int[];
    CREATE DOMAIN
    regression=# select array[1,2]::myia || 3;
    ?column?
    ----------
    {1,2,3}
    (1 row)

    So we will downcast myia to int[], or at least one might assume
    that's what's happening. But actually it's worse than that: the
    result of this operation is thought to be myia not int[], because
    myia itself is taken as matching ANYARRAY, and the operator result
    is the same ANYARRAY type.
    That is actually what I would want and expect. Let's say I have an
    array of attorney bar numbers, and I add one more as a literal.
    While an argument could be made that the integer should be cast to
    a bar number before being added to the array, we don't require that
    for an assignment to a simple variable in the domain, so it would be
    surprising to require a cast here, and even more surprising for the
    concatenation to result in an array of primitive integers rather
    than a array of attorney bar numbers.
    regression=# create domain myia2 as int[]
    check(array_length(value,1) = 2);
    CREATE DOMAIN
    regression=# select array[1,2]::myia2 || 3;
    ?column?
    ----------
    {1,2,3}
    (1 row)
    So we have a value that's claimed to belong to the domain, but it
    doesn't meet the domain's constraints.
    Yeah, that's obviously wrong.

    -Kevin
  • Tom Lane at Oct 21, 2010 at 5:36 pm

    "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
    Tom Lane wrote:
    So we will downcast myia to int[], or at least one might assume
    that's what's happening. But actually it's worse than that: the
    result of this operation is thought to be myia not int[], because
    myia itself is taken as matching ANYARRAY, and the operator result
    is the same ANYARRAY type.
    That is actually what I would want and expect. Let's say I have an
    array of attorney bar numbers, and I add one more as a literal.
    While an argument could be made that the integer should be cast to
    a bar number before being added to the array, we don't require that
    for an assignment to a simple variable in the domain, so it would be
    surprising to require a cast here, and even more surprising for the
    concatenation to result in an array of primitive integers rather
    than a array of attorney bar numbers.
    I disagree with that argument: you are confusing an array over a domain
    type with a domain over an array type. In the latter case, the domain
    could have additional constraints (such as the length constraint in my
    other example), and there's no reason to assume that || or other array
    operators would preserve those constraints.

    A perhaps comparable example is

    create domain verysmallint as int check (value < 10);

    select 9::verysmallint + 1;

    The result of the addition is int, not verysmallint, which is why you
    don't get an error.

    From an abstract-data-type point of view, the fact that any of these
    operations are even allowed without an explicit downcast is a bit
    uncool: it exposes the implementation of the domain type, which one
    could argue shouldn't be allowed, at least not without some notational
    marker showing you know what you're doing. But the SQL committee
    seems to have decided to ignore that tradition and allow auto-downcasts.
    Nonetheless, when a domain type is fed to an operator that works on its
    base type, it has to be clearly understood that there *is* an implied
    downcast, and whatever special properties the domain may have had will
    be lost. As far as the operator and its result are concerned, the
    domain is just its base type.

    I'm not against fixing these cases so that auto downcasts happen, I'm
    just saying that it's outside the scope of what I'm going to do in
    response to the current bug.

    regards, tom lane
  • Kevin Grittner at Oct 21, 2010 at 5:51 pm

    Tom Lane wrote:

    you are confusing an array over a domain type with a domain over
    an array type.
    Yes I was. Sorry. Argument withdrawn.

    -Kevin
  • Richard Huxton at Oct 21, 2010 at 6:51 am

    On 20/10/10 01:47, Robert Haas wrote:
    On Tue, Oct 19, 2010 at 6:14 PM, Tom Lanewrote:
    Comments?
    It might be reasonable to back-patch whatever we decide on into 9.0,
    because it is so new, but I would be reluctant to go back further
    unless we have some evidence that it's bothering people. It seems to
    me that this can could have a lot of worms in it, and I fear that
    there could be several rounds of fixes, which I would rather not
    inflict on users of supposedly-stable branches.
    The work-around I applied when I stumbled across this was just to apply
    an explicit cast before my function's RETURN. That neatly solves my
    particular problem (which I at first thought was a formatting issue
    somewhere in my app).

    The real danger with this is the opportunity to end up with occasional
    bad data in tables, quite possibly unnoticed. If I'd come across this in
    an existing system rather than a new app I'm pretty sure it would have
    confused me for a lot longer than it did.
    --
    Richard Huxton
    Archonet Ltd

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedOct 19, '10 at 10:14p
activeOct 21, '10 at 5:51p
posts11
users4
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase