Here is a WIP patch for enhancements to json generation.

First, there is the much_requested json_agg, which will aggregate rows
directly to json. So the following will now work:

      select json_agg(my_table) from mytable;
      select json_agg(q) from (<myquery here>) q;

One open question regarding this feature is whether this should return
NULL or '[]' for 0 rows. Currently it returns NULL but I could be
convinced to return '[]', and the change would be very small.

Next is to_json(), which will turn any value into json, so we're no
longer restricted to rows and arrays.

Non-builtin types are now searched for a cast to json, and if it exists
it is used instead of the type's text representation. I didn't add a
special type to look for a cast to, as was discussed before, as it
seemed a bit funky and unnecessary. It can easily be added, but I'm
still not convinced it's a good idea. Note that this is only done for
types that aren't builtin - we know how to turn all of those into json
without needing to look for a cast.

Along with this there is an hstore_to_json() function added to the
hstore module, and a cast from hstore to json that uses it. This
function treats every value in the hstore as a string. There is also a
function with the working title of hstore_to_json_loose() that does a
heuristic conversion that treats values of 't' and 'f' as booleans, and
strings that look like numbers as numbers unless they start with a
leading 0 followed by another digit (could be zip codes, phone numbers
etc.) The difference between these is illustrated here (notice that
quoted '"t"' becomes unquoted 'true' and quoted '"1"' becomes '1'):

     andrew=# select json_agg(q) from foo q;
                                  json_agg
     -----------------------------------------------------------------
       [{"a":"a","b":1,"h":{"c": "t", "d": null, "q": "1", "x": "y"}}]
     (1 row)

     andrew=# select json_agg(q) from (select a, b, hstore_to_json_loose(h) as h from foo) q;
                                  json_agg
     ----------------------------------------------------------------
       [{"a":"a","b":1,"h":{"c": true, "d": null, "q": 1, "x": "y"}}]
     (1 row)

Note: this patch will need a change in the oids used for the new functions if applied against git tip, as they have been overtaken by time.


Comments welcome.


cheers

andrew

Search Discussions

  • Andrew Dunstan at Nov 22, 2012 at 4:24 am

    On 11/21/2012 03:16 PM, Andrew Dunstan wrote:
    Here is a WIP patch for enhancements to json generation.

    First, there is the much_requested json_agg, which will aggregate rows
    directly to json. So the following will now work:

    select json_agg(my_table) from mytable;
    select json_agg(q) from (<myquery here>) q;

    One open question regarding this feature is whether this should return
    NULL or '[]' for 0 rows. Currently it returns NULL but I could be
    convinced to return '[]', and the change would be very small.

    Next is to_json(), which will turn any value into json, so we're no
    longer restricted to rows and arrays.

    Non-builtin types are now searched for a cast to json, and if it
    exists it is used instead of the type's text representation. I didn't
    add a special type to look for a cast to, as was discussed before, as
    it seemed a bit funky and unnecessary. It can easily be added, but I'm
    still not convinced it's a good idea. Note that this is only done for
    types that aren't builtin - we know how to turn all of those into json
    without needing to look for a cast.

    Along with this there is an hstore_to_json() function added to the
    hstore module, and a cast from hstore to json that uses it. This
    function treats every value in the hstore as a string. There is also a
    function with the working title of hstore_to_json_loose() that does a
    heuristic conversion that treats values of 't' and 'f' as booleans,
    and strings that look like numbers as numbers unless they start with a
    leading 0 followed by another digit (could be zip codes, phone numbers
    etc.) The difference between these is illustrated here (notice that
    quoted '"t"' becomes unquoted 'true' and quoted '"1"' becomes '1'):

    andrew=# select json_agg(q) from foo q;
    json_agg
    -----------------------------------------------------------------
    [{"a":"a","b":1,"h":{"c": "t", "d": null, "q": "1", "x": "y"}}]
    (1 row)

    andrew=# select json_agg(q) from (select a, b,
    hstore_to_json_loose(h) as h from foo) q;
    json_agg
    ----------------------------------------------------------------
    [{"a":"a","b":1,"h":{"c": true, "d": null, "q": 1, "x": "y"}}]
    (1 row)

    Note: this patch will need a change in the oids used for the new
    functions if applied against git tip, as they have been overtaken by
    time.


    Comments welcome.

    Updated patch that works with git tip and has regression tests.

    cheers

    andrew
  • Dimitri Fontaine at Nov 22, 2012 at 10:55 am

    Andrew Dunstan writes:
    Here is a WIP patch for enhancements to json generation.

    First, there is the much_requested json_agg, which will aggregate rows
    directly to json. So the following will now work:

    select json_agg(my_table) from mytable;
    select json_agg(q) from (<myquery here>) q;
    Awesome, thanks!

    How do you handle the nesting of the source elements? I would expect a
    variant of the aggregate that takes two input parameters, the datum and
    the current nesting level.

    Consider a tree table using parent_id and a recursive query to display
    the tree. You typically handle the nesting with an accumulator and a
    call to repeat() to prepend some spaces before the value columns. What
    about passing that nesting level (integer) to the json_agg()?

    Here's a worked out example:

         CREATE TABLE parent_child (
             parent_id integer NOT NULL,
             this_node_id integer NULL
         );

         INSERT INTO parent_child (parent_id, this_node_id) VALUES (0, 1);
         INSERT INTO parent_child (parent_id, this_node_id) VALUES (1, 2);
         INSERT INTO parent_child (parent_id, this_node_id) VALUES (1, 3);
         INSERT INTO parent_child (parent_id, this_node_id) VALUES (1, 4);
         INSERT INTO parent_child (parent_id, this_node_id) VALUES (2, 5);
         INSERT INTO parent_child (parent_id, this_node_id) VALUES (2, 6);
         INSERT INTO parent_child (parent_id, this_node_id) VALUES (4, 7);
         INSERT INTO parent_child (parent_id, this_node_id) VALUES (4, 8);
         INSERT INTO parent_child (parent_id, this_node_id) VALUES (4, 9);
         INSERT INTO parent_child (parent_id, this_node_id) VALUES (9, 10);


         WITH RECURSIVE tree(id, level, parents) AS (
             SELECT this_node_id as id, 0 as level, '{}'::int[] as parents
               FROM parent_child
              WHERE parent_id = 0

             UNION ALL

             SELECT this_node_id as id, t.level + 1, t.parents || c.parent_id
               FROM parent_child c
               JOIN tree t ON t.id = c.parent_id
         )
         SELECT json_agg(id, level)
           FROM tree;

    I've left the parents column in the query above as a debug facility, but
    it's not needed in that case.

    Regards,
    --
    Dimitri Fontaine
    http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
  • Andrew Dunstan at Nov 22, 2012 at 12:37 pm

    On 11/22/2012 05:54 AM, Dimitri Fontaine wrote:
    Andrew Dunstan <andrew@dunslane.net> writes:
    Here is a WIP patch for enhancements to json generation.

    First, there is the much_requested json_agg, which will aggregate rows
    directly to json. So the following will now work:

    select json_agg(my_table) from mytable;
    select json_agg(q) from (<myquery here>) q;
    Awesome, thanks!

    How do you handle the nesting of the source elements? I would expect a
    variant of the aggregate that takes two input parameters, the datum and
    the current nesting level.

    Consider a tree table using parent_id and a recursive query to display
    the tree. You typically handle the nesting with an accumulator and a
    call to repeat() to prepend some spaces before the value columns. What
    about passing that nesting level (integer) to the json_agg()?

    Here's a worked out example:

    CREATE TABLE parent_child (
    parent_id integer NOT NULL,
    this_node_id integer NULL
    );

    INSERT INTO parent_child (parent_id, this_node_id) VALUES (0, 1);
    INSERT INTO parent_child (parent_id, this_node_id) VALUES (1, 2);
    INSERT INTO parent_child (parent_id, this_node_id) VALUES (1, 3);
    INSERT INTO parent_child (parent_id, this_node_id) VALUES (1, 4);
    INSERT INTO parent_child (parent_id, this_node_id) VALUES (2, 5);
    INSERT INTO parent_child (parent_id, this_node_id) VALUES (2, 6);
    INSERT INTO parent_child (parent_id, this_node_id) VALUES (4, 7);
    INSERT INTO parent_child (parent_id, this_node_id) VALUES (4, 8);
    INSERT INTO parent_child (parent_id, this_node_id) VALUES (4, 9);
    INSERT INTO parent_child (parent_id, this_node_id) VALUES (9, 10);


    WITH RECURSIVE tree(id, level, parents) AS (
    SELECT this_node_id as id, 0 as level, '{}'::int[] as parents
    FROM parent_child
    WHERE parent_id = 0

    UNION ALL

    SELECT this_node_id as id, t.level + 1, t.parents || c.parent_id
    FROM parent_child c
    JOIN tree t ON t.id = c.parent_id
    )
    SELECT json_agg(id, level)
    FROM tree;

    I've left the parents column in the query above as a debug facility, but
    it's not needed in that case.
    the function only takes a single argument and aggregates all the values
    into a json array. If the arguments are composites they will produce
    json objects.

    People complained that to get a resultset as json you have to do in 9.2

          select array_to_json(array_agg(q)) ...

    which is both a bit cumbersome and fairly inefficient. json_agg(q) is
    equivalent to the above expression but is both simpler and much more
    efficient.

    If you want a tree structured object you'll need to construct it
    yourself - this function won't do the nesting for you. That's beyond its
    remit.

    cheers

    andrew
  • Merlin Moncure at Nov 26, 2012 at 6:05 pm

    On Thu, Nov 22, 2012 at 4:54 AM, Dimitri Fontaine wrote:
    Andrew Dunstan <andrew@dunslane.net> writes:
    Here is a WIP patch for enhancements to json generation.

    First, there is the much_requested json_agg, which will aggregate rows
    directly to json. So the following will now work:

    select json_agg(my_table) from mytable;
    select json_agg(q) from (<myquery here>) q;
    Awesome, thanks!

    How do you handle the nesting of the source elements? I would expect a
    variant of the aggregate that takes two input parameters, the datum and
    the current nesting level.

    Consider a tree table using parent_id and a recursive query to display
    the tree. You typically handle the nesting with an accumulator and a
    call to repeat() to prepend some spaces before the value columns. What
    about passing that nesting level (integer) to the json_agg()?

    Here's a worked out example:

    CREATE TABLE parent_child (
    parent_id integer NOT NULL,
    this_node_id integer NULL
    );

    INSERT INTO parent_child (parent_id, this_node_id) VALUES (0, 1);
    INSERT INTO parent_child (parent_id, this_node_id) VALUES (1, 2);
    INSERT INTO parent_child (parent_id, this_node_id) VALUES (1, 3);
    INSERT INTO parent_child (parent_id, this_node_id) VALUES (1, 4);
    INSERT INTO parent_child (parent_id, this_node_id) VALUES (2, 5);
    INSERT INTO parent_child (parent_id, this_node_id) VALUES (2, 6);
    INSERT INTO parent_child (parent_id, this_node_id) VALUES (4, 7);
    INSERT INTO parent_child (parent_id, this_node_id) VALUES (4, 8);
    INSERT INTO parent_child (parent_id, this_node_id) VALUES (4, 9);
    INSERT INTO parent_child (parent_id, this_node_id) VALUES (9, 10);


    WITH RECURSIVE tree(id, level, parents) AS (
    SELECT this_node_id as id, 0 as level, '{}'::int[] as parents
    FROM parent_child
    WHERE parent_id = 0

    UNION ALL

    SELECT this_node_id as id, t.level + 1, t.parents || c.parent_id
    FROM parent_child c
    JOIN tree t ON t.id = c.parent_id
    )
    SELECT json_agg(id, level)
    FROM tree;

    I've left the parents column in the query above as a debug facility, but
    it's not needed in that case.
    I don't think there is any way a json_agg() function could reasonably
    do this. The only possible dataset it could work on would be for
    homogeneously typed array-like data which is not very interesting for
    the broader case of nested json productions. I think the right way to
    do it is to work out the precise structure in sql and do the
    transformation directly on the type using the already existing
    transformation functions.

    That said, recursive structures are a pain presently because postgres
    composite types are not allowed to be recursive. So ISTM getting that
    restriction relaxed is the way to go; then you build it in sql and
    just fire whatever xxx_to_json happens to be the most
    appropriate...then your example would be a snap.

    merlin
  • Hannu Krosing at Nov 26, 2012 at 8:09 pm

    On 11/22/2012 11:54 AM, Dimitri Fontaine wrote:
    Andrew Dunstan <andrew@dunslane.net> writes:
    Here is a WIP patch for enhancements to json generation.

    First, there is the much_requested json_agg, which will aggregate rows
    directly to json. So the following will now work:

    select json_agg(my_table) from mytable;
    select json_agg(q) from (<myquery here>) q;
    Awesome, thanks!

    How do you handle the nesting of the source elements? I would expect a
    variant of the aggregate that takes two input parameters, the datum and
    the current nesting level.

    Consider a tree table using parent_id and a recursive query to display
    the tree. You typically handle the nesting with an accumulator and a
    call to repeat() to prepend some spaces before the value columns. What
    about passing that nesting level (integer) to the json_agg()?
    It still would not produxe nesting, just a nicer format.

    If you want real nesting, you may want a version of my pl/python function
    row-with-all-dependents-by-foreign-key-to-json()

    which outputs a table row and then recursively all rows from other
       (or the same) table which have a foreign key relationship to this row

    I use it to backup removed objects.

    I would love to have something similar as a built-in function, though
    the current version
    has some limitations and lacks some checks, like check for FK loops.


    Function follows:
    -------------------------------------------------------
    CREATE OR REPLACE FUNCTION record_to_json_with_detail(table_name text,
    pk_value int) RETURNS text AS $$

    import json,re

    def fk_info(table_name):
          fkplan = plpy.prepare("""
          SELECT conrelid::regclass as reftable,
                 pg_get_constraintdef(c.oid) as condef
            FROM pg_constraint c
           WHERE c.confrelid::regclass = $1::regclass
             AND c.contype = 'f'
          """, ("text",))
          cdefrx = re.compile('FOREIGN KEY [(](.*)[)] REFERENCES .*[(](.*)[)].*')
          fkres = plpy.execute(fkplan, (table_name,))
          for row in fkres:
              reffields, thisfields = cdefrx.match(row['condef']).groups()
              yield thisfields, row['reftable'],reffields

    def select_from_table_by_col(table_name, col_name, col_value):
          qplan = plpy.prepare('select * from %s where %s = $1' %
    (table_name, col_name), ('int',))
          return plpy.execute(qplan, (col_value,))

    def recursive_rowdict(table_name, row_dict):
          rd = dict([(a,b) for (a,b) in row_dict.items() if b is not None]) #
    skip NULLs
          rd['__row_class__'] = table_name
          for id_col, ref_tab, ref_col in fk_info(table_name):
              q2res = select_from_table_by_col(ref_tab,
    ref_col,row_dict[id_col])
              if q2res:
                  try:
                      rd['__refs__::' + id_col] +=
    [recursive_rowdict(ref_tab,row) for row in q2res]
                  except KeyError:
                      rd['__refs__::' + id_col] =
    [recursive_rowdict(ref_tab,row) for row in q2res]
          return rd

    q1res = select_from_table_by_col(table_name, 'id', pk_value)
    return json.dumps(recursive_rowdict(table_name, q1res[0]), indent=3)
    $$ LANGUAGE plpythonu;

    create table test1(id serial primary key, selfkey int references test1,
    data text);
    create table test2(id serial primary key, test1key int references test1,
    data text);

    insert into test1 values(1,null,'top');
    insert into test1 values(2,1,'lvl1');
    insert into test2 values(1,1,'lvl1-2');
    insert into test2 values(2,2,'lvl2-2');

    select record_to_json_with_detail('test1',1);
              record_to_json_with_detail
    -------------------------------------------
       {
          "__row_class__": "test1",
          "data": "top",
          "id": 1,
          "__refs__::id": [
             {
                "__row_class__": "test1",
                "selfkey": 1,
                "data": "lvl1",
                "id": 2,
                "__refs__::id": [
                   {
                      "__row_class__": "test2",
                      "test1key": 2,
                      "data": "lvl2-2",
                      "id": 2
                   }
                ]
             },
             {
                "__row_class__": "test2",
                "test1key": 1,
                "data": "lvl1-2",
                "id": 1
             }
          ]
       }
    (1 row)

    Time: 6.576 ms

    ---------------------------------------
    Hannu Krosing
  • Robert Haas at Nov 26, 2012 at 3:55 pm

    On Wed, Nov 21, 2012 at 3:16 PM, Andrew Dunstan wrote:
    Non-builtin types are now searched for a cast to json, and if it exists it
    is used instead of the type's text representation. I didn't add a special
    type to look for a cast to, as was discussed before, as it seemed a bit
    funky and unnecessary. It can easily be added, but I'm still not convinced
    it's a good idea. Note that this is only done for types that aren't builtin
    - we know how to turn all of those into json without needing to look for a
    cast.
    The place where I fear this will cause problems is with non-core
    text-like datatypes, such as citext. For example, I wonder if
    creating a cast from citext to json - which seems like a sensible
    thing to want to do for other reasons - changes the semantics of this
    function when applied to citext objects.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Andrew Dunstan at Nov 26, 2012 at 4:44 pm

    On 11/26/2012 10:55 AM, Robert Haas wrote:
    On Wed, Nov 21, 2012 at 3:16 PM, Andrew Dunstan wrote:
    Non-builtin types are now searched for a cast to json, and if it exists it
    is used instead of the type's text representation. I didn't add a special
    type to look for a cast to, as was discussed before, as it seemed a bit
    funky and unnecessary. It can easily be added, but I'm still not convinced
    it's a good idea. Note that this is only done for types that aren't builtin
    - we know how to turn all of those into json without needing to look for a
    cast.
    The place where I fear this will cause problems is with non-core
    text-like datatypes, such as citext. For example, I wonder if
    creating a cast from citext to json - which seems like a sensible
    thing to want to do for other reasons - changes the semantics of this
    function when applied to citext objects.

    I don't understand why you would want to create such a cast. If the cast
    doesn't exist it will do exactly what it does now, i.e. use the type's
    output function and then json quote and escape it, which in the case of
    citext is the Right Thing (tm):

         andrew=# select to_json('foo"bar'::citext);
            to_json
         ------------
           "foo\"bar"


    cheers

    andrew
  • Robert Haas at Nov 26, 2012 at 5:31 pm

    On Mon, Nov 26, 2012 at 11:43 AM, Andrew Dunstan wrote:
    I don't understand why you would want to create such a cast. If the cast
    doesn't exist it will do exactly what it does now, i.e. use the type's
    output function and then json quote and escape it, which in the case of
    citext is the Right Thing (tm):

    andrew=# select to_json('foo"bar'::citext);
    to_json
    ------------
    "foo\"bar"
    I'm not sure either, but setting up a system where seemingly innocuous
    actions can in fact have surprising and not-easily-fixable
    consequences in other parts of the system doesn't seem like good
    design to me. Of course, maybe I'm misunderstanding what will happen;
    I haven't actually tested it myself.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Andrew Dunstan at Nov 27, 2012 at 7:38 pm

    On 11/26/2012 12:31 PM, Robert Haas wrote:
    On Mon, Nov 26, 2012 at 11:43 AM, Andrew Dunstan wrote:
    I don't understand why you would want to create such a cast. If the cast
    doesn't exist it will do exactly what it does now, i.e. use the type's
    output function and then json quote and escape it, which in the case of
    citext is the Right Thing (tm):

    andrew=# select to_json('foo"bar'::citext);
    to_json
    ------------
    "foo\"bar"
    I'm not sure either, but setting up a system where seemingly innocuous
    actions can in fact have surprising and not-easily-fixable
    consequences in other parts of the system doesn't seem like good
    design to me. Of course, maybe I'm misunderstanding what will happen;
    I haven't actually tested it myself.

    I'm all for caution, but the argument here seems a bit nebulous. We
    could create a sort of auxiliary type, as has been previously suggested,
    that would be in all respects the same as the json type except that it
    would be the target of the casts that would be used in to_json() and
    friends. But, that's a darned ugly thing to have to do, so I'd want a
    good concrete reason for doing it. Right now I'm having a hard time
    envisioning a problem that could be caused by just using the
    straightforward solution that's in my latest patch.

    cheers

    andrew
  • Peter Eisentraut at Nov 26, 2012 at 7:12 pm

    On 11/21/12 3:16 PM, Andrew Dunstan wrote:
    One open question regarding this feature is whether this should return
    NULL or '[]' for 0 rows. Currently it returns NULL but I could be
    convinced to return '[]', and the change would be very small.
    Although my intuition would be [], the existing concatenation-like
    aggregates return null for no input rows, so this probably ought to be
    consistent with those.
  • Hannu Krosing at Nov 26, 2012 at 7:46 pm

    On 11/26/2012 08:12 PM, Peter Eisentraut wrote:
    On 11/21/12 3:16 PM, Andrew Dunstan wrote:
    One open question regarding this feature is whether this should return
    NULL or '[]' for 0 rows. Currently it returns NULL but I could be
    convinced to return '[]', and the change would be very small.
    Although my intuition would be [], the existing concatenation-like
    aggregates return null for no input rows, so this probably ought to be
    consistent with those.
    In some previous mail Tom Lane claimed that by SQL standard
    either an array of all NULLs or a record with all fields NULLs (I
    don't remember which) is also considered NULL. If this is true,
    then an empty array - which can be said to consist of nothing
    but NULLs - should itself be NULL.

    If this is so, than the existing behaviour of returning NULL in such
    cases is what standard requires.

    Hannu Krosing
  • Andrew Dunstan at Nov 26, 2012 at 7:54 pm

    On 11/26/2012 02:46 PM, Hannu Krosing wrote:
    On 11/26/2012 08:12 PM, Peter Eisentraut wrote:
    On 11/21/12 3:16 PM, Andrew Dunstan wrote:
    One open question regarding this feature is whether this should return
    NULL or '[]' for 0 rows. Currently it returns NULL but I could be
    convinced to return '[]', and the change would be very small.
    Although my intuition would be [], the existing concatenation-like
    aggregates return null for no input rows, so this probably ought to be
    consistent with those.
    In some previous mail Tom Lane claimed that by SQL standard
    either an array of all NULLs or a record with all fields NULLs (I
    don't remember which) is also considered NULL. If this is true,
    then an empty array - which can be said to consist of nothing
    but NULLs - should itself be NULL.

    If this is so, than the existing behaviour of returning NULL in such
    cases is what standard requires.

    That would be more relevant if we were talking about postgres arrays,
    but the '[]' here would not be a postgres array - it would be a piece of
    json text.

    But in any case, the consensus seems to be to return null, and on the
    principle of doing the least work required I'm happy to comply.

    cheers

    andrew
  • Tom Lane at Nov 26, 2012 at 8:11 pm

    Hannu Krosing writes:
    In some previous mail Tom Lane claimed that by SQL standard
    either an array of all NULLs or a record with all fields NULLs (I
    don't remember which) is also considered NULL. If this is true,
    then an empty array - which can be said to consist of nothing
    but NULLs - should itself be NULL.
    What I think you're referring to is that the spec says that "foo IS
    NULL" should return true if foo is a record containing only null fields.
    That's a fairly narrow statement. It does NOT say that NULL and
    (NULL,NULL,...) are indistinguishable for all purposes; only that
    this particular test doesn't distinguish them. Also I don't think they
    have the same statement for arrays.

    The analogy to other aggregates is probably a better thing to argue
    from. On the other hand, I don't know anyone outside the SQL standards
    committee who thinks it's actually a good idea that SUM() across no rows
    returns null rather than zero.

        regards, tom lane
  • Hannu Krosing at Nov 26, 2012 at 8:29 pm

    On 11/26/2012 09:05 PM, Tom Lane wrote:
    Hannu Krosing <hannu@2ndquadrant.com> writes:
    In some previous mail Tom Lane claimed that by SQL standard
    either an array of all NULLs or a record with all fields NULLs (I
    don't remember which) is also considered NULL. If this is true,
    then an empty array - which can be said to consist of nothing
    but NULLs - should itself be NULL.
    What I think you're referring to is that the spec says that "foo IS
    NULL" should return true if foo is a record containing only null fields.
    Is this requirement recursive ?

    That is , should

    ROW(NULL, NULL, ROW(NULL, ROW(NULL, NULL))) IS NULL
    also be true ?

    Currently PostgreSQL does this kind of IS NULL for "simple" rows

    hannu=# SELECT ROW(NULL, NULL) IS NULL;
       ?column?
    ----------
       t
    (1 row)

    and also for first level row types

    hannu=# SELECT ROW(NULL, ROW(NULL, NULL)) IS NULL;
       ?column?
    ----------
       t
    (1 row)

    but then mysteriously stops working at third level

    hannu=# SELECT ROW(NULL, NULL, ROW(NULL, ROW(NULL, NULL))) IS NULL;
       ?column?
    ----------
       f
    (1 row)

    That's a fairly narrow statement. It does NOT say that NULL and
    (NULL,NULL,...) are indistinguishable for all purposes; only that
    this particular test doesn't distinguish them. Also I don't think they
    have the same statement for arrays.

    The analogy to other aggregates is probably a better thing to argue
    from. On the other hand, I don't know anyone outside the SQL standards
    committee who thinks it's actually a good idea that SUM() across no rows
    returns null rather than zero.
    Might be done in order to be in sync with other aggregates - for
    example the "return NULL for no rows" behaviour makes perfect
    sense for MIN(), AVG(), etc.

    ------------------------
    Hannu
  • Tom Lane at Nov 26, 2012 at 8:42 pm

    Hannu Krosing writes:
    On 11/26/2012 09:05 PM, Tom Lane wrote:
    The analogy to other aggregates is probably a better thing to argue
    from. On the other hand, I don't know anyone outside the SQL standards
    committee who thinks it's actually a good idea that SUM() across no rows
    returns null rather than zero.
    Might be done in order to be in sync with other aggregates - for
    example the "return NULL for no rows" behaviour makes perfect
    sense for MIN(), AVG(), etc.
    Well, if they'd made COUNT() of no rows return null, then I'd agree that
    they were pursuing consistency. As it stands, it's neither consistent
    nor very sensible.

        regards, tom lane
  • Bruce Momjian at Jul 4, 2013 at 4:26 pm

    On Mon, Nov 26, 2012 at 09:29:19PM +0100, Hannu Krosing wrote:
    On 11/26/2012 09:05 PM, Tom Lane wrote:
    Hannu Krosing <hannu@2ndquadrant.com> writes:
    In some previous mail Tom Lane claimed that by SQL standard
    either an array of all NULLs or a record with all fields NULLs (I
    don't remember which) is also considered NULL. If this is true,
    then an empty array - which can be said to consist of nothing
    but NULLs - should itself be NULL.
    What I think you're referring to is that the spec says that "foo IS
    NULL" should return true if foo is a record containing only null fields.
    Is this requirement recursive ?

    That is , should

    ROW(NULL, NULL, ROW(NULL, ROW(NULL, NULL))) IS NULL
    also be true ?

    Currently PostgreSQL does this kind of IS NULL for "simple" rows

    hannu=# SELECT ROW(NULL, NULL) IS NULL;
    ?column?
    ----------
    t
    (1 row)

    and also for first level row types

    hannu=# SELECT ROW(NULL, ROW(NULL, NULL)) IS NULL;
    ?column?
    ----------
    t
    (1 row)

    but then mysteriously stops working at third level

    hannu=# SELECT ROW(NULL, NULL, ROW(NULL, ROW(NULL, NULL))) IS NULL;
    ?column?
    ----------
    f
    (1 row)
    I finally had time to look at this, and it is surprising. I used
    EXPLAIN VERBOSE to see what the optimizer was outputting:

      EXPLAIN VERBOSE SELECT ROW(null) IS NULL;
                      QUERY PLAN
      ------------------------------------------
       Result (cost=0.00..0.01 rows=1 width=0)
    --> Output: true

      EXPLAIN VERBOSE SELECT ROW(ROW(null)) IS NULL;
                      QUERY PLAN
      ------------------------------------------
       Result (cost=0.00..0.01 rows=1 width=0)
    --> Output: (ROW(NULL::unknown) IS NULL)

      EXPLAIN VERBOSE SELECT ROW(ROW(ROW(null))) IS NULL;
                       QUERY PLAN
      ---------------------------------------------
       Result (cost=0.00..0.01 rows=1 width=0)
    --> Output: (ROW(ROW(NULL::unknown)) IS NULL)

    The first test outputs a constant, 'true'. The second test is
    ROW(NULL::unknown) because the inner ROW(NULL) was converted to
    NULL:unknown. The third one, which returns false (wrong), happens
    because you have ROW embedded in ROW, which the optimizer can't process,
    and the executor can't either.

    I developed the attached patch which properly recurses into ROW()
    records checking for NULLs; you can see it returns the right answer in
    all cases (and constant folds too):

      test=> EXPLAIN VERBOSE SELECT ROW(null) IS NULL;
                      QUERY PLAN
      ------------------------------------------
       Result (cost=0.00..0.01 rows=1 width=0)
    --> Output: true
      (2 rows)

      test=> EXPLAIN VERBOSE SELECT ROW(ROW(null)) IS NULL;
                      QUERY PLAN
      ------------------------------------------
       Result (cost=0.00..0.01 rows=1 width=0)
    --> Output: true
      (2 rows)

      test=> EXPLAIN VERBOSE SELECT ROW(ROW(ROW(null))) IS NULL;
                      QUERY PLAN
      ------------------------------------------
       Result (cost=0.00..0.01 rows=1 width=0)
    --> Output: true
      (2 rows)

    You might think the problem is only with constants, but it extends to
    column values too (non-patched server):

      CREATE TABLE test (x INT);
      CREATE TABLE

      INSERT INTO test VALUES (1), (NULL);
      INSERT 0 2

      SELECT ROW(x) IS NULL FROM test;
       ?column?
      ----------
       f
       t

      SELECT ROW(ROW(x)) IS NULL FROM test;
       ?column?
      ----------
       f
       t

      SELECT ROW(ROW(ROW(x))) IS NULL FROM test;
       ?column?
      ----------
    --> f
    --> f

    With the patch, that works too:

      SELECT ROW(ROW(ROW(x))) IS NULL FROM test;
       ?column?
      ----------
       f
       t

    The optimizer seems like the right place to fix this, per my patch. It
    already flattens IS NULL tests into a series of AND clauses, and now by
    recursing, it handles nested ROW values properly too.

    This fix would be for head only.

    --
       Bruce Momjian <bruce@momjian.us> http://momjian.us
       EnterpriseDB http://enterprisedb.com

       + It's impossible for everything to be true. +
  • Tom Lane at Jul 4, 2013 at 8:29 pm

    Bruce Momjian writes:
    I developed the attached patch which properly recurses into ROW()
    records checking for NULLs; you can see it returns the right answer in
    all cases (and constant folds too):
    My recollection of the previous discussion is that we didn't have
    consensus on what the "right" behavior is, so I'm not sure you can just
    assert that this patch is right. In any case this is only touching the
    tip of the iceberg. If we intend that rows of nulls should be null,
    then we have got issues with, for example, NOT NULL column constraint
    checks, which don't have any such recursion built into them. I think
    the same is true for plpgsql variable NOT NULL restrictions, and there
    are probably some other places.
    The optimizer seems like the right place to fix this, per my patch.
    No, it isn't, or at least it's far from the only place. If we're going
    to change this, we would also want to change the behavior of tests on
    RECORD values, which is something that would have to happen at runtime.

        regards, tom lane
  • Alvaro Herrera at Jul 4, 2013 at 9:06 pm

    Tom Lane wrote:

    My recollection of the previous discussion is that we didn't have
    consensus on what the "right" behavior is, so I'm not sure you can just
    assert that this patch is right. In any case this is only touching the
    tip of the iceberg. If we intend that rows of nulls should be null,
    then we have got issues with, for example, NOT NULL column constraint
    checks, which don't have any such recursion built into them.
    FWIW if changing the behavior of NOT NULL constraints is desired, I
    still have the patch to catalogue them around, if anyone wants to play
    around. I haven't gotten around to finishing it up, yet :-(

    --
    Álvaro Herrera http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
  • Peter Eisentraut at Jul 5, 2013 at 1:39 pm

    On 7/4/13 5:06 PM, Alvaro Herrera wrote:
    FWIW if changing the behavior of NOT NULL constraints is desired, I
    still have the patch to catalogue them around, if anyone wants to play
    around. I haven't gotten around to finishing it up, yet :-(
    If your latest patch isn't publicly available, I'd like to see it. I
    might work on that later this year.
  • Alvaro Herrera at Aug 2, 2013 at 4:03 am

    Peter Eisentraut wrote:
    On 7/4/13 5:06 PM, Alvaro Herrera wrote:
    FWIW if changing the behavior of NOT NULL constraints is desired, I
    still have the patch to catalogue them around, if anyone wants to play
    around. I haven't gotten around to finishing it up, yet :-(
    If your latest patch isn't publicly available, I'd like to see it. I
    might work on that later this year.
    Here it is. Note that it is quite incomplete: there are parts that are
    not even close to compiling. I think I was trying to figure out how to
    determine whether a column was of a composite type.

    I might get back to it eventually, so if you start working on it, do let
    me know.

    --
    Álvaro Herrera http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
  • Bruce Momjian at Jul 5, 2013 at 2:21 pm

    On Thu, Jul 4, 2013 at 04:29:20PM -0400, Tom Lane wrote:
    Bruce Momjian <bruce@momjian.us> writes:
    I developed the attached patch which properly recurses into ROW()
    records checking for NULLs; you can see it returns the right answer in
    all cases (and constant folds too):
    My recollection of the previous discussion is that we didn't have
    consensus on what the "right" behavior is, so I'm not sure you can just
    assert that this patch is right. In any case this is only touching the
    tip of the iceberg. If we intend that rows of nulls should be null,
    then we have got issues with, for example, NOT NULL column constraint
    checks, which don't have any such recursion built into them. I think
    the same is true for plpgsql variable NOT NULL restrictions, and there
    are probably some other places.
    Well we have three cases:

      1 SELECT ROW(NULL) IS NULL;
      2 SELECT ROW(ROW(NULL)) IS NULL;
      3 SELECT ROW(ROW(ROW(NULL))) IS NULL;

    I think we could have them all return false, or all true, or the first
    one true, and the rest false. What I don't think we can justify is 1
    and 2 as true, and 3 false.

    Can someone show how those others behave? I don't know enough to test
    it.
    The optimizer seems like the right place to fix this, per my patch.
    No, it isn't, or at least it's far from the only place. If we're going
    to change this, we would also want to change the behavior of tests on
    RECORD values, which is something that would have to happen at runtime.
    I checked RECORD and that behaves with recursion:

      SELECT RECORD(NULL) IS NULL;
       ?column?
      ----------
       t

      SELECT RECORD(RECORD(NULL)) IS NULL;
       ?column?
      ----------
       t

      SELECT RECORD(RECORD(RECORD(NULL))) IS NULL;
       ?column?
      ----------
       t

    Am I missing something?

    --
       Bruce Momjian <bruce@momjian.us> http://momjian.us
       EnterpriseDB http://enterprisedb.com

       + It's impossible for everything to be true. +
  • Tom Lane at Jul 5, 2013 at 3:04 pm

    Bruce Momjian writes:
    On Thu, Jul 4, 2013 at 04:29:20PM -0400, Tom Lane wrote:
    No, it isn't, or at least it's far from the only place. If we're going
    to change this, we would also want to change the behavior of tests on
    RECORD values, which is something that would have to happen at runtime.
    I checked RECORD and that behaves with recursion:
    Apparently you don't even understand the problem. All of these examples
    you're showing are constants. Try something like

      declare r record;
      ...
      select ... into r ...
      if (r is null) ...

        regards, tom lane
  • Bruce Momjian at Jul 5, 2013 at 3:48 pm

    On Fri, Jul 5, 2013 at 11:03:56AM -0400, Tom Lane wrote:
    Bruce Momjian <bruce@momjian.us> writes:
    On Thu, Jul 4, 2013 at 04:29:20PM -0400, Tom Lane wrote:
    No, it isn't, or at least it's far from the only place. If we're going
    to change this, we would also want to change the behavior of tests on
    RECORD values, which is something that would have to happen at runtime.
    I checked RECORD and that behaves with recursion:
    Apparently you don't even understand the problem. All of these examples
    you're showing are constants. Try something like

    declare r record;
    ...
    select ... into r ...
    if (r is null) ...
    Not aparently --- I already said I didn't understand the problem.
    Should I just mark this as a TODO?

    --
       Bruce Momjian <bruce@momjian.us> http://momjian.us
       EnterpriseDB http://enterprisedb.com

       + It's impossible for everything to be true. +
  • Tom Lane at Jul 5, 2013 at 4:44 pm

    Bruce Momjian writes:
    Should I just mark this as a TODO?
    I thought it was on the list already.

        regards, tom lane
  • Bruce Momjian at Jul 5, 2013 at 5:04 pm

    On Fri, Jul 5, 2013 at 12:43:57PM -0400, Tom Lane wrote:
    Bruce Momjian <bruce@momjian.us> writes:
    Should I just mark this as a TODO?
    I thought it was on the list already.
    We only have:

      Improve handling of NULLs in arrays

    and some pl/pgsql items regarding nulls.

    --
       Bruce Momjian <bruce@momjian.us> http://momjian.us
       EnterpriseDB http://enterprisedb.com

       + It's impossible for everything to be true. +
  • Bruce Momjian at Jul 7, 2013 at 5:04 pm

    On Fri, Jul 5, 2013 at 11:03:56AM -0400, Tom Lane wrote:
    Bruce Momjian <bruce@momjian.us> writes:
    On Thu, Jul 4, 2013 at 04:29:20PM -0400, Tom Lane wrote:
    No, it isn't, or at least it's far from the only place. If we're going
    to change this, we would also want to change the behavior of tests on
    RECORD values, which is something that would have to happen at runtime.
    I checked RECORD and that behaves with recursion:
    Apparently you don't even understand the problem. All of these examples
    you're showing are constants. Try something like

    declare r record;
    ...
    select ... into r ...
    if (r is null) ...
    OK, I created the following test on git head (without my patch), and the
    results look correct:

      DO LANGUAGE plpgsql $$
      DECLARE
              r RECORD;
      BEGIN

      DROP TABLE IF EXISTS test;
      CREATE TABLE test (x INT, y INT);

      INSERT INTO test VALUES (1, NULL), (NULL, 1), (NULL, NULL);
      FOR r IN SELECT * FROM test
      LOOP
              IF (r IS NULL)
              THEN RAISE NOTICE 'true';
              ELSE RAISE NOTICE 'false';
              END IF;
      END LOOP;
      END;
      $$;

      NOTICE: false
      NOTICE: false
      NOTICE: true

    Am I missing something?

    Is this an example of NOT NULL contraints not testing NULLs?

      CREATE TABLE test3(x INT, y INT);
      CREATE TABLE test5(z test3 NOT NULL);

      INSERT INTO test5 VALUES (ROW(NULL, NULL));

      SELECT * FROM test5;
        z
      -----
       (,)

    Looks like I have to modify ExecEvalNullTest(). If I fix this, is it
    going to cause problems with pg_upgraded databases now having values
    that are no longer validated by the NOT NULL constraint?

    --
       Bruce Momjian <bruce@momjian.us> http://momjian.us
       EnterpriseDB http://enterprisedb.com

       + It's impossible for everything to be true. +
  • Bruce Momjian at Jul 7, 2013 at 5:32 pm

    On Sun, Jul 7, 2013 at 01:04:05PM -0400, Bruce Momjian wrote:
    Looks like I have to modify ExecEvalNullTest().
    Oops, I mean ExecConstraints(). This could be tricky.

    --
       Bruce Momjian <bruce@momjian.us> http://momjian.us
       EnterpriseDB http://enterprisedb.com

       + It's impossible for everything to be true. +
  • Robert Haas at Nov 26, 2012 at 9:04 pm

    On Mon, Nov 26, 2012 at 3:05 PM, Tom Lane wrote:
    The analogy to other aggregates is probably a better thing to argue
    from. On the other hand, I don't know anyone outside the SQL standards
    committee who thinks it's actually a good idea that SUM() across no rows
    returns null rather than zero.
    Me neither.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • David E. Wheeler at Dec 5, 2012 at 5:01 pm

    On Nov 26, 2012, at 11:12 AM, Peter Eisentraut wrote:

    Although my intuition would be [], the existing concatenation-like
    aggregates return null for no input rows, so this probably ought to be
    consistent with those.
    This annoys me at times, but I wrap such calls in COALESCE() and forget about it. So I agree to keep it consistent with other array-returning aggregate functions.

    Best,

    David

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedNov 21, '12 at 8:16p
activeAug 2, '13 at 4:03a
posts30
users11
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase