The aggregate docs say:

The first form of aggregate expression invokes the aggregate across all input rows for which the given expression(s) yield non-null values. (Actually, it is up to the aggregate function whether to ignore null values or not — but all the standard ones do.)
-- http://developer.postgresql.org/pgdocs/postgres/sql-expressions.html#SYNTAX-AGGREGATES

That, however, is not true of array_agg():

try=# CREATE TABLE foo(id int);
CREATE TABLE
try=# INSERT INTO foo values(1), (2), (NULL), (3);
INSERT 0 4
try=# select array_agg(id) from foo;
array_agg
──────────────
{1,2,NULL,3}
(1 row)

So are the docs right, or is array_agg() right?

Best,

David

Search Discussions

  • Thom Brown at Sep 1, 2010 at 6:58 am

    On 1 September 2010 06:45, David E. Wheeler wrote:
    The aggregate docs say:
    The first form of aggregate expression invokes the aggregate across all input rows for which the given expression(s) yield non-null values. (Actually, it is up to the aggregate function whether to ignore null values or not — but all the standard ones do.)
    -- http://developer.postgresql.org/pgdocs/postgres/sql-expressions.html#SYNTAX-AGGREGATES

    That, however, is not true of array_agg():

    try=# CREATE TABLE foo(id int);
    CREATE TABLE
    try=# INSERT INTO foo values(1), (2), (NULL), (3);
    INSERT 0 4
    try=# select array_agg(id) from foo;
    array_agg
    ──────────────
    {1,2,NULL,3}
    (1 row)

    So are the docs right, or is array_agg() right?
    I think it might be both. array_agg doesn't return NULL, it returns
    an array which contains NULL.

    --
    Thom Brown
    Twitter: @darkixion
    IRC (freenode): dark_ixion
    Registered Linux user: #516935
  • David E. Wheeler at Sep 1, 2010 at 7:03 am

    On Aug 31, 2010, at 11:56 PM, Thom Brown wrote:

    The first form of aggregate expression invokes the aggregate across all input rows for which the given expression(s) yield non-null values. (Actually, it is up to the aggregate function whether to ignore null values or not — but all the standard ones do.)
    -- http://developer.postgresql.org/pgdocs/postgres/sql-expressions.html#SYNTAX-AGGREGATES

    That, however, is not true of array_agg():

    try=# CREATE TABLE foo(id int);
    CREATE TABLE
    try=# INSERT INTO foo values(1), (2), (NULL), (3);
    INSERT 0 4
    try=# select array_agg(id) from foo;
    array_agg
    ──────────────
    {1,2,NULL,3}
    (1 row)

    So are the docs right, or is array_agg() right?
    I think it might be both. array_agg doesn't return NULL, it returns
    an array which contains NULL.
    No, string_agg() doesn't work this way, for example:

    select string_agg(id::text, ',') from foo;
    string_agg
    ────────────
    1,2,3
    (1 row)

    Note that it's not:

    select string_agg(id::text, ',') from foo;
    string_agg
    ────────────
    1,2,,3
    (1 row)

    Best,

    David
  • Thom Brown at Sep 1, 2010 at 8:08 am

    On 1 September 2010 07:56, Thom Brown wrote:
    On 1 September 2010 06:45, David E. Wheeler wrote:
    The aggregate docs say:
    The first form of aggregate expression invokes the aggregate across all input rows for which the given expression(s) yield non-null values. (Actually, it is up to the aggregate function whether to ignore null values or not — but all the standard ones do.)
    -- http://developer.postgresql.org/pgdocs/postgres/sql-expressions.html#SYNTAX-AGGREGATES

    That, however, is not true of array_agg():

    try=# CREATE TABLE foo(id int);
    CREATE TABLE
    try=# INSERT INTO foo values(1), (2), (NULL), (3);
    INSERT 0 4
    try=# select array_agg(id) from foo;
    array_agg
    ──────────────
    {1,2,NULL,3}
    (1 row)

    So are the docs right, or is array_agg() right?
    I think it might be both.  array_agg doesn't return NULL, it returns
    an array which contains NULL.
    The second I wrote that, I realised it was b*ll%$ks, as I was still in
    the process of waking up.

    --
    Thom Brown
    Twitter: @darkixion
    IRC (freenode): dark_ixion
    Registered Linux user: #516935
  • David E. Wheeler at Sep 1, 2010 at 4:02 pm

    On Sep 1, 2010, at 1:06 AM, Thom Brown wrote:

    I think it might be both. array_agg doesn't return NULL, it returns
    an array which contains NULL.
    The second I wrote that, I realised it was b*ll%$ks, as I was still in
    the process of waking up.
    I know that feeling.

    /me sips his coffee

    Best,

    David
  • Pavel Stehule at Sep 1, 2010 at 7:30 am

    2010/9/1 David E. Wheeler <[email protected]>:
    The aggregate docs say:
    The first form of aggregate expression invokes the aggregate across all input rows for which the given expression(s) yield non-null values. (Actually, it is up to the aggregate function whether to ignore null values or not — but all the standard ones do.)
    -- http://developer.postgresql.org/pgdocs/postgres/sql-expressions.html#SYNTAX-AGGREGATES

    That, however, is not true of array_agg():

    try=# CREATE TABLE foo(id int);
    CREATE TABLE
    try=# INSERT INTO foo values(1), (2), (NULL), (3);
    INSERT 0 4
    try=# select array_agg(id) from foo;
    array_agg
    ──────────────
    {1,2,NULL,3}
    (1 row)

    So are the docs right, or is array_agg() right?
    Docs is wrong :) I like current implementation. You can remove a NULLs
    from aggregation very simply, but different direction isn't possible

    Regards
    Pavel Stehule
    Best,

    David


    --
    Sent via pgsql-hackers mailing list ([email protected])
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-hackers
  • David E. Wheeler at Sep 1, 2010 at 3:16 pm

    On Sep 1, 2010, at 12:30 AM, Pavel Stehule wrote:

    Docs is wrong :) I like current implementation. You can remove a NULLs
    from aggregation very simply, but different direction isn't possible
    Would appreciate the recipe for removing the NULLs.

    Best,

    David
  • David Fetter at Sep 1, 2010 at 5:47 pm

    On Wed, Sep 01, 2010 at 08:16:41AM -0700, David Wheeler wrote:
    On Sep 1, 2010, at 12:30 AM, Pavel Stehule wrote:

    Docs is wrong :) I like current implementation. You can remove a
    NULLs from aggregation very simply, but different direction isn't
    possible
    Would appreciate the recipe for removing the NULLs.
    WHERE clause :P

    Cheers,
    David.
    --
    David Fetter <[email protected]> http://fetter.org/
    Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
    Skype: davidfetter XMPP: [email protected]
    iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

    Remember to vote!
    Consider donating to Postgres: http://www.postgresql.org/about/donate
  • Thom Brown at Sep 1, 2010 at 5:53 pm

    On 1 September 2010 18:47, David Fetter wrote:
    On Wed, Sep 01, 2010 at 08:16:41AM -0700, David Wheeler wrote:
    On Sep 1, 2010, at 12:30 AM, Pavel Stehule wrote:

    Docs is wrong :) I like current implementation.  You can remove a
    NULLs from aggregation very simply, but different direction isn't
    possible
    Would appreciate the recipe for removing the NULLs.
    WHERE clause :P
    There may be cases where that's undesirable, such as there being more
    than one aggregate in the SELECT list, or the column being grouped on
    needing to return rows regardless as to whether there's NULLs in the
    column being targeted by array_agg() or not.
    --
    Thom Brown
    Twitter: @darkixion
    IRC (freenode): dark_ixion
    Registered Linux user: #516935
  • David E. Wheeler at Sep 1, 2010 at 5:59 pm

    On Sep 1, 2010, at 10:52 AM, Thom Brown wrote:

    ould appreciate the recipe for removing the NULLs.
    WHERE clause :P
    There may be cases where that's undesirable, such as there being more
    than one aggregate in the SELECT list, or the column being grouped on
    needing to return rows regardless as to whether there's NULLs in the
    column being targeted by array_agg() or not.
    Exactly the issue I ran into:

    SELECT name AS distribution,
    array_agg(
    CASE relstatus WHEN 'stable'
    THEN version
    ELSE NULL
    END ORDER BY version) AS stable,
    array_agg(
    CASE relstatus
    WHEN 'testing'
    THEN version
    ELSE NULL
    END ORDER BY version) AS testing
    FROM distributions
    GROUP BY name;

    distribution │ stable │ testing
    ──────────────┼───────────────────┼────────────────────
    pair │ {NULL,1.0.0,NULL} │ {0.0.1,NULL,1.2.0}
    pgtap │ {NULL} │ {0.0.1}
    (2 rows)

    Annoying.

    Best,

    David
  • Dimitri Fontaine at Sep 2, 2010 at 9:47 am

    "David E. Wheeler" <[email protected]> writes:
    On Sep 1, 2010, at 10:52 AM, Thom Brown wrote:

    ould appreciate the recipe for removing the NULLs.
    WHERE clause :P
    There may be cases where that's undesirable, such as there being more
    than one aggregate in the SELECT list, or the column being grouped on
    needing to return rows regardless as to whether there's NULLs in the
    column being targeted by array_agg() or not.
    Exactly the issue I ran into:

    SELECT name AS distribution,
    array_agg(
    CASE relstatus WHEN 'stable'
    THEN version
    ELSE NULL
    END ORDER BY version) AS stable,
    array_agg(
    CASE relstatus
    WHEN 'testing'
    THEN version
    ELSE NULL
    END ORDER BY version) AS testing
    FROM distributions
    GROUP BY name;
    What about adding WHERE support to aggregates, adding to the ORDER BY
    capability they already have?

    SELECT array_agg(version WHERE relstatus = 'stable' ORDER BY version)

    The current way to do that is using a subquery and unnest() and where
    clause there, but that's not a good way to avoid to process stored data
    in the aggregate / in the query.

    Regards,
    --
    dim
  • Pavel Stehule at Sep 1, 2010 at 6:10 pm

    2010/9/1 Thom Brown <[email protected]>:
    On 1 September 2010 18:47, David Fetter wrote:
    On Wed, Sep 01, 2010 at 08:16:41AM -0700, David Wheeler wrote:
    On Sep 1, 2010, at 12:30 AM, Pavel Stehule wrote:

    Docs is wrong :) I like current implementation.  You can remove a
    NULLs from aggregation very simply, but different direction isn't
    possible
    Would appreciate the recipe for removing the NULLs.
    WHERE clause :P
    There may be cases where that's undesirable, such as there being more
    than one aggregate in the SELECT list, or the column being grouped on
    needing to return rows regardless as to whether there's NULLs in the
    column being targeted by array_agg() or not.
    Then you can eliminate NULLs with simple function

    CREATE OR REPLACE FUNCTION remove_null(anyarray)
    RETURNS anyarray AS $$
    SELECT ARRAY(SELECT x FROM unnest($1) g(x) WHERE x IS NOT NULL)
    $$ LANGUAGE sql;
    --
    Thom Brown
    Twitter: @darkixion
    IRC (freenode): dark_ixion
    Registered Linux user: #516935

    --
    Sent via pgsql-hackers mailing list ([email protected])
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-hackers
  • David E. Wheeler at Sep 1, 2010 at 6:12 pm

    On Sep 1, 2010, at 11:09 AM, Pavel Stehule wrote:

    Then you can eliminate NULLs with simple function

    CREATE OR REPLACE FUNCTION remove_null(anyarray)
    RETURNS anyarray AS $$
    SELECT ARRAY(SELECT x FROM unnest($1) g(x) WHERE x IS NOT NULL)
    $$ LANGUAGE sql;
    Kind of defeats the purpose of the efficiency of the aggregate.

    Best,

    David
  • Tom Lane at Sep 1, 2010 at 6:26 pm

    "David E. Wheeler" <[email protected]> writes:
    On Sep 1, 2010, at 11:09 AM, Pavel Stehule wrote:
    Then you can eliminate NULLs with simple function
    Kind of defeats the purpose of the efficiency of the aggregate.
    Well, you can build your own version of array_agg with the same
    implementation, except you mark the transition function as strict ...

    regards, tom lane
  • Pavel Stehule at Sep 1, 2010 at 6:28 pm

    2010/9/1 Tom Lane <[email protected]>:
    "David E. Wheeler" <[email protected]> writes:
    On Sep 1, 2010, at 11:09 AM, Pavel Stehule wrote:
    Then you can eliminate NULLs with simple function
    Kind of defeats the purpose of the efficiency of the aggregate.
    Well, you can build your own version of array_agg with the same
    implementation, except you mark the transition function as strict ...
    I am checking this now, and it is not possible - it needs a some
    initial value and there isn't possible to set a "internal" value.
    probably some C coding is necessary.

    Regards

    Pavel
    regards, tom lane
  • Tom Lane at Sep 1, 2010 at 6:46 pm

    Pavel Stehule writes:
    2010/9/1 Tom Lane <[email protected]>:
    Well, you can build your own version of array_agg with the same
    implementation, except you mark the transition function as strict ...
    I am checking this now, and it is not possible - it needs a some
    initial value and there isn't possible to set a "internal" value.
    Well, you can cheat a bit ...

    regression=# create or replace function array_agg_transfn_strict(internal, anyelement) returns internal as 'array_agg_transfn' language internal immutable;
    CREATE FUNCTION
    regression=# create aggregate array_agg_strict(anyelement) (stype = internal,
    sfunc = array_agg_transfn_strict, finalfunc = array_agg_finalfn);
    CREATE AGGREGATE
    regression=# create or replace function array_agg_transfn_strict(internal, anyelement) returns internal as 'array_agg_transfn' language internal strict immutable;
    CREATE FUNCTION

    regards, tom lane
  • Pavel Stehule at Sep 1, 2010 at 6:53 pm

    2010/9/1 Tom Lane <[email protected]>:
    Pavel Stehule <[email protected]> writes:
    2010/9/1 Tom Lane <[email protected]>:
    Well, you can build your own version of array_agg with the same
    implementation, except you mark the transition function as strict ...
    I am checking this now, and it is not possible - it needs a some
    initial value and there isn't possible to set a "internal" value.
    Well, you can cheat a bit ...

    regression=# create or replace function array_agg_transfn_strict(internal, anyelement) returns internal as 'array_agg_transfn' language internal immutable;
    CREATE FUNCTION
    regression=# create aggregate array_agg_strict(anyelement) (stype = internal,
    sfunc = array_agg_transfn_strict, finalfunc = array_agg_finalfn);
    CREATE AGGREGATE
    regression=# create or replace function array_agg_transfn_strict(internal, anyelement) returns internal as 'array_agg_transfn' language internal strict immutable;
    CREATE FUNCTION
    nice dark trick :) - but it doesn't work

    ERROR: aggregate 16395 needs to have compatible input type and transition type
    postgres=#

    Pavel


    regards, tom lane
  • David E. Wheeler at Oct 2, 2010 at 2:58 am

    On Sep 1, 2010, at 11:52 AM, Pavel Stehule wrote:

    regression=# create or replace function array_agg_transfn_strict(internal, anyelement) returns internal as 'array_agg_transfn' language internal immutable;
    CREATE FUNCTION
    regression=# create aggregate array_agg_strict(anyelement) (stype = internal,
    sfunc = array_agg_transfn_strict, finalfunc = array_agg_finalfn);
    CREATE AGGREGATE
    regression=# create or replace function array_agg_transfn_strict(internal, anyelement) returns internal as 'array_agg_transfn' language internal strict immutable;
    CREATE FUNCTION
    nice dark trick :) - but it doesn't work

    ERROR: aggregate 16395 needs to have compatible input type and transition type
    postgres=#
    I could use this trick now. Anyone got any bright ideas how to fix it?

    Thanks,

    David
  • David E. Wheeler at Sep 1, 2010 at 4:08 pm

    On Sep 1, 2010, at 12:30 AM, Pavel Stehule wrote:

    So are the docs right, or is array_agg() right?
    Docs is wrong :) I like current implementation. You can remove a NULLs
    from aggregation very simply, but different direction isn't possible
    Patch:

    diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
    index 9f91939..e301019 100644
    *** a/doc/src/sgml/syntax.sgml
    --- b/doc/src/sgml/syntax.sgml
    *************** sqrt(2)
    *** 1543,1549 ****
    The first form of aggregate expression invokes the aggregate
    across all input rows for which the given expression(s) yield
    non-null values. (Actually, it is up to the aggregate function
    ! whether to ignore null values or not &mdash; but all the standard ones do.)
    The second form is the same as the first, since
    <literal>ALL</literal> is the default. The third form invokes the
    aggregate for all distinct values of the expressions found
    --- 1543,1550 ----
    The first form of aggregate expression invokes the aggregate
    across all input rows for which the given expression(s) yield
    non-null values. (Actually, it is up to the aggregate function
    ! whether to ignore null values or not &mdash; but all the standard
    ! ones except <function>array_agg</> do.)
    The second form is the same as the first, since
    <literal>ALL</literal> is the default. The third form invokes the
    aggregate for all distinct values of the expressions found

    Best,

    David
  • Tom Lane at Sep 1, 2010 at 5:12 pm

    "David E. Wheeler" <[email protected]> writes:
    *** 1543,1549 ****
    The first form of aggregate expression invokes the aggregate
    across all input rows for which the given expression(s) yield
    non-null values. (Actually, it is up to the aggregate function
    ! whether to ignore null values or not &mdash; but all the standard ones do.)
    The second form is the same as the first, since
    <literal>ALL</literal> is the default. The third form invokes the
    aggregate for all distinct values of the expressions found
    --- 1543,1550 ----
    The first form of aggregate expression invokes the aggregate
    across all input rows for which the given expression(s) yield
    non-null values. (Actually, it is up to the aggregate function
    ! whether to ignore null values or not &mdash; but all the standard
    ! ones except <function>array_agg</> do.)
    The second form is the same as the first, since
    <literal>ALL</literal> is the default. The third form invokes the
    aggregate for all distinct values of the expressions found
    I think when that text was written, it was meant to imply "all the
    aggregates defined in SQL92". There seems to be a lot of confusion
    in this thread about whether "standard" means "defined by SQL spec"
    or "built-in in Postgres". Should we try to refine the wording to
    clarify that?

    Even more to the point, should we deliberately make this vaguer so that
    we aren't finding ourselves with obsolete text again and again? You can
    bet that people adding new aggregates in the future aren't going to
    think to update this sentence, any more than happened with array_agg.

    regards, tom lane
  • David E. Wheeler at Sep 1, 2010 at 5:16 pm

    On Sep 1, 2010, at 10:12 AM, Tom Lane wrote:

    I think when that text was written, it was meant to imply "all the
    aggregates defined in SQL92". There seems to be a lot of confusion
    in this thread about whether "standard" means "defined by SQL spec"
    or "built-in in Postgres". Should we try to refine the wording to
    clarify that?
    Yes please.
    Even more to the point, should we deliberately make this vaguer so that
    we aren't finding ourselves with obsolete text again and again? You can
    bet that people adding new aggregates in the future aren't going to
    think to update this sentence, any more than happened with array_agg.
    Perhaps “consult the docs for each aggregate to determine how it handles NULLs.”

    Best,

    David
  • Tom Lane at Sep 1, 2010 at 5:30 pm

    "David E. Wheeler" <[email protected]> writes:
    On Sep 1, 2010, at 10:12 AM, Tom Lane wrote:
    Even more to the point, should we deliberately make this vaguer so that
    we aren't finding ourselves with obsolete text again and again? You can
    bet that people adding new aggregates in the future aren't going to
    think to update this sentence, any more than happened with array_agg.
    Perhaps “consult the docs for each aggregate to determine how it handles NULLs.”
    Hm, actually the whole para needs work. It was designed at a time when
    DISTINCT automatically discarded nulls, which isn't true anymore, and
    that fact was patched-in in a very awkward way too. Perhaps something
    like

    The first form of aggregate expression invokes the aggregate
    once for each input row.
    The second form is the same as the first, since
    <literal>ALL</literal> is the default.
    The third form invokes the aggregate once for each distinct value,
    or set of values, of the expression(s) found in the input rows.
    The last form invokes the aggregate once for each input row; since no
    particular input value is specified, it is generally only useful
    for the <function>count(*)</function> aggregate function.

    Most aggregate functions ignore null inputs, so that rows in which
    one or more of the expression(s) yield null are discarded. (This
    can be assumed to be true, unless otherwise specified, for all
    built-in aggregates.)

    Then we have to make sure array_agg is properly documented, but we
    don't have to insert something into the description of every single
    aggregate, which is what your proposal would require.

    regards, tom lane
  • David E. Wheeler at Sep 1, 2010 at 6:07 pm

    On Sep 1, 2010, at 10:30 AM, Tom Lane wrote:

    Hm, actually the whole para needs work. It was designed at a time when
    DISTINCT automatically discarded nulls, which isn't true anymore, and
    that fact was patched-in in a very awkward way too. Perhaps something
    like

    The first form of aggregate expression invokes the aggregate
    once for each input row.
    The second form is the same as the first, since
    <literal>ALL</literal> is the default.
    The third form invokes the aggregate once for each distinct value,
    or set of values, of the expression(s) found in the input rows.
    The last form invokes the aggregate once for each input row; since no
    particular input value is specified, it is generally only useful
    for the <function>count(*)</function> aggregate function.

    Most aggregate functions ignore null inputs, so that rows in which
    one or more of the expression(s) yield null are discarded. (This
    can be assumed to be true, unless otherwise specified, for all
    built-in aggregates.)
    I don't think you need the parentheses, though without them, "This" might be better written as "The ignoring of NULLs".

    Just my $0.02.

    Best,

    David
  • Tom Lane at Sep 1, 2010 at 6:23 pm

    "David E. Wheeler" <[email protected]> writes:
    On Sep 1, 2010, at 10:30 AM, Tom Lane wrote:
    Most aggregate functions ignore null inputs, so that rows in which
    one or more of the expression(s) yield null are discarded. (This
    can be assumed to be true, unless otherwise specified, for all
    built-in aggregates.)
    I don't think you need the parentheses, though without them, "This" might be better written as "The ignoring of NULLs".
    Done, without the parentheses. I didn't add "The ignoring of NULLs",
    it seemed a bit too verbose.

    regards, tom lane

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedSep 1, '10 at 5:45a
activeOct 2, '10 at 2:58a
posts24
users6
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2023 Grokbase