Hello

this patch implement a new iteration construct - iteration over an
array. The sense of this new iteration is:
* a simple and cleaner syntax
* a faster execution - this bring down a number of detoast operations

create or replace function subscripts(anyarray, int)
returns int[] as $$
select array(select generate_subscripts($1,$2));
$$ language sql;

create or replace function fora_test()
returns int as $$
declare x int; s int = 0;
a int[] := array[1,2,3,4,5,6,7,8,9,10];
begin
for x in array subscripts(a, 1)
loop
s := s + a[x];
end loop;
return s;
end;
$$ language plpgsql;

create or replace function fora_test()
returns int as $$
declare x int; s int = 0;
begin
for x in array array[1,2,3,4,5,6,7,8,9,10]
loop
s := s + x;
end loop;
return s;
end;
$$ language plpgsql;

create or replace function fora_test()
returns int as $$
declare x int; y int;
a fora_point[] := array[(1,2),(3,4),(5,6)];
begin
for x, y in array a
loop
raise notice 'point=%,%', x, y;
end loop;
return 0;
end;
$$ language plpgsql;

Regards

Pavel Stehule

Search Discussions

  • Jaime Casanova at Nov 18, 2010 at 12:09 am

    On Thu, Sep 30, 2010 at 7:46 AM, Pavel Stehule wrote:
    Hello

    this patch implement a new iteration construct - iteration over an
    array. The sense of this new iteration is:
    * a simple and cleaner syntax
    i will start the review of this one... but before that sorry for
    suggesting this a bit later but about using UNNEST as part of the
    sintax?

    FOR var IN UNNEST array_expr LOOP
    END LOOP

    i like this because:
    1) is cleaner when array_expr is ARRAY[1,2,3]
    2) is not legal now to use the unnest() function without a SELECT in
    the context of a FOR loop (or am i missing something?)
    3) the unnest() function does the same so seems intuitive what a
    FOR-IN-UNNEST do

    what i don't know if is this syntax could co-exist with the unnest() function?

    --
    Jaime Casanova         www.2ndQuadrant.com
    Professional PostgreSQL: Soporte y capacitación de PostgreSQL
  • Merlin Moncure at Nov 18, 2010 at 5:28 am

    On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova wrote:
    On Thu, Sep 30, 2010 at 7:46 AM, Pavel Stehule wrote:
    Hello

    this patch implement a new iteration construct - iteration over an
    array. The sense of this new iteration is:
    * a simple and cleaner syntax
    i will start the review of this one... but before that sorry for
    suggesting this a bit later but about using UNNEST as part of the
    sintax?
    Does for-in-array do what unnset does? unnest() flattens the whole
    array into scalars irregardless of dimensions:
    select unnest(array[array[1,2],array[3,4]]);
    unnest
    --------
    1
    2
    3
    4


    If yes, then +1 (unless there is some other problem) otherwise -1.

    merlin
  • Tom Lane at Nov 18, 2010 at 5:47 am

    Merlin Moncure writes:
    On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova wrote:
    i will start the review of this one... but before that sorry for
    suggesting this a bit later but about using UNNEST as part of the
    sintax?
    Does for-in-array do what unnset does?
    Yes, which begs the question of why bother at all. AFAICS this patch
    simply allows you to replace

    for x in select unnest(array_value) loop

    with

    for x in unnest array_value loop

    (plus or minus a parenthesis or so). I do not think we need to add a
    bunch of code and create even more syntactic ambiguity (FOR loops are
    already on the hairy edge of unparsability) to save people from writing
    "select".

    regards, tom lane
  • Pavel Stehule at Nov 18, 2010 at 6:05 am

    2010/11/18 Tom Lane <tgl@sss.pgh.pa.us>:
    Merlin Moncure <mmoncure@gmail.com> writes:
    On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova wrote:
    i will start the review of this one... but before that sorry for
    suggesting this a bit later but about using UNNEST as part of the
    sintax?
    Does for-in-array do what unnset does?
    Yes, which begs the question of why bother at all.  AFAICS this patch
    simply allows you to replace

    for x in select unnest(array_value) loop

    with

    for x in unnest array_value loop

    (plus or minus a parenthesis or so).  I do not think we need to add a
    bunch of code and create even more syntactic ambiguity (FOR loops are
    already on the hairy edge of unparsability) to save people from writing
    "select".
    this patch is semantically equal to SELECT unnest(..), but it is
    evaluated as simple expression and does directly array unpacking and
    iteration, - so it means this fragment is significantly >>faster<<.

    Regards

    Pavel Stehule
    regards, tom lane
  • Cédric Villemain at Nov 18, 2010 at 10:00 am

    2010/11/18 Pavel Stehule <pavel.stehule@gmail.com>:
    2010/11/18 Tom Lane <tgl@sss.pgh.pa.us>:
    Merlin Moncure <mmoncure@gmail.com> writes:
    On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova wrote:
    i will start the review of this one... but before that sorry for
    suggesting this a bit later but about using UNNEST as part of the
    sintax?
    Does for-in-array do what unnset does?
    Yes, which begs the question of why bother at all.  AFAICS this patch
    simply allows you to replace

    for x in select unnest(array_value) loop

    with

    for x in unnest array_value loop

    (plus or minus a parenthesis or so).  I do not think we need to add a
    bunch of code and create even more syntactic ambiguity (FOR loops are
    already on the hairy edge of unparsability) to save people from writing
    "select".
    this patch is semantically equal to SELECT unnest(..), but it is
    evaluated as simple expression and does directly array unpacking and
    iteration, - so it means this fragment is significantly >>faster<<.
    Did you implement a method to be able to walk the array and detoast
    only the current needed data ?

    (I wonder because I have something like that in that garage : select
    array_filter(foo,'like','%bar%',10); where 10 is the limit and can be
    avoided, foo is the array, like is callback function, '%bar%' the
    parameter for the callback function for filtering results.)

    It will make my toy in the garage a fast race car (and probably doable
    in (plpg)SQL instead of C) ...

    --
    Cédric Villemain               2ndQuadrant
    http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support
  • Pavel Stehule at Nov 18, 2010 at 1:44 pm

    2010/11/18 Cédric Villemain <cedric.villemain.debian@gmail.com>:
    2010/11/18 Pavel Stehule <pavel.stehule@gmail.com>:
    2010/11/18 Tom Lane <tgl@sss.pgh.pa.us>:
    Merlin Moncure <mmoncure@gmail.com> writes:
    On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova wrote:
    i will start the review of this one... but before that sorry for
    suggesting this a bit later but about using UNNEST as part of the
    sintax?
    Does for-in-array do what unnset does?
    Yes, which begs the question of why bother at all.  AFAICS this patch
    simply allows you to replace

    for x in select unnest(array_value) loop

    with

    for x in unnest array_value loop

    (plus or minus a parenthesis or so).  I do not think we need to add a
    bunch of code and create even more syntactic ambiguity (FOR loops are
    already on the hairy edge of unparsability) to save people from writing
    "select".
    this patch is semantically equal to SELECT unnest(..), but it is
    evaluated as simple expression and does directly array unpacking and
    iteration, - so it means this fragment is significantly >>faster<<.
    Did you implement a method to be able to walk the array and detoast
    only the current needed data ?
    not only - iteration over array can help with readability but a
    general work with SRF (set returning functions is more harder and
    slower) - so special loop statement can to safe a some toast op / when
    you use a large array and access via index, or can to safe a some work
    with memory, because there isn't necessary convert array to set of
    tuples. Please, recheck these tests.

    test:

    CREATE OR REPLACE FUNCTION rndstr() RETURNS text AS $$select
    array_to_string(array(select substring('ABCDEFGHIJKLMNOPQ' FROM
    (random()*16)::int FOR 1) from generate_series(1,10)),'')$$ LANGUAGE
    sql;

    create or replace function rndarray(int) returns text[] as $$select
    array(select rndstr() from generate_series(1,$1)) $$ language sql;

    create table t10(x text[]);
    insert into t10 select rndarray(10) from generate_series(1,10000);
    create table t100(x text[]);
    insert into t100 select rndarray(100) from generate_series(1,10000);
    create table t1000(x text[]);
    insert into t1000 select rndarray(1000) from generate_series(1,10000);

    CREATE OR REPLACE FUNCTION public.filter(text[], text, integer)
    RETURNS text[]
    LANGUAGE plpgsql
    AS $function$
    DECLARE
    s text[] := '{}';
    l int := 0;
    v text;
    BEGIN
    FOR v IN ARRAY $1
    LOOP
    EXIT WHEN l = $3;
    IF v LIKE $2 THEN
    s := s || v;
    l := l + 1;
    END IF;
    END LOOP;
    RETURN s;
    END;$function$;

    postgres=# select avg(array_upper(filter(x,'%AA%', 10),1)) from t10;
    avg
    --------------------
    1.1596079803990200
    (1 row)

    Time: 393.649 ms

    postgres=# select avg(array_upper(filter(x,'%AA%', 10),1)) from t100;
    avg
    --------------------
    3.4976777789245536
    (1 row)

    Time: 2804.502 ms

    postgres=# select avg(array_upper(filter(x,'%AA%', 10),1)) from t1000;
    avg
    ---------------------
    10.0000000000000000
    (1 row)

    Time: 9729.994 ms

    CREATE OR REPLACE FUNCTION public.filter01(text[], text, integer)
    RETURNS text[]
    LANGUAGE plpgsql
    AS $function$
    DECLARE
    s text[] := '{}';
    l int := 0;
    v text;
    BEGIN
    FOR v IN SELECT UNNEST($1)
    LOOP
    EXIT WHEN l = $3;
    IF v LIKE $2 THEN
    s := s || v;
    l := l + 1;
    END IF;
    END LOOP;
    RETURN s;
    END;$function$;

    postgres=# select avg(array_upper(filter01(x,'%AA%', 10),1)) from t10;
    avg
    --------------------
    1.1596079803990200
    (1 row)

    Time: 795.383 ms

    postgres=# select avg(array_upper(filter01(x,'%AA%', 10),1)) from t100;
    avg
    --------------------
    3.4976777789245536
    (1 row)

    Time: 3848.258 ms

    postgres=# select avg(array_upper(filter01(x,'%AA%', 10),1)) from t1000;
    avg
    ---------------------
    10.0000000000000000
    (1 row)

    Time: 12366.093 ms

    The iteration via specialized FOR IN ARRAY is about 25-30% faster than
    FOR IN SELECT UNNEST

    postgres=# CREATE OR REPLACE FUNCTION public.filter02(text[], text, integer)
    RETURNS text[]
    LANGUAGE plpgsql
    AS $function$
    DECLARE
    s text[] := '{}';
    l int := 0; i int;
    v text;
    BEGIN
    FOR i IN array_lower($1,1)..array_upper($1,1)
    LOOP
    EXIT WHEN l = $3;
    IF $1[i] LIKE $2 THEN
    s := s || $1[i];
    l := l + 1;
    END IF;
    END LOOP;
    RETURN s;
    END;$function$
    ;

    postgres=# select avg(array_upper(filter02(x,'%AA%', 10),1)) from t10;
    avg
    --------------------
    1.1596079803990200
    (1 row)

    Time: 414.960 ms

    postgres=# select avg(array_upper(filter02(x,'%AA%', 10),1)) from t100;
    avg
    --------------------
    3.4976777789245536
    (1 row)

    Time: 3460.970 ms

    there FOR IN ARRAY is faster about 30% then access per index

    for T1000 I had to cancel over 1 minute!!!!

    (I wonder because I have something like that in that garage : select
    array_filter(foo,'like','%bar%',10); where 10 is the limit and can be
    avoided, foo is the array, like is callback function, '%bar%' the
    parameter for the callback function for filtering results.)

    It will make my toy in the garage a fast race car (and probably doable
    in (plpg)SQL instead of C) ...
    it can help with reading of array. But it doesn't help with array
    updating :(. For large arrays it can be slow too.

    Regards

    Pavel Stehule
    --
    Cédric Villemain               2ndQuadrant
    http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support
  • Cédric Villemain at Nov 18, 2010 at 3:37 pm

    2010/11/18 Pavel Stehule <pavel.stehule@gmail.com>:
    2010/11/18 Cédric Villemain <cedric.villemain.debian@gmail.com>:
    2010/11/18 Pavel Stehule <pavel.stehule@gmail.com>:
    2010/11/18 Tom Lane <tgl@sss.pgh.pa.us>:
    Merlin Moncure <mmoncure@gmail.com> writes:
    On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova wrote:
    i will start the review of this one... but before that sorry for
    suggesting this a bit later but about using UNNEST as part of the
    sintax?
    Does for-in-array do what unnset does?
    Yes, which begs the question of why bother at all.  AFAICS this patch
    simply allows you to replace

    for x in select unnest(array_value) loop

    with

    for x in unnest array_value loop

    (plus or minus a parenthesis or so).  I do not think we need to add a
    bunch of code and create even more syntactic ambiguity (FOR loops are
    already on the hairy edge of unparsability) to save people from writing
    "select".
    this patch is semantically equal to SELECT unnest(..), but it is
    evaluated as simple expression and does directly array unpacking and
    iteration, - so it means this fragment is significantly >>faster<<.
    Did you implement a method to be able to walk the array and detoast
    only the current needed data ?
    not only - iteration over array can help with readability but a
    general work with SRF (set returning functions is more harder and
    slower) - so special loop statement can to safe a some toast op / when
    you use a large array and access via index, or can to safe a some work
    with memory, because there isn't necessary convert array to set of
    tuples. Please, recheck these tests.

    test:

    CREATE OR REPLACE FUNCTION rndstr() RETURNS text AS $$select
    array_to_string(array(select substring('ABCDEFGHIJKLMNOPQ' FROM
    (random()*16)::int FOR 1) from generate_series(1,10)),'')$$ LANGUAGE
    sql;

    create or replace function rndarray(int) returns text[] as $$select
    array(select rndstr() from generate_series(1,$1)) $$ language sql;

    create table t10(x text[]);
    insert into t10 select rndarray(10) from generate_series(1,10000);
    create table t100(x text[]);
    insert into t100 select rndarray(100) from generate_series(1,10000);
    create table t1000(x text[]);
    insert into t1000 select rndarray(1000) from generate_series(1,10000);

    CREATE OR REPLACE FUNCTION public.filter(text[], text, integer)
    RETURNS text[]
    LANGUAGE plpgsql
    AS $function$
    DECLARE
    s text[] := '{}';
    l int := 0;
    v text;
    BEGIN
    FOR v IN ARRAY $1
    LOOP
    EXIT WHEN l = $3;
    IF v LIKE $2 THEN
    s := s || v;
    l := l + 1;
    END IF;
    END LOOP;
    RETURN s;
    END;$function$;

    postgres=# select avg(array_upper(filter(x,'%AA%', 10),1)) from t10;
    avg
    --------------------
    1.1596079803990200
    (1 row)

    Time: 393.649 ms

    postgres=# select avg(array_upper(filter(x,'%AA%', 10),1)) from t100;
    avg
    --------------------
    3.4976777789245536
    (1 row)

    Time: 2804.502 ms

    postgres=# select avg(array_upper(filter(x,'%AA%', 10),1)) from t1000;
    avg
    ---------------------
    10.0000000000000000
    (1 row)

    Time: 9729.994 ms

    CREATE OR REPLACE FUNCTION public.filter01(text[], text, integer)
    RETURNS text[]
    LANGUAGE plpgsql
    AS $function$
    DECLARE
    s text[] := '{}';
    l int := 0;
    v text;
    BEGIN
    FOR v IN SELECT UNNEST($1)
    LOOP
    EXIT WHEN l = $3;
    IF v LIKE $2 THEN
    s := s || v;
    l := l + 1;
    END IF;
    END LOOP;
    RETURN s;
    END;$function$;

    postgres=# select avg(array_upper(filter01(x,'%AA%', 10),1)) from t10;
    avg
    --------------------
    1.1596079803990200
    (1 row)

    Time: 795.383 ms

    postgres=# select avg(array_upper(filter01(x,'%AA%', 10),1)) from t100;
    avg
    --------------------
    3.4976777789245536
    (1 row)

    Time: 3848.258 ms

    postgres=# select avg(array_upper(filter01(x,'%AA%', 10),1)) from t1000;
    avg
    ---------------------
    10.0000000000000000
    (1 row)

    Time: 12366.093 ms

    The iteration via specialized FOR IN ARRAY is about 25-30% faster than
    FOR IN SELECT UNNEST

    postgres=# CREATE OR REPLACE FUNCTION public.filter02(text[], text, integer)
    RETURNS text[]
    LANGUAGE plpgsql
    AS $function$
    DECLARE
    s text[] := '{}';
    l int := 0; i int;
    v text;
    BEGIN
    FOR i IN array_lower($1,1)..array_upper($1,1)
    LOOP
    EXIT WHEN l = $3;
    IF $1[i] LIKE $2 THEN
    s := s || $1[i];
    l := l + 1;
    END IF;
    END LOOP;
    RETURN s;
    END;$function$
    ;

    postgres=# select avg(array_upper(filter02(x,'%AA%', 10),1)) from t10;
    avg
    --------------------
    1.1596079803990200
    (1 row)

    Time: 414.960 ms

    postgres=# select avg(array_upper(filter02(x,'%AA%', 10),1)) from t100;
    avg
    --------------------
    3.4976777789245536
    (1 row)

    Time: 3460.970 ms

    there FOR IN ARRAY is faster about 30% then access per index

    for T1000 I had to cancel over 1 minute!!!!
    I can't test until this week-end. But I will.
    (I wonder because I have something like that in that garage : select
    array_filter(foo,'like','%bar%',10); where 10 is the limit and can be
    avoided, foo is the array, like is callback function, '%bar%' the
    parameter for the callback function for filtering results.)

    It will make my toy in the garage a fast race car (and probably doable
    in (plpg)SQL instead of C) ...
    it can help with reading of array. But it doesn't help with array
    updating :(. For large arrays it can be slow too.
    select fast is already a good job, thank you.


    --
    Cédric Villemain               2ndQuadrant
    http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support
  • Merlin Moncure at Nov 18, 2010 at 3:24 pm

    On Thu, Nov 18, 2010 at 12:47 AM, Tom Lane wrote:
    Merlin Moncure <mmoncure@gmail.com> writes:
    On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova wrote:
    i will start the review of this one... but before that sorry for
    suggesting this a bit later but about using UNNEST as part of the
    sintax?
    Does for-in-array do what unnset does?
    Yes, which begs the question of why bother at all.  AFAICS this patch
    simply allows you to replace

    for x in select unnest(array_value) loop

    with

    for x in unnest array_value loop

    (plus or minus a parenthesis or so).  I do not think we need to add a
    bunch of code and create even more syntactic ambiguity (FOR loops are
    already on the hairy edge of unparsability) to save people from writing
    "select".
    Pavel's performance argument is imnsho valid. arrays at present are
    the best way to pass data around functions and any optimizations here
    are very welcome. Given that, is there any way to mitigate your
    concerns on the syntax side?

    merlin
  • Robert Haas at Nov 18, 2010 at 3:40 pm

    On Thu, Nov 18, 2010 at 10:24 AM, Merlin Moncure wrote:
    On Thu, Nov 18, 2010 at 12:47 AM, Tom Lane wrote:
    Merlin Moncure <mmoncure@gmail.com> writes:
    On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova wrote:
    i will start the review of this one... but before that sorry for
    suggesting this a bit later but about using UNNEST as part of the
    sintax?
    Does for-in-array do what unnset does?
    Yes, which begs the question of why bother at all.  AFAICS this patch
    simply allows you to replace

    for x in select unnest(array_value) loop

    with

    for x in unnest array_value loop

    (plus or minus a parenthesis or so).  I do not think we need to add a
    bunch of code and create even more syntactic ambiguity (FOR loops are
    already on the hairy edge of unparsability) to save people from writing
    "select".
    Pavel's performance argument is imnsho valid. arrays at present are
    the best way to pass data around functions and any optimizations here
    are very welcome.  Given that, is there any way to mitigate your
    concerns on the syntax side?
    Can we get the performance benefit any other way? I hate to think
    that it will still be slow for people using the already-supported
    syntax.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Cédric Villemain at Nov 18, 2010 at 3:51 pm

    2010/11/18 Robert Haas <robertmhaas@gmail.com>:
    On Thu, Nov 18, 2010 at 10:24 AM, Merlin Moncure wrote:
    On Thu, Nov 18, 2010 at 12:47 AM, Tom Lane wrote:
    Merlin Moncure <mmoncure@gmail.com> writes:
    On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova wrote:
    i will start the review of this one... but before that sorry for
    suggesting this a bit later but about using UNNEST as part of the
    sintax?
    Does for-in-array do what unnset does?
    Yes, which begs the question of why bother at all.  AFAICS this patch
    simply allows you to replace

    for x in select unnest(array_value) loop

    with

    for x in unnest array_value loop

    (plus or minus a parenthesis or so).  I do not think we need to add a
    bunch of code and create even more syntactic ambiguity (FOR loops are
    already on the hairy edge of unparsability) to save people from writing
    "select".
    Pavel's performance argument is imnsho valid. arrays at present are
    the best way to pass data around functions and any optimizations here
    are very welcome.  Given that, is there any way to mitigate your
    concerns on the syntax side?
    Can we get the performance benefit any other way?  I hate to think
    that it will still be slow for people using the already-supported
    syntax.
    If you are able to make unnest() outputting 1st row without detoasting
    last field.

    I think if we have :
    #define DatumGetTextPSlice(X,m,n) ((text *) PG_DETOAST_DATUM_SLICE(X,m,n))
    but for array, most is done

    Pavel, am I correct ?
    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company

    --
    Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-hackers


    --
    Cédric Villemain               2ndQuadrant
    http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support
  • Pavel Stehule at Nov 18, 2010 at 3:58 pm

    2010/11/18 Cédric Villemain <cedric.villemain.debian@gmail.com>:
    2010/11/18 Robert Haas <robertmhaas@gmail.com>:
    On Thu, Nov 18, 2010 at 10:24 AM, Merlin Moncure wrote:
    On Thu, Nov 18, 2010 at 12:47 AM, Tom Lane wrote:
    Merlin Moncure <mmoncure@gmail.com> writes:
    On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova wrote:
    i will start the review of this one... but before that sorry for
    suggesting this a bit later but about using UNNEST as part of the
    sintax?
    Does for-in-array do what unnset does?
    Yes, which begs the question of why bother at all.  AFAICS this patch
    simply allows you to replace

    for x in select unnest(array_value) loop

    with

    for x in unnest array_value loop

    (plus or minus a parenthesis or so).  I do not think we need to add a
    bunch of code and create even more syntactic ambiguity (FOR loops are
    already on the hairy edge of unparsability) to save people from writing
    "select".
    Pavel's performance argument is imnsho valid. arrays at present are
    the best way to pass data around functions and any optimizations here
    are very welcome.  Given that, is there any way to mitigate your
    concerns on the syntax side?
    Can we get the performance benefit any other way?  I hate to think
    that it will still be slow for people using the already-supported
    syntax.
    If you are able to make unnest() outputting 1st row without detoasting
    last field.

    I think if we have :
    #define DatumGetTextPSlice(X,m,n)   ((text *) PG_DETOAST_DATUM_SLICE(X,m,n))
    but for array, most is done

    Pavel, am I correct ?
    yes, it can help, but still if you iterate over complete array, you
    have to do n - detoast ops.

    Pavel
    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company

    --
    Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-hackers


    --
    Cédric Villemain               2ndQuadrant
    http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support
  • Cédric Villemain at Nov 18, 2010 at 4:03 pm

    2010/11/18 Pavel Stehule <pavel.stehule@gmail.com>:
    2010/11/18 Cédric Villemain <cedric.villemain.debian@gmail.com>:
    2010/11/18 Robert Haas <robertmhaas@gmail.com>:
    On Thu, Nov 18, 2010 at 10:24 AM, Merlin Moncure wrote:
    On Thu, Nov 18, 2010 at 12:47 AM, Tom Lane wrote:
    Merlin Moncure <mmoncure@gmail.com> writes:
    On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova wrote:
    i will start the review of this one... but before that sorry for
    suggesting this a bit later but about using UNNEST as part of the
    sintax?
    Does for-in-array do what unnset does?
    Yes, which begs the question of why bother at all.  AFAICS this patch
    simply allows you to replace

    for x in select unnest(array_value) loop

    with

    for x in unnest array_value loop

    (plus or minus a parenthesis or so).  I do not think we need to add a
    bunch of code and create even more syntactic ambiguity (FOR loops are
    already on the hairy edge of unparsability) to save people from writing
    "select".
    Pavel's performance argument is imnsho valid. arrays at present are
    the best way to pass data around functions and any optimizations here
    are very welcome.  Given that, is there any way to mitigate your
    concerns on the syntax side?
    Can we get the performance benefit any other way?  I hate to think
    that it will still be slow for people using the already-supported
    syntax.
    If you are able to make unnest() outputting 1st row without detoasting
    last field.

    I think if we have :
    #define DatumGetTextPSlice(X,m,n)   ((text *) PG_DETOAST_DATUM_SLICE(X,m,n))
    but for array, most is done

    Pavel, am I correct ?
    yes, it can help, but still if you iterate over complete array, you
    have to do n - detoast ops. sure.
    Pavel
    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company

    --
    Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-hackers


    --
    Cédric Villemain               2ndQuadrant
    http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support


    --
    Cédric Villemain               2ndQuadrant
    http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support
  • Pavel Stehule at Nov 18, 2010 at 3:52 pm

    2010/11/18 Robert Haas <robertmhaas@gmail.com>:
    On Thu, Nov 18, 2010 at 10:24 AM, Merlin Moncure wrote:
    On Thu, Nov 18, 2010 at 12:47 AM, Tom Lane wrote:
    Merlin Moncure <mmoncure@gmail.com> writes:
    On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova wrote:
    i will start the review of this one... but before that sorry for
    suggesting this a bit later but about using UNNEST as part of the
    sintax?
    Does for-in-array do what unnset does?
    Yes, which begs the question of why bother at all.  AFAICS this patch
    simply allows you to replace

    for x in select unnest(array_value) loop

    with

    for x in unnest array_value loop

    (plus or minus a parenthesis or so).  I do not think we need to add a
    bunch of code and create even more syntactic ambiguity (FOR loops are
    already on the hairy edge of unparsability) to save people from writing
    "select".
    Pavel's performance argument is imnsho valid. arrays at present are
    the best way to pass data around functions and any optimizations here
    are very welcome.  Given that, is there any way to mitigate your
    concerns on the syntax side?
    Can we get the performance benefit any other way?  I hate to think
    that it will still be slow for people using the already-supported
    syntax.
    I afraid so other ways are more difficult with deeper impacts on
    plpgsql executor.

    what is a slow:

    a) repeated detoasting - access with subscripts - maybe detoasted
    values can be cached?
    b) evaluation of SRF expression - maybe call of SRF function can be
    simple expression,
    c) faster evaluation ro query

    The most important is @a. Only a few people uses a FOR IN SELECT
    UNNEST form now. Probably not optimizable on PLpgSQL level is form FOR
    IN SELECT * FROM UNNEST.

    FOR IN ARRAY doesn't impacts on expression executing - this patch is
    just simple and isolated.

    Pavel


    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Tom Lane at Nov 18, 2010 at 6:57 pm

    Pavel Stehule writes:
    what is a slow:
    a) repeated detoasting - access with subscripts - maybe detoasted
    values can be cached?
    b) evaluation of SRF expression - maybe call of SRF function can be
    simple expression,
    c) faster evaluation ro query
    The most important is @a.
    Really? Becase AFAICS array_unnest only detoasts the source array once,
    and saves the value between calls.

    array_unnest doesn't currently have any smarts about fetching slices
    of an array. I'm not sure how useful that would be in practice, since
    (1) in most usages you probably run the function to the end and fetch
    all the values anyway; (2) it's hard to see how to optimize that way
    if the elements are varlena, which they most likely are in most usages
    where this could possibly be a win. But if Cedric's use-case is really
    worth optimizing, I'd sure rather see the smarts for it in the general
    purpose array_unnest function instead of buried in plpgsql's FOR logic.

    regards, tom lane
  • Pavel Stehule at Nov 18, 2010 at 7:18 pm

    2010/11/18 Tom Lane <tgl@sss.pgh.pa.us>:
    Pavel Stehule <pavel.stehule@gmail.com> writes:
    what is a slow:
    a) repeated detoasting - access with subscripts - maybe detoasted
    values can be cached?
    b) evaluation of SRF expression - maybe call of SRF function can be
    simple expression,
    c) faster evaluation ro query
    The most important is @a.
    Really?  Becase AFAICS array_unnest only detoasts the source array once,
    and saves the value between calls.
    I know.

    this note was a different -only a few people use FOR IN SELECT UNNEST
    for iteration over array. So from Robert's question (what is important
    for current code?) perspective the more significant is access to
    individual fields via subscripts. For example:

    for i in 1..10000 loop
    s := s + A[i];
    end loop

    is slow, when high limit of array is some bigger number > 1000. But
    almost all stored procedures used this pattern. I know so some people
    use a pattern FOR IN SELECT UNNEST, but (for example) I didn't meet
    that developer in Czech Rep. It isn't usual so people can mix SQL and
    PL well.

    It has a practical reasons - using a UNNEST for small arrays is slower.
    array_unnest doesn't currently have any smarts about fetching slices
    of an array.  I'm not sure how useful that would be in practice, since
    (1) in most usages you probably run the function to the end and fetch
    all the values anyway; (2) it's hard to see how to optimize that way
    if the elements are varlena, which they most likely are in most usages
    where this could possibly be a win.  But if Cedric's use-case is really
    worth optimizing, I'd sure rather see the smarts for it in the general
    purpose array_unnest function instead of buried in plpgsql's FOR logic.
    Probably - example with LIKE filter is really specific. But there can
    be a tasks, where you can early break a iteration where you find a
    value higher or less then some constant - it's not too artificial -
    test "IS MEMBER OF"

    Regards

    Pavel
    regards, tom lane
  • Tom Lane at Nov 18, 2010 at 7:25 pm

    Pavel Stehule writes:
    this note was a different -only a few people use FOR IN SELECT UNNEST
    for iteration over array. So from Robert's question (what is important
    for current code?) perspective the more significant is access to
    individual fields via subscripts. For example:
    for i in 1..10000 loop
    s := s + A[i];
    end loop
    is slow, when high limit of array is some bigger number > 1000.
    True, but inventing new FOR syntax isn't going to help people who are
    used to doing that.

    regards, tom lane
  • Pavel Stehule at Nov 18, 2010 at 7:35 pm

    2010/11/18 Tom Lane <tgl@sss.pgh.pa.us>:
    Pavel Stehule <pavel.stehule@gmail.com> writes:
    this note was a different -only  a few people use FOR IN SELECT UNNEST
    for iteration over array. So from Robert's question (what is important
    for current code?) perspective the more significant is access to
    individual fields via subscripts. For example:
    for i in 1..10000 loop
    s := s + A[i];
    end loop
    is slow, when high limit of array is some bigger number > 1000.
    True, but inventing new FOR syntax isn't going to help people who are
    used to doing that.
    sure - I don't try it. Any change of this mean significant plpgsql's
    refactoring and significant increasing the size and complexity of
    code. More there can be still some overhead, because subscript can be
    expression. And in almost all cases people dislike to write
    subscripts.
    regards, tom lane
  • Andrew Dunstan at Nov 18, 2010 at 7:31 pm

    On 11/18/2010 02:17 PM, Pavel Stehule wrote:
    -only a few people use FOR IN SELECT UNNEST for iteration over array.
    How on earth do you know that? I use it a lot and I was just
    demonstrating it to a client yesterday, and I'm quite sure he will use
    it a lot too. I bet I'm far from alone.

    cheers

    andrew
  • Pavel Stehule at Nov 18, 2010 at 7:40 pm
    2010/11/18 Andrew Dunstan <andrew@dunslane.net>:
    On 11/18/2010 02:17 PM, Pavel Stehule wrote:

    -only a few people use FOR IN SELECT UNNEST for iteration over array.
    How on earth do you know that? I use it a lot and I was just demonstrating
    it to a client yesterday, and I'm quite sure he will use it a lot too. I bet
    I'm far from alone.
    how much people are active readers and writers in pg_hackers like you? :)

    I didn't say so nobody use it. You, me, David. But I really didn't see
    this pattern here in real applications.

    Regards

    Pavel


    cheers

    andrew
  • Andrew Dunstan at Nov 18, 2010 at 7:45 pm

    On 11/18/2010 02:39 PM, Pavel Stehule wrote:
    2010/11/18 Andrew Dunstan<andrew@dunslane.net>:
    On 11/18/2010 02:17 PM, Pavel Stehule wrote:
    -only a few people use FOR IN SELECT UNNEST for iteration over array.
    How on earth do you know that? I use it a lot and I was just demonstrating
    it to a client yesterday, and I'm quite sure he will use it a lot too. I bet
    I'm far from alone.
    how much people are active readers and writers in pg_hackers like you? :)

    I didn't say so nobody use it. You, me, David. But I really didn't see
    this pattern here in real applications.
    Lots of people are told to use it on IRC. Trust me, it's getting well known.

    cheers

    andrew
  • Pavel Stehule at Nov 18, 2010 at 8:00 pm
    2010/11/18 Andrew Dunstan <andrew@dunslane.net>:
    On 11/18/2010 02:39 PM, Pavel Stehule wrote:

    2010/11/18 Andrew Dunstan<andrew@dunslane.net>:
    On 11/18/2010 02:17 PM, Pavel Stehule wrote:

    -only a few people use FOR IN SELECT UNNEST for iteration over array.
    How on earth do you know that? I use it a lot and I was just
    demonstrating
    it to a client yesterday, and I'm quite sure he will use it a lot too. I
    bet
    I'm far from alone.
    how much people are active readers and writers in pg_hackers like you? :)

    I didn't say so nobody use it. You, me, David. But I really didn't see
    this pattern here in real applications.
    Lots of people are told to use it on IRC. Trust me, it's getting well known.
    can be. but people on IRC are not representative.

    I have about 10 courses of PL/pgSQL per year (about 100 people) -
    almost all my students newer visited IRC. 30% of my students has a
    problem to write a bublesort or some little bit complex code. I meet
    this people. There can be a language barrier or some laziness. Really
    it is surprisingly how too less people are interesting about coding.
    This people has own problems, and usually uses a most classic patter
    that know from programming languages.

    These peoples are "normal" and typical. Some courses I have under some
    big Czech agency - so there are people from banks, industry. Actually
    only I do a courses of PLpgSQL in Czech language - so I think I know
    what people use. For example - only a few people know and use a
    generate_series functions.

    sorry for offtopic :)

    Pavel
    cheers

    andrew
  • Alvaro Herrera at Nov 18, 2010 at 8:12 pm

    Excerpts from Pavel Stehule's message of jue nov 18 17:00:04 -0300 2010:
    2010/11/18 Andrew Dunstan <andrew@dunslane.net>:
    I didn't say so nobody use it. You, me, David. But I really didn't see
    this pattern here in real applications.
    Lots of people are told to use it on IRC. Trust me, it's getting well known.
    can be. but people on IRC are not representative.
    Yeah, that's true. I point out usage of unnest to our customers too,
    but it's much more common to see people not using it, instead relying on
    subscripts. People using Postgres show up unexpectedly from under
    rocks, in the weirdest corners; they rarely consult documentation and
    even more rarely get into IRC or mailing list to get help.

    I fail to see how this supports the FOR-IN-array development though. It
    will just be another unused construct for most people, no?

    --
    Álvaro Herrera <alvherre@commandprompt.com>
    The PostgreSQL Company - Command Prompt, Inc.
    PostgreSQL Replication, Consulting, Custom Development, 24x7 support
  • Pavel Stehule at Nov 18, 2010 at 8:18 pm

    2010/11/18 Alvaro Herrera <alvherre@commandprompt.com>:
    Excerpts from Pavel Stehule's message of jue nov 18 17:00:04 -0300 2010:
    2010/11/18 Andrew Dunstan <andrew@dunslane.net>:
    I didn't say so nobody use it. You, me, David. But I really didn't see
    this pattern here in real applications.
    Lots of people are told to use it on IRC. Trust me, it's getting well known.
    can be. but people on IRC are not representative.
    Yeah, that's true.  I point out usage of unnest to our customers too,
    but it's much more common to see people not using it, instead relying on
    subscripts.  People using Postgres show up unexpectedly from under
    rocks, in the weirdest corners; they rarely consult documentation and
    even more rarely get into IRC or mailing list to get help.

    I fail to see how this supports the FOR-IN-array development though.  It
    will just be another unused construct for most people, no?
    maybe I don't understand well, but patch FOR-IN-ARRAY has a documentation

    <sect2 id="plpgsql-array-iterating">
    + <title>Looping Through Array</title>
    +.
    + <para>
    + The syntax is:
    + <synopsis>
    + <optional> &lt;&lt;<replaceable>label</replaceable>&gt;&gt; </optional>
    + FOR <replaceable>target</replaceable> IN <replaceable>array
    expression</replaceable
    + <replaceable>statements</replaceable>
    + END LOOP <optional> <replaceable>label</replaceable> </optional>;
    + </synopsis>
    +.
    + The <replaceable>target</replaceable> is a record variable, row variable,
    + or comma-separated list of scalar variables.
    + The <replaceable>target</replaceable> is successively assigned each item
    + of result of the <replaceable>array_expression</replaceable>
    and the loop body
    + executed for each item. Here is an example:
    +.
    + <programlisting>
    + CREATE TYPE mypt AS (x int, y int);
    +.
    + CREATE FUNCTION iterate_over_points() RETURNS void AS $$
    + DECLARE
    + x int; y int;
    + a mypt[] = ARRAY[(10,11),(20,21),(30,31)];
    + BEGIN
    + FOR x, y IN ARRAY a
    + LOOP
    + RAISE NOTICE 'x = %, y = %', x, y;
    + END LOOP;
    + END;
    + $$ LANGUAGE plpgsql;
    + </programlisting>
    +.
    + If the loop is terminated by an <literal>EXIT</> statement, the last
    + assigned item value is still accessible after the loop.
    + </para>
    + </sect2>
    +.

    Pavel
    --
    Álvaro Herrera <alvherre@commandprompt.com>
    The PostgreSQL Company - Command Prompt, Inc.
    PostgreSQL Replication, Consulting, Custom Development, 24x7 support
  • Tom Lane at Nov 18, 2010 at 10:29 pm

    Pavel Stehule writes:
    2010/11/18 Alvaro Herrera <alvherre@commandprompt.com>:
    I fail to see how this supports the FOR-IN-array development though.  It
    will just be another unused construct for most people, no?
    maybe I don't understand well, but patch FOR-IN-ARRAY has a documentation
    UNNEST is documented too. Adding still more features doesn't really
    improve matters for people who haven't memorized the documentation;
    it only makes it even harder for them to find out what they should be
    using. (More features != better)

    To my mind, the complaint about subscripting being slow suggests that we
    ought to fix subscripting, not introduce a nonstandard feature that will
    make certain use-cases faster if people rewrite their code to use it.

    I think it would probably not be terribly hard to arrange forcible
    detoasting of an array variable's value the first time it gets
    subscripted, for instance. Of course that only fixes some use-cases;
    but it would help, and it helps without requiring people to change their
    code.

    regards, tom lane
  • Pavel Stehule at Nov 19, 2010 at 6:41 am

    2010/11/18 Tom Lane <tgl@sss.pgh.pa.us>:
    Pavel Stehule <pavel.stehule@gmail.com> writes:
    2010/11/18 Alvaro Herrera <alvherre@commandprompt.com>:
    I fail to see how this supports the FOR-IN-array development though.  It
    will just be another unused construct for most people, no?
    maybe I don't understand well, but patch FOR-IN-ARRAY has a documentation
    UNNEST is documented too.  Adding still more features doesn't really
    improve matters for people who haven't memorized the documentation;
    it only makes it even harder for them to find out what they should be
    using.  (More features != better)
    yes, but less user feature doesn't mean less code. Mainly in little
    bit specific environment like plpgsql.
    To my mind, the complaint about subscripting being slow suggests that we
    ought to fix subscripting, not introduce a nonstandard feature that will
    make certain use-cases faster if people rewrite their code to use it.

    I think it would probably not be terribly hard to arrange forcible
    detoasting of an array variable's value the first time it gets
    subscripted, for instance.  Of course that only fixes some use-cases;
    but it would help, and it helps without requiring people to change their
    code.
    This is just one half of problem and isn't simple. Second half is
    "array_seek" - So any access with subscripts means seq reading of
    array's data. Please, look on this part. I am thinking, so this is
    more important, than anything what we discussed before. For fast
    access there is necessary to call a deconstruct_array function. Then
    you can access to subscripts quickly. Actually we have not a control
    for access to items in array, when subscript is used in expression
    (inside PL). So it is very difficult to accelerate speed in area -
    probably it means a subscript expr should be evaluated individually.

    A deconstruct_area is relative expensive function, so you have to have
    a information about a using of array. Without it, and for smaller
    arrays, the optimization can be bad. There isn't any backend
    infrastructure for this decision now.

    I did a profiling

    first example: FOR IN ARRAY

    samples % symbol name
    336 20.6642 exec_eval_expr
    269 16.5437 plpgsql_param_fetch
    229 14.0836 exec_stmts
    225 13.8376 exec_eval_datum
    118 7.2571 exec_assign_value
    91 5.5966 exec_eval_cleanup.clone.10
    88 5.4121 setup_param_list
    72 4.4280 __i686.get_pc_thunk.bx
    65 3.9975 exec_eval_boolean
    47 2.8905 exec_simple_cast_value
    43 2.6445 free_var.clone.2
    28 1.7220 exec_cast_value

    samples % image name symbol name
    1064 16.1188 postgres pglz_decompress
    410 6.2112 postgres AllocSetAlloc
    353 5.3477 postgres MemoryContextAllocZero
    293 4.4387 postgres GetSnapshotData
    290 4.3933 postgres AllocSetFree
    281 4.2569 postgres ExecEvalParamExtern
    223 3.3783 postgres ExecMakeFunctionResultNoSets
    220 3.3328 postgres AllocSetReset
    212 3.2116 postgres UTF8_MatchText
    210 3.1813 postgres LWLockAcquire
    195 2.9541 postgres AllocSetCheck
    195 2.9541 postgres LWLockRelease
    172 2.6057 postgres pfree
    163 2.4693 postgres CopySnapshot
    162 2.4542 postgres list_member_ptr
    144 2.1815 postgres RevalidateCachedPlan
    133 2.0148 postgres PushActiveSnapshot
    121 1.8331 postgres PopActiveSnapshot
    121 1.8331 postgres bms_is_member
    118 1.7876 postgres MemoryContextAlloc
    108 1.6361 postgres textlike
    105 1.5907 postgres AcquireExecutorLocks
    79 1.1968 postgres TransactionIdPrecedes
    76 1.1513 postgres pgstat_end_function_usage
    75 1.1362 postgres pgstat_init_function_usage
    72 1.0907 postgres check_list_invariants

    sample01 - FOR i IN array_lowe()..array_upper() for t1000

    Profiling through timer interrupt
    samples % symbol name
    1039 29.4084 exec_stmts
    723 20.4642 exec_eval_expr
    587 16.6148 exec_eval_datum
    408 11.5483 plpgsql_param_fetch
    176 4.9816 exec_eval_cleanup.clone.10
    167 4.7269 setup_param_list
    159 4.5004 exec_eval_boolean
    128 3.6230 __i686.get_pc_thunk.bx
    66 1.8681 exec_simple_cast_value

    samples % image name symbol name
    312604 84.1141 postgres pglz_decompress
    4800 1.2916 postgres hash_search_with_hash_value
    4799 1.2913 postgres array_seek.clone.0
    2935 0.7897 postgres LWLockAcquire
    2399 0.6455 postgres _bt_compare
    2219 0.5971 postgres LWLockRelease
    1899 0.5110 postgres index_getnext
    1374 0.3697 postgres hash_any
    1257 0.3382 postgres LockAcquireExtended
    1231 0.3312 postgres _bt_checkkeys
    1208 0.3250 postgres AllocSetAlloc
    1158 0.3116 postgres FunctionCall2
    1102 0.2965 postgres toast_fetch_datum

    same for t100

    samples % symbol name
    108 20.6107 exec_eval_expr
    96 18.3206 plpgsql_param_fetch
    92 17.5573 exec_eval_datum
    66 12.5954 exec_stmts
    43 8.2061 setup_param_list
    38 7.2519 __i686.get_pc_thunk.bx
    34 6.4885 exec_eval_cleanup.clone.10
    16 3.0534 exec_simple_cast_value
    12 2.2901 exec_eval_boolean

    samples % image name symbol name
    511 20.4646 postgres array_seek.clone.0
    163 6.5278 postgres ExecEvalParamExtern
    131 5.2463 postgres AllocSetAlloc
    127 5.0861 postgres MemoryContextAllocZero
    113 4.5254 postgres list_member_ptr
    103 4.1249 postgres GetSnapshotData
    95 3.8046 postgres AllocSetFree
    92 3.6844 postgres LWLockAcquire
    80 3.2038 postgres ExecMakeFunctionResultNoSets
    74 2.9636 postgres UTF8_MatchText
    70 2.8034 postgres LWLockRelease
    57 2.2827 postgres ExecEvalArrayRef
    57 2.2827 postgres RevalidateCachedPlan
    53 2.1225 postgres AllocSetReset
    48 1.9223 postgres AllocSetCheck
    47 1.8823 postgres pfree
    41 1.6420 postgres PushActiveSnapshot
    40 1.6019 postgres CopySnapshot
    40 1.6019 postgres bms_is_member
    39 1.5619 postgres PopActiveSnapshot
    37 1.4818 postgres AcquireExecutorLocks
    32 1.2815 postgres array_ref
    31 1.2415 postgres textlike
    28 1.1213 postgres MemoryContextAlloc

    sample3 FOR IN UNNEST

    samples % symbol name
    334 19.1844 exec_eval_expr
    278 15.9678 plpgsql_param_fetch
    246 14.1298 exec_eval_datum
    180 10.3389 exec_stmts
    140 8.0414 exec_assign_value
    107 6.1459 setup_param_list
    97 5.5715 exec_eval_cleanup.clone.10
    97 5.5715 exec_move_row
    84 4.8248 __i686.get_pc_thunk.bx
    53 3.0442 exec_eval_boolean
    42 2.4124 exec_simple_cast_value
    36 2.0678 free_var.clone.2

    samples % image name symbol name
    996 11.5171 postgres pglz_decompress
    507 5.8626 postgres AllocSetAlloc
    494 5.7123 postgres list_member_ptr
    411 4.7525 postgres MemoryContextAllocZero
    344 3.9778 postgres ExecEvalParamExtern
    305 3.5268 postgres GetSnapshotData
    297 3.4343 postgres ExecMakeFunctionResultNoSets
    265 3.0643 postgres AllocSetFree
    250 2.8908 postgres UTF8_MatchText
    242 2.7983 postgres LWLockRelease
    236 2.7290 postgres LWLockAcquire
    210 2.4283 postgres AllocSetReset
    201 2.3242 postgres heap_form_tuple
    198 2.2895 postgres AllocSetCheck
    183 2.1161 postgres pfree
    165 1.9080 postgres ExecProject
    155 1.7923 postgres heap_fill_tuple
    151 1.7461 postgres CopySnapshot
    141 1.6304 postgres RevalidateCachedPlan
    136 1.5726 postgres MemoryContextAlloc
    114 1.3182 postgres PopActiveSnapshot
    108 1.2488 postgres AcquireExecutorLocks
    102 1.1795 postgres ExecMakeFunctionResult
    102 1.1795 postgres pgstat_init_function_usage
    95 1.0985 postgres textlike
    94 1.0870 postgres bms_is_member
    92 1.0638 postgres datumGetSize

    For iteration over large array with subscripts I am thinking so enough
    is a block repeated pglz_decompress. Others optimizations needs a
    hundreds lines (my personal opinion)

    regards

    Pavel Stehule



    regards, tom lane
  • Pavel Stehule at Nov 19, 2010 at 3:51 pm
    I checked my tests and the most important is a remove a repeated detoast.

    postgres=# CREATE OR REPLACE FUNCTION public.filter01(text[], text, integer)
    RETURNS text[]
    LANGUAGE plpgsql
    AS $function$
    DECLARE
    s text[] := '{}';
    l int := 0; i int;
    v text; loc text[] = $1;
    BEGIN
    FOR i IN array_lower(loc,1)..array_upper(loc,1)
    LOOP
    EXIT WHEN l = $3;
    IF loc[i] LIKE $2 THEN
    s := s || loc[i];
    l := l + 1;
    END IF;
    END LOOP;
    RETURN s;
    END;$function$;

    This code is very slow when array is large - tested on n=1000. With
    one small modification can be 20x faster

    DECLARE
    s text[] := '{}';
    l int := 0; i int;
    v text; loc text[] = $1 || '{}'::text[]; --<<< does just detoast and
    docomprimation
    BEGIN

    the final version of test can be:


    so result:

    Don't access to large unmodified array inside cycle, when data comes
    from table (for iteration over A[1000] of text(10)). A speadup is from
    451 sec to 15 sec. This rule can be interesting for PostGIS people,
    because it can be valid for other long varlena values. But still this
    is 2x slower than special statement.

    Regards

    Pavel Stehule

    samples % symbol name
    332 22.1333 exec_eval_expr
    311 20.7333 plpgsql_param_fetch
    267 17.8000 exec_eval_datum
    220 14.6667 exec_stmts
    91 6.0667 setup_param_list
    82 5.4667 exec_eval_cleanup.clone.10
    71 4.7333 __i686.get_pc_thunk.bx
    48 3.2000 exec_simple_cast_value
    43 2.8667 exec_eval_boolean

    samples % symbol name
    4636 37.5994 array_seek.clone.0
    961 7.7940 pglz_decompress
    901 7.3074 list_member_ptr
    443 3.5929 MemoryContextAllocZero
    384 3.1144 AllocSetAlloc
    381 3.0900 ExecEvalParamExtern
    334 2.7088 GetSnapshotData
    255 2.0681 AllocSetFree
    254 2.0600 LWLockRelease
    249 2.0195 ExecMakeFunctionResultNoSets
    249 2.0195 UTF8_MatchText
    234 1.8978 LWLockAcquire
    195 1.5815 AllocSetReset
    167 1.3544 AllocSetCheck
    163 1.3220 pfree
    151 1.2247 ExecEvalArrayRef
    149 1.2084 RevalidateCachedPlan
    138 1.1192 bms_is_member
    126 1.0219 CopySnapshot
  • Andres Freund at Nov 18, 2010 at 11:06 pm

    On Thursday 18 November 2010 21:11:32 Alvaro Herrera wrote:
    Excerpts from Pavel Stehule's message of jue nov 18 17:00:04 -0300 2010:
    2010/11/18 Andrew Dunstan <andrew@dunslane.net>:
    I didn't say so nobody use it. You, me, David. But I really didn't see
    this pattern here in real applications.
    Lots of people are told to use it on IRC. Trust me, it's getting well
    known.
    can be. but people on IRC are not representative.
    Yeah, that's true. I point out usage of unnest to our customers too,
    but it's much more common to see people not using it, instead relying on
    subscripts. People using Postgres show up unexpectedly from under
    rocks, in the weirdest corners; they rarely consult documentation and
    even more rarely get into IRC or mailing list to get help.
    Well, a good reason for that might be that unnest() is pretty new... Most code
    I read has been initially written quite a bit earlier. Seeing 8.4 in
    production is only starting to get common.

    Andres
  • Andrew Dunstan at Nov 18, 2010 at 11:41 pm

    On 11/18/2010 06:06 PM, Andres Freund wrote:
    Well, a good reason for that might be that unnest() is pretty new... Most code
    I read has been initially written quite a bit earlier. Seeing 8.4 in
    production is only starting to get common.
    I guess we must have more adventurous customers than you :-)

    (Incidentally, there is a (slow but still very useful) userland version
    of unnest that works with 8.3.)

    cheers

    andrew
  • Cédric Villemain at Nov 19, 2010 at 1:35 am

    2010/11/18 Tom Lane <tgl@sss.pgh.pa.us>:
    Pavel Stehule <pavel.stehule@gmail.com> writes:
    what is a slow:
    a) repeated detoasting - access with subscripts - maybe detoasted
    values can be cached?
    b) evaluation of SRF expression - maybe call of SRF function can be
    simple expression,
    c) faster evaluation ro query
    The most important is @a.
    Really?  Becase AFAICS array_unnest only detoasts the source array once,
    and saves the value between calls.

    array_unnest doesn't currently have any smarts about fetching slices
    of an array.  I'm not sure how useful that would be in practice, since
    (1) in most usages you probably run the function to the end and fetch
    all the values anyway; (2) it's hard to see how to optimize that way
    if the elements are varlena, which they most likely are in most usages
    where this could possibly be a win.  But if Cedric's use-case is really
    worth optimizing, I'd sure rather see the smarts for it in the general
    purpose array_unnest function instead of buried in plpgsql's FOR logic.
    My use case is the following:

    I have text[] data containing around 50k field.
    Executing my array_filter (which is probably not as fast as what Pavel
    did) is the same thing as executing unnest, except that during the
    array walk, I apply a callback function and increment an internal
    counter up to the p_limit parameter when callback function success.
    So that it stops walking the array as soon as p_limit counter is full,
    or there are no more elements to walk to in the array.

    1) At a maximum it is slow like unnest (plus callback overhead), at a
    minimum it find quickly and the gain is huge. Don't have the exact
    numbers right here, but (from memory) the durations are between Oms
    and 45 milliseconds (20M rows, of 50k field text array, not very long
    text < 100 char, 15k calls per second, depending on items to walk in
    the array, linear). WIth just unnest, a minimum is 45 ms per query.

    2) DETOAST_SLICE I don't know the internals here


    I have a concrete usage of what Pavel did.
    And I agree that this is fast and easy way to handle the real issues behind :/
    regards, tom lane

    --
    Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-hackers


    --
    Cédric Villemain               2ndQuadrant
    http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support
  • Andrew Dunstan at Nov 18, 2010 at 4:01 pm

    On 11/18/2010 10:33 AM, Robert Haas wrote:
    On Thu, Nov 18, 2010 at 10:24 AM, Merlin Moncurewrote:
    Pavel's performance argument is imnsho valid. arrays at present are
    the best way to pass data around functions and any optimizations here
    are very welcome. Given that, is there any way to mitigate your
    concerns on the syntax side?
    Can we get the performance benefit any other way? I hate to think
    that it will still be slow for people using the already-supported
    syntax.
    It's not disastrously slower. AFAICT from a very quick glance over the
    patch, he's getting the speedup by bypassing the normal mechanism for
    evaluating "for x in select ...". So we'd have to special-case that to
    trap calls to unnest in the general form. That would be pretty ugly. Or
    else make unnest and SPI faster. But that's a much bigger project.

    Syntactic sugar is not entirely to be despised, anyway.

    cheers

    andrew
  • Tom Lane at Nov 18, 2010 at 5:36 pm

    Andrew Dunstan writes:
    Syntactic sugar is not entirely to be despised, anyway.
    If it were harmless syntactic sugar I wouldn't be objecting so loudly.
    The problem here is that FOR is a syntactic choke point: it's already
    overloaded with several different sub-syntaxes that are quite difficult
    to separate. Adding another one makes that worse, with the consequences
    that we might misinterpret the user's intent, leading either to
    misleading/unhelpful error messages or unexpected runtime behavior.
    If you consult the archives you can find numerous past instances of
    complaints from people who hit such problems with the existing set of
    FOR sub-syntaxes, so this isn't hypothetical.

    I'm also quite afraid of having a conflict with other future extensions
    of FOR, which we might want to introduce either on our own hook or for
    compatibility with something Oracle might add to PL/SQL in future.

    This might not be the worst place in the entire system to be introducing
    inessential syntactic sugar, but it's certainly one of the top ten.
    I would *much* rather we get the performance benefit by internal
    optimization, and forego inventing syntax.

    regards, tom lane
  • Robert Haas at Nov 18, 2010 at 5:54 pm

    On Thu, Nov 18, 2010 at 12:36 PM, Tom Lane wrote:
    I would *much* rather we get the performance benefit by internal
    optimization, and forego inventing syntax.
    +1.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Pavel Stehule at Nov 18, 2010 at 6:04 pm

    2010/11/18 Robert Haas <robertmhaas@gmail.com>:
    On Thu, Nov 18, 2010 at 12:36 PM, Tom Lane wrote:
    I would *much* rather we get the performance benefit by internal
    optimization, and forego inventing syntax.
    +1.
    any optimization will be about 10-20% slower than direct access. See
    my tests: on large arrays isn't significant if you use a simple
    expression or full query. This is just overhead from building a
    "tuplestore" and access to data via cursor. And you cannot to change a
    SRF functions to returns just array. I would to see any optimization
    on this level, but I think so it's unreal expecting.

    Regards

    Pavel Stehule
    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Robert Haas at Nov 18, 2010 at 6:17 pm

    On Thu, Nov 18, 2010 at 1:03 PM, Pavel Stehule wrote:
    2010/11/18 Robert Haas <robertmhaas@gmail.com>:
    On Thu, Nov 18, 2010 at 12:36 PM, Tom Lane wrote:
    I would *much* rather we get the performance benefit by internal
    optimization, and forego inventing syntax.
    +1.
    any optimization will be about 10-20% slower than direct access. See
    my tests: on large arrays isn't significant if you use a simple
    expression or full query. This is just overhead from building a
    "tuplestore" and access to data via cursor. And you cannot to change a
    SRF functions to returns just array. I would to see any optimization
    on this level, but I think so it's unreal expecting.
    How can you possibly make a general statement like that? What's slow
    is not the syntax; it's what the syntax is making happen under the
    hood.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Pavel Stehule at Nov 18, 2010 at 6:48 pm

    2010/11/18 Robert Haas <robertmhaas@gmail.com>:
    On Thu, Nov 18, 2010 at 1:03 PM, Pavel Stehule wrote:
    2010/11/18 Robert Haas <robertmhaas@gmail.com>:
    On Thu, Nov 18, 2010 at 12:36 PM, Tom Lane wrote:
    I would *much* rather we get the performance benefit by internal
    optimization, and forego inventing syntax.
    +1.
    any optimization will be about 10-20% slower than direct access. See
    my tests: on large arrays isn't significant if you use a simple
    expression or full query. This is just overhead from building a
    "tuplestore" and access to data via cursor. And you cannot to change a
    SRF functions to returns just array. I would to see any optimization
    on this level, but I think so it's unreal expecting.
    How can you possibly make a general statement like that?  What's slow
    is not the syntax; it's what the syntax is making happen under the
    hood.
    ok, it is based on my tests, but it can be subjective. Probably is
    possible to work with a tuplestore as result of SRF function. And
    probably we can read from it without cursor. Maybe we can to teach a
    SRF functions to store values as scalars not as tuple - tuplestore can
    do it, but the we have to have a global state and we must to modify
    buildin functions (not just "unnest" - if the feature should be
    general). But newer we can to ensure a working with only necessary
    data like a special PL statement. "unnest" returns all fields, but
    these fields should not be used. There isn't possible to say - stop, I
    don't need other fields. It's possible just with special PL statement,
    because it is controlled by PL. So it is reason why I don't believe in
    optimizations on PL level.

    Regards

    Pavel Stehule
    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Tom Lane at Nov 18, 2010 at 7:28 pm

    Pavel Stehule writes:
    "unnest" returns all fields, but
    these fields should not be used. There isn't possible to say - stop, I
    don't need other fields. It's possible just with special PL statement,
    because it is controlled by PL. So it is reason why I don't believe in
    optimizations on PL level.
    That is complete nonsense. array_unnest doesn't return the whole array
    contents at once, so it's just as capable of being optimized as any
    single-purpose implementation. If you exit the loop early, you just
    don't call it anymore.

    regards, tom lane
  • Pavel Stehule at Nov 18, 2010 at 7:47 pm

    2010/11/18 Tom Lane <tgl@sss.pgh.pa.us>:
    Pavel Stehule <pavel.stehule@gmail.com> writes:
    "unnest" returns all fields, but
    these fields should not be used. There isn't possible to say - stop, I
    don't need other fields. It's possible just with special PL statement,
    because it is controlled by PL. So it is reason why I don't believe in
    optimizations on PL level.
    That is complete nonsense.  array_unnest doesn't return the whole array
    contents at once, so it's just as capable of being optimized as any
    single-purpose implementation.  If you exit the loop early, you just
    don't call it anymore.
    no it isn't - actually you cannot to limit a returned set when you
    call SRF function in expression context - if I remember well. We can
    change it - but this is other complexity.

    Pavel


    regards, tom lane
  • Pavel Stehule at Nov 18, 2010 at 5:56 pm

    2010/11/18 Tom Lane <tgl@sss.pgh.pa.us>:
    Andrew Dunstan <andrew@dunslane.net> writes:
    Syntactic sugar is not entirely to be despised, anyway.
    If it were harmless syntactic sugar I wouldn't be objecting so loudly.
    The problem here is that FOR is a syntactic choke point: it's already
    overloaded with several different sub-syntaxes that are quite difficult
    to separate.  Adding another one makes that worse, with the consequences
    that we might misinterpret the user's intent, leading either to
    misleading/unhelpful error messages or unexpected runtime behavior.
    If you consult the archives you can find numerous past instances of
    complaints from people who hit such problems with the existing set of
    FOR sub-syntaxes, so this isn't hypothetical.
    yes, this argument is correct - but we can rearange a parser rules
    related to FOR statement. It can be solved.
    I'm also quite afraid of having a conflict with other future extensions
    of FOR, which we might want to introduce either on our own hook or for
    compatibility with something Oracle might add to PL/SQL in future.
    we talked about it last time - and I respect it - syntax is FOR IN
    ARRAY<<< expression
    This might not be the worst place in the entire system to be introducing
    inessential syntactic sugar, but it's certainly one of the top ten.
    I would *much* rather we get the performance benefit by internal
    optimization, and forego inventing syntax.
    Regards

    Pavel

    regards, tom lane
  • Tom Lane at Nov 18, 2010 at 7:16 pm

    Pavel Stehule writes:
    2010/11/18 Tom Lane <tgl@sss.pgh.pa.us>:
    The problem here is that FOR is a syntactic choke point: it's already
    overloaded with several different sub-syntaxes that are quite difficult
    to separate.  Adding another one makes that worse, with the consequences
    that we might misinterpret the user's intent, leading either to
    misleading/unhelpful error messages or unexpected runtime behavior.
    yes, this argument is correct - but we can rearange a parser rules
    related to FOR statement. It can be solved.
    No, it can't. The more things that can possibly follow FOR, the less
    likely that you correctly guess which one the user had in mind when
    faced with something that's not quite syntactically correct. Or maybe
    it *is* syntactically correct, only not according to the variant that
    the user thought he was invoking. We've seen bug reports of this sort
    connected with FOR already; in fact I'm pretty sure you've responded to
    a few yourself. Adding more variants *will* make it worse. We need
    a decent return on investment for anything we add here, and this
    proposal just doesn't offer enough benefit.

    regards, tom lane
  • Pavel Stehule at Nov 18, 2010 at 7:32 pm

    2010/11/18 Tom Lane <tgl@sss.pgh.pa.us>:
    Pavel Stehule <pavel.stehule@gmail.com> writes:
    2010/11/18 Tom Lane <tgl@sss.pgh.pa.us>:
    The problem here is that FOR is a syntactic choke point: it's already
    overloaded with several different sub-syntaxes that are quite difficult
    to separate.  Adding another one makes that worse, with the consequences
    that we might misinterpret the user's intent, leading either to
    misleading/unhelpful error messages or unexpected runtime behavior.
    yes, this argument is correct - but we can rearange a parser rules
    related to FOR statement. It can be solved.
    No, it can't.  The more things that can possibly follow FOR, the less
    likely that you correctly guess which one the user had in mind when
    faced with something that's not quite syntactically correct.  Or maybe
    it *is* syntactically correct, only not according to the variant that
    the user thought he was invoking.  We've seen bug reports of this sort
    connected with FOR already; in fact I'm pretty sure you've responded to
    a few yourself.  Adding more variants *will* make it worse.  We need
    a decent return on investment for anything we add here, and this
    proposal just doesn't offer enough benefit.
    yes I reported a allowing a labels on "wrong" position. But minimally
    this patch must not to change a current behave. It's your idea to use
    keyword "ARRAY" there. Maybe we have just only different view on
    complexity. My proposal increase complexity in parser, your proposal
    in executor. Anybody thinking so other variant is worst. I don't speak
    so we have to have a just FOR IN ARRAY syntax - I though so there was
    a agreement on last discus. We can use a different syntax - but should
    be readable.

    Regards

    Pavel Stehule
    regards, tom lane
  • Valentine Gogichashvili at Nov 22, 2010 at 11:21 am
    Hi,

    with the FOR e IN SELECT UNNEST(a) construct there is an issue again related
    to the unresting of composite type arrays:

    BEGIN;
    CREATE TYPE truple AS (i integer, a text, b text);

    DO $SQL$
    DECLARE
    start_time timestamp;
    t truple;
    ta truple[] := ARRAY( select ROW(s.i, 'A' || (s.i)::text, 'B' ||
    (s.i)::text )::truple from generate_series(1, 10000) as s(i) );
    i integer := 1;
    BEGIN
    start_time := clock_timestamp();
    FOR t IN SELECT UNNEST(ta) LOOP
    raise info 't is %', t;
    i := i + 1;
    END LOOP;
    RAISE INFO 'looped in %', clock_timestamp() - start_time;
    END;
    $SQL$;
    ROLLBACK;

    fails with ERROR: invalid input syntax for integer: "(1,A1,B1)"
    CONTEXT: PL/pgSQL function "inline_code_block" line 8 at FOR over SELECT
    rows

    So to UNNEST such an array one has to SELECT * FROM UNNEST(a) to be able
    loop there like:

    BEGIN;
    CREATE TYPE truple AS (i integer, a text, b text);

    DO $SQL$
    DECLARE
    start_time timestamp;
    t truple;
    ta truple[] := ARRAY( select ROW(s.i, 'A' || (s.i)::text, 'B' ||
    (s.i)::text )::truple from generate_series(1, 10000) as s(i) );
    i integer := 1;
    BEGIN
    start_time := clock_timestamp();
    FOR t IN SELECT * FROM UNNEST(ta) LOOP
    raise info 't is %', t;
    i := i + 1;
    END LOOP;
    RAISE INFO 'looped in %', clock_timestamp() - start_time;
    END;
    $SQL$;
    ROLLBACK;

    Is it a bug or a feature? And if the second, then any work on optimizing FOR
    e IN SELECT UNNEST(a) should probably include FOR e IN SELECT * FROM
    UNNEST(a) statement optimizations.

    Also, would the suggested FOR-IN-ARRAY construct loop in such
    a composite type arrays?

    Best regards,

    -- Valenine Gogichashvili

    On Thu, Nov 18, 2010 at 8:16 PM, Tom Lane wrote:

    Pavel Stehule <pavel.stehule@gmail.com> writes:
    2010/11/18 Tom Lane <tgl@sss.pgh.pa.us>:
    The problem here is that FOR is a syntactic choke point: it's already
    overloaded with several different sub-syntaxes that are quite difficult
    to separate. Adding another one makes that worse, with the consequences
    that we might misinterpret the user's intent, leading either to
    misleading/unhelpful error messages or unexpected runtime behavior.
    yes, this argument is correct - but we can rearange a parser rules
    related to FOR statement. It can be solved.
    No, it can't. The more things that can possibly follow FOR, the less
    likely that you correctly guess which one the user had in mind when
    faced with something that's not quite syntactically correct. Or maybe
    it *is* syntactically correct, only not according to the variant that
    the user thought he was invoking. We've seen bug reports of this sort
    connected with FOR already; in fact I'm pretty sure you've responded to
    a few yourself. Adding more variants *will* make it worse. We need
    a decent return on investment for anything we add here, and this
    proposal just doesn't offer enough benefit.

    regards, tom lane

    --
    Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-hackers
  • Robert Haas at Nov 22, 2010 at 2:29 pm

    On Mon, Nov 22, 2010 at 6:21 AM, Valentine Gogichashvili wrote:
    Hi,
    with the FOR e IN SELECT UNNEST(a) construct there is an issue again related
    to the unresting of composite type arrays:
    [ example ]
    Is it a bug or a feature?
    It looks like the problem in this example is that PL/pgsql tries to
    assign the result of unest(ta) to t.i rather than to t as a whole.
    This is pretty ridiculously stupid in this case, but it would make
    sense if the subselect where of the form SELECT a, b, c FROM ...

    I haven't looked at the code to see whether there's a way to make this
    case smarter (it would be nice - I've been bitten by similar problems
    myself) but it's only tangentially related to the patch under
    discussion.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Tom Lane at Nov 18, 2010 at 4:09 pm

    Merlin Moncure writes:
    On Thu, Nov 18, 2010 at 12:47 AM, Tom Lane wrote:
    Yes, which begs the question of why bother at all.
    Pavel's performance argument is imnsho valid.
    Well, that argument is unsupported by any evidence, so far as I've seen.

    More to the point, if there is indeed an interesting performance win
    here, we could get the same win by internally optimizing the existing
    syntax. That would provide the benefit to existing code not just
    new code; and it would avoid foreclosing our future options for
    extending FOR in not-so-redundant ways.

    regards, tom lane
  • Pavel Stehule at Nov 18, 2010 at 4:37 pm

    2010/11/18 Tom Lane <tgl@sss.pgh.pa.us>:
    Merlin Moncure <mmoncure@gmail.com> writes:
    On Thu, Nov 18, 2010 at 12:47 AM, Tom Lane wrote:
    Yes, which begs the question of why bother at all.
    Pavel's performance argument is imnsho valid.
    Well, that argument is unsupported by any evidence, so far as I've seen.

    More to the point, if there is indeed an interesting performance win
    here, we could get the same win by internally optimizing the existing
    syntax.  That would provide the benefit to existing code not just
    new code; and it would avoid foreclosing our future options for
    extending FOR in not-so-redundant ways.
    sorry, but I don't agree. I don't think, so there are some big space
    for optimizing - and if then it means much more code complexity for
    current expr executor. Next - FOR IN ARRAY takes fields from array on
    request - and it is possible, because a unpacking of array is
    controlled by statement - it's impossible do same when unpacking is
    inside other functions with same effectivity.

    Regards

    Pavel Stehule

    regards, tom lane
  • Tom Lane at Nov 18, 2010 at 5:19 pm

    Pavel Stehule writes:
    2010/11/18 Tom Lane <tgl@sss.pgh.pa.us>:
    More to the point, if there is indeed an interesting performance win
    here, we could get the same win by internally optimizing the existing
    syntax.
    sorry, but I don't agree. I don't think, so there are some big space
    for optimizing - and if then it means much more code complexity for
    current expr executor. Next - FOR IN ARRAY takes fields from array on
    request - and it is possible, because a unpacking of array is
    controlled by statement - it's impossible do same when unpacking is
    inside other functions with same effectivity.
    Just because you haven't thought about it doesn't mean it's impossible
    or impractical.

    The first implementation I was thinking of would involve looking at the
    SELECT querytree after parsing to see if it's "SELECT UNNEST(something)"
    --- that is, empty jointree and so on, single tlist item that is an
    invocation of the unnest() function. If it is, you pull out the
    argument expression of unnest() and go from there, with exactly the same
    execution behavior as in the specialized-syntax patch. This is
    perfectly safe if you identify the array_unnest function by OID: since
    it's a built-in function you know exactly what it's supposed to do.

    But having said that, it's still not apparent to me that array_unnest
    itself is markedly slower than what you could hope to do in plpgsql.
    I think the real issue here is that plpgsql's simple-expression code
    can't be used with set-returning expressions, which means that we have
    to go through the vastly more expensive SPI code path. But that
    restriction is largely based on fear of re-using expression trees, which
    is something we fixed a few weeks ago. I think that it would now be
    interesting to look at whether "FOR x IN SELECT simple-expression" could
    use the simple-expression code even when the expression returns set.
    That approach might bring a useful speedup not just for unnest, but for
    many other use-cases as well.

    regards, tom lane
  • Pavel Stehule at Nov 18, 2010 at 5:44 pm

    2010/11/18 Tom Lane <tgl@sss.pgh.pa.us>:
    Pavel Stehule <pavel.stehule@gmail.com> writes:
    2010/11/18 Tom Lane <tgl@sss.pgh.pa.us>:
    More to the point, if there is indeed an interesting performance win
    here, we could get the same win by internally optimizing the existing
    syntax.
    sorry, but I don't agree. I don't think, so there are some big space
    for optimizing - and if then it means much more code complexity for
    current expr executor. Next - FOR IN ARRAY takes fields from array on
    request - and it is possible, because a unpacking of array is
    controlled by statement - it's impossible do same when unpacking is
    inside other functions with same effectivity.
    Just because you haven't thought about it doesn't mean it's impossible
    or impractical.

    The first implementation I was thinking of would involve looking at the
    SELECT querytree after parsing to see if it's "SELECT UNNEST(something)"
    --- that is, empty jointree and so on, single tlist item that is an
    invocation of the unnest() function.  If it is, you pull out the
    argument expression of unnest() and go from there, with exactly the same
    execution behavior as in the specialized-syntax patch.  This is
    perfectly safe if you identify the array_unnest function by OID: since
    it's a built-in function you know exactly what it's supposed to do.
    this additional control will do slow down for any expression - more -
    somebody can use a form: SELECT FROM unnest(), and this form will be
    slower.
    But having said that, it's still not apparent to me that array_unnest
    itself is markedly slower than what you could hope to do in plpgsql.
    I think the real issue here is that plpgsql's simple-expression code
    can't be used with set-returning expressions, which means that we have
    to go through the vastly more expensive SPI code path.  But that
    restriction is largely based on fear of re-using expression trees, which
    is something we fixed a few weeks ago.  I think that it would now be
    interesting to look at whether "FOR x IN SELECT simple-expression" could
    use the simple-expression code even when the expression returns set.
    That approach might bring a useful speedup not just for unnest, but for
    many other use-cases as well.
    any SRF call must not be faster then direct access to array. There is
    overhead with tuples.

    I don't understand you. Sorry. I don't think, so your objections are objective.

    Regards

    Pavel
    regards, tom lane
  • Jaime Casanova at Nov 22, 2010 at 1:40 pm

    On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova wrote:
    On Thu, Sep 30, 2010 at 7:46 AM, Pavel Stehule wrote:
    Hello

    this patch implement a new iteration construct - iteration over an
    array. The sense of this new iteration is:
    * a simple and cleaner syntax
    i will start the review of this one...
    so, what is the concensus for this patch?
    return with feedback? reject the patch on the grounds that we should
    go fix unnest() if it slow?
    something else?

    the patch itself works as advertised (in functionality) but i haven't
    make much performance tests to see if we actually win something

    --
    Jaime Casanova         www.2ndQuadrant.com
    Professional PostgreSQL: Soporte y capacitación de PostgreSQL
  • Robert Haas at Nov 22, 2010 at 2:37 pm

    On Mon, Nov 22, 2010 at 8:39 AM, Jaime Casanova wrote:
    On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova wrote:
    On Thu, Sep 30, 2010 at 7:46 AM, Pavel Stehule wrote:
    Hello

    this patch implement a new iteration construct - iteration over an
    array. The sense of this new iteration is:
    * a simple and cleaner syntax
    i will start the review of this one...
    so, what is the concensus for this patch?
    return with feedback? reject the patch on the grounds that we should
    go fix unnest() if it slow?
    something else?
    I think it should be marked rejected. I don't think Tom is likely to
    ever be in favor of a syntax change here, and while I hesitate to deal
    in absolutes, I don't think I will be either, and certainly not
    without a lot more work on improving the performance of the existing
    constructs. In particular, this seems like something that really
    ought to be pursued:

    http://archives.postgresql.org/pgsql-hackers/2010-11/msg01177.php

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Pavel Stehule at Nov 22, 2010 at 8:37 pm
    Hello

    I spent last two days a searching how to solve this problem better.
    Probably I removed a issue with toasting. But I found other issue,
    that wasn't discussed before. This issue is only seq access to items
    via array_seek function. I though about some variable that stores a
    last accessed field and last used subscript - but there isn't a good
    place where this info can be stored (maybe in ArrayType). Other issues
    are a arrays with a null bitmap. It is difficult to say if this cache
    can have a positive effect - maybe yes - for large arrays. Problem is
    in possible a increase of price for random access to array. And there
    are not any "hint", that helps with specification about access to
    array.

    I don't think so searching inside expr. plan is a good idea. Is way to
    have more code, more complexity. If we will do it, then more important
    are accelerations of expression var = var + 1; var = var || var; or
    some else.

    So, please, I know, so you and Tom are busy, but try to spend a few
    time about this problem before you are definitely reject this idea.

    With my last patch (removing a toasting issue) the classic access to
    array should be usable. But still any direct access to array from PL
    executor is 20% faster - depends on array type. Maybe it is useless
    discus - and all can be solved with possible support SRF inside simple
    expression, but I don't know when it will be possible.

    Regards

    Pavel Stehule


    *
    * CAUTION: if you change the header for ordinary arrays you will also
    * need to change the headers for oidvector and int2vector!
    */


    I think it should be marked rejected.  I don't think Tom is likely to
    ever be in favor of a syntax change here, and while I hesitate to deal
    in absolutes, I don't think I will be either, and certainly not
    without a lot more work on improving the performance of the existing
    constructs.   In particular, this seems like something that really
    ought to be pursued:

    http://archives.postgresql.org/pgsql-hackers/2010-11/msg01177.php

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Robert Haas at Nov 22, 2010 at 9:59 pm

    On Mon, Nov 22, 2010 at 3:36 PM, Pavel Stehule wrote:
    So, please, I know, so you and Tom are busy, but try to spend a few
    time about this problem before you are definitely reject this idea.
    If I were to spend more time on this problem, what exactly would I
    spend that time doing and how would it help? If I were interested in
    spending time I'd probably spend it pursuing the suggestions Tom
    already made, and that's what I think you should do. But I'm not
    going to do that, because the purpose of the CommitFest is not for me
    to write new patches from scratch that do something vaguely similar to
    what a patch you wrote was trying to do. It's for all of us to review
    and commit the patches that have already been written. You aren't
    helping with that process, so your complaint that we aren't spending
    enough time on your patches would be unfair even if were true, and it
    isn't. The problem with your patch is that it has a design weakness,
    not that it got short shift.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedSep 30, '10 at 12:46p
activeNov 24, '10 at 6:07p
posts64
users11
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase