Hello, list

Our company is creating a ticketing system. Of course the performance
issues are very important to us (as to all of you I guess). To increase
speed of some queries stable functions are used, but somehow they don't
act exactly as I expect, so would you please explain what am I doing (or
expecting) wrong...

First of all I have the stable function witch runs fast and I have no
problems with it at all.
CREATE OR REPLACE FUNCTION web_select_extra_price(prm_price_id integer,
prm_event_id integer, prm_cashier_id integer)
RETURNS numeric AS
'
........ some code here
'
LANGUAGE plpgsql STABLE
COST 100;

Now the test:

1) query without using the function
explain analyze
SELECT thtp_tick_id, price_id,
price_price,
price_color
FROM ticket_price
JOIN ticket_has_ticket_price ON (price_id = thtp_price_id)
WHERE price_event_id = 7820 and (current_timestamp >= price_date AND
current_timestamp <= price_date_till)
ORDER BY price_id;

Result:
"Sort (cost=132.47..133.77 rows=518 width=25) (actual time=5.125..5.842
rows=4335 loops=1)"
" Sort Key: ticket_price.price_id"
" Sort Method: quicksort Memory: 433kB"
" -> Nested Loop (cost=0.00..109.12 rows=518 width=25) (actual
time=0.037..3.148 rows=4335 loops=1)"
" -> Index Scan using index_price_event_id on ticket_price
(cost=0.00..8.52 rows=2 width=21) (actual time=0.014..0.026 rows=7 loops=1)"
" Index Cond: (price_event_id = 7820)"
" Filter: ((now() >= price_date) AND (now() <=
price_date_till))"
" -> Index Scan using idx_thtp_price_id on
ticket_has_ticket_price (cost=0.00..47.06 rows=259 width=8) (actual
time=0.013..0.211 rows=619 loops=7)"
" Index Cond: (ticket_has_ticket_price.thtp_price_id =
ticket_price.price_id)"
"Total runtime: 6.425 ms"


2) Query using the function
explain analyze
SELECT thtp_tick_id, price_id,
price_price, web_select_extra_price(price_id, price_event_id, 1),
price_color
FROM ticket_price
JOIN ticket_has_ticket_price ON (price_id = thtp_price_id)
WHERE price_event_id = 7820 and (current_timestamp >= price_date AND
current_timestamp <= price_date_till)
ORDER BY price_id;

Result:
"Sort (cost=261.97..263.27 rows=518 width=29) (actual
time=704.224..704.927 rows=4335 loops=1)"
" Sort Key: ticket_price.price_id"
" Sort Method: quicksort Memory: 433kB"
" -> Nested Loop (cost=0.00..238.62 rows=518 width=29) (actual
time=0.272..699.073 rows=4335 loops=1)"
" -> Index Scan using index_price_event_id on ticket_price
(cost=0.00..8.52 rows=2 width=25) (actual time=0.011..0.052 rows=7 loops=1)"
" Index Cond: (price_event_id = 7820)"
" Filter: ((now() >= price_date) AND (now() <=
price_date_till))"
" -> Index Scan using idx_thtp_price_id on
ticket_has_ticket_price (cost=0.00..47.06 rows=259 width=8) (actual
time=0.017..0.582 rows=619 loops=7)"
" Index Cond: (ticket_has_ticket_price.thtp_price_id =
ticket_price.price_id)"
"Total runtime: 705.531 ms"


Now what you can think is that executing web_select_extra_price takes
the difference, but
3) As STABLE function should be executed once for every different set of
parameters I do
SELECT web_select_extra_price(price_id, 7820, 1) FROM (

SELECT distinct price_id
FROM ticket_price
JOIN ticket_has_ticket_price ON (price_id = thtp_price_id)
WHERE price_event_id = 7820 and (current_timestamp >= price_date AND
current_timestamp <= price_date_till)
) as qq;

Result:
"Subquery Scan on qq (cost=110.34..110.88 rows=2 width=4) (actual
time=7.265..8.907 rows=7 loops=1)"
" -> HashAggregate (cost=110.34..110.36 rows=2 width=4) (actual
time=6.866..6.873 rows=7 loops=1)"
" -> Nested Loop (cost=0.00..109.05 rows=517 width=4) (actual
time=0.037..4.643 rows=4335 loops=1)"
" -> Index Scan using index_price_event_id on
ticket_price (cost=0.00..8.52 rows=2 width=4) (actual time=0.014..0.038
rows=7 loops=1)"
" Index Cond: (price_event_id = 7820)"
" Filter: ((now() >= price_date) AND (now() <=
price_date_till))"
" -> Index Scan using idx_thtp_price_id on
ticket_has_ticket_price (cost=0.00..47.04 rows=258 width=4) (actual
time=0.019..0.336 rows=619 loops=7)"
" Index Cond: (ticket_has_ticket_price.thtp_price_id
= ticket_price.price_id)"
"Total runtime: 8.966 ms"


You can see the query has only 7 distinct parameter sets to pass to the
function but...
4) Explain analyze
SELECT web_select_extra_price(price_id, 7820, 1)
FROM ticket_price
JOIN ticket_has_ticket_price ON (price_id = thtp_price_id)
WHERE price_event_id = 7820 and (current_timestamp >= price_date AND
current_timestamp <= price_date_till)

Result:
"Nested Loop (cost=0.00..238.30 rows=517 width=4) (actual
time=0.365..808.537 rows=4335 loops=1)"
" -> Index Scan using index_price_event_id on ticket_price
(cost=0.00..8.52 rows=2 width=4) (actual time=0.014..0.040 rows=7 loops=1)"
" Index Cond: (price_event_id = 7820)"
" Filter: ((now() >= price_date) AND (now() <= price_date_till))"
" -> Index Scan using idx_thtp_price_id on ticket_has_ticket_price
(cost=0.00..47.04 rows=258 width=4) (actual time=0.016..0.655 rows=619
loops=7)"
" Index Cond: (ticket_has_ticket_price.thtp_price_id =
ticket_price.price_id)"
"Total runtime: 810.143 ms"


So I am totally confused... It seems that selecting 4335 rows is a joke
for Postgresql, but the great job is done then adding one of 7 possible
values to the result set... Please help me understand what I am missing
here?...

Finally the system:
Server
PG: Version string PostgreSQL 9.0.3 on i486-pc-linux-gnu, compiled by
GCC gcc-4.4.real (Debian 4.4.5-10) 4.4.5, 32-bit

Client
Win XP SP3 with pgAdmin 1.12.2.

--
Julius Tuskenis
Programavimo skyriaus vadovas
UAB nSoft
mob. +37068233050

Search Discussions

  • Merlin Moncure at Mar 10, 2011 at 9:14 pm

    On Thu, Mar 10, 2011 at 10:26 AM, Julius Tuskenis wrote:
    Hello, list

    Our company is creating a ticketing system. Of course the performance issues
    are very important to us (as to all of you I guess). To increase speed of
    some queries stable functions are used, but somehow they don't act exactly
    as I expect, so would you please explain what am I doing (or expecting)
    wrong...

    First of all I have the stable function witch runs fast and I have no
    problems with it at all.
    CREATE OR REPLACE FUNCTION web_select_extra_price(prm_price_id integer,
    prm_event_id integer, prm_cashier_id integer)
    RETURNS numeric AS
    '
    ........ some code here
    '
    LANGUAGE plpgsql STABLE
    COST 100;

    Now the test:

    1) query without using the function
    explain analyze
    SELECT thtp_tick_id, price_id,
    price_price,
    price_color
    FROM ticket_price
    JOIN ticket_has_ticket_price ON (price_id = thtp_price_id)
    WHERE price_event_id = 7820 and (current_timestamp >= price_date AND
    current_timestamp <= price_date_till)
    ORDER BY price_id;

    Result:
    "Sort  (cost=132.47..133.77 rows=518 width=25) (actual time=5.125..5.842
    rows=4335 loops=1)"
    "  Sort Key: ticket_price.price_id"
    "  Sort Method:  quicksort  Memory: 433kB"
    "  ->  Nested Loop  (cost=0.00..109.12 rows=518 width=25) (actual
    time=0.037..3.148 rows=4335 loops=1)"
    "        ->  Index Scan using index_price_event_id on ticket_price
    (cost=0.00..8.52 rows=2 width=21) (actual time=0.014..0.026 rows=7
    loops=1)"
    "              Index Cond: (price_event_id = 7820)"
    "              Filter: ((now() >= price_date) AND (now() <=
    price_date_till))"
    "        ->  Index Scan using idx_thtp_price_id on ticket_has_ticket_price
    (cost=0.00..47.06 rows=259 width=8) (actual time=0.013..0.211 rows=619
    loops=7)"
    "              Index Cond: (ticket_has_ticket_price.thtp_price_id =
    ticket_price.price_id)"
    "Total runtime: 6.425 ms"


    2) Query using the function
    explain analyze
    SELECT thtp_tick_id, price_id,
    price_price, web_select_extra_price(price_id, price_event_id, 1),
    price_color
    FROM ticket_price
    JOIN ticket_has_ticket_price ON (price_id = thtp_price_id)
    WHERE price_event_id = 7820 and (current_timestamp >= price_date AND
    current_timestamp <= price_date_till)
    ORDER BY price_id;

    Result:
    "Sort  (cost=261.97..263.27 rows=518 width=29) (actual time=704.224..704.927
    rows=4335 loops=1)"
    "  Sort Key: ticket_price.price_id"
    "  Sort Method:  quicksort  Memory: 433kB"
    "  ->  Nested Loop  (cost=0.00..238.62 rows=518 width=29) (actual
    time=0.272..699.073 rows=4335 loops=1)"
    "        ->  Index Scan using index_price_event_id on ticket_price
    (cost=0.00..8.52 rows=2 width=25) (actual time=0.011..0.052 rows=7
    loops=1)"
    "              Index Cond: (price_event_id = 7820)"
    "              Filter: ((now() >= price_date) AND (now() <=
    price_date_till))"
    "        ->  Index Scan using idx_thtp_price_id on ticket_has_ticket_price
    (cost=0.00..47.06 rows=259 width=8) (actual time=0.017..0.582 rows=619
    loops=7)"
    "              Index Cond: (ticket_has_ticket_price.thtp_price_id =
    ticket_price.price_id)"
    "Total runtime: 705.531 ms"


    Now what you can think is that executing web_select_extra_price takes the
    difference, but
    3) As STABLE function should be executed once for every different set of
    parameters I do
    SELECT web_select_extra_price(price_id, 7820, 1) FROM (

    SELECT distinct price_id
    FROM ticket_price
    JOIN ticket_has_ticket_price ON (price_id = thtp_price_id)
    WHERE price_event_id = 7820 and (current_timestamp >= price_date AND
    current_timestamp <= price_date_till)
    ) as qq;

    Result:
    "Subquery Scan on qq  (cost=110.34..110.88 rows=2 width=4) (actual
    time=7.265..8.907 rows=7 loops=1)"
    "  ->  HashAggregate  (cost=110.34..110.36 rows=2 width=4) (actual
    time=6.866..6.873 rows=7 loops=1)"
    "        ->  Nested Loop  (cost=0.00..109.05 rows=517 width=4) (actual
    time=0.037..4.643 rows=4335 loops=1)"
    "              ->  Index Scan using index_price_event_id on ticket_price
    (cost=0.00..8.52 rows=2 width=4) (actual time=0.014..0.038 rows=7 loops=1)"
    "                    Index Cond: (price_event_id = 7820)"
    "                    Filter: ((now() >= price_date) AND (now() <=
    price_date_till))"
    "              ->  Index Scan using idx_thtp_price_id on
    ticket_has_ticket_price  (cost=0.00..47.04 rows=258 width=4) (actual
    time=0.019..0.336 rows=619 loops=7)"
    "                    Index Cond: (ticket_has_ticket_price.thtp_price_id =
    ticket_price.price_id)"
    "Total runtime: 8.966 ms"


    You can see the query has only 7 distinct parameter sets to pass to the
    function but...
    4)   Explain analyze
    SELECT web_select_extra_price(price_id, 7820, 1)
    FROM ticket_price
    JOIN ticket_has_ticket_price ON (price_id = thtp_price_id)
    WHERE price_event_id = 7820 and (current_timestamp >= price_date AND
    current_timestamp <= price_date_till)

    Result:
    "Nested Loop  (cost=0.00..238.30 rows=517 width=4) (actual
    time=0.365..808.537 rows=4335 loops=1)"
    "  ->  Index Scan using index_price_event_id on ticket_price
    (cost=0.00..8.52 rows=2 width=4) (actual time=0.014..0.040 rows=7 loops=1)"
    "        Index Cond: (price_event_id = 7820)"
    "        Filter: ((now() >= price_date) AND (now() <= price_date_till))"
    "  ->  Index Scan using idx_thtp_price_id on ticket_has_ticket_price
    (cost=0.00..47.04 rows=258 width=4) (actual time=0.016..0.655 rows=619
    loops=7)"
    "        Index Cond: (ticket_has_ticket_price.thtp_price_id =
    ticket_price.price_id)"
    "Total runtime: 810.143 ms"


    So I am totally confused... It seems that selecting 4335 rows is a joke for
    Postgresql, but the great job is done then adding one of 7 possible values
    to the result set... Please help me understand what I am missing here?...

    Finally the system:
    Server
    PG: Version string    PostgreSQL 9.0.3 on i486-pc-linux-gnu, compiled by GCC
    gcc-4.4.real (Debian 4.4.5-10) 4.4.5, 32-bit

    Client
    Win XP SP3 with pgAdmin 1.12.2.
    This is a huge problem with non trivial functions in the select list.
    Pushing the result into and a subquery does NOT guarantee that the
    inner result is materialized first. Try a CTE.

    with foo as
    (
    select yadda;
    )
    select func(foo.a), foo.* from foo;

    merlin
  • Julius Tuskenis at Mar 14, 2011 at 8:46 am
    Hello, Merlin

    Thank you for your quick response.

    2011.03.10 23:14, Merlin Moncure rašė:
    This is a huge problem with non trivial functions in the select list.
    Pushing the result into and a subquery does NOT guarantee that the
    inner result is materialized first.
    From the postgresql documentation about STABLE functions: "This
    category allows the optimizer to optimize multiple calls of the function
    to a single call." I thought that this means that optimizer executes the
    function only for now parameter sets and stores results in some "cache"
    and use it if the parameters are already known. I realize this is very
    naive approach and most probably everything is much more complicated. I
    would appreciate if someone would explain the mechanism (or provide with
    some useful link).
    Try a CTE.

    with foo as
    (
    select yadda;
    )
    select func(foo.a), foo.* from foo;
    I'm sorry, but I'm totally new to CTE. Would you please show me how
    should I use the stable function and where the parameters should be put
    to improve the behavior of the optimizer for my problem?

    Thank you in advance

    --
    Julius Tuskenis
    Programavimo skyriaus vadovas
    UAB nSoft
    mob. +37068233050
  • Merlin Moncure at Mar 14, 2011 at 1:41 pm

    On Mon, Mar 14, 2011 at 3:46 AM, Julius Tuskenis wrote:
    Hello, Merlin

    Thank you for your quick response.

    2011.03.10 23:14, Merlin Moncure rašė:

    This is a huge problem with non trivial functions in the select list.
    Pushing the result into and a subquery does NOT guarantee that the
    inner result is materialized first.

    From the postgresql documentation about STABLE functions: "This category
    allows the optimizer to optimize multiple calls of the function to a single
    call." I thought that this means that optimizer executes the function only
    for now parameter sets and stores results in some "cache" and use it if the
    parameters are already known. I realize this is very naive approach and most
    probably everything is much more complicated. I would appreciate if someone
    would explain the mechanism (or provide with some useful link).
    Just because some optimizations can happen doesn't mean they will
    happen or there is even capability to make them happen. There was
    some recent discussion about this very topic here:
    http://postgresql.1045698.n5.nabble.com/function-contants-evaluated-for-every-row-td3278945.html.

    Try a CTE.

    with foo as
    (
    select yadda;
    )
    select func(foo.a), foo.* from foo;

    I'm sorry, but I'm totally new to CTE. Would you please show me how should I
    use the stable function and where the parameters should be put to improve
    the behavior of the optimizer for my problem?
    WITH results as
    (
    SELECT distinct price_id as price_id
    FROM ticket_price
    JOIN ticket_has_ticket_price ON (price_id = thtp_price_id)
    WHERE price_event_id = 7820 and (current_timestamp >= price_date AND
    current_timestamp <= price_date_till)
    ) as qq
    )
    SELECT web_select_extra_price(price_id, 7820, 1) from results;


    Another way to fight this is to play with the cost planner hint
    parameter in 'create function', but I prefer the CTE -- it gives
    strong guarantees about order of execution which is what you really
    want. CTEs are great btw, I'd start learning them immediately.

    IMNSHO, this (uncontrolled number of function executions when run via
    field select list) is a common gotcha w/postgres and a FAQ. Also the
    documentation is not very helpful on this point...do you agree CTE is
    the right way to advise handling this problem...is it worth further
    notation?

    merlin
  • Julius Tuskenis at Mar 15, 2011 at 8:04 am
    Hello,

    2011.03.14 15:41, Merlin Moncure rašė:
    WITH results as
    (
    SELECT distinct price_id as price_id
    FROM ticket_price
    JOIN ticket_has_ticket_price ON (price_id = thtp_price_id)
    WHERE price_event_id = 7820 and (current_timestamp>= price_date AND
    current_timestamp<= price_date_till)
    ) as qq
    )
    SELECT web_select_extra_price(price_id, 7820, 1) from results;
    Thank you Merlin for your help. I have updated my function to use CTE.
    Although there was no performance improvement (I had the select with
    function using distinct values joined earlyer) it's good to know the
    optimizer will not change the way I want the query to be executed. Thank
    you once again.
    CTEs are great btw, I'd start learning them immediately.
    I am going to do that.
    IMNSHO, this (uncontrolled number of function executions when run via
    field select list) is a common gotcha w/postgres and a FAQ. Also the
    documentation is not very helpful on this point...
    Yes, I totally agree with you. I think sentence like "Although function
    is marked as STABLE or IMMUTABLE the optimizer is not obliged to take
    advantage of these properties." (sorry for my English).
    do you agree CTE is the right way to advise handling this problem...is it worth further
    notation?
    Yes, the CTE worked fine for me. Reading some more on this topic I found
    some comments that the optimizer has no possibility to know how many
    times the function is to be called in such queries (without actually
    executing the query), so there is no way to determine the cost. That
    explains why not the optimal plan was chosen to my query.

    --
    Julius Tuskenis
    Programavimo skyriaus vadovas
    UAB nSoft
    mob. +37068233050
  • Tom Lane at Mar 14, 2011 at 5:17 pm

    Julius Tuskenis writes:
    From the postgresql documentation about STABLE functions: "This
    category allows the optimizer to optimize multiple calls of the function
    to a single call." I thought that this means that optimizer executes the
    function only for now parameter sets and stores results in some "cache"
    and use it if the parameters are already known.
    No, it does not. That function property *allows* the optimizer to
    invoke the function fewer times than would happen in an un-optimized
    query. It does not *require* it to do so. There is no such cache
    mechanism in Postgres, and it's unlikely that there ever will be,
    because it probably would be a net performance loss on average.

    regards, tom lane
  • Julius Tuskenis at Mar 15, 2011 at 8:12 am
    Thank you, Tom for you answer

    2011.03.14 19:17, Tom Lane rašė:
    That function property*allows* the optimizer to
    invoke the function fewer times than would happen in an un-optimized
    query. It does not*require* it to do so.
    Thank you for clearing that for me. I think these 2 sentences in
    documentation
    (http://www.postgresql.org/docs/current/interactive/xfunc-volatility.html)
    would prevent misunderstandings in the future.

    --
    Julius Tuskenis
    Programavimo skyriaus vadovas
    UAB nSoft
    mob. +37068233050
  • Andres Freund at Mar 14, 2011 at 2:08 pm

    On Thursday, March 10, 2011 05:26:00 PM Julius Tuskenis wrote:
    3) As STABLE function should be executed once for every different set of
    parameters
    Thats not true. Thats not what any of the volatility information (like STABLE,
    IMMUTABLE, VOLATILE) does.

    See http://www.postgresql.org/docs/current/interactive/xfunc-volatility.html

    It *does* change how often a function is executed though. I.e.

    SELECT g.i, some_stable_func(1) FROM generate_series(1, 1000) g(i)

    will call some_stable_func only once because it can determine all the
    parameters beforehand.


    Andres

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-performance @
categoriespostgresql
postedMar 10, '11 at 4:26p
activeMar 15, '11 at 8:12a
posts8
users4
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase