Hello

I tested speed SELF JOIN and correlated subquery for couting of subtotals:

It's strange, so correlated subqueries is faster, but it has much higher cost:

postgres=# explain analyze select t1.id, t1.sale_date, t1.product,
t1.sale_price, sum(t2.sale_price) from
history t1 inner join history t2 on t1.id >= t2.id and t1.product = t2.product
group by t1.id, t1.sale_date, t1.product, t1.sale_price
order by t1.id
;
QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=3678.85..3691.36 rows=5003 width=19) (actual
time=1553.575..1560.618 rows=5003 loops=1)
Sort Key: t1.id
Sort Method: quicksort Memory: 480kB
-> HashAggregate (cost=3308.91..3371.45 rows=5003 width=19)
(actual time=1530.276..1540.206 rows=5003 loops=1)
-> Nested Loop (cost=0.00..1708.29 rows=128050 width=19)
(actual time=0.264..1034.048 rows=198333 loops=1)
-> Seq Scan on history t1 (cost=0.00..78.03 rows=5003
width=15) (actual time=0.077..8.835 rows=5003 loops=1)
-> Index Scan using fxxx on history t2
(cost=0.00..0.31 rows=1 width=11) (actual time=0.017..0.087 rows=40
loops=5003)
Index Cond: (((t2.product)::text =
(t1.product)::text) AND (t1.id >= t2.id))
Total runtime: 1567.125 ms
(9 rows)

postgres=# explain analyze SELECT sale_date, product, sale_price,
COALESCE((SELECT SUM(sale_price)
FROM history
WHERE product = o.product
AND id <= o.id), 0) AS total
FROM history o;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Seq Scan on history o (cost=0.00..41532.29 rows=5003 width=15)
(actual time=0.073..825.333 rows=5003 loops=1)
SubPlan
-> Aggregate (cost=8.28..8.29 rows=1 width=4) (actual
time=0.158..0.159 rows=1 loops=5003)
-> Index Scan using fxxx on history (cost=0.00..8.27
rows=1 width=4) (actual time=0.018..0.086 rows=40 loops=5003)
Index Cond: (((product)::text = ($0)::text) AND (id <= $1))
Total runtime: 833.213 ms
(6 rows)

postgres=# show effective_cache_size ;
effective_cache_size
----------------------
600MB
(1 row)

postgres=# SHOW shared_buffers ;
shared_buffers
----------------
300MB
(1 row)

Maybe I have too big random_page_cost?
postgres=# SHOW random_page_cost ;
random_page_cost
------------------
4
(1 row)

Time: 0,351 ms
postgres=# set random_page_cost to 2;
SET
Time: 0,330 ms
postgres=# SHOW random_page_cost ;
random_page_cost
------------------
2
(1 row)

Time: 0,320 ms
postgres=# explain analyze SELECT sale_date, product, sale_price,
COALESCE((SELECT SUM(sale_price)
FROM history
WHERE product = o.product
AND id <= o.id), 0) AS total
FROM history o;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Seq Scan on history o (cost=0.00..21518.09 rows=5003 width=15)
(actual time=0.132..809.701 rows=5003 loops=1)
SubPlan
-> Aggregate (cost=4.28..4.29 rows=1 width=4) (actual
time=0.154..0.155 rows=1 loops=5003)
-> Index Scan using fxxx on history (cost=0.00..4.27
rows=1 width=4) (actual time=0.020..0.088 rows=40 loops=5003)
Index Cond: (((product)::text = ($0)::text) AND (id <= $1))
Total runtime: 817.358 ms

Regards
Pavel Stehule

Search Discussions

  • Tom Lane at Mar 16, 2008 at 5:41 pm

    "Pavel Stehule" <pavel.stehule@gmail.com> writes:
    It's strange, so correlated subqueries is faster, but it has much higher cost:
    In the nestloop plan, the estimated cost for the indexscan is discounted
    based on the knowledge that it'll be executed repeatedly:
    -> Index Scan using fxxx on history t2
    (cost=0.00..0.31 rows=1 width=11) (actual time=0.017..0.087 rows=40
    loops=5003)
    In the subplan case that doesn't happen:
    -> Index Scan using fxxx on history (cost=0.00..8.27
    rows=1 width=4) (actual time=0.018..0.086 rows=40 loops=5003)
    Index Cond: (((product)::text = ($0)::text) AND (id <= $1))
    Had the same discount been applied then the estimated costs would be
    pretty nearly in line with reality, if I did the math right.

    It'd be nice to do better but I'm not sure how; at the time that we
    create plans for sub-queries we don't really have any way to know how
    often they'll be called by the upper query.

    regards, tom lane

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedMar 16, '08 at 7:40a
activeMar 16, '08 at 5:41p
posts2
users2
websitepostgresql.org...
irc#postgresql

2 users in discussion

Pavel Stehule: 1 post Tom Lane: 1 post

People

Translate

site design / logo © 2021 Grokbase