Hi all,
I'm using 8.2.6 and I'm observing a trange behaviour using
offset and limits.

This are the two queries that are puzzling me:

explain SELECT c.id, tsk, lir, nctr, nctn, ncts, rvel,ecp, pvcp, pvcc,pvcf,pvcl,ldcn,ogtd,sgti
FROM t_OA_2_00_card c JOIN t_OA_2_00_dt dt ON (dt.card_id = c.id)
WHERE ecp=18 AND _to >= 1500 AND _from <= 1550
ORDER BY nctr,nctn,ncts,rvel
offset 0 ;
QUERY PLAN
- ----------------------------------------------------------------------------------------------------------------
Limit (cost=175044.75..175071.04 rows=10518 width=90)
-> Sort (cost=175044.75..175071.04 rows=10518 width=90)
Sort Key: c.nctr, c.nctn, c.ncts, c.rvel
-> Hash Join (cost=25830.72..174342.12 rows=10518 width=90)
Hash Cond: (c.id = dt.card_id)
-> Bitmap Heap Scan on t_oa_2_00_card c (cost=942.36..148457.19 rows=101872 width=90)
Recheck Cond: (ecp = 18)
-> Bitmap Index Scan on i7_t_oa_2_00_card (cost=0.00..916.89 rows=101872 width=0)
Index Cond: (ecp = 18)
-> Hash (cost=22743.45..22743.45 rows=171593 width=8)
-> Bitmap Heap Scan on t_oa_2_00_dt dt (cost=2877.26..22743.45 rows=171593 width=8)
Recheck Cond: (_from <= 1550)
Filter: (_to >= 1500)
-> Bitmap Index Scan on i_oa_2_00_dt_from (cost=0.00..2834.36 rows=182546 width=0)
Index Cond: (_from <= 1550)


explain SELECT c.id, tsk, lir, nctr, nctn, ncts, rvel,ecp, pvcp, pvcc,pvcf,pvcl,ldcn,ogtd,sgti
FROM t_OA_2_00_card c JOIN t_OA_2_00_dt dt ON (dt.card_id = c.id)
WHERE ecp=18 AND _to >= 1500 AND _from <= 1550
ORDER BY nctr,nctn,ncts,rvel
offset 0 limit 5;
QUERY PLAN
- --------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..2125.12 rows=5 width=90)
-> Nested Loop (cost=0.00..4470402.02 rows=10518 width=90)
-> Index Scan using i_oa_2_00_card_keys on t_oa_2_00_card c (cost=0.00..3927779.56 rows=101872 width=90)
Filter: (ecp = 18)
-> Index Scan using i_oa_2_00_dt_for on t_oa_2_00_dt dt (cost=0.00..5.31 rows=1 width=8)
Index Cond: (dt.card_id = c.id)
Filter: ((_to >= 1500) AND (_from <= 1550))


using the limit I have an execution time of minutes vs a some seconds.

What am I missing here ?

Regards
Gaetano Mendola

Search Discussions

  • Martijn van Oosterhout at Feb 1, 2008 at 11:16 am

    On Fri, Feb 01, 2008 at 12:08:56PM +0100, Gaetano Mendola wrote:
    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    Hi all,
    I'm using 8.2.6 and I'm observing a trange behaviour using
    offset and limits.
    Please post EXPLAIN ANALYZE output so we can see what's actually taking
    the time.

    Have a nice day,
    --
    Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
    Those who make peaceful revolution impossible will make violent revolution inevitable.
    -- John F Kennedy
  • Gaetano Mendola at Feb 1, 2008 at 11:43 am

    Martijn van Oosterhout wrote:
    On Fri, Feb 01, 2008 at 12:08:56PM +0100, Gaetano Mendola wrote:
    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    Hi all,
    I'm using 8.2.6 and I'm observing a trange behaviour using
    offset and limits.
    Please post EXPLAIN ANALYZE output so we can see what's actually taking
    the time.
    The analyze is still running (I launched it 30 mins ago), I'll post it as soon
    I have it.

    Disabling the nested_loop ( set enable_nestloop = false ) the query with the limit
    has now the same execution time without the limit.

    I don't get why a limit is going to change the query plan and most of all decreasing
    the performances.


    Regards
    Gaetano Mendola
  • Gregory Stark at Feb 1, 2008 at 12:35 pm

    "Gaetano Mendola" <mendola@bigfoot.com> writes:

    I don't get why a limit is going to change the query plan and most of all decreasing
    the performances.
    Until we see the explain analyze it won't be clear what exactly is going on.
    But in theory a LIMIT can definitely change the plan because the planner knows
    it won't need to generate all the rows to satisfy the LIMIT.

    In the plans you gave note that the plan for the unlimited query has a Sort so
    it has to produce all the records every time. The second query produces the
    records in order so if the LIMIT is satisfied quickly then it can save a lot
    of work.

    It's evidently guessing wrong about the limit being satisfied early. The
    non-indexed restrictions might be pruning out a lot more records than the
    planner expects. Or possibly the table is just full of dead records.

    --
    Gregory Stark
    EnterpriseDB http://www.enterprisedb.com
    Ask me about EnterpriseDB's PostGIS support!
  • Gaetano Mendola at Feb 1, 2008 at 2:22 pm

    Gregory Stark wrote:
    "Gaetano Mendola" <mendola@bigfoot.com> writes:
    I don't get why a limit is going to change the query plan and most of all decreasing
    the performances.
    Until we see the explain analyze it won't be clear what exactly is going on.
    But in theory a LIMIT can definitely change the plan because the planner knows
    it won't need to generate all the rows to satisfy the LIMIT.

    In the plans you gave note that the plan for the unlimited query has a Sort so
    it has to produce all the records every time. The second query produces the
    records in order so if the LIMIT is satisfied quickly then it can save a lot
    of work.

    It's evidently guessing wrong about the limit being satisfied early. The
    non-indexed restrictions might be pruning out a lot more records than the
    planner expects. Or possibly the table is just full of dead records.
    Here the analyze result:


    explain analyze SELECT c.id, tsk, lir, nctr, nctn, ncts, rvel,ecp, pvcp, pvcc,pvcf,pvcl,ldcn,ogtd,sgti FROM t_OA_2_00_card c JOIN t_OA_2_00_dt dt ON (dt.card_id = c.id) WHERE ecp=18 AND _to >= 1500 AND _from <= 1550 ORDER BY nctr,nctn,ncts,rvel offset 0 limit 5;

    QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Limit (cost=0.00..2125.12 rows=5 width=90) (actual time=3399923.424..3399960.174 rows=5 loops=1)
    -> Nested Loop (cost=0.00..4470402.02 rows=10518 width=90) (actual time=3399923.420..3399960.156 rows=5 loops=1)
    -> Index Scan using i_oa_2_00_card_keys on t_oa_2_00_card c (cost=0.00..3927779.56 rows=101872 width=90) (actual time=3399892.632..3399896.773 rows=50 loops=1)
    Filter: (ecp = 18)
    -> Index Scan using i_oa_2_00_dt_for on t_oa_2_00_dt dt (cost=0.00..5.31 rows=1 width=8) (actual time=1.264..1.264 rows=0 loops=50)
    Index Cond: (dt.card_id = c.id)
    Filter: ((_to >= 1500) AND (_from <= 1550))
    Total runtime: 3399960.277 ms


    explain analyze SELECT c.id, tsk, lir, nctr, nctn, ncts, rvel,ecp, pvcp, pvcc,pvcf,pvcl,ldcn,ogtd,sgti FROM t_OA_2_00_card c JOIN t_OA_2_00_dt dt ON (dt.card_id = c.id) WHERE ecp=18 AND _to >= 1500 AND _from <= 1550 ORDER BY nctr,nctn,ncts,rvel offset 0 ;
    QUERY PLAN -----------------------------------------------------------------------------------------------------------------------------------------------------------------
    Limit (cost=175044.75..175071.04 rows=10518 width=90) (actual time=2425.138..2435.633 rows=3298 loops=1)
    -> Sort (cost=175044.75..175071.04 rows=10518 width=90) (actual time=2425.134..2428.812 rows=3298 loops=1)
    Sort Key: c.nctr, c.nctn, c.ncts, c.rvel
    -> Hash Join (cost=25830.72..174342.12 rows=10518 width=90) (actual time=797.540..2382.900 rows=3298 loops=1)
    Hash Cond: (c.id = dt.card_id)
    -> Bitmap Heap Scan on t_oa_2_00_card c (cost=942.36..148457.19 rows=101872 width=90) (actual time=70.212..1507.429 rows=97883 loops=1)
    Recheck Cond: (ecp = 18)
    -> Bitmap Index Scan on i7_t_oa_2_00_card (cost=0.00..916.89 rows=101872 width=0) (actual time=53.340..53.340 rows=97883 loops=1)
    Index Cond: (ecp = 18)
    -> Hash (cost=22743.45..22743.45 rows=171593 width=8) (actual time=726.597..726.597 rows=89277 loops=1)
    -> Bitmap Heap Scan on t_oa_2_00_dt dt (cost=2877.26..22743.45 rows=171593 width=8) (actual time=86.181..593.275 rows=89277 loops=1)
    Recheck Cond: (_from <= 1550)
    Filter: (_to >= 1500)
    -> Bitmap Index Scan on i_oa_2_00_dt_from (cost=0.00..2834.36 rows=182546 width=0) (actual time=80.863..80.863 rows=201177 loops=1)
    Index Cond: (_from <= 1550)
    Total runtime: 2440.396 ms



    Regards
    Gaetano Mendola
  • Tom Lane at Feb 1, 2008 at 3:48 pm

    Gaetano Mendola writes:
    Gregory Stark wrote:
    It's evidently guessing wrong about the limit being satisfied early. The
    non-indexed restrictions might be pruning out a lot more records than the
    planner expects. Or possibly the table is just full of dead records.
    Here the analyze result:
    explain analyze SELECT c.id, tsk, lir, nctr, nctn, ncts, rvel,ecp, pvcp, pvcc,pvcf,pvcl,ldcn,ogtd,sgti FROM t_OA_2_00_card c JOIN t_OA_2_00_dt dt ON (dt.card_id = c.id) WHERE ecp=18 AND _to >= 1500 AND _from <= 1550 ORDER BY nctr,nctn,ncts,rvel offset 0 limit 5;
    QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Limit (cost=0.00..2125.12 rows=5 width=90) (actual time=3399923.424..3399960.174 rows=5 loops=1)
    -> Nested Loop (cost=0.00..4470402.02 rows=10518 width=90) (actual time=3399923.420..3399960.156 rows=5 loops=1)
    -> Index Scan using i_oa_2_00_card_keys on t_oa_2_00_card c (cost=0.00..3927779.56 rows=101872 width=90) (actual time=3399892.632..3399896.773 rows=50 loops=1)
    Filter: (ecp = 18)
    -> Index Scan using i_oa_2_00_dt_for on t_oa_2_00_dt dt (cost=0.00..5.31 rows=1 width=8) (actual time=1.264..1.264 rows=0 loops=50)
    Index Cond: (dt.card_id = c.id)
    Filter: ((_to >= 1500) AND (_from <= 1550))
    Total runtime: 3399960.277 ms
    It's guessing that there are 101872 rows altogether that have ecp = 18.
    Is that about right? If not, raising the statistics target for the
    table might fix the problem. If it is about right, then you may be
    stuck --- the problem then could be that the rows with ecp=18 aren't
    uniformly scattered in the i_oa_2_00_card_keys ordering, but are
    clustered near the end.

    Greg's comment about dead rows might be correct too --- the actual
    runtime for the indexscan seems kinda high even if it is scanning most
    of the table. Also, if this query is important enough, clustering
    by that index would improve matters, at the cost of possibly slowing
    down other queries that use other indexes.

    regards, tom lane
  • Greg Stark at Feb 1, 2008 at 4:04 pm

    -> Index Scan using i_oa_2_00_dt_for on t_oa_2_00_dt dt (cost=0.00..5.31 rows=1 width=8) (actual time=1.264..1.264 rows=0 loops=50)
    Index Cond: (dt.card_id = c.id)
    Filter: ((_to >= 1500) AND (_from <= 1550))
    Total runtime: 3399960.277 ms
    Also, are 1500 and 1550 user-supplied parameters or are they part of a small set of possible values? You could consider having a partial index on "card_id WHERE _to >= 1500 AND _from <= 1550". The numbers don't even have to match exactly as long as they include all the records the query needs.

    Another possibility is using something like "cube" from contrib to build a GIST index on <_to,_from>. I think you would need to load gist_btree as well for the first column on card_id. It doesn't help every use case though, you would have to experiment.

    But before experimenting with either of those things, what does "VACUUM VERBOSE t_oa_2_00_dt" say?
  • Tom Lane at Feb 1, 2008 at 4:20 pm

    "Greg Stark" <greg.stark@enterprisedb.com> writes:
    -> Index Scan using i_oa_2_00_dt_for on t_oa_2_00_dt dt (cost=0.00..5.31 rows=1 width=8) (actual time=1.264..1.264 rows=0 loops=50)
    Index Cond: (dt.card_id = c.id)
    Filter: ((_to >= 1500) AND (_from <= 1550))
    Total runtime: 3399960.277 ms
    Also, are 1500 and 1550 user-supplied parameters or are they part of a small set of possible values? You could consider having a partial index on "card_id WHERE _to >= 1500 AND _from <= 1550". The numbers don't even have to match exactly as long as they include all the records the query needs.
    That side of the join isn't where the problem is, though.

    If you're willing to invent new indexes, one on ecp,nctr,nctn,ncts,rvel
    would probably fix the performance issue very nicely.

    regards, tom lane
  • Gaetano Mendola at Feb 1, 2008 at 5:18 pm

    Tom Lane wrote:
    "Greg Stark" <greg.stark@enterprisedb.com> writes:
    -> Index Scan using i_oa_2_00_dt_for on t_oa_2_00_dt dt (cost=0.00..5.31 rows=1 width=8) (actual time=1.264..1.264 rows=0 loops=50)
    Index Cond: (dt.card_id = c.id)
    Filter: ((_to >= 1500) AND (_from <= 1550))
    Total runtime: 3399960.277 ms
    Also, are 1500 and 1550 user-supplied parameters or are they part of a small set of possible values? You could consider having a partial index on "card_id WHERE _to >= 1500 AND _from <= 1550". The numbers don't even have to match exactly as long as they include all the records the query needs.
    That side of the join isn't where the problem is, though.

    If you're willing to invent new indexes, one on ecp,nctr,nctn,ncts,rvel
    would probably fix the performance issue very nicely.
    As always you are right, creating the index "ivan" btree (ecp, nctr, nctn, ncts, rvel)

    that query with the limit responds now in the blink of an eye:

    explain analyze SELECT c.id, tsk, lir, nctr, nctn, ncts, rvel,ecp, pvcp, pvcc,pvcf,pvcl,ldcn,ogtd,sgti
    FROM t_OA_2_00_card c JOIN t_OA_2_00_dt dt ON (dt.card_id = c.id)
    WHERE ecp=18 AND _to >= 1500 AND _from <= 1550
    ORDER BY nctr,nctn,ncts,rvel
    offset 0 limit 5;
    QUERY PLAN
    - -----------------------------------------------------------------------------------------------------------------------------------------------
    Limit (cost=0.00..370.03 rows=5 width=90) (actual time=0.102..0.608 rows=5 loops=1)
    -> Nested Loop (cost=0.00..778392.80 rows=10518 width=90) (actual time=0.099..0.594 rows=5 loops=1)
    -> Index Scan using ivan on t_oa_2_00_card c (cost=0.00..235770.34 rows=101872 width=90) (actual time=0.024..0.134 rows=50 loops=1)
    Index Cond: (ecp = 18)
    -> Index Scan using i_oa_2_00_dt_for on t_oa_2_00_dt dt (cost=0.00..5.31 rows=1 width=8) (actual time=0.006..0.006 rows=0 loops=50)
    Index Cond: (dt.card_id = c.id)
    Filter: ((_to >= 1500) AND (_from <= 1550))
    Total runtime: 0.700 ms
    (8 rows)


    Regards
    Gaetano Mendola

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedFeb 1, '08 at 11:08a
activeFeb 1, '08 at 5:18p
posts9
users5
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase