I have a relatively simple query with a single index on (contract_id, time):

vjtrade=> EXPLAIN SELECT * FROM ticks WHERE contract_id=1 ORDER BY time;
QUERY
PLAN
-----------------------------------------------------------------------------------------------------
Sort (cost=11684028.44..11761274.94 rows=30898601 width=40)
Sort Key: "time"
-> Bitmap Heap Scan on ticks (cost=715657.57..6995196.08 rows=30898601
width=40)
Recheck Cond: (contract_id = 1)
-> Bitmap Index Scan on contract_id_time_idx
(cost=0.00..707932.92 rows=30898601 width=0)
Index Cond: (contract_id = 1)
(6 rows)

This plan doesn't complete in a reasonable amount of time. I end up having
to kill the query after it's been running for over an hour.

If I do a:
SET enable_sort=FALSE;
SET enable_bitmapscan=FALSE;

Then it gives me this plan:

Index Scan using contract_id_time_idx on ticks (cost=0.00..117276552.51
rows=30897044 width=40) (actual time=34.025..738583.609 rows=27858174
loops=1)
Index Cond: (contract_id = 1)
Total runtime: 742323.102 ms

Notice how the estimated cost is so much different from the actual time.
The row estimate is pretty good, however.

This is on postgresql 8.3.5 with:
shared_buffers = 512MB
temp_buffers = 256MB
work_mem = 256MB
max_fsm_pages = 153600
effective_cache_size = 1500MB

Is there any way to give postgresql a better estimate of the index scan
time? I tried setting random_page_cost=1, but it still gave me the bitmap
plan.

Thanks,
Jonathan Hseu

Search Discussions

  • Tom Lane at Mar 5, 2009 at 6:33 pm

    Jonathan Hseu writes:
    Sort (cost=11684028.44..11761274.94 rows=30898601 width=40)
    Sort Key: "time"
    -> Bitmap Heap Scan on ticks (cost=715657.57..6995196.08 rows=30898601
    width=40)
    Recheck Cond: (contract_id = 1)
    -> Bitmap Index Scan on contract_id_time_idx
    (cost=0.00..707932.92 rows=30898601 width=0)
    Index Cond: (contract_id = 1)
    (6 rows)
    This plan doesn't complete in a reasonable amount of time. I end up having
    to kill the query after it's been running for over an hour.
    The bitmap scan should be at least as efficient as the plain indexscan,
    so I suppose the problem is that the sort is slow. What's the datatype
    of "time"? Can this machine actually support 256MB+ work_mem, or is that
    likely to be driving it into swapping?

    You might learn more from enabling trace_sort and watching the
    postmaster log entries it generates. On the whole I think the planner
    isn't making a stupid choice here: sorting a large number of rows
    usually *is* preferable to making an indexscan over them, unless the
    table is remarkably close to being in physical order for the index.
    So it would be worth trying to figure out what the problem with the
    sort is.

    regards, tom lane
  • Jonathan Hseu at Mar 6, 2009 at 1:42 am
    Oops, forgot to CC my reply to the list. Sorry if this gets messed up.
    On Thu, Mar 5, 2009 at 12:30 PM, Tom Lane wrote:

    Jonathan Hseu <vomjom@vomjom.net> writes:
    Sort (cost=11684028.44..11761274.94 rows=30898601 width=40)
    Sort Key: "time"
    -> Bitmap Heap Scan on ticks (cost=715657.57..6995196.08
    rows=30898601
    width=40)
    Recheck Cond: (contract_id = 1)
    -> Bitmap Index Scan on contract_id_time_idx
    (cost=0.00..707932.92 rows=30898601 width=0)
    Index Cond: (contract_id = 1)
    (6 rows)
    This plan doesn't complete in a reasonable amount of time. I end up having
    to kill the query after it's been running for over an hour.
    The bitmap scan should be at least as efficient as the plain indexscan,
    so I suppose the problem is that the sort is slow. What's the datatype
    of "time"?

    It's a timestamp with time zone and not null.

    Can this machine actually support 256MB+ work_mem, or is that
    likely to be driving it into swapping?

    Yeah, the machine has 4 GB of RAM and isn't even close to swapping at all.
    You might learn more from enabling trace_sort and watching the
    postmaster log entries it generates.

    I got this (I'm not sure how to interpret it, as there doesn't seem to be
    any documentation about it on the web):

    2009-03-05 15:28:27 CST STATEMENT: select * from ticks where contract_id=1
    order by time limit 2800000;
    2009-03-05 15:28:30 CST LOG: begin tuple sort: nkeys = 1, workMem = 262144,
    randomAccess = f
    2009-03-05 15:28:30 CST STATEMENT: explain analyze select * from ticks
    where contract_id=1 order by time limit 2800000;
    2009-03-05 16:50:31 CST LOG: switching to external sort with 937 tapes: CPU
    26.57s/4835.39u sec elapsed 4921.38 sec
    2009-03-05 16:50:31 CST STATEMENT: explain analyze select * from ticks
    where contract_id=1 order by time limit 2800000;
    2009-03-05 17:00:46 CST LOG: performsort starting: CPU 92.51s/4955.58u sec
    elapsed 5536.57 sec
    2009-03-05 17:00:46 CST STATEMENT: explain analyze select * from ticks
    where contract_id=1 order by time limit 2800000;
    2009-03-05 17:00:50 CST LOG: finished writing run 1 to tape 0: CPU
    92.86s/4958.30u sec elapsed 5539.78 sec
    2009-03-05 17:00:50 CST STATEMENT: explain analyze select * from ticks
    where contract_id=1 order by time limit 2800000;
    2009-03-05 17:00:50 CST LOG: finished writing final run 2 to tape 1: CPU
    92.88s/4958.40u sec elapsed 5539.90 sec
    2009-03-05 17:00:50 CST STATEMENT: explain analyze select * from ticks
    where contract_id=1 order by time limit 2800000;
    2009-03-05 17:00:51 CST LOG: performsort done (except 2-way final merge):
    CPU 92.96s/4958.55u sec elapsed 5541.10 sec
    2009-03-05 17:00:51 CST STATEMENT: explain analyze select * from ticks
    where contract_id=1 order by time limit 2800000;
    2009-03-05 17:00:58 CST LOG: external sort ended, 204674 disk blocks used:
    CPU 93.36s/4960.04u sec elapsed 5548.33 sec
    2009-03-05 17:00:58 CST STATEMENT: explain analyze select * from ticks
    where contract_id=1 order by time limit 2800000;


    On the whole I think the planner
    isn't making a stupid choice here: sorting a large number of rows
    usually *is* preferable to making an indexscan over them, unless the
    table is remarkably close to being in physical order for the index.
    So it would be worth trying to figure out what the problem with the
    sort is.

    I don't really understand this. It seems to me that fetching and sorting 30
    million rows wouldn't be preferable to just fetching them in the correct
    order in the first place, even if it's in a random order.

    I tried another query with a much smaller result set, and the index scan
    takes 76 seconds, but the bitmap scan & sort takes 1.5 hours. That's quite
    a difference. I'm pretty sure the physical order of the index is very
    different from the physical order of the table. The elements of the table
    are inserted in strictly time order, if that's how it ends up being on disk,
    whereas the index, as far as I understand it, would be sorted by the first
    of the multiple columns, the contract_id, then the time.

    Here's both of the EXPLAIN ANALYZEs for the same query:

    => explain analyze select * from ticks where contract_id=1 order by time
    limit 2800000;

    QUERY
    PLAN

    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Limit (cost=10487812.41..10494812.41 rows=2800000 width=40) (actual
    time=5541109.704..5545345.598 rows=2800000 loops=1)
    -> Sort (cost=10487812.41..10565267.29 rows=30981949 width=40) (actual
    time=5541109.702..5544883.149 rows=2800000 loops=1)
    Sort Key: "time"
    Sort Method: external merge Disk: 1637392kB
    -> Bitmap Heap Scan on ticks (cost=718724.01..7015201.37
    rows=30981949 width=40) (actual time=4874084.105..5465131.997 rows=27917481
    loops=1)
    Recheck Cond: (contract_id = 1)
    -> Bitmap Index Scan on contract_id_time_idx
    (cost=0.00..710978.52 rows=30981949 width=0) (actual
    time=4871649.240..4871649.240 rows=27918305 loops=1)
    Index Cond: (contract_id = 1)
    Total runtime: 5548440.918 ms
    (9 rows)

    => explain analyze select * from ticks where contract_id=1 order by time
    limit 2800000;

    QUERY
    PLAN

    --------------------------------------------------------------------------------------------------------------------------------------------------------------
    Limit (cost=0.00..10629028.93 rows=2800000 width=40) (actual
    time=136.612..75717.675 rows=2800000 loops=1)
    -> Index Scan using contract_id_time_idx on ticks
    (cost=0.00..117628023.89 rows=30986694 width=40) (actual
    time=136.611..75033.090 rows=2800000 loops=1)
    Index Cond: (contract_id = 1)
    Total runtime: 76081.634 ms
    (4 rows)


    To me, it seems like postgresql thinks that it has to do a random page fetch
    for each row from the index scan and so its prediction is way off. The
    prediction for the other bitmap plan is much closer.

    Thanks,
    Jonathan Hseu
  • Robert Haas at Mar 6, 2009 at 2:33 am

    On Thu, Mar 5, 2009 at 1:30 PM, Tom Lane wrote:
    Jonathan Hseu <vomjom@vomjom.net> writes:
    Sort  (cost=11684028.44..11761274.94 rows=30898601 width=40)
    Sort Key: "time"
    ->  Bitmap Heap Scan on ticks  (cost=715657.57..6995196.08 rows=30898601
    width=40)
    Recheck Cond: (contract_id = 1)
    ->  Bitmap Index Scan on contract_id_time_idx
    (cost=0.00..707932.92 rows=30898601 width=0)
    Index Cond: (contract_id = 1)
    (6 rows)
    This plan doesn't complete in a reasonable amount of time.  I end up having
    to kill the query after it's been running for over an hour.
    The bitmap scan should be at least as efficient as the plain indexscan,
    so I suppose the problem is that the sort is slow.  What's the datatype
    of "time"?  Can this machine actually support 256MB+ work_mem, or is that
    likely to be driving it into swapping?

    You might learn more from enabling trace_sort and watching the
    postmaster log entries it generates.  On the whole I think the planner
    isn't making a stupid choice here: sorting a large number of rows
    usually *is* preferable to making an indexscan over them, unless the
    table is remarkably close to being in physical order for the index.
    It seems like this is only likely to be true if most of the data needs
    to be read from a magnetic disk, so that many seeks are involved.
    That might not be the case here, since the machine has an awful lot of
    RAM.

    ...Robert

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-performance @
categoriespostgresql
postedMar 5, '09 at 6:17p
activeMar 6, '09 at 2:33a
posts4
users3
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase