Hello all,

I was hoping someone could explain the plan for a statement.

We have a table with a column of longs being used as an index. The
query plan in 8.0 was like this:

# explain select distinct timeseriesid from tbltimeseries where
timeseriesid > 0 order by timeseriesid;
SET
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Unique (cost=0.00..15065908.60 rows=10854026 width=8)
-> Index Scan using idx_timeseris on tbltimeseries
(cost=0.00..15038773.53 rows=10854026 width=8)
Index Cond: (timeseriesid > 0)
(3 rows)



In 8.1, (using the same database after a dump+restore+vacuum+analyze) I
get the following:
# explain select distinct timeseriesid from tbltimeseries where
timeseriesid > 0 order by timeseriesid;
QUERY PLAN
---------------------------------------------------------------------------------------------------
Unique (cost=2717137.08..2771407.21 rows=10854026 width=8)
-> Sort (cost=2717137.08..2744272.14 rows=10854026 width=8)
Sort Key: timeseriesid
-> Bitmap Heap Scan on tbltimeseries
(cost=48714.09..1331000.42 rows=10854026 width=8)
Recheck Cond: (timeseriesid > 0)
-> Bitmap Index Scan on idx_timeseris
(cost=0.00..48714.09 rows=10854026 width=0)
Index Cond: (timeseriesid > 0)
(7 rows)


I'm hoping someone can explain the new query plan (as I'm not sure I
understand what it is doing).

Thanks!

-- Alan

Search Discussions

  • Merlin Moncure at Aug 26, 2005 at 3:12 pm

    Hello all,

    I was hoping someone could explain the plan for a statement.

    We have a table with a column of longs being used as an index. The
    query plan in 8.0 was like this:

    # explain select distinct timeseriesid from tbltimeseries where
    timeseriesid > 0 order by timeseriesid;
    I had the same problem. You probably already have seq scan turned off,
    or the server would be using that. You may have to turn bitmap off or
    rework you query such that the server will use the index. (between?).

    Anyways, distinct is code word for 'bad performance' :). Consider
    laying out tables such that it not necessary, for example set up table
    with RI link. Then you can do this in zero time.

    Good luck!

    Merlin
  • Michael Fuhr at Aug 26, 2005 at 3:16 pm

    On Fri, Aug 26, 2005 at 10:45:07AM -0400, Alan Stange wrote:
    -> Bitmap Heap Scan on tbltimeseries (cost=48714.09..1331000.42 rows=10854026 width=8)
    Recheck Cond: (timeseriesid > 0)
    -> Bitmap Index Scan on idx_timeseris (cost=0.00..48714.09 rows=10854026 width=0)
    Index Cond: (timeseriesid > 0)

    I'm hoping someone can explain the new query plan (as I'm not sure I
    understand what it is doing).
    Search for "bitmap" in the 8.1 Release Notes:

    http://developer.postgresql.org/docs/postgres/release.html#RELEASE-8-1

    You could probably find more detailed discussion in the pgsql-hackers
    archives.

    --
    Michael Fuhr
  • Tom Lane at Aug 26, 2005 at 3:17 pm

    Alan Stange writes:
    Unique (cost=2717137.08..2771407.21 rows=10854026 width=8)
    -> Sort (cost=2717137.08..2744272.14 rows=10854026 width=8)
    Sort Key: timeseriesid
    -> Bitmap Heap Scan on tbltimeseries
    (cost=48714.09..1331000.42 rows=10854026 width=8)
    Recheck Cond: (timeseriesid > 0)
    -> Bitmap Index Scan on idx_timeseris
    (cost=0.00..48714.09 rows=10854026 width=0)
    Index Cond: (timeseriesid > 0)
    (7 rows)
    I'm hoping someone can explain the new query plan (as I'm not sure I
    understand what it is doing).
    The index scan is reading the index to find out which heap tuple IDs
    (TIDs) the index says meet the condition. It returns a bitmap of the
    tuple locations (actually, an array of per-page bitmaps). The heap
    scan goes and fetches the tuples from the table, working in TID order
    to avoid re-reading the same page many times, as can happen for ordinary
    index scans. Since the result isn't sorted, we have to do a sort to get
    it into the correct order for the Unique step.

    Because it avoids random access to the heap, this plan can be a lot
    faster than a regular index scan. I'm not sure at all that 8.1 is
    doing good relative cost estimation yet, though. It would be
    interesting to see EXPLAIN ANALYZE results for both ways. (You can
    use enable_bitmapscan and enable_indexscan to force the planner to pick
    the plan it thinks is slower.)

    regards, tom lane
  • Alan Stange at Aug 26, 2005 at 7:02 pm

    Tom Lane wrote:
    Alan Stange <stange@rentec.com> writes:
    Unique (cost=2717137.08..2771407.21 rows=10854026 width=8)
    -> Sort (cost=2717137.08..2744272.14 rows=10854026 width=8)
    Sort Key: timeseriesid
    -> Bitmap Heap Scan on tbltimeseries
    (cost=48714.09..1331000.42 rows=10854026 width=8)
    Recheck Cond: (timeseriesid > 0)
    -> Bitmap Index Scan on idx_timeseris
    (cost=0.00..48714.09 rows=10854026 width=0)
    Index Cond: (timeseriesid > 0)
    (7 rows)
    I'm hoping someone can explain the new query plan (as I'm not sure I
    understand what it is doing).
    The index scan is reading the index to find out which heap tuple IDs
    (TIDs) the index says meet the condition. It returns a bitmap of the
    tuple locations (actually, an array of per-page bitmaps). The heap
    scan goes and fetches the tuples from the table, working in TID order
    to avoid re-reading the same page many times, as can happen for ordinary
    index scans. Since the result isn't sorted, we have to do a sort to get
    it into the correct order for the Unique step.

    Because it avoids random access to the heap, this plan can be a lot
    faster than a regular index scan. I'm not sure at all that 8.1 is
    doing good relative cost estimation yet, though. It would be
    interesting to see EXPLAIN ANALYZE results for both ways. (You can
    use enable_bitmapscan and enable_indexscan to force the planner to pick
    the plan it thinks is slower.)
    Just to be clear. The index is on the timeseriesid column. Also, We
    usually have the where clause with some non-zero number.

    Anyway, here's the basic query, with variations added on belowe:

    fiasco=# explain analyze select timeseriesid from tbltimeseries where
    timeseriesid > 0;
    QUERY
    PLAN
    ------------------------------------------------------------------------------------------------------------------------------------------------
    Bitmap Heap Scan on tbltimeseries (cost=48906.82..1332935.19
    rows=10905949 width=8) (actual time=16476.337..787480.979 rows=10907853
    loops=1)
    Recheck Cond: (timeseriesid > 0)
    -> Bitmap Index Scan on idx_timeseris (cost=0.00..48906.82
    rows=10905949 width=0) (actual time=16443.585..16443.585 rows=10907853
    loops=1)
    Index Cond: (timeseriesid > 0)
    Total runtime: 791340.341 ms
    (5 rows)



    Now add the order:

    fiasco=# explain analyze select timeseriesid from tbltimeseries where
    timeseriesid > 0 order by timeseriesid;

    QUERY PLAN
    ------------------------------------------------------------------------------------------------------------------------------------------------------
    Sort (cost=2726087.93..2753352.81 rows=10905949 width=8) (actual
    time=821090.666..826353.054 rows=10913868 loops=1)
    Sort Key: timeseriesid
    -> Bitmap Heap Scan on tbltimeseries (cost=48912.82..1332941.19
    rows=10905949 width=8) (actual time=16353.921..757075.349 rows=10913868
    loops=1)
    Recheck Cond: (timeseriesid > 0)
    -> Bitmap Index Scan on idx_timeseris (cost=0.00..48912.82
    rows=10905949 width=0) (actual time=16335.239..16335.239 rows=10913868
    loops=1)
    Index Cond: (timeseriesid > 0)
    Total runtime: 830829.145 ms
    (7 rows)




    and the distinct:

    fiasco=# explain analyze select distinct timeseriesid from tbltimeseries
    where timeseriesid > 0 order by timeseriesid;

    QUERY PLAN
    ------------------------------------------------------------------------------------------------------------------------------------------------------------
    Unique (cost=2726087.93..2780617.68 rows=10905949 width=8) (actual
    time=816938.970..831119.423 rows=10913868 loops=1)
    -> Sort (cost=2726087.93..2753352.81 rows=10905949 width=8) (actual
    time=816938.967..822298.802 rows=10913868 loops=1)
    Sort Key: timeseriesid
    -> Bitmap Heap Scan on tbltimeseries
    (cost=48912.82..1332941.19 rows=10905949 width=8) (actual
    time=15866.736..752851.006 rows=10913868 loops=1)
    Recheck Cond: (timeseriesid > 0)
    -> Bitmap Index Scan on idx_timeseris
    (cost=0.00..48912.82 rows=10905949 width=0) (actual
    time=15852.652..15852.652 rows=10913868 loops=1)
    Index Cond: (timeseriesid > 0)
    Total runtime: 835558.312 ms
    (8 rows)




    Now the usual query from 8.0:

    fiasco=# set enable_bitmapscan=false; explain analyze select distinct
    timeseriesid from tbltimeseries where timeseriesid > 0 order by
    timeseriesid;
    SET

    QUERY PLAN
    --------------------------------------------------------------------------------------------------------------------------------------------------------------
    Unique (cost=0.00..14971276.10 rows=10905949 width=8) (actual
    time=24.930..999645.638 rows=10913868 loops=1)
    -> Index Scan using idx_timeseris on tbltimeseries
    (cost=0.00..14944011.22 rows=10905949 width=8) (actual
    time=24.926..989117.882 rows=10913868 loops=1)
    Index Cond: (timeseriesid > 0)
    Total runtime: 1003549.067 ms
    (4 rows)




    And now a sequential scan of the table itself:

    fiasco=# set enable_bitmapscan=false; set enable_indexscan=false;
    explain analyze select distinct timeseriesid from tbltimeseries where
    timeseriesid > 0 order by timeseriesid;
    SET
    SET

    QUERY PLAN
    ------------------------------------------------------------------------------------------------------------------------------------------------------
    Unique (cost=102677188.75..102731718.49 rows=10905949 width=8) (actual
    time=956783.989..971036.657 rows=10919883 loops=1)
    -> Sort (cost=102677188.75..102704453.62 rows=10905949 width=8)
    (actual time=956783.985..962115.616 rows=10919883 loops=1)
    Sort Key: timeseriesid
    -> Seq Scan on tbltimeseries (cost=100000000.00..101284042.00
    rows=10905949 width=8) (actual time=7.314..893267.030 rows=10919883 loops=1)
    Filter: (timeseriesid > 0)
    Total runtime: 975393.678 ms
    (6 rows)


    For us, the query is best served by the index scan as the ordering comes
    for free and results can be streamed to a client immediately. So, while
    the whole query is a bit slower, the client can begin processing the
    results immediately. The client has three threads which stream in two
    sets of id's and emit delete statements in smaller batches. It can be
    done as one statement, but on our production system that statement can
    run for 10 hours and delete 20M rows...which conflicts with the vacuum
    process. This version can be throttled, stopped and restarted at any
    time and no work is lost compared to a single long running query.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-performance @
categoriespostgresql
postedAug 26, '05 at 2:45p
activeAug 26, '05 at 7:02p
posts5
users4
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase