Hi All,

I have to optimize following query :

SELECT r.TopFamilyID AS FamilyID, FROM CMRules r
WHERE r.WorkspaceID =18512
GROUP BY r.TopFamilyID ;

The explain plan is as follows :

Group (cost=509989.19..511518.30 rows=9 width=10) (actual time=1783.102..2362.587 rows=261 loops=1)
-> Sort (cost=509989.19..510753.74 rows=305821 width=10) (actual time=1783.097..2121.378 rows=272211 loops=1)
Sort Key: topfamilyid
-> Bitmap Heap Scan on cmrules r (cost=14501.36..476896.34 rows=305821 width=10) (actual time=51.507..351.487 rows=272211 loops=1)
Recheck Cond: (workspaceid = 18512::numeric)
-> Bitmap Index Scan on pk_ws_fea_fam_cmrules (cost=0.00..14424.90 rows=305821 width=0) (actual time=48.097..48.097 rows=272211 loops=1)
Index Cond: (workspaceid = 18512::numeric)
Total runtime: 2373.008 ms
(8 rows)
-----------------------------------------------------------------------------------------------------------------
\d CMRules gives follows indexes

Indexes:
"pk_ws_fea_fam_cmrules" PRIMARY KEY, btree (workspaceid, featureid, topfamilyid, ruleenddate, gid)
"idx_cmrules" btree (topfamilyid)
"idx_gid_ws_cmrules" btree (gid, workspaceid)
-----------------------------------------------------------------------------------------------------------------
SELECT count(distinct r.TopFamilyID) FROM CMRules r WHERE r.WorkspaceID =18512

Gives me 261 Rows

SELECT count(r.TopFamilyID) FROM CMRules r WHERE r.WorkspaceID =18512 ;

Gives me 272 211 Rows

select count(*) from cmrules;

Gives me 17 643 532 Rows


Please suggest me something to optimize this query

Thanks
Niraj Patel

Search Discussions

  • Grzegorz Jaśkiewicz at Dec 8, 2009 at 1:43 pm
    it looks like it might choose wrong plan, cos it gets the stats wrong.
    Try increasing number of stats to 100.
    Btw, what version it is ?
  • Niraj patel at Dec 8, 2009 at 1:51 pm
    Hi gryzman,

    I have run vacuum full analyze on the cmrules tables. The version of pstgres is 8.2.13. How should I change stats to 100 ?

    Thanks



    ________________________________
    From: Grzegorz Jaśkiewicz <gryzman@gmail.com>
    To: niraj patel <npatel@gridsolv.com>
    Cc: pgsql-performance@postgresql.org
    Sent: Tue, 8 December, 2009 7:12:49 PM
    Subject: Re: [PERFORM] Optimizing Bitmap Heap Scan.

    it looks like it might choose wrong plan, cos it gets the stats wrong.
    Try increasing number of stats to 100.
    Btw, what version it is ?
  • Lennin Caro at Dec 8, 2009 at 3:51 pm
    From: niraj patel <npatel@gridsolv.com>
    Subject: Re: [PERFORM] Optimizing Bitmap Heap Scan.
    To: "Grzegorz Jaśkiewicz" <gryzman@gmail.com>
    Cc: pgsql-performance@postgresql.org
    Date: Tuesday, December 8, 2009, 1:50 PM

    Hi gryzman,

    I have run vacuum full analyze on the cmrules tables. The version of pstgres is 8.2.13. How should I change stats to 100 ?

    Thanks
    From: Grzegorz Jaśkiewicz <gryzman@gmail.com>
    To: niraj patel <npatel@gridsolv.com>
    Cc: pgsql-performance@postgresql.org
    Sent: Tue, 8 December, 2009 7:12:49 PM
    Subject:
    Re: [PERFORM] Optimizing Bitmap Heap Scan.

    it looks like it might choose wrong plan, cos it gets the stats wrong.
    Try increasing number of stats to 100.
    Btw, what version it is ?


    in psql
    mydb=# set default_statistics_target = 100;
  • Kevin Grittner at Dec 8, 2009 at 4:48 pm

    Lennin Caro wrote:

    I have run vacuum full
    That's not usually a good idea. For one thing, it will tend to
    bloat your indexes.

    -Kevin
  • Robert Haas at Dec 8, 2009 at 6:30 pm
    2009/12/8 Lennin Caro <lennin.caro@yahoo.com>
    From: niraj patel <npatel@gridsolv.com>

    Subject: Re: [PERFORM] Optimizing Bitmap Heap Scan.
    To: "Grzegorz Jaśkiewicz" <gryzman@gmail.com>
    Cc: pgsql-performance@postgresql.org
    Date: Tuesday, December 8, 2009, 1:50 PM

    Hi gryzman,

    I have run vacuum full analyze on the cmrules tables. The version of pstgres is 8.2.13. How should I change stats to 100 ?

    Thanks
    ________________________________
    From: Grzegorz Jaśkiewicz <gryzman@gmail.com>
    To: niraj patel <npatel@gridsolv.com>
    Cc: pgsql-performance@postgresql.org
    Sent: Tue, 8 December, 2009 7:12:49 PM
    Subject: Re: [PERFORM] Optimizing Bitmap Heap Scan.

    it looks like it might choose wrong plan, cos it gets the stats wrong.
    Try increasing number of stats to 100.
    Btw, what version it is ?


    in psql
    mydb=# set default_statistics_target = 100;
    That's only going to affect the current session. To change it
    permanently, edit postgresql.conf and do pg_ctl reload.

    ...Robert
  • Matthew Wakeling at Dec 8, 2009 at 2:03 pm

    On Tue, 8 Dec 2009, niraj patel wrote:
    Group  (cost=509989.19..511518.30 rows=9 width=10) (actual time=1783.102..2362.587
    rows=261 loops=1)
    ->  Sort  (cost=509989.19..510753.74 rows=305821 width=10) (actual
    time=1783.097..2121.378 rows=272211 loops=1)
    Sort Key: topfamilyid
    ->  Bitmap Heap Scan on cmrules r  (cost=14501.36..476896.34 rows=305821
    width=10) (actual time=51.507..351.487 rows=272211 loops=1)
    Recheck Cond: (workspaceid = 18512::numeric)
    ->  Bitmap Index Scan on pk_ws_fea_fam_cmrules  (cost=0.00..14424.90
    rows=305821 width=0) (actual time=48.097..48.097 rows=272211 loops=1)
    Index Cond: (workspaceid = 18512::numeric)
    Total runtime: 2373.008 ms
    (8 rows)
    select count(*) from  cmrules;

    Gives me 17 643 532 Rows
    Looks good from here. Think about what you're asking the database to do.
    It has to select 272211 rows out of a large table with 17643532 rows. That
    in itself could take a very long time. It is clear that in your EXPLAIN
    this data is already cached, otherwise it would have to perform nigh on
    270000 seeks over the discs, which would take (depending on the disc
    system) something on the order of twenty minutes. Those 272211 rows then
    have to be sorted, which takes a couple of seconds, which again is pretty
    good. The rows are then uniqued, which is really quick, before returning
    the results.

    It's hard to think how you would expect the database to do this any
    faster, really.
    Indexes:
    "pk_ws_fea_fam_cmrules" PRIMARY KEY, btree (workspaceid, featureid, topfamilyid,
    ruleenddate, gid)
    "idx_cmrules" btree (topfamilyid)
    "idx_gid_ws_cmrules" btree (gid, workspaceid)
    You may perhaps benefit from an index on just the workspaceid column, but
    the benefit may be minor.

    You may think of clustering the table on the index, but that will only be
    of benefit if the data is not in the cache.

    The statistics seem to be pretty accurate, predicting 305821 instead of
    272211 rows. The database is not going to easily predict the number of
    unique results (9 instead of 261), but that doesn't affect the query plan
    much, so I wouldn't worry about it.

    I would consider upgrading to Postgres 8.4 if possible, as it does have
    some considerable performance improvements, especially for bitmap index
    scans if you are using a RAID array. I'd also try using "SELECT DISTINCT"
    rather than "GROUP BY" and seeing if that helps.

    Matthew

    --
    Now the reason people powdered their faces back then was to change the values
    "s" and "n" in this equation here. - Computer science lecturer
  • Niraj patel at Dec 8, 2009 at 2:27 pm
    Hi Matthew ,

    Thanks very much for the analysis. It does takes 17 sec to execute when data is not in cache. I cannot use "distinct" as I have aggregate operators in select clause in original query. What I would like to ask can partitioning around workspaceid would help ? Or any sort of selective index would help me.

    Thanks.




    ________________________________
    From: Matthew Wakeling <matthew@flymine.org>
    To: niraj patel <npatel@gridsolv.com>
    Cc: pgsql-performance@postgresql.org
    Sent: Tue, 8 December, 2009 7:33:38 PM
    Subject: Re: [PERFORM] Optimizing Bitmap Heap Scan.
    On Tue, 8 Dec 2009, niraj patel wrote:
    Group (cost=509989.19..511518.30 rows=9 width=10) (actual time=1783.102..2362.587
    rows=261 loops=1)
    -> Sort (cost=509989.19..510753.74 rows=305821 width=10) (actual
    time=1783.097..2121.378 rows=272211 loops=1)
    Sort Key: topfamilyid
    -> Bitmap Heap Scan on cmrules r (cost=14501.36..476896.34 rows=305821
    width=10) (actual time=51.507..351.487 rows=272211 loops=1)
    Recheck Cond: (workspaceid = 18512::numeric)
    -> Bitmap Index Scan on pk_ws_fea_fam_cmrules (cost=0.00..14424.90
    rows=305821 width=0) (actual time=48.097..48.097 rows=272211 loops=1)
    Index Cond: (workspaceid = 18512::numeric)
    Total runtime: 2373.008 ms
    (8 rows)
    select count(*) from cmrules;

    Gives me 17 643 532 Rows
    Looks good from here. Think about what you're asking the database to do. It has to select 272211 rows out of a large table with 17643532 rows. That in itself could take a very long time. It is clear that in your EXPLAIN this data is already cached, otherwise it would have to perform nigh on 270000 seeks over the discs, which would take (depending on the disc system) something on the order of twenty minutes. Those 272211 rows then have to be sorted, which takes a couple of seconds, which again is pretty good. The rows are then uniqued, which is really quick, before returning the results.

    It's hard to think how you would expect the database to do this any faster, really.
    Indexes:
    "pk_ws_fea_fam_cmrules" PRIMARY KEY, btree (workspaceid, featureid, topfamilyid,
    ruleenddate, gid)
    "idx_cmrules" btree (topfamilyid)
    "idx_gid_ws_cmrules" btree (gid, workspaceid)
    You may perhaps benefit from an index on just the workspaceid column, but the benefit may be minor.

    You may think of clustering the table on the index, but that will only be of benefit if the data is not in the cache.

    The statistics seem to be pretty accurate, predicting 305821 instead of 272211 rows. The database is not going to easily predict the number of unique results (9 instead of 261), but that doesn't affect the query plan much, so I wouldn't worry about it.

    I would consider upgrading to Postgres 8.4 if possible, as it does have some considerable performance improvements, especially for bitmap index scans if you are using a RAID array. I'd also try using "SELECT DISTINCT" rather than "GROUP BY" and seeing if that helps.

    Matthew

    -- Now the reason people powdered their faces back then was to change the values
    "s" and "n" in this equation here. - Computer science lecturer
    -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-performance
  • Matthew Wakeling at Dec 8, 2009 at 2:48 pm

    On Tue, 8 Dec 2009, niraj patel wrote:
    Thanks very much for the analysis. It does takes 17 sec to execute when
    data is not in cache.
    It sounds like the table is already very much ordered by the workspaceid,
    otherwise this would have taken much longer.
    What I would like to ask can partitioning around workspaceid would help?
    Or any sort of selective index would help me.
    Depends on how many distinct values of workspaceid there are. I would
    suggest that given how well ordered your table is, and if you aren't doing
    too many writes, then there would be little benefit, and much hassle.

    Matthew

    --
    Now, you would have thought these coefficients would be integers, given that
    we're working out integer results. Using a fraction would seem really
    stupid. Well, I'm quite willing to be stupid here - in fact, I'm going to
    use complex numbers. -- Computer Science Lecturer

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-performance @
categoriespostgresql
postedDec 8, '09 at 1:39p
activeDec 8, '09 at 6:30p
posts9
users6
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase