Hi everybody,

I have a performance-problem with a query using a LIMIT. There are other threads rergading performance issues with LIMIT, but I didn't find useful hints for our problem and it might
be interesting for other postgres-users.


There are only 2 simple tables:

CREATE TABLE newsfeed
(
id varchar(32) PRIMARY KEY,
version int4 NOT NULL,
newsfeed_type varchar(20) NOT NULL,
new_item_count int4 NOT NULL
);
CREATE INDEX IDX_NEWSFEED_TYPE ON newsfeed (newsfeed_type);


CREATE TABLE newsfeed_item
(
id varchar(32) PRIMARY NOT NULL,
item_type varchar(35) NOT NULL,
version int4 NOT NULL,
category varchar(25) NULL,
data1 bytea NULL,
data2 bytea NULL,
date_time timestamp NOT NULL,
guid1 varchar(32) NULL,
guid2 varchar(32) NULL,
guid3 varchar(32) NULL,
id1 int8 NULL,
id2 int8 NULL,
long_value1 int8 NULL,
long_value2 int8 NULL,
long_value3 int8 NULL,
string_value1 varchar(4000) NULL,
string_value2 varchar(500) NULL,
string_value3 varchar(500) NULL,
string_value4 varchar(500) NULL,
string_value5 varchar(500) NULL,
string_value6 varchar(500) NULL,
newsfeed varchar(32) NOT NULL
);
CREATE UNIQUE INDEX newsfeed_item_pkey ON newsfeed_item (id);
CREATE INDEX idx_nfi_guid1 ON newsfeed_item (guid1);
CREATE INDEX idx_nfi_guid2 ON newsfeed_item (guid2);
CREATE INDEX idx_nfi_guid3 ON newsfeed_item (guid3);
CREATE INDEX idx_nfi_id1 ON newsfeed_item (id1);
CREATE INDEX idx_nfi_id2 ON newsfeed_item (id2);
CREATE INDEX idx_nfi_newsfeed ON newsfeed_item (newsfeed);
CREATE INDEX idx_nfi_type ON newsfeed_item (item_type);
CREATE INDEX idx_nfi_datetime ON newsfeed_item (date_time);

newsfeed contains 457036 rows
newsweed_item contains 5169727 rows

postgres version: 9.0.2
OS: CentOS release 5.5 (Final)


The following query took 4.2 seconds:

-------------------------
select *
from newsfeed_item
where newsfeed in
(
'173ee4dcec0d11de9f4f12313c0018c1','10dabde0f70211df816612313b02054e',
'17841c9af70211df874b12313b02054e','1783fce2f70211df814412313b02054e','1783fdd2f70211df8c1d12313b02054e','178405a2f70211df829212313b02054e',
'178440c6f70211df97c812313b02054e','178416e6f70211dfac3412313b02054e','1783e4aaf70211df9acd12313b02054e','178437e8f70211df8b8512313b02054e',
'1783f54ef70211df81e012313b02054e','178415c4f70211df8f8112313b02054e'
)
order by date_time desc

limit 25
-------------------------

If the LIMIT was removed, the query took 60 milliseconds! If the sorting order was changed to ASC, the query took 44ms, even with the LIMIT.

Then I tried to create the index on date_time in DESC order (because the result is sorted in descending order), but that did not change anything.


Then I removed the index on date_time with the following results:

query with the limit: 40 ms
query without the limit: 60 ms

=> the optimizer seems to use a wrong index (I did perform an ANALYZE on newsfeed_item and a REINDEX before I did the test). Since I currently don't need
the index on date_time (but will need it in the near future), I removed the index on date_time, which is ok for now.

------------------------

here are the explain analyze results:

1) the query in descending order with the limit and index on date_time (the slow one):

Limit (cost=0.00..980.09 rows=25 width=963) (actual time=48.592..4060.779 rows=25 loops=1)
-> Index Scan Backward using "IDX_NFI_DATETIME" on newsfeed_item (cost=0.00..409365.16 rows=10442 width=963) (actual time=48.581..4060.542 rows=25 loops=1)
Filter: ((newsfeed)::text = ANY ('{173ee4dcec0d11de9f4f12313c0018c1,10dabde0f70211df816612313b02054e,17841c9af70211df874b12313b02054e,1783fce2f70211df814412313b02054e,1783fdd2f70211df8c1d12313b02054e,178405a2f70211df829212313b02054e,178440c6f70211df97c812313b02054e,178416e6f70211dfac3412313b02054e,1783e4aaf70211df9acd12313b02054e,178437e8f70211df8b8512313b02054e,1783f54ef70211df81e012313b02054e,178415c4f70211df8f8112313b02054e}'::text[]))
Total runtime: 4060.959 ms


2) the query in descending order without the limit (which is much faster):

Sort (cost=39575.23..39601.33 rows=10442 width=963) (actual time=15.014..17.038 rows=477 loops=1)
Sort Key: date_time
Sort Method: quicksort Memory: 287kB
-> Bitmap Heap Scan on newsfeed_item (cost=421.41..34450.72 rows=10442 width=963) (actual time=0.644..12.601 rows=477 loops=1)
Recheck Cond: ((newsfeed)::text = ANY ('{173ee4dcec0d11de9f4f12313c0018c1,10dabde0f70211df816612313b02054e,17841c9af70211df874b12313b02054e,1783fce2f70211df814412313b02054e,1783fdd2f70211df8c1d12313b02054e,178405a2f70211df829212313b02054e,178440c6f70211df97c812313b02054e,178416e6f70211dfac3412313b02054e,1783e4aaf70211df9acd12313b02054e,178437e8f70211df8b8512313b02054e,1783f54ef70211df81e012313b02054e,178415c4f70211df8f8112313b02054e}'::text[]))
-> Bitmap Index Scan on idx_nfi_newsfeed (cost=0.00..418.80 rows=10442 width=0) (actual time=0.555..0.555 rows=477 loops=1)
Index Cond: ((newsfeed)::text = ANY ('{173ee4dcec0d11de9f4f12313c0018c1,10dabde0f70211df816612313b02054e,17841c9af70211df874b12313b02054e,1783fce2f70211df814412313b02054e,1783fdd2f70211df8c1d12313b02054e,178405a2f70211df829212313b02054e,178440c6f70211df97c812313b02054e,178416e6f70211dfac3412313b02054e,1783e4aaf70211df9acd12313b02054e,178437e8f70211df8b8512313b02054e,1783f54ef70211df81e012313b02054e,178415c4f70211df8f8112313b02054e}'::text[]))
Total runtime: 19.065 ms

3) the query in ascending order with the limit (which is fast):

Limit (cost=0.00..980.09 rows=25 width=963) (actual time=0.261..3.704 rows=25 loops=1)
-> Index Scan using "IDX_NFI_DATETIME" on newsfeed_item (cost=0.00..409365.16 rows=10442 width=963) (actual time=0.250..3.495 rows=25 loops=1)
Filter: ((newsfeed)::text = ANY ('{173ee4dcec0d11de9f4f12313c0018c1,10dabde0f70211df816612313b02054e,17841c9af70211df874b12313b02054e,1783fce2f70211df814412313b02054e,1783fdd2f70211df8c1d12313b02054e,178405a2f70211df829212313b02054e,178440c6f70211df97c812313b02054e,178416e6f70211dfac3412313b02054e,1783e4aaf70211df9acd12313b02054e,178437e8f70211df8b8512313b02054e,1783f54ef70211df81e012313b02054e,178415c4f70211df8f8112313b02054e}'::text[]))
Total runtime: 3.854 ms


4) The query after removing the index on date_time, in descending order with the LIMIT (which is fast as well).

Limit (cost=34745.39..34745.45 rows=25 width=963) (actual time=12.855..13.143 rows=25 loops=1)
-> Sort (cost=34745.39..34771.49 rows=10442 width=963) (actual time=12.846..12.946 rows=25 loops=1)
Sort Key: date_time
Sort Method: top-N heapsort Memory: 40kB
-> Bitmap Heap Scan on newsfeed_item (cost=421.41..34450.72 rows=10442 width=963) (actual time=0.622..9.936 rows=477 loops=1)
Recheck Cond: ((newsfeed)::text = ANY ('{173ee4dcec0d11de9f4f12313c0018c1,10dabde0f70211df816612313b02054e,17841c9af70211df874b12313b02054e,1783fce2f70211df814412313b02054e,1783fdd2f70211df8c1d12313b02054e,178405a2f70211df829212313b02054e,178440c6f70211df97c812313b02054e,178416e6f70211dfac3412313b02054e,1783e4aaf70211df9acd12313b02054e,178437e8f70211df8b8512313b02054e,1783f54ef70211df81e012313b02054e,178415c4f70211df8f8112313b02054e}'::text[]))
-> Bitmap Index Scan on idx_nfi_newsfeed (cost=0.00..418.80 rows=10442 width=0) (actual time=0.543..0.543 rows=477 loops=1)
Index Cond: ((newsfeed)::text = ANY ('{173ee4dcec0d11de9f4f12313c0018c1,10dabde0f70211df816612313b02054e,17841c9af70211df874b12313b02054e,1783fce2f70211df814412313b02054e,1783fdd2f70211df8c1d12313b02054e,178405a2f70211df829212313b02054e,178440c6f70211df97c812313b02054e,178416e6f70211dfac3412313b02054e,1783e4aaf70211df9acd12313b02054e,178437e8f70211df8b8512313b02054e,1783f54ef70211df81e012313b02054e,178415c4f70211df8f8112313b02054e}'::text[]))

Total runtime: 13.318 ms

Is there anything I can do to add the index on date_time without the performance problem?

regards
Dieter

Search Discussions

  • Claudio Freire at Apr 12, 2011 at 7:42 am

    On Tue, Apr 12, 2011 at 7:20 AM, Dieter Rehbein wrote:
    Hi everybody,

    I have a performance-problem with a query using a LIMIT. There are other threads rergading performance issues with LIMIT, but I didn't find useful hints for our problem and it might
    be interesting for other postgres-users.
    Did you perform an ANALYZE or VACUUM ANALYZE?
    Did you try increasing the statistic targets?

    AFAIK, it looks a lot like the planner is missing stats, since it
    estimates the index query on idx_nfi_newsfeed will fetch 10k rows -
    instead of 25.
  • Dieter Rehbein at Apr 12, 2011 at 8:59 am
    what I did, was an ANALYZE, which did not change anything.

    I just executed a VACUUM ANALYZE and now everything performs well. hm, strange.

    thanks
    Dieter



    Am 12.04.2011 um 09:42 schrieb Claudio Freire:

    On Tue, Apr 12, 2011 at 7:20 AM, Dieter Rehbein
    wrote:
    Hi everybody,

    I have a performance-problem with a query using a LIMIT. There are other threads rergading performance issues with LIMIT, but I didn't find useful hints for our problem and it might
    be interesting for other postgres-users.
    Did you perform an ANALYZE or VACUUM ANALYZE?
    Did you try increasing the statistic targets?

    AFAIK, it looks a lot like the planner is missing stats, since it
    estimates the index query on idx_nfi_newsfeed will fetch 10k rows -
    instead of 25.
  • Claudio Freire at Apr 12, 2011 at 9:07 am

    On Tue, Apr 12, 2011 at 10:59 AM, Dieter Rehbein wrote:
    I just executed a VACUUM ANALYZE and now everything performs well. hm, strange.
    That probably means you need more statistics - try increasing the
    newsfeed's statistics target count.

    ALTER TABLE newsfeed_item ALTER COLUMN newsfeed SET STATISTICS <n>;

    Try different <n> numbers, you can crank it up to 4000 or perhaps more
    in 9.0, but you should start lower I guess.
  • Tv at Apr 12, 2011 at 9:33 am

    On Tue, Apr 12, 2011 at 10:59 AM, Dieter Rehbein
    wrote:
    I just executed a VACUUM ANALYZE and now everything performs well. hm,
    strange.
    That probably means you need more statistics - try increasing the
    newsfeed's statistics target count.

    ALTER TABLE newsfeed_item ALTER COLUMN newsfeed SET STATISTICS <n>;

    Try different <n> numbers, you can crank it up to 4000 or perhaps more
    in 9.0, but you should start lower I guess.
    AFAIK the max value is 10000 and the default is 100. Higher numbers mean
    higher overhead, so do not jump to 10000 directly. Set it to 1000 and see
    if that helps, etc.

    regards
    Tomas
  • Dieter Rehbein at Apr 12, 2011 at 9:36 am
    thank's a lot guys, I will try that out.

    regards
    Dieter



    Am 12.04.2011 um 11:07 schrieb Claudio Freire:

    On Tue, Apr 12, 2011 at 10:59 AM, Dieter Rehbein
    wrote:
    I just executed a VACUUM ANALYZE and now everything performs well. hm, strange.
    That probably means you need more statistics - try increasing the
    newsfeed's statistics target count.

    ALTER TABLE newsfeed_item ALTER COLUMN newsfeed SET STATISTICS <n>;

    Try different <n> numbers, you can crank it up to 4000 or perhaps more
    in 9.0, but you should start lower I guess.
  • Tom Lane at Apr 12, 2011 at 2:38 pm

    Claudio Freire writes:
    Did you try increasing the statistic targets?
    AFAIK, it looks a lot like the planner is missing stats, since it
    estimates the index query on idx_nfi_newsfeed will fetch 10k rows -
    instead of 25.
    BTW, this is the right suggestion, but for the wrong reason. You seem
    to be looking at

    Limit (cost=0.00..980.09 rows=25 width=963) (actual time=48.592..4060.779 rows=25 loops=1)
    -> Index Scan Backward using "IDX_NFI_DATETIME" on newsfeed_item (cost=0.00..409365.16 rows=10442 width=963) (actual time=48.581..4060.542 rows=25 loops=1)

    Here, the actual row count is constrained to 25 because the LIMIT node
    stops calling the indexscan node once it's got 25. So this case proves
    little about whether the planner's estimates are any good. You need to
    check the estimates in the unconstrained plan:

    -> Bitmap Heap Scan on newsfeed_item (cost=421.41..34450.72 rows=10442 width=963) (actual time=0.644..12.601 rows=477 loops=1)

    Here we can see that there really are only 477 rows in the table that
    satisfy the WHERE clause, versus an estimate of 10K. So sure enough,
    the statistics are bad, and an increase in stats target might help.
    But you can't conclude that from an explain that involves LIMIT.

    regards, tom lane

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-performance @
categoriespostgresql
postedApr 12, '11 at 5:28a
activeApr 12, '11 at 2:38p
posts7
users4
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2023 Grokbase