FAQ

Query planner doesn't use index scan on tsvector GIN index if LIMIT is specifiedQuery planner doesn't use index scan on tsvector GIN index if LIMIT is specified

Darklow
Jan 10, 2012 at 12:30 pm
Some info:
PostgreSQL version: 9.1.2

Table "cache":
Rows count: 3 471 081
Column "tsv" tsvector
Index "cache_tsv" USING gin (tsv)

If i do query like THIS:

*SELECT id FROM table WHERE tsv @@ to_tsquery('test:*');*
It uses index and returns results immediately:

explain analyze
'Bitmap Heap Scan on cache (cost=1441.78..63802.63 rows=19843 width=4)
(actual time=29.309..31.518 rows=1358 loops=1)'
' Recheck Cond: (tsv @@ to_tsquery('test:*'::text))'
' -> Bitmap Index Scan on cache_tsv (cost=0.00..1436.82 rows=19843
width=0) (actual time=28.966..28.966 rows=1559 loops=1)'
' Index Cond: (tsv @@ to_tsquery('test:*'::text))'
'Total runtime: 31.789 ms'


But the performance problems starts when i do the same query specifying
LIMIT.
*SELECT id FROM cache WHERE tsv @@ to_tsquery('test:*') limit 20;*

By some reason index is not used.

explain analyze
'Limit (cost=0.00..356.23 rows=20 width=4) (actual time=7.984..765.550
rows=20 loops=1)'
' -> Seq Scan on cache (cost=0.00..353429.50 rows=19843 width=4) (actual
time=7.982..765.536 rows=20 loops=1)'
' Filter: (tsv @@ to_tsquery('test:*'::text))'
'Total runtime: 765.620 ms'


Some more debug notes:
1) If i set SET enable_seqscan=off; then query uses indexes correctly
2) Also i notified, if i use: to_tsquery('test') without wildcard search
:*, then index is used correctly in both queries, with or without LIMIT

Any ideas how to fix the problem?
Thank you
reply

Search Discussions

2 responses

  • Tom Lane at Jan 10, 2012 at 5:04 pm

    darklow writes:
    But the performance problems starts when i do the same query specifying
    LIMIT.
    *SELECT id FROM cache WHERE tsv @@ to_tsquery('test:*') limit 20;*
    By some reason index is not used.
    It apparently thinks there are enough matches that it might as well just
    seqscan the table and expect to find some matches at random, in less
    time than using the index would take.

    The estimate seems to be off quite a bit, so maybe raising the stats
    target for this column would help.

    regards, tom lane
  • Jesper Krogh at Jan 10, 2012 at 9:43 pm

    On 2012-01-10 18:04, Tom Lane wrote:
    darklow<darklow@gmail.com> writes:
    But the performance problems starts when i do the same query specifying
    LIMIT.
    *SELECT id FROM cache WHERE tsv @@ to_tsquery('test:*') limit 20;*
    By some reason index is not used.
    It apparently thinks there are enough matches that it might as well just
    seqscan the table and expect to find some matches at random, in less
    time than using the index would take.

    The estimate seems to be off quite a bit, so maybe raising the stats
    target for this column would help.
    The cost of matching ts_match_vq against a toasted column
    is not calculated correctly. This is completely parallel with
    http://archives.postgresql.org/pgsql-hackers/2011-11/msg01754.php

    Try raising the cost for ts_match_vq(tsvector,tsquery) that help a bit, but
    its hard to get the cost high enough.

    Raising statistics target helps too..

    --
    Jesper

Related Discussions

Discussion Navigation
viewthread | post

3 users in discussion

Darklow: 1 post Jesper Krogh: 1 post Tom Lane: 1 post