I am using gin index on a tsvector and doing basic search. I see the
row-estimate of the planner to be horribly wrong. It is returning
row-estimate as 4843 for all queries whether it matches zero rows, a
medium number of rows (88,000) or a large number of rows (726,000).

The table has roughly a million docs.

I see a similar problem reported here but thought it was fixed in 9.0
which I am running.

http://archives.postgresql.org/pgsql-hackers/2010-05/msg01389.php

Here is the version info and detailed planner output for all the three
queries:


select version();

version

PostgreSQL 9.0.0 on x86_64-unknown-linux-gnu, compiled by GCC gcc
(Gentoo 4.3.4 p1.1, pie-10.1.5) 4.3.4, 64-bit


Case I: FOR A NON-MATCHING WORD
===============================

explain analyze select count(*) from docmeta,
plainto_tsquery('english', 'dyfdfdf') as qdoc where docvector @@ qdoc;
QUERY
PLAN

Aggregate (cost=20322.17..20322.18 rows=1 width=0) (actual
time=0.058..0.058 rows=1 loops=1)
-> Nested Loop (cost=5300.28..20310.06 rows=4843 width=0) (actual
time=0.055..0.055 rows=0 loops=1)
-> Function Scan on qdoc (cost=0.00..0.01 rows=1 width=32)
(actual time=0.005..0.005 rows=1 loops=1)
-> Bitmap Heap Scan on docmeta (cost=5300.28..20249.51
rows=4843 width=270) (actual time=0.046..0.046 rows=0 loops=1)
Recheck Cond: (docmeta.docvector @@ qdoc.qdoc)
-> Bitmap Index Scan on doc_index (cost=0.00..5299.07
rows=4843 width=0) (actual time=0.044..0.044 rows=0 loops=1)
Index Cond: (docmeta.docvector @@ qdoc.qdoc)
Total runtime: 0.092 ms

CASE II: FOR A MEDIUM-MATCHING WORD
===================================
explain analyze select count(*) from docmeta,
plainto_tsquery('english', 'quit') as qdoc where docvector @@ qdoc;
QUERY
PLAN

Aggregate (cost=20322.17..20322.18 rows=1 width=0) (actual
time=1222.856..1222.857 rows=1 loops=1)
-> Nested Loop (cost=5300.28..20310.06 rows=4843 width=0) (actual
time=639.275..1212.460 rows=88545 loops=1)
-> Function Scan on qdoc (cost=0.00..0.01 rows=1 width=32)
(actual time=0.006..0.007 rows=1 loops=1)
-> Bitmap Heap Scan on docmeta (cost=5300.28..20249.51
rows=4843 width=270) (actual time=639.264..1196.542 rows=88545 loops=1)
Recheck Cond: (docmeta.docvector @@ qdoc.qdoc)
-> Bitmap Index Scan on doc_index (cost=0.00..5299.07
rows=4843 width=0) (actual time=621.877..621.877 rows=88545 loops=1)
Index Cond: (docmeta.docvector @@ qdoc.qdoc)
Total runtime: 1222.907 ms


Case II: FOR A HIGH-MATCHING WORD
=================================

explain analyze select count(*) from docmeta,
plainto_tsquery('english', 'j') as qdoc where docvector @@ qdoc;
QUERY
PLAN

Aggregate (cost=20322.17..20322.18 rows=1 width=0) (actual
time=742.857..742.858 rows=1 loops=1)
-> Nested Loop (cost=5300.28..20310.06 rows=4843 width=0) (actual
time=126.804..660.895 rows=726985 loops=1)
-> Function Scan on qdoc (cost=0.00..0.01 rows=1 width=32)
(actual time=0.004..0.006 rows=1 loops=1)
-> Bitmap Heap Scan on docmeta (cost=5300.28..20249.51
rows=4843 width=270) (actual time=126.795..530.422 rows=726985 loops=1)
Recheck Cond: (docmeta.docvector @@ qdoc.qdoc)
-> Bitmap Index Scan on doc_index (cost=0.00..5299.07
rows=4843 width=0) (actual time=113.742..113.742 rows=726985 loops=1)
Index Cond: (docmeta.docvector @@ qdoc.qdoc)
Total runtime: 742.906 ms

Thanks,
Sushant.

Search Discussions

  • Jan Urbański at Oct 24, 2010 at 12:57 pm

    On 24/10/10 14:44, Sushant Sinha wrote:
    I am using gin index on a tsvector and doing basic search. I see the
    row-estimate of the planner to be horribly wrong. It is returning
    row-estimate as 4843 for all queries whether it matches zero rows, a
    medium number of rows (88,000) or a large number of rows (726,000).

    The table has roughly a million docs.

    I see a similar problem reported here but thought it was fixed in 9.0
    which I am running.
    Hi,

    what's your default statistics target? Could you attach the output of

    select * from pg_stats where tablename = '<the-tablename>';

    and

    select typanalyze from pg_type where typname = 'tsvector';

    Thanks,
    Jan
  • Jan Urbański at Oct 24, 2010 at 1:07 pm

    On 24/10/10 14:44, Sushant Sinha wrote:
    I am using gin index on a tsvector and doing basic search. I see the
    row-estimate of the planner to be horribly wrong. It is returning
    row-estimate as 4843 for all queries whether it matches zero rows, a
    medium number of rows (88,000) or a large number of rows (726,000).

    The table has roughly a million docs.
    explain analyze select count(*) from docmeta,
    plainto_tsquery('english', 'dyfdfdf') as qdoc where docvector @@ qdoc;
    OK, forget my previous message. The problem is that you are doing a join
    using @@ as the operator for the join condition, so the planner uses the
    operator's join selectivity estimate. For @@ the tsmatchjoinsel function
    simply returns 0.005.

    Try doing:

    explain analyze select count(*) from docmeta where docvector @@
    plainto_tsquery('english', 'dyfdfdf');

    It should help.

    Cheers,
    Jan
  • Sushant Sinha at Oct 24, 2010 at 3:26 pm
    Thanks a ton Jan! It works quite correctly. But many tsearch tutorials
    ask tsquery to be placed in 'from' statement and that can cause bad
    plan. Isn't it possible to return the correct number for a join with the
    query as well?

    -Sushant.
    On Sun, 2010-10-24 at 15:07 +0200, Jan Urbański wrote:
    On 24/10/10 14:44, Sushant Sinha wrote:
    I am using gin index on a tsvector and doing basic search. I see the
    row-estimate of the planner to be horribly wrong. It is returning
    row-estimate as 4843 for all queries whether it matches zero rows, a
    medium number of rows (88,000) or a large number of rows (726,000).

    The table has roughly a million docs.
    explain analyze select count(*) from docmeta,
    plainto_tsquery('english', 'dyfdfdf') as qdoc where docvector @@ qdoc;
    OK, forget my previous message. The problem is that you are doing a join
    using @@ as the operator for the join condition, so the planner uses the
    operator's join selectivity estimate. For @@ the tsmatchjoinsel function
    simply returns 0.005.

    Try doing:

    explain analyze select count(*) from docmeta where docvector @@
    plainto_tsquery('english', 'dyfdfdf');

    It should help.

    Cheers,
    Jan

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedOct 24, '10 at 12:44p
activeOct 24, '10 at 3:26p
posts4
users2
websitepostgresql.org...
irc#postgresql

2 users in discussion

Sushant Sinha: 2 posts Jan Urbański: 2 posts

People

Translate

site design / logo © 2021 Grokbase