Hi list,

This simple query shouldn't cause two bitmap index scans:
EXPLAIN select * from test where b='0';

Bitmap Heap Scan on test (cost=1056.68..8200.12 rows=29839 width=314)
Recheck Cond: ((b = 0) AND (b = 0::smallint))
-> BitmapAnd (cost=1056.68..1056.68 rows=5237 width=0)
-> Bitmap Index Scan on test_i_idx (cost=0.00..485.45
rows=29839 width=0)
-> Bitmap Index Scan on test_b_c_idx (cost=0.00..556.06
rows=29839 width=0)
Index Cond: (b = 0::smallint)

One of the indexes is a partial index, and the other is just a simple index.

Apparently, for some reason, the '0' is expanded into both an integer
and a smallint literal and the planner thinks it can reduce rows by
checking the condition twice?

This is how I reproduced the issue:
set enable_indexscan=off;
create table test as select i, (i/30000)::smallint as b, 0::int as c,
repeat('x', 300) as filler from generate_series(1,170000) i;
create index test_i_idx on test (i) where b=0;
create index test_b_c_idx on test (b,c);
analyze test;
explain select * from test where b='0';

Reproduced on PostgreSQL 8.3.15, 8.4.8, 9.0.4, 9.1rc1 and 9.2devel.
However, this issue does NOT occur on 8.2.21

When I write the literal without quotes, I get a more sensible plan:
EXPLAIN select * from test where b=0;

Bitmap Heap Scan on test (cost=493.79..8260.88 rows=30007 width=314)
Recheck Cond: (b = 0)
-> Bitmap Index Scan on test_i_idx (cost=0.00..486.29 rows=30007 width=0)

Also, *before* analyzing the table, I get a good plan:
EXPLAIN select * from test where b='0';

Bitmap Heap Scan on test (cost=18.86..2450.01 rows=850 width=42)
Recheck Cond: (b = 0::smallint)
-> Bitmap Index Scan on test_b_c_idx (cost=0.00..18.64 rows=850 width=0)
Index Cond: (b = 0::smallint)


Regards,
Marti Raudsepp

Search Discussions

  • Tom Lane at Sep 5, 2011 at 6:07 pm

    Marti Raudsepp writes:
    This simple query shouldn't cause two bitmap index scans:
    EXPLAIN select * from test where b='0';
    Bitmap Heap Scan on test (cost=1056.68..8200.12 rows=29839 width=314)
    Recheck Cond: ((b = 0) AND (b = 0::smallint))
    -> BitmapAnd (cost=1056.68..1056.68 rows=5237 width=0)
    -> Bitmap Index Scan on test_i_idx (cost=0.00..485.45
    rows=29839 width=0)
    -> Bitmap Index Scan on test_b_c_idx (cost=0.00..556.06
    rows=29839 width=0)
    Index Cond: (b = 0::smallint)
    One of the indexes is a partial index, and the other is just a simple index.
    Apparently, for some reason, the '0' is expanded into both an integer
    and a smallint literal and the planner thinks it can reduce rows by
    checking the condition twice?
    Yeah, this happens because choose_bitmap_and() compromises between
    planning speed and exact detection of redundant index conditions.
    What we have to start with is WHERE b = 0::smallint, which the planner
    is able to prove implies the index predicate WHERE b = 0::integer,
    so both indexes are considered. But the check for predicate redundancy
    in choose_bitmap_and() only uses simple equality not provability,
    so it does not recognize that the two indexes are entirely redundant.

    I'm not really eager to change that, especially in view of the fact
    that a plain (non bitmap) indexscan is considerably cheaper than any
    of these alternatives in this example.

    I did have an idea while looking at this example --- namely, that
    we could provide some further protection cheaply with this simple hack
    in cost_bitmap_heap_scan:

    diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
    index 7812a8628fc94335aaf1f506c4ea5ebb7960f8d8..c001725267a06063f45bbcde0b09f5784b0f5c3a 100644
    *** a/src/backend/optimizer/path/costsize.c
    --- b/src/backend/optimizer/path/costsize.c
    *************** cost_bitmap_heap_scan(Path *path, Planne
    *** 607,612 ****
    --- 607,622 ----
    */
    tuples_fetched = clamp_row_est(indexSelectivity * baserel->tuples);

    + /*
    + * Disbelieve an estimate that's less than what we previously estimated
    + * for the actual number of rows needed from the table. This can happen
    + * when we are considering a bitmap AND of indexes with redundant
    + * conditions, since it's difficult for the selectivity code to recognize
    + * the redundancy. By clamping the cost estimate this way, we prevent
    + * redundant AND scans from looking cheaper than non-redundant ones.
    + */
    + tuples_fetched = Max(tuples_fetched, baserel->rows);
    +
    T = (baserel->pages > 1) ? (double) baserel->pages : 1.0;

    if (outer_rel != NULL && outer_rel->rows > 1)


    I tested this and it fixes this particular example, by preventing the
    heap scan part of the plan from looking cheaper than it does with just
    one index in use. (The index scan part is of course more expensive
    with extra indexes, so possibilities with extra indexes will lose out.)

    It'd be nice to imagine that this quick and dirty solution obsoletes
    the need for most of the expensive heuristics in choose_bitmap_and,
    but I'm afraid it probably does not. baserel->rows might include the
    effects of some non-index-related WHERE conditions, so the clamp here
    is not tight. Still, it should fix egregious cases like this one,
    so I'm inclined to apply it.
    Reproduced on PostgreSQL 8.3.15, 8.4.8, 9.0.4, 9.1rc1 and 9.2devel.
    However, this issue does NOT occur on 8.2.21
    8.2 doesn't recognize the partial index as applicable (for lack of
    enough understanding of cross-type operator relationships), so it
    doesn't reach the problematic decision.

    regards, tom lane
  • Marti Raudsepp at Sep 6, 2011 at 9:01 am

    On Mon, Sep 5, 2011 at 21:01, Tom Lane wrote:
    What we have to start with is WHERE b = 0::smallint, which the planner
    is able to prove implies the index predicate WHERE b = 0::integer,
    so both indexes are considered.  But the check for predicate redundancy
    in choose_bitmap_and() only uses simple equality not provability,
    so it does not recognize that the two indexes are entirely redundant.
    So it seems the more fundamental issue is that b=0 and b='0'
    conditions are normalized differently when b is smallint.

    Why doesn't this occur when b is bigint, though?
    I'm not really eager to change that, especially in view of the fact
    that a plain (non bitmap) indexscan is considerably cheaper than any
    of these alternatives in this example.
    I did hit this case with a real query, with enable_indexscan allowed.
    I just couldn't figure out how to make a more similar test case.
    +       tuples_fetched = Max(tuples_fetched, baserel->rows);
    I tested this and it fixes this particular example, by preventing the
    heap scan part of the plan from looking cheaper than it does with just
    one index in use.
    Cool, this should take care of the simpler cases.

    Regards,
    Marti
  • Tom Lane at Sep 6, 2011 at 4:51 pm

    Marti Raudsepp writes:
    On Mon, Sep 5, 2011 at 21:01, Tom Lane wrote:
    What we have to start with is WHERE b = 0::smallint, which the planner
    is able to prove implies the index predicate WHERE b = 0::integer,
    so both indexes are considered.  But the check for predicate redundancy
    in choose_bitmap_and() only uses simple equality not provability,
    so it does not recognize that the two indexes are entirely redundant.
    So it seems the more fundamental issue is that b=0 and b='0'
    conditions are normalized differently when b is smallint.
    That's not a "fundamental issue", if by that you mean that it's a bug to
    be fixed.
    Why doesn't this occur when b is bigint, though?
    Looks like the bigint index is enough larger that it's not thought to be
    worth the extra cost to scan. The underlying assumptions are all the
    same though.
    I tested this and it fixes this particular example, by preventing the
    heap scan part of the plan from looking cheaper than it does with just
    one index in use.
    Cool, this should take care of the simpler cases.
    I realized that that patch is no good because it will break estimation
    for inner-indexscan cases, where the selectivity of a bitmap index might
    legitimately be better than what you'd get from the restriction clauses
    alone. Possibly we could adapt the idea to use in choose_bitmap_and,
    but it'll take more thought.

    regards, tom lane
  • Marti Raudsepp at Sep 6, 2011 at 7:37 pm

    On Tue, Sep 6, 2011 at 19:51, Tom Lane wrote:
    Marti Raudsepp <marti@juffo.org> writes:
    So it seems the more fundamental issue is that b=0 and b='0'
    conditions are normalized differently when b is smallint.
    That's not a "fundamental issue", if by that you mean that it's a bug to
    be fixed.
    That was a bad choice of words, I meant "fundamental reason".
    I realized that that patch is no good because it will break estimation
    for inner-indexscan cases, where the selectivity of a bitmap index might
    legitimately be better than what you'd get from the restriction clauses
    alone.  Possibly we could adapt the idea to use in choose_bitmap_and,
    but it'll take more thought.
    Anyway the conditions for triggering this kind of plan are rather
    obscure, so unless there is a simple solution, it's probably not worth
    fixing.

    You need a constant query WHERE expression and partial index that
    mixes b=0 and b='0' syntax, on a non-integer column. The workaround is
    also straightforward: be consistent how you write the expression.

    The planner will only choose such a plan when you have a fairly large
    heap and small indexes, so the heap scan is likely to dominate query
    time anyway. In the original query it was a matter of 66 ms (single
    bitmap scan) vs 82 ms (redundant bitmap scans)

    Regards,
    Marti

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedSep 5, '11 at 8:01a
activeSep 6, '11 at 7:37p
posts5
users2
websitepostgresql.org...
irc#postgresql

2 users in discussion

Marti Raudsepp: 3 posts Tom Lane: 2 posts

People

Translate

site design / logo © 2021 Grokbase