Hi all,

I'm debugging a performance issue that looks like it might actually be an
issue/limitation/parameter/bug in the query planner, but since I couldn't
find anything authoritative on when exactly postgresql is able to use
partial not null indexes I'm not sure that that's the case and I was hopping
someone could give some clarity around that or point to an option I could
tweak that would change this behavior. Anyways the table in question (with
names changed) is below. I'm running postgres 8.4.1

\d
Column | Type | Modifiers
--------------------------+----------+-----------
id | integer | not null
sid | integer |
bid | integer |
m | date | not null
k | integer |
cc | text |
f | integer |
d | smallint |
u | smallint |
f2 | integer |
cm | text |
Indexes:
"scm_pkey" PRIMARY KEY, btree (id) WITH (fillfactor=100)
"index_scm_on_bid" btree (bid) WITH (fillfactor=100) WHERE bid IS NOT
NULL

~35 million rows (about 15 million of which have null bid). There are about
1 million distinct bids (with selectivity ranging from 1 to ~100,000 rows).

The end user selects an arbitrary number of bid's then we run several
queries one of which I include explain analyze output below. For <= 100
bids the planner uses the index and completes in ~35ms, for 101+ bid's the
planner uses a sequence scan and completes in ~45 seconds (3 orders of
magnitude slower). My first thought was that there was a problem with the
statistics/estimation in the planner, but using "set enable seq_scan=off;"
still does not use the index when there's over 100 bid's in the IN clause.
Breaking the IN clause into 2 < 100 element groups does however rescue the
use of the index and the fast performance as does creating a new non-partial
index on bid (i.e. an index "index_scm_on_bid2" btree (bid) WITH
(fillfactor=100) will be used with over 100 bid's). My best guess is that
this is do to some limit on the number of in clause elements the query
planner will check to see if match a partial index before giving up and
assuming it doesn't (if there is such a limit I'd definitely like to make it
larger, at least for this big of a table...). Is this 100 limit documented
anywhere?

Anyways my workarounds are to either split the IN clause into multiple < 10
element groups or switch the index to be across all values rather then not
null. The reason I tend to use not null partial indexes is that I have
another similarly sized table with 20 separately indexed columns each of
which only has about 10000 non-null rows and the disk space savings from the
not null partial index there are huge.

# <= 100 bids
=>explain analyze SELECT * FROM "scm" WHERE ((bid in
(1000071082,1558141,1261493,1558137,1558166,1622957,1261535,1558191,1885437,2025548,1558144,1261485,1261536,1261539,1261541,1000067964,1558183,1789348,1000090512,1558150,1000096731,1261533,2056107,1875527,1177541,1535923,1558184,1558155,1261538,1261537,1558140,1159311,1558188,1558185,1261529,1558158,1000021460,1558517,1000090515,1558194,1558143,1558153,1261484,1261542,1558156,1201225,1261481,1558157,1891458,1200735,1285621,1702779,1558135,1261540,1579615,1558189,1558154,2053227,1261531,1261488,1558139,1261527,1558192,1261530,1261528,1159310,1558136,1558138,1558164,1261543,1000015605,2053214,1558187,1183258,1184576,1558145,1558159,1208646,1622955,1558161,1558160,1208046,1000060938,1000067963,1000067965,1261487,1828875,1541699,1261491,1210589,1558162,1558151,1558152,1558163,1181201,1186001,1197776,1200734,1208043,1208044)))
order by m desc limit 100;
Limit (cost=66518.18..66518.43 rows=100 width=229) (actual
time=24.665..24.821 rows=100 loops=1)
-> Sort (cost=66518.18..66563.14 rows=17987 width=229) (actual
time=24.658..24.698 rows=100 loops=1)
Sort Key: m
Sort Method: top-N heapsort Memory: 85kB
-> Bitmap Heap Scan on scm (cost=566.83..65830.73 rows=17987
width=229) (actual time=1.863..12.850 rows=16430 loops=1)
Recheck Cond: (bid = ANY
('{1000071082,1558141,1261493,1558137,1558166,1622957,1261535,1558191,1885437,2025548,1558144,1261485,1261536,1261539,1261541,1000067964,1558183,1789348,1000090512,1558150,1000096731,1261533,2056107,1875527,1177541,1535923,1558184,1558155,1261538,1261537,1558140,1159311,1558188,1558185,1261529,1558158,1000021460,1558517,1000090515,1558194,1558143,1558153,1261484,1261542,1558156,1201225,1261481,1558157,1891458,1200735,1285621,1702779,1558135,1261540,1579615,1558189,1558154,2053227,1261531,1261488,1558139,1261527,1558192,1261530,1261528,1159310,1558136,1558138,1558164,1261543,1000015605,2053214,1558187,1183258,1184576,1558145,1558159,1208646,1622955,1558161,1558160,1208046,1000060938,1000067963,1000067965,1261487,1828875,1541699,1261491,1210589,1558162,1558151,1558152,1558163,1181201,1186001,1197776,1200734,1208043,1208044}'::integer[]))
-> Bitmap Index Scan on index_scm_on_bid (cost=0.00..562.33
rows=17987 width=0) (actual time=1.719..1.719 rows=16430 loops=1)
Index Cond: (bid = ANY
('{1000071082,1558141,1261493,1558137,1558166,1622957,1261535,1558191,1885437,2025548,1558144,1261485,1261536,1261539,1261541,1000067964,1558183,1789348,1000090512,1558150,1000096731,1261533,2056107,1875527,1177541,1535923,1558184,1558155,1261538,1261537,1558140,1159311,1558188,1558185,1261529,1558158,1000021460,1558517,1000090515,1558194,1558143,1558153,1261484,1261542,1558156,1201225,1261481,1558157,1891458,1200735,1285621,1702779,1558135,1261540,1579615,1558189,1558154,2053227,1261531,1261488,1558139,1261527,1558192,1261530,1261528,1159310,1558136,1558138,1558164,1261543,1000015605,2053214,1558187,1183258,1184576,1558145,1558159,1208646,1622955,1558161,1558160,1208046,1000060938,1000067963,1000067965,1261487,1828875,1541699,1261491,1210589,1558162,1558151,1558152,1558163,1181201,1186001,1197776,1200734,1208043,1208044}'::integer[]))
Total runtime: 24.908 ms

# > 100 bids
# NOTE - this is the same even after running
# set enable_seqscan = off;
# HOWEVER - this will use an index scan and run fast if we create a non
partial index on bid
=>explain analyze SELECT * FROM "scm" WHERE ((bid in
(1000071082,1558141,1261493,1558137,1558166,1622957,1261535,1558191,1885437,2025548,1558144,1261485,1261536,1261539,1261541,1000067964,1558183,1789348,1000090512,1558150,1000096731,1261533,2056107,1875527,1177541,1535923,1558184,1558155,1261538,1261537,1558140,1159311,1558188,1558185,1261529,1558158,1000021460,1558517,1000090515,1558194,1558143,1558153,1261484,1261542,1558156,1201225,1261481,1558157,1891458,1200735,1285621,1702779,1558135,1261540,1579615,1558189,1558154,2053227,1261531,1261488,1558139,1261527,1558192,1261530,1261528,1159310,1558136,1558138,1558164,1261543,1000015605,2053214,1558187,1183258,1184576,1558145,1558159,1208646,1622955,1558161,1558160,1208046,1000060938,1000067963,1000067965,1261487,1828875,1541699,1261491,1210589,1558162,1558151,1558152,1558163,1181201,1186001,1197776,1200734,1208043,1208044,1208045)))
order by m desc limit 100;
Limit (cost=3783824.03..3783824.28 rows=100 width=229) (actual
time=31229.140..31229.284 rows=100 loops=1)
-> Sort (cost=3783824.03..3783869.06 rows=18014 width=229) (actual
time=31229.137..31229.193 rows=100 loops=1)
Sort Key: m
Sort Method: top-N heapsort Memory: 82kB
-> Seq Scan on scm (cost=0.00..3783135.55 rows=18014 width=229)
(actual time=97.582..31217.270 rows=16433 loops=1)
Filter: (bid = ANY
('{1000071082,1558141,1261493,1558137,1558166,1622957,1261535,1558191,1885437,2025548,1558144,1261485,1261536,1261539,1261541,1000067964,1558183,1789348,1000090512,1558150,1000096731,1261533,2056107,1875527,1177541,1535923,1558184,1558155,1261538,1261537,1558140,1159311,1558188,1558185,1261529,1558158,1000021460,1558517,1000090515,1558194,1558143,1558153,1261484,1261542,1558156,1201225,1261481,1558157,1891458,1200735,1285621,1702779,1558135,1261540,1579615,1558189,1558154,2053227,1261531,1261488,1558139,1261527,1558192,1261530,1261528,1159310,1558136,1558138,1558164,1261543,1000015605,2053214,1558187,1183258,1184576,1558145,1558159,1208646,1622955,1558161,1558160,1208046,1000060938,1000067963,1000067965,1261487,1828875,1541699,1261491,1210589,1558162,1558151,1558152,1558163,1181201,1186001,1197776,1200734,1208043,1208044,1208045}'::integer[]))
Total runtime: 31229.377 ms

# Splitting the IN into <= 100 element pieces works
=>explain analyze SELECT * FROM "scm" WHERE ((bid in
(1000071082,1558141,1261493,1558137,1558166,1622957,1261535,1558191,1885437,2025548,1558144,1261485,1261536,1261539,1261541,1000067964,1558183,1789348,1000090512,1558150,1000096731,1261533,2056107,1875527,1177541,1535923,1558184,1558155,1261538,1261537,1558140,1159311,1558188,1558185,1261529,1558158,1000021460,1558517,1000090515,1558194,1558143,1558153,1261484,1261542,1558156,1201225,1261481,1558157,1891458,1200735,1285621,1702779,1558135,1261540,1579615,1558189,1558154,2053227,1261531,1261488,1558139,1261527,1558192,1261530,1261528,1159310,1558136,1558138,1558164,1261543,1000015605,2053214,1558187,1183258,1184576,1558145,1558159,1208646,1622955,1558161,1558160,1208046,1000060938,1000067963,1000067965,1261487,1828875,1541699,1261491,1210589,1558162,1558151,1558152,1558163,1181201,1186001,1197776,1200734,1208043,1208044)))
OR ((bid IN (1208045))) order by m desc limit 100;
Limit (cost=66664.94..66665.19 rows=100 width=229) (actual
time=24.749..24.902 rows=100 loops=1)
-> Sort (cost=66664.94..66709.98 rows=18014 width=229) (actual
time=24.747..24.795 rows=100 loops=1)
Sort Key: m
Sort Method: top-N heapsort Memory: 85kB
-> Bitmap Heap Scan on scm (cost=577.99..65976.46 rows=18014
width=229) (actual time=1.867..12.888 rows=16433 loops=1)
Recheck Cond: ((bid = ANY
('{1000071082,1558141,1261493,1558137,1558166,1622957,1261535,1558191,1885437,2025548,1558144,1261485,1261536,1261539,1261541,1000067964,1558183,1789348,1000090512,1558150,1000096731,1261533,2056107,1875527,1177541,1535923,1558184,1558155,1261538,1261537,1558140,1159311,1558188,1558185,1261529,1558158,1000021460,1558517,1000090515,1558194,1558143,1558153,1261484,1261542,1558156,1201225,1261481,1558157,1891458,1200735,1285621,1702779,1558135,1261540,1579615,1558189,1558154,2053227,1261531,1261488,1558139,1261527,1558192,1261530,1261528,1159310,1558136,1558138,1558164,1261543,1000015605,2053214,1558187,1183258,1184576,1558145,1558159,1208646,1622955,1558161,1558160,1208046,1000060938,1000067963,1000067965,1261487,1828875,1541699,1261491,1210589,1558162,1558151,1558152,1558163,1181201,1186001,1197776,1200734,1208043,1208044}'::integer[]))
OR (bid = 1208045))
-> BitmapOr (cost=577.99..577.99 rows=18014 width=0)
(actual time=1.720..1.720 rows=0 loops=1)
-> Bitmap Index Scan on index_scm_on_bid
(cost=0.00..562.33 rows=17987 width=0) (actual time=1.715..1.715 rows=16430
loops=1)
Index Cond: (bid = ANY
('{1000071082,1558141,1261493,1558137,1558166,1622957,1261535,1558191,1885437,2025548,1558144,1261485,1261536,1261539,1261541,1000067964,1558183,1789348,1000090512,1558150,1000096731,1261533,2056107,1875527,1177541,1535923,1558184,1558155,1261538,1261537,1558140,1159311,1558188,1558185,1261529,1558158,1000021460,1558517,1000090515,1558194,1558143,1558153,1261484,1261542,1558156,1201225,1261481,1558157,1891458,1200735,1285621,1702779,1558135,1261540,1579615,1558189,1558154,2053227,1261531,1261488,1558139,1261527,1558192,1261530,1261528,1159310,1558136,1558138,1558164,1261543,1000015605,2053214,1558187,1183258,1184576,1558145,1558159,1208646,1622955,1558161,1558160,1208046,1000060938,1000067963,1000067965,1261487,1828875,1541699,1261491,1210589,1558162,1558151,1558152,1558163,1181201,1186001,1197776,1200734,1208043,1208044}'::integer[]))
-> Bitmap Index Scan on index_scm_on_bid
(cost=0.00..6.66 rows=27 width=0) (actual time=0.003..0.003 rows=3 loops=1)
Index Cond: (bid = 1208045)
Total runtime: 24.998 ms


Thanks!
Tim

Search Discussions

  • Tom Lane at Aug 3, 2010 at 9:01 pm

    Timothy Garnett writes:
    ... My first thought was that there was a problem with the
    statistics/estimation in the planner, but using "set enable seq_scan=off;"
    still does not use the index when there's over 100 bid's in the IN clause.
    Breaking the IN clause into 2 < 100 element groups does however rescue the
    use of the index and the fast performance as does creating a new non-partial
    index on bid (i.e. an index "index_scm_on_bid2" btree (bid) WITH
    (fillfactor=100) will be used with over 100 bid's).
    I think you're hitting the code that abandons attempts to prove
    constraints true when the expressions get too large (to avoid O(N^2)
    or worse behavior). Could you just add an explicit AND bid IS NOT NULL
    when you know none of the items in the IN clause will be null?

    regards, tom lane
  • Timothy Garnett at Aug 3, 2010 at 9:14 pm
    Adding the is not null clause does allow the query to use the index again
    (and is a much cleaner workaround in that I don't have to change the indexes
    or rely on any magic number for splitting the in clauses). Also makes sense
    since it more exactly matches the partial indexing condition.

    Thanks Tom!

    Tim

    => SELECT * FROM scm WHERE ((bid in
    (1000071082,1558141,1261493,1558137,1558166,1622957,1261535,1558191,1885437,2025548,1558144,1261485,1261536,1261539,1261541,1000067964,1558183,1789348,1000090512,1558150,1000096731,1261533,2056107,1875527,1177541,1535923,1558184,1558155,1261538,1261537,1558140,1159311,1558188,1558185,1261529,1558158,1000021460,1558517,1000090515,1558194,1558143,1558153,1261484,1261542,1558156,1201225,1261481,1558157,1891458,1200735,1285621,1702779,1558135,1261540,1579615,1558189,1558154,2053227,1261531,1261488,1558139,1261527,1558192,1261530,1261528,1159310,1558136,1558138,1558164,1261543,1000015605,2053214,1558187,1183258,1184576,1558145,1558159,1208646,1622955,1558161,1558160,1208046,1000060938,1000067963,1000067965,1261487,1828875,1541699,1261491,1210589,1558162,1558151,1558152,1558163,1181201,1186001,1197776,1200734,1208043,1208044,1208045,1261486,1558142,1558146,1000096733,1000091036,1190958,1261532,1178300,1544212,1000015606,1637671,1261482,1261489,1261483,1875455,1000015596,1558165,1000152148,1000152147,1000152146,1000141594,1000141133,1000172483,1000191484,1000191485,1000196236,1000236337,1000241756,1000242921,1000256842,1000257993,1000270323,1000272820,1000281535,1000297033,1000297039,1000297446,1000301868,1000307196,1000316101,1000331822,1000334293,1000342550,1000352078,1000367699,1000372920,1000373959,1000383317,1000400498,1000405863,1000412281,1000420780,1000430861)))
    AND bid IS NOT NULL ORDER BY m DESC LIMIT 100 OFFSET 0;
    Limit (cost=80925.25..80925.50 rows=100 width=229)
    -> Sort (cost=80925.25..80979.66 rows=21765 width=229)
    Sort Key: m
    -> Bitmap Heap Scan on scm (cost=825.19..80093.41 rows=21765
    width=229)
    Recheck Cond: ((bid = ANY
    ('{1000071082,1558141,1261493,1558137,1558166,1622957,1261535,1558191,1885437,2025548,1558144,1261485,1261536,1261539,1261541,1000067964,1558183,1789348,1000090512,1558150,1000096731,1261533,2056107,1875527,117541,1535923,1558184,1558155,1261538,1261537,1558140,1159311,1558188,1558185,1261529,1558158,1000021460,1558517,1000090515,1558194,1558143,1558153,1261484,1261542,1558156,1201225,1261481,1558157,1891458,1200735,1285621,1702779,1558135,1261540,1579615,155889,1558154,2053227,1261531,1261488,1558139,1261527,1558192,1261530,1261528,1159310,1558136,1558138,1558164,1261543,1000015605,2053214,1558187,1183258,1184576,1558145,1558159,1208646,1622955,1558161,1558160,1208046,1000060938,1000067963,1000067965,1261487,828875,1541699,1261491,1210589,1558162,1558151,1558152,1558163,1181201,1186001,1197776,1200734,1208043,1208044,1208045,1261486,1558142,1558146,1000096733,1000091036,1190958,1261532,1178300,1544212,1000015606,1637671,1261482,1261489,1261483,1875455,100001596,1558165,1000152148,1000152147,1000152146,1000141594,1000141133,1000172483,1000191484,1000191485,1000196236,1000236337,1000241756,1000242921,1000256842,1000257993,1000270323,1000272820,1000281535,1000297033,1000297039,1000297446,1000301868,1000307196,100316101,1000331822,1000334293,1000342550,1000352078,1000367699,1000372920,1000373959,1000383317,1000400498,1000405863,1000412281,1000420780,1000430861}'::integer[]))
    AND (bid IS NOT NULL))
    -> Bitmap Index Scan on index_scm_on_bid (cost=0.00..819.75
    rows=21765 width=0)
    Index Cond: (bid = ANY
    ('{1000071082,1558141,1261493,1558137,1558166,1622957,1261535,1558191,1885437,2025548,1558144,1261485,1261536,1261539,1261541,1000067964,1558183,1789348,1000090512,1558150,1000096731,1261533,2056107,18755271177541,1535923,1558184,1558155,1261538,1261537,1558140,1159311,1558188,1558185,1261529,1558158,1000021460,1558517,1000090515,1558194,1558143,1558153,1261484,1261542,1558156,1201225,1261481,1558157,1891458,1200735,1285621,1702779,1558135,1261540,1579615,158189,1558154,2053227,1261531,1261488,1558139,1261527,1558192,1261530,1261528,1159310,1558136,1558138,1558164,1261543,1000015605,2053214,1558187,1183258,1184576,1558145,1558159,1208646,1622955,1558161,1558160,1208046,1000060938,1000067963,1000067965,126147,1828875,1541699,1261491,1210589,1558162,1558151,1558152,1558163,1181201,1186001,1197776,1200734,1208043,1208044,1208045,1261486,1558142,1558146,1000096733,1000091036,1190958,1261532,1178300,1544212,1000015606,1637671,1261482,1261489,1261483,1875455,100015596,1558165,1000152148,1000152147,1000152146,1000141594,1000141133,1000172483,1000191484,1000191485,1000196236,1000236337,1000241756,1000242921,1000256842,1000257993,1000270323,1000272820,1000281535,1000297033,1000297039,1000297446,1000301868,10003071961000316101,1000331822,1000334293,1000342550,1000352078,1000367699,1000372920,1000373959,1000383317,1000400498,1000405863,1000412281,1000420780,1000430861}'::integer[]))
    (7 rows)
    Total runtime: 47.137 ms

    On Tue, Aug 3, 2010 at 5:01 PM, Tom Lane wrote:

    Timothy Garnett <tgarnett@panjiva.com> writes:
    ... My first thought was that there was a problem with the
    statistics/estimation in the planner, but using "set enable
    seq_scan=off;"
    still does not use the index when there's over 100 bid's in the IN clause.
    Breaking the IN clause into 2 < 100 element groups does however rescue the
    use of the index and the fast performance as does creating a new
    non-partial
    index on bid (i.e. an index "index_scm_on_bid2" btree (bid) WITH
    (fillfactor=100) will be used with over 100 bid's).
    I think you're hitting the code that abandons attempts to prove
    constraints true when the expressions get too large (to avoid O(N^2)
    or worse behavior). Could you just add an explicit AND bid IS NOT NULL
    when you know none of the items in the IN clause will be null?

    regards, tom lane
  • Scott Marlowe at Aug 3, 2010 at 9:01 pm

    On Tue, Aug 3, 2010 at 2:03 PM, Timothy Garnett wrote:
    Hi all,

    I'm debugging a performance issue that looks like it might actually be an
    issue/limitation/parameter/bug in the query planner, but since I couldn't
    find anything authoritative on when exactly postgresql is able to use
    partial not null indexes I'm not sure that that's the case and I was hopping
    someone could give some clarity around that or point to an option I could
    tweak that would change this behavior.  Anyways the table in question (with
    names changed) is below.  I'm running postgres 8.4.1
    8.4.1 has some pretty nasty bugs that have since been fixed, and some
    work was done on the planner as well since then. First upgrade and
    see if your problem goes away.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-general @
categoriespostgresql
postedAug 3, '10 at 8:10p
activeAug 3, '10 at 9:14p
posts4
users3
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase