I have run into issue where the query optimizer is choosing the wrong
execution plan when I'm trying to join two large tables that have been
partitioned. I would really appreciate it if someone could help me out
this. I don't know whether I've found a bug in the optimizer, or whether
there is some parameter/option I need to set in postgres. Below, I've
included my execution plans. I'm using postgres 9.0.3, and I'm running this
on a pretty beefy Linux server.

My two tables:
-widget: has 4041866 records, and is broken up into 4 partitions (no records
are in the parent table).
-icecream: I'm starting with zero records, but since this there could be
billions of ice-cream records, I will partition and will not have any
records in the parent table.

So, then I then create my first partition in icecream table, and load
4041866 records into it.

Here is the query I'm using to join the two tables:
explain analyze
SELECT
r.widget_id,
r.widget_type_id,
avg(rc.cost)::double precision cost_avg
FROM
widget r,
icecream rc
WHERE
r.widget_type_id = 4
and r.widgetset_id = 5
AND r.widget_id = rc.widget_id
and rc.dataset_id = 281
group by r.widget_id,r.chromosome, r.start_pos, r.end_pos,r.widget_type_id
;

Here is the corresponding execution plan:


QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=147262.20..147299.12 rows=1136 width=41) (actual
time=31876.290..31904.880 rows=11028 loops=1)
-> Merge Join (cost=95574.83..112841.79 rows=1147347 width=41) (actual
time=31130.870..31832.922 rows=11028 loops=1)
Merge Cond: (r.widget_id = rc.widget_id)
-> Sort (cost=1913.89..1942.27 rows=11352 width=21) (actual
time=56.818..68.701 rows=11028 loops=1)
Sort Key: r.widget_id
Sort Method: quicksort Memory: 1246kB
-> Append (cost=4.28..1149.30 rows=11352 width=21) (actual
time=0.139..40.513 rows=11028 loops=1)
-> Bitmap Heap Scan on widget r (cost=4.28..12.75
rows=1 width=48) (actual time=0.030..0.030 rows=0 loops=1)
Recheck Cond: (widgetset_id = 5)
Filter: (widget_type_id = 4)
-> Bitmap Index Scan on widget_widgetset_id_idx
(cost=0.00..4.28 rows=4 width=0) (actual time=0.023..0.023 rows=0 loops=1)
Index Cond: (widgetset_id = 5)
-> Index Scan using
widget_part_5_widget_widget_type_id_idx on widget_part_5 r
(cost=0.00..1136.55 rows=11351 width=21) (actual time=0.106..18.489
rows=11028 loops=1)
Index Cond: (widget_type_id = 4)
Filter: (widgetset_id = 5)
-> Sort (cost=93660.94..93711.47 rows=20214 width=24) (actual
time=29730.522..30766.354 rows=946140 loops=1)
Sort Key: rc.widget_id
Sort Method: external sort Disk: 165952kB
-> Append (cost=0.00..92215.33 rows=20214 width=24) (actual
time=0.057..13731.204 rows=4041866 loops=1)
-> Seq Scan on icecream rc (cost=0.00..23.00 rows=5
width=24) (actual time=0.002..0.002 rows=0 loops=1)
Filter: (dataset_id = 281)
-> Seq Scan on icecream_part_281 rc
(cost=0.00..92192.33 rows=20209 width=24) (actual time=0.051..5427.730
rows=4041866 loops=1)
Filter: (dataset_id = 281)
Total runtime: 33182.945 ms
(24 rows)


The query is doing a merge join, is taking 33 seconds, but should take less
than a second. So, then I do: select * from icecream;

Now, when I run the same query again, I get a different and correct
execution plan (nested loop), and the query takes less than 1 second as I
would expect.



QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=7223611.41..7223648.33 rows=1136 width=41) (actual
time=392.822..420.166 rows=11028 loops=1)
-> Nested Loop (cost=4.28..341195.22 rows=229413873 width=41) (actual
time=0.231..331.800 rows=11028 loops=1)
Join Filter: (r.widget_id = rc.widget_id)
-> Append (cost=4.28..1149.30 rows=11352 width=21) (actual
time=0.051..50.181 rows=11028 loops=1)
-> Bitmap Heap Scan on widget r (cost=4.28..12.75 rows=1
width=48) (actual time=0.013..0.013 rows=0 loops=1)
Recheck Cond: (widgetset_id = 5)
Filter: (widget_type_id = 4)
-> Bitmap Index Scan on widget_widgetset_id_idx
(cost=0.00..4.28 rows=4 width=0) (actual time=0.007..0.007 rows=0 loops=1)
Index Cond: (widgetset_id = 5)
-> Index Scan using widget_part_5_widget_widget_type_id_idx
on widget_part_5 r (cost=0.00..1136.55 rows=11351 width=21) (actual
time=0.033..21.254 rows=11028 loops=1)
Index Cond: (widget_type_id = 4)
Filter: (widgetset_id = 5)
-> Append (cost=0.00..29.88 rows=6 width=24) (actual
time=0.014..0.018 rows=1 loops=11028)
-> Seq Scan on icecream rc (cost=0.00..23.00 rows=5
width=24) (actual time=0.001..0.001 rows=0 loops=11028)
Filter: (rc.dataset_id = 281)
-> Index Scan using icecream_part_281_widget_id_idx on
icecream_part_281 rc (cost=0.00..6.88 rows=1 width=24) (actual
time=0.009..0.010 rows=1 loops=11028)
Index Cond: (rc.widget_id = r.widget_id)
Filter: (rc.dataset_id = 281)
Total runtime: 431.935 ms
(19 rows)


My guess as to what happened:
-because the icecream parent table has zero records, the query optimizer
chooses the incorrect execution plan
-when I do select * from icecream, the optimizer now knows how many records
are really in the icecream table, by knowing that the icecream table has
partitions.

Next, if I run vacuum analyze on the parent table, I again get a wrong/slow
execution plan (this time it uses the hash join). Again, I think this is
because the parent table itself has zero records.



QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=191926.03..191962.95 rows=1136 width=41) (actual
time=28967.567..28994.395 rows=11028 loops=1)
-> Hash Join (cost=166424.79..191585.47 rows=11352 width=41) (actual
time=28539.196..28917.830 rows=11028 loops=1)
Hash Cond: (r.widget_id = rc.widget_id)
-> Append (cost=4.28..1149.30 rows=11352 width=21) (actual
time=0.054..54.068 rows=11028 loops=1)
-> Bitmap Heap Scan on widget r (cost=4.28..12.75 rows=1
width=48) (actual time=0.013..0.013 rows=0 loops=1)
Recheck Cond: (widgetset_id = 5)
Filter: (widget_type_id = 4)
-> Bitmap Index Scan on widget_widgetset_id_idx
(cost=0.00..4.28 rows=4 width=0) (actual time=0.007..0.007 rows=0 loops=1)
Index Cond: (widgetset_id = 5)
-> Index Scan using widget_part_5_widget_widget_type_id_idx
on widget_part_5 r (cost=0.00..1136.55 rows=11351 width=21) (actual
time=0.035..22.419 rows=11028 loops=1)
Index Cond: (widget_type_id = 4)
Filter: (widgetset_id = 5)
-> Hash (cost=92214.73..92214.73 rows=4041823 width=24) (actual
time=28438.419..28438.419 rows=4041866 loops=1)
Buckets: 524288 Batches: 2 Memory Usage: 118449kB
-> Append (cost=0.00..92214.73 rows=4041823 width=24)
(actual time=0.020..14896.908 rows=4041866 loops=1)
-> Seq Scan on icecream rc (cost=0.00..23.00 rows=5
width=24) (actual time=0.002..0.002 rows=0 loops=1)
Filter: (dataset_id = 281)
-> Seq Scan on icecream_part_281 rc
(cost=0.00..92191.73 rows=4041818 width=24) (actual time=0.012..5718.592
rows=4041866 loops=1)
Filter: (dataset_id = 281)
Total runtime: 29007.937 ms
(20 rows)


select * from icecream does not fix this issue.


I could of course disable hash join and merge join to force postgres to use
a nested loop, but my system is often joining these two tables, and I'd
rather not have to set this in every single place.
set enable_mergejoin=off;
set enable_hashjoin=off;
set enable_nestloop = on;


thanks in advance!!!

Anish

Search Discussions

  • Heikki Linnakangas at Jul 9, 2011 at 7:54 am

    On 09.07.2011 00:36, Anish Kejariwal wrote:
    My guess as to what happened:
    -because the icecream parent table has zero records, the query optimizer
    chooses the incorrect execution plan
    -when I do select * from icecream, the optimizer now knows how many records
    are really in the icecream table, by knowing that the icecream table has
    partitions.
    "select * from icecream" won't have any direct effect on the
    optimization of subsequent queries. What probably happened is that
    autoanalyze ran in the background while you ran that select, and
    analyzed some of the partitions. Simply waiting a while would've had the
    same effect.
    Next, if I run vacuum analyze on the parent table, I again get a wrong/slow
    execution plan (this time it uses the hash join). Again, I think this is
    because the parent table itself has zero records.



    QUERY PLAN

    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    HashAggregate (cost=191926.03..191962.95 rows=1136 width=41) (actual
    time=28967.567..28994.395 rows=11028 loops=1)
    -> Hash Join (cost=166424.79..191585.47 rows=11352 width=41) (actual
    time=28539.196..28917.830 rows=11028 loops=1)
    Hash Cond: (r.widget_id = rc.widget_id)
    -> Append (cost=4.28..1149.30 rows=11352 width=21) (actual
    time=0.054..54.068 rows=11028 loops=1)
    -> Bitmap Heap Scan on widget r (cost=4.28..12.75 rows=1
    width=48) (actual time=0.013..0.013 rows=0 loops=1)
    Recheck Cond: (widgetset_id = 5)
    Filter: (widget_type_id = 4)
    -> Bitmap Index Scan on widget_widgetset_id_idx
    (cost=0.00..4.28 rows=4 width=0) (actual time=0.007..0.007 rows=0 loops=1)
    Index Cond: (widgetset_id = 5)
    -> Index Scan using widget_part_5_widget_widget_type_id_idx
    on widget_part_5 r (cost=0.00..1136.55 rows=11351 width=21) (actual
    time=0.035..22.419 rows=11028 loops=1)
    Index Cond: (widget_type_id = 4)
    Filter: (widgetset_id = 5)
    -> Hash (cost=92214.73..92214.73 rows=4041823 width=24) (actual
    time=28438.419..28438.419 rows=4041866 loops=1)
    Buckets: 524288 Batches: 2 Memory Usage: 118449kB
    -> Append (cost=0.00..92214.73 rows=4041823 width=24)
    (actual time=0.020..14896.908 rows=4041866 loops=1)
    -> Seq Scan on icecream rc (cost=0.00..23.00 rows=5
    width=24) (actual time=0.002..0.002 rows=0 loops=1)
    Filter: (dataset_id = 281)
    -> Seq Scan on icecream_part_281 rc
    (cost=0.00..92191.73 rows=4041818 width=24) (actual time=0.012..5718.592
    rows=4041866 loops=1)
    Filter: (dataset_id = 281)
    Total runtime: 29007.937 ms
    (20 rows)
    The cost estimates in the above slow plan are pretty accurate, so I
    suspect the cost estimates for the fast plan are not, or the planner
    would choose that.
    I could of course disable hash join and merge join to force postgres to use
    a nested loop, but my system is often joining these two tables, and I'd
    rather not have to set this in every single place.
    set enable_mergejoin=off;
    set enable_hashjoin=off;
    set enable_nestloop = on;
    Can you do explain analyze with these settings? That might give us a
    clue on where it's going wrong.

    Also, I suspect that when you load more data into icecream, the planner
    might start to pick the faster plan, because the seqscan on icecream
    will start to look more expensive compared to the index scan and nested
    loop join in the faster plan.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Tom Lane at Jul 9, 2011 at 5:44 pm

    Heikki Linnakangas writes:
    On 09.07.2011 00:36, Anish Kejariwal wrote:
    My guess as to what happened:
    -because the icecream parent table has zero records, the query optimizer
    chooses the incorrect execution plan
    -when I do select * from icecream, the optimizer now knows how many records
    are really in the icecream table, by knowing that the icecream table has
    partitions.
    "select * from icecream" won't have any direct effect on the
    optimization of subsequent queries. What probably happened is that
    autoanalyze ran in the background while you ran that select, and
    analyzed some of the partitions. Simply waiting a while would've had the
    same effect.
    Yeah. Also, the reason that a manual vacuum on icecream changes things
    yet again is that in 9.0 and up, we have a notion of summary stats
    across the whole inheritance tree, but autoanalyze hasn't been taught to
    gather those. The manual command on the parent table does gather them,
    though.

    So what's happening here is that we suddenly have an accurate idea of
    the size of the join product as a result of having inheritance summary
    stats to estimate with, and that drives the estimated cost of the merge
    or hash join down out of the stratosphere. The estimated cost of the
    nestloop goes down a lot too, but not as much.

    I experimented with a similar case here, and it seems like a lot of the
    remaining error in the nestloop estimate comes from this:
    -> Seq Scan on icecream rc (cost=0.00..23.00 rows=5 width=24) (actual time=0.002..0.002 rows=0 loops=1)
    Filter: (dataset_id = 281)
    The indexscan on the nonempty child partition is estimated at less than
    10 cost units, so this is a *large* fraction of what the planner sees as
    the per-outer-row cost of a nestloop. And with more than 11000 rows on
    the other side of the join, that discourages it from using the nestloop.
    In reality of course this takes negligible time compared to examining
    the child partition.

    Now why is the seqscan cost estimate so large, when actually the parent
    icecream table is totally empty? It's because the planner has been
    taught to never believe that an empty table is empty. If memory serves,
    it's really estimating on an assumption that the table contains 10 pages
    and some corresponding number of rows. This is a reasonable defensive
    posture when dealing with ordinary tables, I think, since most likely
    if the catalogs say the table is empty that's just a leftover from when
    it was created. But maybe we should reconsider the heuristic for tables
    that are members of inheritance trees --- particularly parents of
    inheritance trees.

    I was able to defeat the empty-table heuristic here by doing

    update pg_class set relpages = 1 where relname = 'icecream';

    and then I started getting much more realistic estimates in my test
    case. (It still wanted to use a merge join initially, but after
    knocking down random_page_cost it went to the nestloop.) It would
    be interesting to see what sorts of results Anish gets with that.

    regards, tom lane
  • Anish Kejariwal at Jul 10, 2011 at 3:34 pm
    Thanks Tom and Heikki! I really appreciate your help.

    I went ahead and loaded all the data. In the icream table, I now have ~175
    partitions, each with 4041866 records.

    The data finished loading 12 hours ago, and I then ran the same query I gave
    you guys, and it took 25 seconds since it used the wrong execution plan as
    expected.

    HashAggregate (cost=27680.90..28045.81 rows=11228 width=41) (actual
    time=24769.190..24817.618 rows=11028 loops=1)
    -> Hash Join (cost=1304.04..18901.88 rows=292634 width=41) (actual
    time=3938.965..24688.718 rows=11028 loops=1)
    Hash Cond: (rc.widget_id = r.widget_id)
    -> Append (cost=0.00..12110.95 rows=292634 width=24) (actual
    time=2854.925..22887.638 rows=309579 loops=1)
    -> Seq Scan on icecream rc (cost=0.00..25.60 rows=1
    width=24) (actual time=0.003..0.003 rows=0 loops=1)
    Filter: ((widgetset_id = 5) AND (dataset_id = 283))
    -> Index Scan using icecream_part_283_widgetset_id_idx on
    icecream_part_283 rc (cost=0.00..12085.35 rows=292633 width=24) (actual
    time=2854.915..2
    1784.769 rows=309579 loops=1)
    Index Cond: (widgetset_id = 5)
    Filter: (dataset_id = 283)
    -> Hash (cost=1163.69..1163.69 rows=11228 width=21) (actual
    time=1083.704..1083.704 rows=11028 loops=1)
    Buckets: 2048 Batches: 1 Memory Usage: 604kB
    -> Append (cost=4.28..1163.69 rows=11228 width=21) (actual
    time=528.216..1066.659 rows=11028 loops=1)
    -> Bitmap Heap Scan on widget r (cost=4.28..12.75
    rows=1 width=48) (actual time=528.017..528.017 rows=0 loops=1)
    Recheck Cond: (widgetset_id = 5)
    Filter: (widget_type_id = 4)
    -> Bitmap Index Scan on widget_widgetset_id_idx
    (cost=0.00..4.28 rows=4 width=0) (actual time=527.995..527.995 rows=0
    loops=1)
    Index Cond: (widgetset_id = 5)
    -> Index Scan using
    widget_part_5_widget_widget_type_id_idx on widget_part_5 r
    (cost=0.00..1150.94 rows=11227 width=21) (actual time=0.191..512.847 rows=1
    1028 loops=1)
    Index Cond: (widget_type_id = 4)
    Filter: (widgetset_id = 5)
    Total runtime: 24844.016 ms
    (21 rows)


    I then changed my enable* options to force it to use a nested loop:
    set enable_mergejoin=off;
    set enable_hashjoin=off;
    set enable_nestloop = on;


    HashAggregate (cost=460004.79..460369.70 rows=11228 width=41) (actual
    time=298.014..341.822 rows=11028 loops=1)
    -> Nested Loop (cost=4.28..338747.85 rows=4041898 width=41) (actual
    time=0.175..248.529 rows=11028 loops=1)
    Join Filter: (r.widget_id = rc.widget_id)
    -> Append (cost=4.28..1163.69 rows=11228 width=21) (actual
    time=0.053..42.532 rows=11028 loops=1)
    -> Bitmap Heap Scan on widget r (cost=4.28..12.75 rows=1
    width=48) (actual time=0.014..0.014 rows=0 loops=1)
    Recheck Cond: (widgetset_id = 5)
    Filter: (widget_type_id = 4)
    -> Bitmap Index Scan on widget_widgetset_id_idx
    (cost=0.00..4.28 rows=4 width=0) (actual time=0.008..0.008 rows=0 loops=1)
    Index Cond: (widgetset_id = 5)
    -> Index Scan using widget_part_5_widget_widget_type_id_idx
    on widget_part_5 r (cost=0.00..1150.94 rows=11227 width=21) (actual
    time=0.032..18.410 rows=11028 lo
    ops=1)
    Index Cond: (widget_type_id = 4)
    Filter: (widgetset_id = 5)
    -> Append (cost=0.00..29.99 rows=6 width=24) (actual
    time=0.009..0.012 rows=1 loops=11028)
    -> Seq Scan on icecream rc (cost=0.00..23.00 rows=5
    width=24) (actual time=0.001..0.001 rows=0 loops=11028)
    Filter: (rc.dataset_id = 283)
    -> Index Scan using icecream_part_283_widget_id_idx on
    icecream_part_283 rc (cost=0.00..6.99 rows=1 width=24) (actual
    time=0.004..0.006 rows=1 loo
    ps=11028)
    Index Cond: (rc.widget_id = r.widget_id)
    Filter: (rc.dataset_id = 283)
    Total runtime: 361.180 ms
    (19 rows)

    The query was nice and fast as expected.

    I then restored the enable* options back to default. The query was slow
    again and taking around ~19 seconds. So, this gives us some information
    about whether autoanalyze is running in the background:
    -I don't think waiting is making a difference. I loaded the data 12 hours
    ago, and then ran the query and the query was very slow. I then ran the
    query again but for a different dataset and it was also slow. It was only
    once I changed the enable* parameters could I get my expected performance.
    -I would have tried select * from ice-cream again, but with 176 partitions
    this query is no longer feasible. But, I do agree that select * from
    icecream is causing the statistics to be updated.

    So, then I followed Tom's suggestion to defeat the empty-table heuristic.

    select relpages from pg_class where relname = 'icecream';

    relpages
    ----------

    (1 row)

    Ok, so the planner thinks that the parent table is empty. I then ran:
    update pg_class set relpages = 1 where relname = 'icecream';


    HashAggregate (cost=201199.27..201564.18 rows=11228 width=41) (actual
    time=277.195..304.620 rows=11028 loops=1)
    -> Nested Loop (cost=4.28..79942.45 rows=4041894 width=41) (actual
    time=0.227..231.181 rows=11028 loops=1)
    Join Filter: (r.widget_id = rc.widget_id)
    -> Append (cost=4.28..1163.69 rows=11228 width=21) (actual
    time=0.125..40.834 rows=11028 loops=1)
    -> Bitmap Heap Scan on widget r (cost=4.28..12.75 rows=1
    width=48) (actual time=0.022..0.022 rows=0 loops=1)
    Recheck Cond: (widgetset_id = 5)
    Filter: (widget_type_id = 4)
    -> Bitmap Index Scan on widget_widgetset_id_idx
    (cost=0.00..4.28 rows=4 width=0) (actual time=0.019..0.019 rows=0 loops=1)
    Index Cond: (widgetset_id = 5)
    -> Index Scan using widget_part_5_widget_widget_type_id_idx
    on widget_part_5 r (cost=0.00..1150.94 rows=11227 width=21) (actual
    time=0.100..18.964 rows=11028 lo
    ops=1)
    Index Cond: (widget_type_id = 4)
    Filter: (widgetset_id = 5)
    -> Append (cost=0.00..6.99 rows=2 width=24) (actual
    time=0.008..0.012 rows=1 loops=11028)
    -> Seq Scan on icecream rc (cost=0.00..0.00 rows=1
    width=24) (actual time=0.001..0.001 rows=0 loops=11028)
    Filter: (rc.dataset_id = 283)
    -> Index Scan using icecream_part_283_widget_id_idx on
    icecream_part_283 rc (cost=0.00..6.99 rows=1 width=24) (actual
    time=0.004..0.006 rows=1 loo
    ps=11028)
    Index Cond: (rc.widget_id = r.widget_id)
    Filter: (rc.dataset_id = 283)
    Total runtime: 318.634 ms
    (19 rows)


    Wow! that fixes it. Thanks you so much!!!! I've been struggling with this
    issue for 2-3 days. (Also, in the past, I've seen inconsistent performance
    with this query, which may be the result of the planner sometimes choosing
    the wrong plan, but I'll chase that down later).

    Tom said: But maybe we should reconsider the heuristic for tables that are
    members of inheritance trees --- particularly parents of inheritance trees.

    I agree. I think postgres should get updated to take this into account. I
    shouldn't have to set the relpages to 1 for all the empty parent tables that
    I have partitioned. Should I file this as a bug/enhancement?

    Also, do I need to worry about about autoanalyze/autovacuum setting back
    relpages to zero for the parent icecream table?

    thanks!!!
    Anish


    On Sat, Jul 9, 2011 at 10:43 AM, Tom Lane wrote:

    Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
    On 09.07.2011 00:36, Anish Kejariwal wrote:
    My guess as to what happened:
    -because the icecream parent table has zero records, the query optimizer
    chooses the incorrect execution plan
    -when I do select * from icecream, the optimizer now knows how many
    records
    are really in the icecream table, by knowing that the icecream table has
    partitions.
    "select * from icecream" won't have any direct effect on the
    optimization of subsequent queries. What probably happened is that
    autoanalyze ran in the background while you ran that select, and
    analyzed some of the partitions. Simply waiting a while would've had the
    same effect.
    Yeah. Also, the reason that a manual vacuum on icecream changes things
    yet again is that in 9.0 and up, we have a notion of summary stats
    across the whole inheritance tree, but autoanalyze hasn't been taught to
    gather those. The manual command on the parent table does gather them,
    though.

    So what's happening here is that we suddenly have an accurate idea of
    the size of the join product as a result of having inheritance summary
    stats to estimate with, and that drives the estimated cost of the merge
    or hash join down out of the stratosphere. The estimated cost of the
    nestloop goes down a lot too, but not as much.

    I experimented with a similar case here, and it seems like a lot of the
    remaining error in the nestloop estimate comes from this:
    -> Seq Scan on icecream rc (cost=0.00..23.00 rows=5 width=24) (actual
    time=0.002..0.002 rows=0 loops=1)
    Filter: (dataset_id = 281)
    The indexscan on the nonempty child partition is estimated at less than
    10 cost units, so this is a *large* fraction of what the planner sees as
    the per-outer-row cost of a nestloop. And with more than 11000 rows on
    the other side of the join, that discourages it from using the nestloop.
    In reality of course this takes negligible time compared to examining
    the child partition.

    Now why is the seqscan cost estimate so large, when actually the parent
    icecream table is totally empty? It's because the planner has been
    taught to never believe that an empty table is empty. If memory serves,
    it's really estimating on an assumption that the table contains 10 pages
    and some corresponding number of rows. This is a reasonable defensive
    posture when dealing with ordinary tables, I think, since most likely
    if the catalogs say the table is empty that's just a leftover from when
    it was created. But maybe we should reconsider the heuristic for tables
    that are members of inheritance trees --- particularly parents of
    inheritance trees.

    I was able to defeat the empty-table heuristic here by doing

    update pg_class set relpages = 1 where relname = 'icecream';

    and then I started getting much more realistic estimates in my test
    case. (It still wanted to use a merge join initially, but after
    knocking down random_page_cost it went to the nestloop.) It would
    be interesting to see what sorts of results Anish gets with that.

    regards, tom lane
  • Jeremy Harris at Jul 10, 2011 at 4:06 pm

    On 2011-07-09 18:43, Tom Lane wrote:
    Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes:
    On 09.07.2011 00:36, Anish Kejariwal wrote:
    My guess as to what happened:
    -because the icecream parent table has zero records, the query optimizer
    chooses the incorrect execution plan
    -when I do select * from icecream, the optimizer now knows how many records
    are really in the icecream table, by knowing that the icecream table has
    partitions.
    "select * from icecream" won't have any direct effect on the
    optimization of subsequent queries. What probably happened is that
    autoanalyze ran in the background while you ran that select, and
    analyzed some of the partitions. Simply waiting a while would've had the
    same effect.
    Yeah. Also, the reason that a manual vacuum on icecream changes things
    yet again is that in 9.0 and up, we have a notion of summary stats
    across the whole inheritance tree, but autoanalyze hasn't been taught to
    gather those. The manual command on the parent table does gather them,
    though.
    Is stats-gathering significantly more expensive than an FTS? Could an FTS
    update stats as a matter of course (or perhaps only if enough changes in table)?
    --
    Jeremy

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-performance @
categoriespostgresql
postedJul 8, '11 at 9:36p
activeJul 10, '11 at 4:06p
posts5
users4
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase