I have a table with 45 million rows, partitioned on a date field. Each of
the partitions has a primary key. There are 3 partitions so far, 15
million records each.

I retrieve up to 200k document id's (primary key for each partition) from
the application and put them into a temporary table which I then join to
the partitioned monster and some other tables. I discovered a strange
thing: the optimizer chooses hash join, with a full table scan of all
underlying tables unless I create a primary key on the temporary table,
in which case the appropriate nested loops join is chosen.

What makes optimizer do that and why? I can get by the problem by
disabling hash join in the postgresql.conf but I don't like that
solution. Not even lowering random page cost to the same cost as
sequential page cost helps.

My question is how does the optimizer calculate stats for the temporary
tables? I am probably not expected to do a vacuum analyze on the
temporary table, after finishing the inserts? How exactly does the
optimizer deal
with the temporary tables?

Postgresql is 8.4.4 on 64 bit Red Hat 5.5

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com

Search Discussions

  • Josh Kupershmidt at Sep 23, 2010 at 2:26 pm

    On Thu, Sep 23, 2010 at 9:33 AM, Mladen Gogala wrote:
    My question is how does the optimizer calculate stats for the temporary
    tables? I am probably not expected to do a vacuum analyze on the
    temporary table, after finishing the inserts? How exactly does the optimizer
    deal
    with the temporary tables?
    The advice I've seen says to manually run an ANALYZE on a temporary
    table which you're going to be doing something performance-intensive
    with. Old thread:
    http://archives.postgresql.org/pgsql-general/2004-01/msg01553.php

    and the 9.0 docs briefly mention as well:
    http://www.postgresql.org/docs/current/static/sql-createtable.html
    saying "... appropriate vacuum and analyze operations should be
    performed via session SQL commands. For example, if a temporary table
    is going to be used in complex queries, it is wise to run ANALYZE on
    the temporary table after it is populated. "

    I think the planner assumes some bogus default (1000?) number of rows
    for temporary tables otherwise. Can't find a more-reliable reference,
    but see here:
    http://stackoverflow.com/questions/486154/postgresql-temporary-tables

    I also use temp. tables fairly heavily, and I just run an ANALYZE on
    any decently-sized tables I'm going to use in further queries, which
    seems to work well for me.

    Josh
  • Tom Lane at Sep 23, 2010 at 2:37 pm

    Josh Kupershmidt writes:
    On Thu, Sep 23, 2010 at 9:33 AM, Mladen Gogala
    wrote:
    My question is how does the optimizer calculate stats for the temporary
    tables? I am probably not expected to do a vacuum analyze on the
    temporary table, after finishing the inserts? How exactly does the optimizer
    deal
    with the temporary tables?
    The advice I've seen says to manually run an ANALYZE on a temporary
    table which you're going to be doing something performance-intensive
    with.
    Yeah. Autovacuum cannot help you with a temp table --- it can't
    safely access such a table. So the stuff that normally goes on
    "in the background" has to be done explicitly if you need it done.

    regards, tom lane

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-novice @
categoriespostgresql
postedSep 23, '10 at 1:33p
activeSep 23, '10 at 2:37p
posts3
users3
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase