From: firstname.lastname@example.org [mailto:pgsql-performance-
email@example.com] On Behalf Of Robert James
Sent: Wednesday, June 01, 2011 5:55 PM
Subject: [PERFORM] CLUSTER versus a dedicated table
Hi. I'm interested in understanding the differences between
CLUSTERing a table and making a dedicated one.
We have a table with about 1 million records. On a given day, only
about 1% of them are of interest. That 1% changes every day (it's
WHERE active_date = today), and so we index and cluster on it.
Even so, the planner shows a very large cost for the Index Scan: about
3500. If I instead do a SELECT INTO temp_table FROM big_table WHERE
active_date = today, and then do SELECT * FROM temp_table, I get a
planned cost of 65. Yet, the actual time for both queries is almost
1. Why is there such a discrepancy between the planner's estimate and
the actual cost?
2. In a case like this, will I in general see a performance gain by
doing a daily SELECT INTO and then querying from that table? My ad hoc
test doesn't indicate I would (despite the planner's prediction), and
I'd rather avoid this if it won't help.
3. In general, does CLUSTER provide all the performance benefits of a
dedicated table? If it doesn't, what does it lack?
1: there could be many reasons for the planner to come up a grossly
inaccurate ESTIMATE for some values. Last time the table was analyzed, is
usually where people start.
2: look at table partitioning it's pretty straight forward and sounds like
it might be a good fit for you. It will however involve some triggers or
rules and check constraints. Table partitioning has some downsides though,
you should be aware of what they are before you commit to it.
3: clustering, from a high level, just reorders the data on disk by a given
index. Depending on your setup keeping it close to that clustered ordering
might be trivial or it might not be. Big tables are relative to different
people, 1M rows might be a big table or it might not be, since you didn't
post the size of the table and indexes we can only guess. Table
partitioning helps most with table maintenance, IMO, but can be very useful
it the constraint exclusion can eliminate a large number of child tables
right off so it doesn't have to traverse large indexes or do lots of random
You will need to post at lot more specific info if you want more specific
help. The guide to reporting slow queries or guide to reporting problems
and start gathering specific information and then post back to the list.