On 8.1, I have a very interesting database where the distributions of
some values in a multi-million rows table is logarithmic (i.e. the most
frequent value is an order of magnitude more frequent than the next
ones). If I analyze the table, the statistics become extremely skewed
towards the most frequent values and this prevents the planner from
giving any good results on queries that do not target these entries.

In a recent case, the planner estimated that the number of returned rows
would be ~13% of the table size and from this bad assumption generated a
very slow conservative plan that executed in days. If I set the
statistics at zero for that table, the planner uses a hardcoded ratio
(seems like 0.5%) for the number of returned rows and this helps
generating a plan that executes in 3 minutes (still sub-optimal, but not
as bad).

Generating partial index for the less frequent cases helps, but this
solution is not flexible enough for our needs as the number of complex
queries grow. We are mostly left with pre-generating a lot of temporary
tables whenever the planner over-evaluates the number of rows generated
by a subquery (query execution was trimmed from 3 minutes to 30 seconds
using this technique) or using the OFFSET 0 tweak, but it would be nice
if the planner could handle this on its own.

Am I missing something obvious? Setting the statistics for this table to
zero seems awkward even if it gives good results.
Jerry.

Search Discussions

  • Tom Lane at Jul 7, 2010 at 9:22 pm

    Jerry Gamache writes:
    On 8.1, I have a very interesting database where the distributions of
    some values in a multi-million rows table is logarithmic (i.e. the most
    frequent value is an order of magnitude more frequent than the next
    ones). If I analyze the table, the statistics become extremely skewed
    towards the most frequent values and this prevents the planner from
    giving any good results on queries that do not target these entries.
    Highly skewed distributions are hardly unusual, and I'm not aware that
    the planner is totally incapable of dealing with them. You do need a
    large enough stats target to get down into the tail of the
    distribution (the default target for 8.1 is probably too small for you).
    It might be that there have been some other relevant improvements since
    8.1, too ...

    regards, tom lane

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-performance @
categoriespostgresql
postedJul 7, '10 at 9:03p
activeJul 7, '10 at 9:22p
posts2
users2
websitepostgresql.org
irc#postgresql

2 users in discussion

Jerry Gamache: 1 post Tom Lane: 1 post

People

Translate

site design / logo © 2022 Grokbase