I'm using PostgreSQL 9.0 beta 1. I've got the following table definition:

# \d parts_2576
Table "public.parts_2576"
Column | Type |
Modifiers
------------+------------------------+-----------------------------------------------------------
ID | bigint | not null default
nextval('"parts_2576_ID_seq"'::regclass)
binaryID | character varying(32) | not null default ''::character varying
messageID | character varying(255) | not null default ''::character varying
subject | character varying(512) | not null default ''::character varying
fromname | character varying(512) | not null default ''::character varying
date | bigint | default 0::bigint
partnumber | bigint | not null default 0::bigint
size | bigint | not null default 0::bigint
Indexes:
"parts_2576_pkey" PRIMARY KEY, btree ("ID")
"binaryID_2576_idx" btree ("binaryID")
"date_2576_idx" btree (date)
"parts_2576_binaryID_idx" btree ("binaryID")

If I run this:

EXPLAIN ANALYZE SELECT SUM("size") AS totalsize, "binaryID", COUNT(*)
AS parttotal, MAX("subject") AS subject, MAX("fromname") AS fromname,
MIN("date") AS mindate FROM parts_2576 WHERE "binaryID" >
'1082fa89fe499741b8271f9c92136f44' GROUP BY "binaryID" ORDER BY
"binaryID" LIMIT 400;

I get this:

Limit (cost=0.00..316895.11 rows=400 width=211) (actual
time=3.880..1368.936 rows=400 loops=1)
-> GroupAggregate (cost=0.00..41843621.95 rows=52817 width=211)
(actual time=3.872..1367.048 rows=400 loops=1)
-> Index Scan using "binaryID_2576_idx" on parts_2576
(cost=0.00..41683754.21 rows=10578624 width=211) (actual
time=0.284..130.756 rows=19954 loops=1)
Index Cond: (("binaryID")::text >
'1082fa89fe499741b8271f9c92136f44'::text)
Total runtime: 1370.140 ms

The first thing which strikes me is how the GroupAggregate step shows
it got the 400 rows which matches the limit, but it estimated 52,817
rows. Shouldn't it have already known it would be 400?

I've got an index on "binaryID" (actually, I appear to have 2), but I
suspect it's not really working as intended as it's doing an
evaluation on its value and those greater than it. Is there a way to
optimise this like using a functional index or something?

Obviously this isn't my design (duplicate indexes and mixed-case
column names?), but I'd like to see if I can get things running
faster.

Thanks

Thom

Search Discussions

  • Tom Lane at May 28, 2010 at 6:54 pm

    Thom Brown writes:
    I get this:
    Limit (cost=0.00..316895.11 rows=400 width=211) (actual
    time=3.880..1368.936 rows=400 loops=1)
    -> GroupAggregate (cost=0.00..41843621.95 rows=52817 width=211)
    (actual time=3.872..1367.048 rows=400 loops=1)
    -> Index Scan using "binaryID_2576_idx" on parts_2576
    (cost=0.00..41683754.21 rows=10578624 width=211) (actual
    time=0.284..130.756 rows=19954 loops=1)
    Index Cond: (("binaryID")::text >
    '1082fa89fe499741b8271f9c92136f44'::text)
    Total runtime: 1370.140 ms
    The first thing which strikes me is how the GroupAggregate step shows
    it got the 400 rows which matches the limit, but it estimated 52,817
    rows. Shouldn't it have already known it would be 400?
    No. Rowcount estimates are always in terms of what the node would emit
    if allowed to run to completion. Likewise cost. In this case both the
    indexscan and the groupagg are terminated early once they satisfy the
    limit. The planner is expecting this which is why the estimated cost
    for the limit node is way less than those for its inputs.

    That looks like a perfectly reasonable plan from here, though it would
    probably not get chosen with a larger limit or no limit at all, since
    the ultimate costs are pretty large. Essentially this is a fast-start
    plan rather than a lowest-total-cost plan, and that looks like the
    best bet for a small limit value.

    regards, tom lane
  • Thom Brown at May 28, 2010 at 7:06 pm

    On 28 May 2010 19:54, Tom Lane wrote:
    Thom Brown <thombrown@gmail.com> writes:
    I get this:
    Limit  (cost=0.00..316895.11 rows=400 width=211) (actual
    time=3.880..1368.936 rows=400 loops=1)
    ->  GroupAggregate  (cost=0.00..41843621.95 rows=52817 width=211)
    (actual time=3.872..1367.048 rows=400 loops=1)
    ->  Index Scan using "binaryID_2576_idx" on parts_2576
    (cost=0.00..41683754.21 rows=10578624 width=211) (actual
    time=0.284..130.756 rows=19954 loops=1)
    Index Cond: (("binaryID")::text >
    '1082fa89fe499741b8271f9c92136f44'::text)
    Total runtime: 1370.140 ms
    The first thing which strikes me is how the GroupAggregate step shows
    it got the 400 rows which matches the limit, but it estimated 52,817
    rows.  Shouldn't it have already known it would be 400?
    No.  Rowcount estimates are always in terms of what the node would emit
    if allowed to run to completion.  Likewise cost.  In this case both the
    indexscan and the groupagg are terminated early once they satisfy the
    limit.  The planner is expecting this which is why the estimated cost
    for the limit node is way less than those for its inputs.

    That looks like a perfectly reasonable plan from here, though it would
    probably not get chosen with a larger limit or no limit at all, since
    the ultimate costs are pretty large.  Essentially this is a fast-start
    plan rather than a lowest-total-cost plan, and that looks like the
    best bet for a small limit value.

    regards, tom lane
    You're absolutely right, it's not chosen when without limit. I see
    what you mean though about terminating once it has enough rows. It's
    a shame I can't optimise it though as the real case that runs is with
    a limit of 4000 which takes a long time to complete.

    Thanks

    Thom
  • Kevin Grittner at May 29, 2010 at 5:01 pm

    Thom Brown wrote:

    It's a shame I can't optimise it though as the real case that runs
    is with a limit of 4000 which takes a long time to complete.
    Perhaps you should post the real case.

    -Kevin

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-performance @
categoriespostgresql
postedMay 28, '10 at 6:27p
activeMay 29, '10 at 5:01p
posts4
users3
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase