The following bug has been logged online:

Bug reference: 5231
Logged by: Thomas Hamilton
Email address: [email protected]
PostgreSQL version: 8.3.8
Operating system: Ubuntu 4.2.4
Description: SELECT DISTINCT poorly implemented vs SELECT ... GROUP
BY
Details:

SELECT DISTINCT does a Sort followed by Unique.

SELECT ... GROUP BY, which is logically equivalent, performs a
HashAggregate.

When run against a large dataset with a small number of distinct results
HashAggregate is an order of magnitude more efficient!

Since the spec does not require DISTINCT to return sorted results, I don't
believe Sort ... Unique will ever be more efficient than HashAggregate.

Therefore, in order to maximize performance, DISTINCT should always be
implemented as HashAggregate.

Search Discussions

  • Joshua Tolley at Dec 3, 2009 at 4:25 pm

    On Thu, Dec 03, 2009 at 03:56:05PM +0000, Thomas Hamilton wrote:

    The following bug has been logged online:

    Bug reference: 5231
    Logged by: Thomas Hamilton
    Email address: [email protected]
    PostgreSQL version: 8.3.8
    Operating system: Ubuntu 4.2.4
    Description: SELECT DISTINCT poorly implemented vs SELECT ... GROUP
    BY
    Details:

    SELECT DISTINCT does a Sort followed by Unique.

    SELECT ... GROUP BY, which is logically equivalent, performs a
    HashAggregate.

    When run against a large dataset with a small number of distinct results
    HashAggregate is an order of magnitude more efficient!

    Since the spec does not require DISTINCT to return sorted results, I don't
    believe Sort ... Unique will ever be more efficient than HashAggregate.

    Therefore, in order to maximize performance, DISTINCT should always be
    implemented as HashAggregate.
    In 8.4 and above, SELECT DISTINCT can use HashAggregates rather than
    Sort/Unique -- though I'm not sure it always does. Anyway, an upgrade should
    handle this for you. However, this isn't really a bug, it's a feature, so
    don't expect to see it in an 8.3.x version.

    -- Josh / eggyknap
  • Alvaro Herrera at Dec 3, 2009 at 5:50 pm

    Thomas Hamilton wrote:

    When run against a large dataset with a small number of distinct results
    HashAggregate is an order of magnitude more efficient!

    Since the spec does not require DISTINCT to return sorted results, I don't
    believe Sort ... Unique will ever be more efficient than HashAggregate.
    This is not a bug -- merely a missing optimizer feature, which as it
    turns out was implemented in 8.4.

    --
    Alvaro Herrera http://www.CommandPrompt.com/
    The PostgreSQL Company - Command Prompt, Inc.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-bugs @
categoriespostgresql
postedDec 3, '09 at 3:56p
activeDec 3, '09 at 5:50p
posts3
users3
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2023 Grokbase