The following bug has been logged online:
Bug reference: 5231
Logged by: Thomas Hamilton
Email address: [email protected]
PostgreSQL version: 8.3.8
Operating system: Ubuntu 4.2.4
Description: SELECT DISTINCT poorly implemented vs SELECT ... GROUP
BY
Details:
SELECT DISTINCT does a Sort followed by Unique.
SELECT ... GROUP BY, which is logically equivalent, performs a
HashAggregate.
When run against a large dataset with a small number of distinct results
HashAggregate is an order of magnitude more efficient!
Since the spec does not require DISTINCT to return sorted results, I don't
believe Sort ... Unique will ever be more efficient than HashAggregate.
Therefore, in order to maximize performance, DISTINCT should always be
implemented as HashAggregate.