I'm trying to migrate an application from an Oracle
backend to PostgreSQL and have a performance question.

The hardware for the database is the same, a SunFire
v120, 2x73GB U2W SCSI disks, 1GB RAM, 650MHz US-IIe
CPU. Running Solaris 8.

The table in question has 541741 rows. Under Oracle,
the query ' select distinct version from vers where
version is not null ' returns 534 rows in 6.14
seconds, with an execution plan showing a table scan
of vers followed by a sort.

The explain output on postgres shows the same
execution with a scan on vers and a sort but the query
time is 78.6 seconds.

The explain output from PostgreSQL is:
QUERY PLAN
---------------------------------------------------------------------------------
Unique (cost=117865.77..120574.48 rows=142
width=132)
-> Sort (cost=117865.77..119220.13 rows=541741
width=132)
Sort Key: "version"
-> Seq Scan on vers (cost=0.00..21367.41
rows=541741 width=132)
Filter: ("version" IS NOT NULL)

I do have an index on the column in question but
neither oracle nor postgresql choose to use it (which
given that we're visiting all rows is perhaps not
surprising).

I'm not as familiar with postgresql as I am with
Oracle but I think I've configured comparible
buffering and sort area sizes, certainly there isn't
much physical IO going on in either case.

What can I do to speed up this query? Other queries
are slightly slower than under Oracle on the same
hardware but nothing like this.

Thanks!

G





___________________________________________________________ALL-NEW Yahoo! Messenger - sooooo many all-new ways to express yourself http://uk.messenger.yahoo.com

Search Discussions

  • Richard Huxton at Jun 18, 2004 at 12:07 pm

    Gary Cowell wrote:

    I'm not as familiar with postgresql as I am with
    Oracle but I think I've configured comparible
    buffering and sort area sizes, certainly there isn't
    much physical IO going on in either case.
    People are going to want to know:
    1. version of PG
    2. explain analyse output, rather than just explain
    3. What values you've used for the postgresql.conf file

    The actual plan from explain analyse isn't going to be much use - as you
    say, a scan of the whole table followed by sorting is the best you'll
    get. However, the actual costs of these steps might say something useful.

    --
    Richard Huxton
    Archonet Ltd
  • Paul Thomas at Jun 18, 2004 at 12:09 pm

    On 18/06/2004 12:31 Gary Cowell wrote:
    [snip]
    I'm not as familiar with postgresql as I am with
    Oracle but I think I've configured comparible
    buffering and sort area sizes, certainly there isn't
    much physical IO going on in either case.

    What can I do to speed up this query? Other queries
    are slightly slower than under Oracle on the same
    hardware but nothing like this.
    Usual questions:

    have you vacuumed the table recently?
    what are your postgresql.conf settings?
    can you show us explain ANALYZE output rather than just explain output?

    --
    Paul Thomas
    +------------------------------+---------------------------------------------+
    Thomas Micro Systems Limited | Software Solutions for
    Business |
    Computer Consultants |
    http://www.thomas-micro-systems-ltd.co.uk |
    +------------------------------+---------------------------------------------+
  • Jeff at Jun 18, 2004 at 12:13 pm

    On Jun 18, 2004, at 7:31 AM, Gary Cowell wrote:

    The explain output on postgres shows the same
    execution with a scan on vers and a sort but the query
    time is 78.6 seconds.
    Does it run just as slow if you run it again?
    It could be a case of the caches being empty
    Oracle but I think I've configured comparible
    buffering and sort area sizes, certainly there isn't
    much physical IO going on in either case.
    Configuring PG like Oracle isn't the best thing in the world. The
    general PG philosophy is to let the OS do all the caching & buffering
    - this is reversed in the Oracle world. In 7.4 the rule of thumb is no
    more than 10k shared_buffers.. beyond that the overhead of maintaining
    it becomes excessive. (This isn't really the case in 7.5)

    Curiously, what are your sort_mem and shared_buffers settings?
  • Shridhar Daithankar at Jun 18, 2004 at 12:19 pm

    Gary Cowell wrote:
    The explain output on postgres shows the same
    execution with a scan on vers and a sort but the query
    time is 78.6 seconds.

    The explain output from PostgreSQL is:
    QUERY PLAN
    ---------------------------------------------------------------------------------
    Unique (cost=117865.77..120574.48 rows=142
    width=132)
    -> Sort (cost=117865.77..119220.13 rows=541741
    width=132)
    Sort Key: "version"
    -> Seq Scan on vers (cost=0.00..21367.41
    rows=541741 width=132)
    Filter: ("version" IS NOT NULL)

    I do have an index on the column in question but
    neither oracle nor postgresql choose to use it (which
    given that we're visiting all rows is perhaps not
    surprising).
    Can you post explain analyze for the same query? It contains actual numbers
    alond side the chosen plan.
    I'm not as familiar with postgresql as I am with
    Oracle but I think I've configured comparible
    buffering and sort area sizes, certainly there isn't
    much physical IO going on in either case.
    Well, for postgresql you should check out

    http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html
    http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html

    HTH

    Shridhar
  • Gary Cowell at Jun 18, 2004 at 3:47 pm

    Try increasing sort_mem temporarily, and see if that
    makes a difference:
    SET sort_mem = 64000;
    EXPLAIN ANALYSE ...
    I did this (actualy 65536) and got the following:
    pvcsdb=# explain analyze select distinct version from
    vers where version is not null;

    QUERY PLAN
    ------------------------------------------------------------------------------------------------------------------------------------
    Unique (cost=117865.77..120574.48 rows=142
    width=132) (actual time=81595.178..86573.228 rows=536
    loops=1)
    -> Sort (cost=117865.77..119220.13 rows=541741
    width=132) (actual time=81595.169..84412.069
    rows=541741 loops=1)
    Sort Key: "version"
    -> Seq Scan on vers (cost=0.00..21367.41
    rows=541741 width=132) (actual time=10.068..7397.374
    rows=541741 loops=1)
    Filter: ("version" IS NOT NULL)
    Total runtime: 86647.495 ms
    (6 rows)


    In response to Tom Lane, I have compared a
    select/order by on the same data in Oracle and PG to
    see if this changes things:


    PG: Time: 67438.536 ms 541741 rows
    Oracle: After an hour and a half I canned it

    So it seems the idea that oracle is dropping duplicate
    rows prior to the sort when using distinct may indeed
    be the case.

    From what I've seen here, it seems that PGs on-disk
    sort performance is exceeding that of Oracle - it's
    just that oracle sorts fewer rows for distinct.






    ___________________________________________________________ALL-NEW Yahoo! Messenger - sooooo many all-new ways to express yourself http://uk.messenger.yahoo.com
  • Tom Lane at Jun 18, 2004 at 8:47 pm

    =?iso-8859-1?q?Gary=20Cowell?= <gary_cowell@yahoo.co.uk> writes:
    So it seems the idea that oracle is dropping duplicate
    rows prior to the sort when using distinct may indeed
    be the case.
    Okay. We won't have any short-term solution for making DISTINCT do that,
    but if you are on PG 7.4 you could get the same effect from using
    GROUP BY: instead of
    select distinct version from vers where version is not null
    try
    select version from vers where version is not null group by version
    You should get a HashAggregate plan out of that, and I'd think it'd be
    pretty quick when there are not many distinct values of version.

    regards, tom lane
  • Gary Cowell at Jun 18, 2004 at 11:06 pm
    --- Tom Lane wrote: >
    =?iso-8859-1?q?Gary=20Cowell?=
    <gary_cowell@yahoo.co.uk> writes:
    So it seems the idea that oracle is dropping duplicate
    rows prior to the sort when using distinct may indeed
    be the case.
    Okay. We won't have any short-term solution for
    making DISTINCT do that,
    but if you are on PG 7.4 you could get the same
    effect from using
    GROUP BY: instead of
    select distinct version from vers where version is
    not null
    try
    select version from vers where version is not null
    group by version
    You should get a HashAggregate plan out of that, and
    I'd think it'd be
    pretty quick when there are not many distinct values
    of version.
    Yeah out of the half million rows there are only ever
    going to be 500 or so distinct values.

    I do indeed get such a plan. It's much faster that
    way. Down to 16 seconds. I'll get the chap to rewrite
    his app to use group by instead of distinct.

    Thanks (everyone) for the top class help!





    ___________________________________________________________ALL-NEW Yahoo! Messenger - sooooo many all-new ways to express yourself http://uk.messenger.yahoo.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-performance @
categoriespostgresql
postedJun 18, '04 at 11:32a
activeJun 18, '04 at 11:06p
posts8
users6
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase