FAQ

Performance comparison between Postgres and Greenplum

Suvankar Roy
Jul 13, 2009 at 11:34 am
Hi,

I have some 99,000 records in a table (OBSERVATION_ALL) in a Postgres DB
as well as a Greenplum DB.

The Primary key is a composite one comprising of 2 columns (so_no,
serial_no).

The execution of the following query takes 8214.016 ms in Greenplum but
only 729.134 ms in Postgres.
select * from observation_all order by so_no, serial_no;

I believe that execution time in greenplum should be less compared to
postgres. Can anybody throw some light, it would be of great help.


Regards,

Suvankar Roy
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you
reply

Search Discussions

11 responses

  • Scott Marlowe at Jul 15, 2009 at 4:56 am
    On Mon, Jul 13, 2009 at 5:23 AM, Suvankar Roywrote:
    Hi,

    I have some 99,000 records in a table (OBSERVATION_ALL) in a Postgres DB as
    well as a Greenplum DB.

    The Primary key is a composite one comprising of 2 columns (so_no,
    serial_no).

    The execution of the following query takes 8214.016 ms in Greenplum but only
    729.134 ms in Postgres.
    select * from observation_all order by so_no, serial_no;

    I believe that execution time in greenplum should be less compared to
    postgres. Can anybody throw some light, it would be of great help.
    What versions are you comparing?
  • Suvankar Roy at Jul 15, 2009 at 5:34 am
    Hi Scott,

    This is what I have got -

    In Greenplum, the following query returns:

    test_db1=# select version();
    version
    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    PostgreSQL 8.2.13 (Greenplum Database 3.3.0.1 build 4) on
    i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat
    4.1.2-44) compiled on Jun 4 2009 16:30:49
    (1 row)


    In Postgres, the same query returns:

    postgres=# select version();
    version
    -----------------------------------------------------
    PostgreSQL 8.3.7, compiled by Visual C++ build 1400
    (1 row)

    Regards,

    Suvankar Roy
    Tata Consultancy Services
    Ph:- +91 33 66367352
    Cell:- +91 9434666898




    Scott Marlowe <scott.marlowe@gmail.com>
    07/15/2009 09:10 AM

    To
    Suvankar Roy <suvankar.roy@tcs.com>
    cc
    pgsql-performance@postgresql.org
    Subject
    Re: [PERFORM] Performance comparison between Postgres and Greenplum






    On Mon, Jul 13, 2009 at 5:23 AM, Suvankar Roywrote:
    Hi,

    I have some 99,000 records in a table (OBSERVATION_ALL) in a Postgres DB as
    well as a Greenplum DB.

    The Primary key is a composite one comprising of 2 columns (so_no,
    serial_no).

    The execution of the following query takes 8214.016 ms in Greenplum but only
    729.134 ms in Postgres.
    select * from observation_all order by so_no, serial_no;

    I believe that execution time in greenplum should be less compared to
    postgres. Can anybody throw some light, it would be of great help.
    What versions are you comparing?

    ForwardSourceID:NT00004AAE
    =====-----=====-----=====
    Notice: The information contained in this e-mail
    message and/or attachments to it may contain
    confidential or privileged information. If you are
    not the intended recipient, any dissemination, use,
    review, distribution, printing or copying of the
    information contained in this e-mail message
    and/or attachments to it are strictly prohibited. If
    you have received this communication in error,
    please notify us by reply e-mail or telephone and
    immediately and permanently delete the message
    and any attachments. Thank you
  • Scott Marlowe at Jul 15, 2009 at 9:30 am
    On Tue, Jul 14, 2009 at 11:33 PM, Suvankar Roywrote:
    Hi Scott,

    This is what I have got -
    In Greenplum, version PostgreSQL 8.2.13 (Greenplum Database 3.3.0.1 build 4) on
    i686-pc-linux-gnu, compiled by GCC gcc (GCC)
    In Postgres, version PostgreSQL 8.3.7, compiled by Visual C++ build 140
    (1 row)
    I wouldn't expect 8.2.x to outrun 8.3.x
  • Suvankar Roy at Jul 15, 2009 at 9:39 am
    Hi Scott,

    Thanks for your input Scott.

    But, then being a Massively Parallel Processing Database, is Greenplum not
    expected to outperform versions of Postgres higher than on which it is
    based.

    My notion was that GP 3.3 (based on PostgreSQL 8.2.13) would exceed PG
    8.3.7.

    It seems that I was wrong here.

    Regards,

    Suvankar Roy




    Scott Marlowe <scott.marlowe@gmail.com>
    07/15/2009 03:00 PM

    To
    Suvankar Roy <suvankar.roy@tcs.com>
    cc
    pgsql-performance@postgresql.org
    Subject
    Re: [PERFORM] Performance comparison between Postgres and Greenplum






    On Tue, Jul 14, 2009 at 11:33 PM, Suvankar Roywrote:
    Hi Scott,

    This is what I have got -
    In Greenplum, version PostgreSQL 8.2.13 (Greenplum Database 3.3.0.1
    build 4) on
    i686-pc-linux-gnu, compiled by GCC gcc (GCC)
    In Postgres, version PostgreSQL 8.3.7, compiled by Visual C++ build 1400
    (1 row)
    I wouldn't expect 8.2.x to outrun 8.3.x

    ForwardSourceID:NT00004AD2
    =====-----=====-----=====
    Notice: The information contained in this e-mail
    message and/or attachments to it may contain
    confidential or privileged information. If you are
    not the intended recipient, any dissemination, use,
    review, distribution, printing or copying of the
    information contained in this e-mail message
    and/or attachments to it are strictly prohibited. If
    you have received this communication in error,
    please notify us by reply e-mail or telephone and
    immediately and permanently delete the message
    and any attachments. Thank you
  • Greg Smith at Jul 16, 2009 at 1:17 am

    On Wed, 15 Jul 2009, Scott Marlowe wrote:

    On Tue, Jul 14, 2009 at 11:33 PM, Suvankar Roywrote:
    Hi Scott,

    This is what I have got -
    In Greenplum, version PostgreSQL 8.2.13 (Greenplum Database 3.3.0.1 build 4) on
    i686-pc-linux-gnu, compiled by GCC gcc (GCC)
    In Postgres, version PostgreSQL 8.3.7, compiled by Visual C++ build 1400
    (1 row)
    I wouldn't expect 8.2.x to outrun 8.3.x
    And you can't directly compare performance of a system running Linux with
    one running Windows, even if they're the same hardware. Theoretically,
    Linux should have an advantage, but only if you're accounting for a whole
    stack of other variables.

    --
    * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
    From pgsql-performance-owner@postgresql.org Wed Jul 15 23:37:09 2009
    Received: from maia.hub.org (unknown [200.46.204.183])
    by mail.postgresql.org (Postfix) with ESMTP id 7372E633F4F
    for <pgsql-performance-postgresql.org@mail.postgresql.org>; Wed, 15 Jul 2009 23:37:09 -0300 (ADT)
    Received: from mail.postgresql.org ([200.46.204.86])
    by maia.hub.org (mx1.hub.org [200.46.204.183]) (amavisd-maia, port 10024)
    with ESMTP id 39630-06
    for <pgsql-performance-postgresql.org@mail.postgresql.org>;
    Wed, 15 Jul 2009 23:36:58 -0300 (ADT)
    X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
    Received: from mail-ew0-f223.google.com (mail-ew0-f223.google.com [209.85.219.223])
    by mail.postgresql.org (Postfix) with ESMTP id F374863386F
    for <pgsql-performance@postgresql.org>; Wed, 15 Jul 2009 23:36:55 -0300 (ADT)
    Received: by ewy23 with SMTP id 23so4392589ewy.19
    for <pgsql-performance@postgresql.org>; Wed, 15 Jul 2009 19:36:53 -0700 (PDT)
    DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
    d=gmail.com; s=gamma;
    h=domainkey-signature:mime-version:received:in-reply-to:references
    :date:message-id:subject:from:to:cc:content-type
    :content-transfer-encoding;
    bh=zAd2j/NRQceoCSRzNlsqeSjRgrq+BGQ8vrIESsiGTU8=;
    b=WfO09u5wWjUyoJRb7cEyxT/WM1tjLY4r8zTMVAabVs3QvEQufug8qkFtXLv6dni5Z7
    TIUHUELABFrezvTmPQG9eX1VJgYcBHuNw6PPXzeyKlMCSlRd74Ev6f7VJ7m2FShKPhPv
    aM8B5cfPEd0v2gbgSKS7V6oXXSJphHBEX7xiA=
    DomainKey-Signature: a=rsa-sha1; c=nofws;
    d=gmail.com; s=gamma;
    h=mime-version:in-reply-to:references:date:message-id:subject:from:to
    :cc:content-type:content-transfer-encoding;
    b=fqS+7js4KDYQDotfr5TcBhbo5dIyS2vaYgwwWo4yvSfNoAkOSEoB431WBbhCh6J785
    xyhpVHz5RXyFluxbtEvGjCN/cZcjc7AOrHF75AF9bmj7neoT5xdP9erdCYWqbuSbU/Eq
    G3j6SqJiU57Csoi6VYbELJBi5ia0EnTDL44Nw=
    MIME-Version: 1.0
    Received: by 10.210.81.9 with SMTP id e9mr9033856ebb.68.1247711811972; Wed, 15
    Jul 2009 19:36:51 -0700 (PDT)
    In-Reply-To: <ac116f9a0907151917o6d87d1a8wf3ff2afb088eb47c@mail.gmail.com>
    References: <530068a0907150804p7455348fp4ec264448b9c36bf@mail.gmail.com>
    <C683C36A.A2F8%scott@richrelevance.com>
    <ac116f9a0907151917o6d87d1a8wf3ff2afb088eb47c@mail.gmail.com>
    Date: Wed, 15 Jul 2009 20:36:51 -0600
    Message-ID: <dcc563d10907151936y22d01025qcd27d6420c695d4@mail.gmail.com>
    Subject: Re: cluster index on a table
    From: Scott Marlowe <scott.marlowe@gmail.com>
    To: Justin Pitts <justinpitts@gmail.com>
    Cc: "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>,
    Ibrahim Harrani <ibrahim.harrani@gmail.com>, Scott Carey <scott@richrelevance.com>
    Content-Type: text/plain; charset=windows-1252
    Content-Transfer-Encoding: quoted-printable
    X-Virus-Scanned: Maia Mailguard 1.0.1
    X-Spam-Status: No, hits=0 tagged_above=0 required=5 tests=none
    X-Spam-Level:
    X-Archive-Number: 200907/122
    X-Sequence-Number: 34782

    I'd love to see it.

    On Wed, Jul 15, 2009 at 8:17 PM, Justin Pittswrote:
    Is there any interest in adding that (continual/automatic cluster
    order maintenance) to a future release?

    On Wed, Jul 15, 2009 at 8:33 PM, Scott Carey<scott@richrelevance.com> wro=
    te:
    If you have a lot of insert/update/delete activity on a table fillfactor=
    can
    help.

    I don=92t believe that postgres will try and maintain the table in the c=
    luster
    order however.


    On 7/15/09 8:04 AM, "Ibrahim Harrani" wrote:

    Hi,

    thanks for your suggestion.
    Is there any benefit of setting fillfactor to 70 or 80 on this table?



    On Wed, Jun 24, 2009 at 8:42 PM, Scott Marlowe<scott.marlowe@gmail.com>
    wrote:
    As another poster pointed out, you cluster on ONE index and one index
    only. =A0However, you can cluster on a multi-column index.
    --
    Sent via pgsql-performance mailing list (pgsql-performance@postgresql.or=
    g)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-performance


    --=20
    When fascism comes to America, it will be intolerance sold as diversity.
  • Alex Goncharov at Jul 15, 2009 at 12:37 pm
    ,--- You/Suvankar (Mon, 13 Jul 2009 16:53:41 +0530) ----*
    I have some 99,000 records in a table (OBSERVATION_ALL) in a Postgres DB
    as well as a Greenplum DB.

    The Primary key is a composite one comprising of 2 columns (so_no,
    serial_no).

    The execution of the following query takes 8214.016 ms in Greenplum but
    only 729.134 ms in Postgres.
    select * from observation_all order by so_no, serial_no;

    I believe that execution time in greenplum should be less compared to
    postgres. Can anybody throw some light, it would be of great help.
    Why do you believe so?

    Is your data distributed and served by separate segment hosts? By how
    many? Is the network connectivity not a factor? What happens with
    the times if you don't sort your result set?

    -- Alex -- alex-goncharov@comcast.net --
  • Suvankar Roy at Jul 15, 2009 at 1:02 pm
    Hi Alex,

    Yes, I have got 2 segments and a master host. So, in a way processing
    should be faster in Greenplum.

    Actually this is only a sort of Proof of Concept trial that I am carrying
    out to notice differences between greenplum and postgres, if any.

    For other queries though, results are satisfactory or at least comparable,
    like-

    select distinct so_no, serial_no from observation_all;
    in postgres it takes - 1404.238 ms
    in gp it takes - 1217.283 ms


    Regards,

    Suvankar Roy



    Alex Goncharov <alex-goncharov@comcast.net>
    07/15/2009 06:07 PM
    Please respond to
    Alex Goncharov <alex-goncharov@comcast.net>


    To
    Suvankar Roy <suvankar.roy@tcs.com>
    cc
    pgsql-performance@postgresql.org
    Subject
    Re: [PERFORM] Performance comparison between Postgres and Greenplum






    ,--- You/Suvankar (Mon, 13 Jul 2009 16:53:41 +0530) ----*
    I have some 99,000 records in a table (OBSERVATION_ALL) in a Postgres DB
    as well as a Greenplum DB.

    The Primary key is a composite one comprising of 2 columns (so_no,
    serial_no).

    The execution of the following query takes 8214.016 ms in Greenplum but
    only 729.134 ms in Postgres.
    select * from observation_all order by so_no, serial_no;

    I believe that execution time in greenplum should be less compared to
    postgres. Can anybody throw some light, it would be of great help.
    Why do you believe so?

    Is your data distributed and served by separate segment hosts? By how
    many? Is the network connectivity not a factor? What happens with
    the times if you don't sort your result set?

    -- Alex -- alex-goncharov@comcast.net --


    ForwardSourceID:NT00004AF2
    =====-----=====-----=====
    Notice: The information contained in this e-mail
    message and/or attachments to it may contain
    confidential or privileged information. If you are
    not the intended recipient, any dissemination, use,
    review, distribution, printing or copying of the
    information contained in this e-mail message
    and/or attachments to it are strictly prohibited. If
    you have received this communication in error,
    please notify us by reply e-mail or telephone and
    immediately and permanently delete the message
    and any attachments. Thank you
  • Alex Goncharov at Jul 15, 2009 at 1:18 pm
    ,--- You/Suvankar (Wed, 15 Jul 2009 18:32:12 +0530) ----*
    Yes, I have got 2 segments and a master host. So, in a way processing
    should be faster in Greenplum.
    No, it should not: it all depends on your data, SQL statements and
    setup.

    In my own experiments, with small amounts of stored data, PostgreSQL
    beats Greenplum, which doesn't surprise me a bit.

    You need to know where most of the execution time goes -- maybe to
    sorting? And sorting in Greenplum, isn't it done on one machine, the
    master host? Why would that be faster than in PostgreSQL?
    For other queries though, results are satisfactory or at least comparable,
    like-

    select distinct so_no, serial_no from observation_all;
    in postgres it takes - 1404.238 ms
    in gp it takes - 1217.283 ms
    No surprise here: the data is picked by multiple segment hosts and
    never sorted on the master.

    -- Alex -- alex-goncharov@comcast.net --
  • Scott Mead at Jul 15, 2009 at 3:33 pm

    On Wed, Jul 15, 2009 at 9:18 AM, Alex Goncharov wrote:

    ,--- You/Suvankar (Wed, 15 Jul 2009 18:32:12 +0530) ----*
    Yes, I have got 2 segments and a master host. So, in a way processing
    should be faster in Greenplum.
    No, it should not: it all depends on your data, SQL statements and
    setup.

    In my own experiments, with small amounts of stored data, PostgreSQL
    beats Greenplum, which doesn't surprise me a bit.

    Agreed. You're only operating on 99,000 rows. That isn't really
    enough rows to exercise the architecture of shared-nothing clusters.
    Now, I don't know greenplum very well, but I am familiar with another
    warehousing product
    with approximately the same architecture behind
    it. From all the testing I've done, you need to get into the 50
    million plus row range before the architecture starts to be really
    effective. 99,000 rows probably fits completely into memory on the
    machine that you're testing PG with, so your test really isn't fair.
    On one PG box, you're just doing memory reads, and maybe some high-speed
    disk access, on the Greenplum setup, you've got network overhead on top of
    all that. Bottom
    line: You need to do a test with a number of rows that won't fit into
    memory, and won't be very quickly scanned from disk into memory. You
    need a LOT of data.

    --Scott
  • Scott Marlowe at Jul 17, 2009 at 6:24 am
    On Wed, Jul 15, 2009 at 7:02 AM, Suvankar Roywrote:
    Hi Alex,

    Yes, I have got 2 segments and a master host. So, in a way processing should
    be faster in Greenplum.

    Actually this is only a sort of Proof of Concept trial that I am carrying
    out to notice differences between greenplum and postgres, if any.
    You're definitely gonna want more data to test with. I run regular
    vanilla pgsql for stats at work, and we average 0.8M to 2M rows of
    stats every day. We keep them for up to two years. So, when we reach
    our max of two years, we're talking somewhere in the range of a
    billion rows to mess about with.

    During a not so busy day, the 99,000th row entered into stats for
    happens at about 3am. Once they're loaded into memory it takes 435 ms
    to access those 99k rows.

    Start testing in the millions, at a minimum. Hundreds of millions is
    more likely to start showing a difference.
  • Greg Smith at Jul 16, 2009 at 1:14 am

    On Mon, 13 Jul 2009, Suvankar Roy wrote:

    I believe that execution time in greenplum should be less compared to postgres.
    Well, first off you don't even mention which PostgreSQL or Greenplum
    version you're comparing, which leaves a lot of variables we can't account
    for. Second, you'd need to make sure that the two servers had as close to
    identical server parameter configurations as possible to get a fair
    comparison (the postgresql.conf file). Next, you need to make sure the
    data has been loaded and analyzed similarly on the two--try using "VACUUM
    ANALYZE" on both systems before running your query, then "EXPLAIN ANALYZE"
    on both setups to get an idea if they're using the same plan to pull data
    from the disk, you may discover there's a radical different there.

    ...and even if you did all that, this still wouldn't be the right place to
    ask about Greenplum's database product. You'll end up with everyone mad
    at you. Nobody likes have benchmarks that show their product in a bad
    light published, particularly if they aren't completely fair. And this
    list is dedicated to talking about the open-source PostgreSQL versions.
    Your question would be more appropriate to throw in Greenplum's direction.
    The list I gave above is by no means even comprehensive--there are plenty
    of other ways you can end up doing an unfair comparison here (using
    different paritions on the same disk which usually end up with different
    speeds comes to mind).

    --
    * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Related Discussions