Hello Experts,
My application uses Oracle DB, and makes use of OCI interface.
I have been able to develop similar interface using postgreSQL library.
However, I have done some tests but results for PostgreSQL have not been
encouraging for a few of them.

My questions/scenarios are:

1. How does PostgreSQL perform when inserting data into an indexed (type: btree)
table? Is it true that as you add the indexes on a table, the performance
deteriorates significantly whereas Oracle does not show that much performance
decrease. I have tried almost all postgreSQL performance tips available. I want
to have very good "insert" performance (with indexes), "select" performance is
not that important at this point of time.

2. What are the average storage requirements of postgres compared to Oracle? I
inserted upto 1 million records. The storage requirement of postgreSQL is almost
double than that of Oracle.

Thanks in anticipation.

Search Discussions

  • Joshua D. Drake at Oct 25, 2010 at 6:20 pm

    On Mon, 2010-10-25 at 11:12 -0700, Divakar Singh wrote:

    My questions/scenarios are:

    1. How does PostgreSQL perform when inserting data into an indexed
    (type: btree)
    table? Is it true that as you add the indexes on a table, the
    performance
    deteriorates significantly whereas Oracle does not show that much
    performance
    decrease. I have tried almost all postgreSQL performance tips
    available. I want
    to have very good "insert" performance (with indexes), "select"
    performance is
    not that important at this point of time.
    Did you test?
    2. What are the average storage requirements of postgres compared to
    Oracle? I
    inserted upto 1 million records. The storage requirement of postgreSQL
    is almost
    double than that of Oracle.
    What was your table structure?

    Joshua D. Drake


    --
    PostgreSQL.org Major Contributor
    Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
    Consulting, Training, Support, Custom Development, Engineering
    http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
  • Divakar Singh at Oct 25, 2010 at 6:31 pm

    My questions/scenarios are:

    1. How does PostgreSQL perform when inserting data into an indexed
    (type: btree)
    table? Is it true that as you add the indexes on a table, the
    performance
    deteriorates significantly whereas Oracle does not show that much
    performance
    decrease. I have tried almost all postgreSQL performance tips
    available. I want
    to have very good "insert" performance (with indexes), "select"
    performance is
    not that important at this point of time.
    -- Did you test?

    Yes. the performance was comparable when using SQL procedure. However, When I
    used libpq, PostgreSQL performed very bad. There was some difference in
    environment also between these 2 tests, but I am assuming libpq vs SQL was the
    real cause. Or it was something else?
    2. What are the average storage requirements of postgres compared to
    Oracle? I
    inserted upto 1 million records. The storage requirement of postgreSQL
    is almost
    double than that of Oracle.
    -- What was your table structure?

    Some 10-12 columns ( like 2 timestamp, 4 int and 4 varchar), with 5 indexes on
    varchar and int fields including 1 implicit index coz of PK.



    Joshua D. Drake


    --
    PostgreSQL.org Major Contributor
    Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
    Consulting, Training, Support, Custom Development, Engineering
    http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
  • Steve Singer at Oct 25, 2010 at 6:46 pm

    On 10-10-25 02:31 PM, Divakar Singh wrote:
    My questions/scenarios are:

    1. How does PostgreSQL perform when inserting data into an indexed
    (type: btree)
    table? Is it true that as you add the indexes on a table, the
    performance
    deteriorates significantly whereas Oracle does not show that much
    performance
    decrease. I have tried almost all postgreSQL performance tips
    available. I want
    to have very good "insert" performance (with indexes), "select"
    performance is
    not that important at this point of time.
    -- Did you test?

    Yes. the performance was comparable when using SQL procedure. However,
    When I used libpq, PostgreSQL performed very bad. There was some
    difference in environment also between these 2 tests, but I am assuming
    libpq vs SQL was the real cause. Or it was something else?
    So your saying that when you load the data with psql it loads fine, but
    when you load it using libpq it takes much longer?

    How are you using libpq?
    -Are you opening and closing the database connection between each insert?
    -Are you doing all of your inserts as one big transaction or are you
    doing a transaction per insert
    -Are you using prepared statements for your inserts?
    -Are you using the COPY command to load your data or the INSERT command?
    -Are you running your libpq program on the same server as postgresql?
    -How is your libpq program connecting to postgresql, is it using ssl?
    Some 10-12 columns ( like 2 timestamp, 4 int and 4 varchar), with 5
    indexes on varchar and int fields including 1 implicit index coz of PK.
    If your run "VACUUM VERBOSE tablename" on the table, what does it say?

    You also don't mention which version of postgresql your using.

    Joshua D. Drake


    --
    PostgreSQL.org Major Contributor
    Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
    Consulting, Training, Support, Custom Development, Engineering
    http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
  • Divakar Singh at Oct 25, 2010 at 6:52 pm
    Answers:

    How are you using libpq?
    -Are you opening and closing the database connection between each insert?

    [Need to check, will come back on this]

    -Are you doing all of your inserts as one big transaction or are you doing a
    transaction per insert

    [Answer: for C++ program, one insert per transaction in PG as well as Oracle.
    But in stored proc, I think both use only 1 transaction for all inserts]

    -Are you using prepared statements for your inserts?

    [Need to check, will come back on this]

    -Are you using the COPY command to load your data or the INSERT command?

    [No]

    -Are you running your libpq program on the same server as postgresql?

    [Yes]

    -How is your libpq program connecting to postgresql, is it using ssl?

    [No]

    If your run "VACUUM VERBOSE tablename" on the table, what does it say?

    [Need to check, will come back on this]

    You also don't mention which version of postgresql your using.

    [Latest, 9.x]

    Best Regards,
    Divakar




    ________________________________
    From: Steve Singer <ssinger@ca.afilias.info>
    To: Divakar Singh <dpsmails@yahoo.com>
    Cc: jd@commandprompt.com; pgsql-performance@postgresql.org
    Sent: Tue, October 26, 2010 12:16:46 AM
    Subject: Re: [PERFORM] Postgres insert performance and storage requirement
    compared to Oracle
    On 10-10-25 02:31 PM, Divakar Singh wrote:

    My questions/scenarios are:

    1. How does PostgreSQL perform when inserting data into an indexed
    (type: btree)
    table? Is it true that as you add the indexes on a table, the
    performance
    deteriorates significantly whereas Oracle does not show that much
    performance
    decrease. I have tried almost all postgreSQL performance tips
    available. I want
    to have very good "insert" performance (with indexes), "select"
    performance is
    not that important at this point of time.
    -- Did you test?

    Yes. the performance was comparable when using SQL procedure. However,
    When I used libpq, PostgreSQL performed very bad. There was some
    difference in environment also between these 2 tests, but I am assuming
    libpq vs SQL was the real cause. Or it was something else?
    So your saying that when you load the data with psql it loads fine, but
    when you load it using libpq it takes much longer?

    How are you using libpq?
    -Are you opening and closing the database connection between each insert?
    -Are you doing all of your inserts as one big transaction or are you
    doing a transaction per insert
    -Are you using prepared statements for your inserts?
    -Are you using the COPY command to load your data or the INSERT command?
    -Are you running your libpq program on the same server as postgresql?
    -How is your libpq program connecting to postgresql, is it using ssl?
    Some 10-12 columns ( like 2 timestamp, 4 int and 4 varchar), with 5
    indexes on varchar and int fields including 1 implicit index coz of PK.
    If your run "VACUUM VERBOSE tablename" on the table, what does it say?

    You also don't mention which version of postgresql your using.

    Joshua D. Drake


    --
    PostgreSQL.org Major Contributor
    Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
    Consulting, Training, Support, Custom Development, Engineering
    http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt

    --
    Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-performance
  • Scott Marlowe at Oct 25, 2010 at 6:26 pm

    On Mon, Oct 25, 2010 at 12:12 PM, Divakar Singh wrote:
    Hello Experts,
    My application uses Oracle DB, and makes use of OCI interface.
    I have been able to develop similar interface using postgreSQL library.
    However, I have done some tests but results for PostgreSQL have not been
    encouraging for a few of them.
    Tell us more about your tests and results please.
  • Divakar Singh at Oct 25, 2010 at 6:36 pm
    Storage test was simple, but the data (seconds taken) for INSERT test for PG vs
    Oracle for 1, 2, 3,4 and 5 indexes was:
    PG:

    25
    30
    37
    42
    45

    Oracle:


    33
    43
    50
    65
    68 Rows inserted: 100,000
    Above results show good INSERT performance of PG when using SQL procedures. But
    performance when I use C++ lib is very bad. I did that test some time back so I
    do not have data for that right now.




    ________________________________
    From: Scott Marlowe <scott.marlowe@gmail.com>
    To: Divakar Singh <dpsmails@yahoo.com>
    Cc: pgsql-performance@postgresql.org
    Sent: Mon, October 25, 2010 11:56:27 PM
    Subject: Re: [PERFORM] Postgres insert performance and storage requirement
    compared to Oracle
    On Mon, Oct 25, 2010 at 12:12 PM, Divakar Singh wrote:
    Hello Experts,
    My application uses Oracle DB, and makes use of OCI interface.
    I have been able to develop similar interface using postgreSQL library.
    However, I have done some tests but results for PostgreSQL have not been
    encouraging for a few of them.
    Tell us more about your tests and results please.
  • Joshua D. Drake at Oct 25, 2010 at 6:39 pm

    On Mon, 2010-10-25 at 11:36 -0700, Divakar Singh wrote:

    68 Rows inserted: 100,000
    Above results show good INSERT performance of PG when using SQL
    procedures. But
    performance when I use C++ lib is very bad. I did that test some time
    back so I
    do not have data for that right now.
    This is interesting, are you using libpq or libpqXX?

    Joshua D. Drake

    --
    PostgreSQL.org Major Contributor
    Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
    Consulting, Training, Support, Custom Development, Engineering
    http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
  • Divakar Singh at Oct 25, 2010 at 6:49 pm
    Hi Joshua,
    I have been only using libpq.
    Is libpqXX better than the other?
    Is there any notable facility in libpqxx which could aid in fast inserts or
    better performance in general?

    Best Regards,
    Divakar




    ________________________________
    From: Joshua D. Drake <jd@commandprompt.com>
    To: Divakar Singh <dpsmails@yahoo.com>
    Cc: Scott Marlowe <scott.marlowe@gmail.com>; pgsql-performance@postgresql.org
    Sent: Tue, October 26, 2010 12:08:52 AM
    Subject: Re: [PERFORM] Postgres insert performance and storage requirement
    compared to Oracle
    On Mon, 2010-10-25 at 11:36 -0700, Divakar Singh wrote:

    68 Rows inserted: 100,000
    Above results show good INSERT performance of PG when using SQL
    procedures. But
    performance when I use C++ lib is very bad. I did that test some time
    back so I
    do not have data for that right now.
    This is interesting, are you using libpq or libpqXX?

    Joshua D. Drake

    --
    PostgreSQL.org Major Contributor
    Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
    Consulting, Training, Support, Custom Development, Engineering
    http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


    --
    Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-performance
  • Mladen Gogala at Oct 25, 2010 at 6:56 pm
    Profiling could tell you where is the time lost and where is your
    program spending time. Having experience with both Oracle and Postgres,
    I don't feel that there is much of a difference in the insert speed. I
    am not using C++, I am using scripting languages like Perl and PHP and,
    as far as inserts go, I don't see much of a difference. I have an
    application which inserts approximately 600,000 records into a
    PostgreSQL 9.0.1 per day, in chunks of up to 60,000 records every hour.
    The table is partitioned and there are indexes on the underlying
    partitions. I haven't noticed any problems with inserts. Also, if I use
    "copy" instead of the "insert" command, I can be even faster. In
    addition to that, PostgreSQL doesn't support index organized tables.

    Divakar Singh wrote:
    Storage test was simple, but the data (seconds taken) for INSERT test
    for PG vs Oracle for 1, 2, 3,4 and 5 indexes was:
    PG:
    25
    30
    37
    42
    45



    Oracle:

    33
    43
    50
    65
    68

    Rows inserted: 100,000
    Above results show good INSERT performance of PG when using SQL
    procedures. But performance when I use C++ lib is very bad. I did that
    test some time back so I do not have data for that right now.

    ------------------------------------------------------------------------
    *From:* Scott Marlowe <scott.marlowe@gmail.com>
    *To:* Divakar Singh <dpsmails@yahoo.com>
    *Cc:* pgsql-performance@postgresql.org
    *Sent:* Mon, October 25, 2010 11:56:27 PM
    *Subject:* Re: [PERFORM] Postgres insert performance and storage
    requirement compared to Oracle
    On Mon, Oct 25, 2010 at 12:12 PM, Divakar Singh wrote:
    Hello Experts,
    My application uses Oracle DB, and makes use of OCI interface.
    I have been able to develop similar interface using postgreSQL library.
    However, I have done some tests but results for PostgreSQL have not been
    encouraging for a few of them.
    Tell us more about your tests and results please.

    --

    Mladen Gogala
    Sr. Oracle DBA
    1500 Broadway
    New York, NY 10036
    (212) 329-5251
    http://www.vmsinfo.com
    The Leader in Integrated Media Intelligence Solutions
  • Scott Marlowe at Oct 25, 2010 at 7:26 pm

    On Mon, Oct 25, 2010 at 12:36 PM, Divakar Singh wrote:
    Storage test was simple, but the data (seconds taken) for INSERT test for PG vs Oracle for 1, 2, 3,4 and 5 indexes was:
    PG:
    25
    30
    37
    42
    45

    Oracle:

    33
    43
    50
    65
    68
    Rows inserted: 100,000
    Above results show good INSERT performance of PG when using SQL procedures. But performance when I use C++ lib is very bad. I did that test some time back so I do not have data for that right now.
    So, assuming I wanted to reproduce your results, can you provide a
    self contained test case that shows these differences? I have always
    gotten really good performance using libpq myself, so I'm looking for
    what it is you might be doing differently from me that would make it
    so slow.
  • Alan Hodgson at Oct 25, 2010 at 7:51 pm

    On October 25, 2010 11:36:24 am Divakar Singh wrote:
    Above results show good INSERT performance of PG when using SQL procedures.
    But performance when I use C++ lib is very bad. I did that test some time
    back so I do not have data for that right now.
    Wrap it in a transaction.
  • Josh Kupershmidt at Oct 25, 2010 at 6:33 pm

    On Mon, Oct 25, 2010 at 2:12 PM, Divakar Singh wrote:
    1. How does PostgreSQL perform when inserting data into an indexed (type:
    btree) table? Is it true that as you add the indexes on a table, the
    performance deteriorates significantly whereas Oracle does not show that
    much performance decrease. I have tried almost all postgreSQL performance
    tips available. I want to have very good "insert" performance (with
    indexes), "select" performance is not that important at this point of time.
    I don't claim to have any experience with Oracle, but this boast
    smells fishy. See for example Figure 3-2 (pp. 57-58) in "The Art of
    SQL", where the author presents simple charts showing the performance
    impact upon INSERTs of adding indexes to a table in Oracle and MySQL:
    they're both in the same ballpark, and the performance impact is
    indeed significant. As Joshua Drake suggests, table schemas and test
    results would help your case.

    Josh
  • Ray Stell at Oct 25, 2010 at 7:22 pm

    On Mon, Oct 25, 2010 at 11:39:30AM -0700, Divakar Singh wrote:
    Thanks Ray,
    Already seen that, but it does not tell about storage requirement compared to
    Oracle. I find it takes 2 times space than oracle.


    Best Regards,
    Divakar
    ________________________________
    From: Ray Stell <stellr@cns.vt.edu>
    To: Divakar Singh <dpsmails@yahoo.com>
    Sent: Tue, October 26, 2010 12:05:23 AM
    Subject: Re: [PERFORM] Postgres insert performance and storage requirement
    compared to Oracle
    On Mon, Oct 25, 2010 at 11:12:40AM -0700, Divakar Singh wrote:

    2. What are the average storage requirements of postgres compared to Oracle? I
    inserted upto 1 million records. The storage requirement of postgreSQL is
    almost

    double than that of Oracle.
    there's a fine manual:
    http://www.postgresql.org/docs/9.0/interactive/storage.html

    Maybe compare to oracle's storage documentation:

    http://download.oracle.com/docs/cd/E11882_01/server.112/e17118/sql_elements001.htm#SQLRF30020
    http://download.oracle.com/docs/cd/E11882_01/server.112/e17120/schema007.htm#ADMIN11622

    I don't believe for a second the byte count is double in pg, but that's just
    a religious expression, I've never counted.
  • Merlin Moncure at Oct 25, 2010 at 8:29 pm

    On Mon, Oct 25, 2010 at 2:12 PM, Divakar Singh wrote:
    Hello Experts,
    My application uses Oracle DB, and makes use of OCI interface.
    I have been able to develop similar interface using postgreSQL library.
    However, I have done some tests but results for PostgreSQL have not been
    encouraging for a few of them.

    My questions/scenarios are:

    1. How does PostgreSQL perform when inserting data into an indexed (type:
    btree) table? Is it true that as you add the indexes on a table, the
    performance deteriorates significantly whereas Oracle does not show that
    much performance decrease. I have tried almost all postgreSQL performance
    tips available. I want to have very good "insert" performance (with
    indexes), "select" performance is not that important at this point of time.

    2. What are the average storage requirements of postgres compared to Oracle?
    I inserted upto 1 million records. The storage requirement of postgreSQL is
    almost double than that of Oracle.
    u
    Thanks in anticipation.
    I ran the following tests w/libpqtypes. While you probably wont end
    up using libpqtypes, it's illustrative to mention it because it's
    generally the easiest way to get data into postgres and by far the
    fastest (excluding 'COPY'). source code follows after the sig (I
    banged it out quite quickly, it's messy!) :-). I am not seeing your
    results.

    via libpqtypes: Inserting, begin..insert..(repeat 1000000x) commit;
    local workstation: 2m24s
    remote server: 8m8s

    via libpqtypes, but stacking array and unstacking on server (this
    could be optimized further by using local prepare):
    local workstation: 43s (io bound)
    remote server: 29s (first million)
    remote server: 29s (second million)
    create index (1.8s) remote
    remote server: 33s (third million, w/index)

    obviously insert at a time tests are network bound. throw a couple of
    indexes in there and you should see some degradation, but nothing too
    terrible.

    merlin
    libpqtypes.esilo.com

    ins1.c (insert at a time)
    #include "libpq-fe.h"
    #include "libpqtypes.h"

    #define INS_COUNT 1000000

    int main()
    {
    int i;

    PGconn *conn = PQconnectdb("host=devdb dbname=postgres port=8071");
    if(PQstatus(conn) != CONNECTION_OK)
    {
    printf("bad connection");
    return -1;
    }

    PQtypesRegister(conn);

    PQexec(conn, "begin");

    for(i=0; i<INS_COUNT; i++)
    {
    PGint4 a=i;
    PGtext b = "some_text";
    PGtimestamp c;
    PGbytea d;

    d.len = 8;
    d.data = b;

    c.date.isbc = 0;
    c.date.year = 2000;
    c.date.mon = 0;
    c.date.mday = 19;
    c.time.hour = 10;
    c.time.min = 41;
    c.time.sec = 6;
    c.time.usec = 0;
    c.time.gmtoff = -18000;

    PGresult *res = PQexecf(conn, "insert into ins_test(a,b,c,d)
    values(%int4, %text, %timestamptz, %bytea)", a, b, &c, &d);

    if(!res)
    {
    printf("got %s\n", PQgeterror());
    return -1;
    }
    PQclear(res);
    }

    PQexec(conn, "commit");

    PQfinish(conn);
    }


    ins2.c (array stack/unstack)
    #include "libpq-fe.h"
    #include "libpqtypes.h"

    #define INS_COUNT 1000000

    int main()
    {
    int i;

    PGconn *conn = PQconnectdb("host=devdb dbname=postgres port=8071");
    PGresult *res;
    if(PQstatus(conn) != CONNECTION_OK)
    {
    printf("bad connection");
    return -1;
    }

    PQtypesRegister(conn);

    PGregisterType type = {"ins_test", NULL, NULL};
    PQregisterComposites(conn, &type, 1);

    PGparam *p = PQparamCreate(conn);
    PGarray arr;
    arr.param = PQparamCreate(conn);
    arr.ndims = 0;

    for(i=0; i<INS_COUNT; i++)
    {
    PGint4 a=i;
    PGtext b = "some_text";
    PGtimestamp c;
    PGbytea d;
    PGparam *i = PQparamCreate(conn);

    d.len = 8;
    d.data = b;

    c.date.isbc = 0;
    c.date.year = 2000;
    c.date.mon = 0;
    c.date.mday = 19;
    c.time.hour = 10;
    c.time.min = 41;
    c.time.sec = 6;
    c.time.usec = 0;
    c.time.gmtoff = -18000;

    PQputf(i, "%int4 %text %timestamptz %bytea", a, b, &c, &d);
    PQputf(arr.param, "%ins_test", i);
    }

    if(!PQputf(p, "%ins_test[]", &arr))
    {
    printf("putf failed: %s\n", PQgeterror());
    return -1;
    }
    res = PQparamExec(conn, p, "insert into ins_test select (unnest($1)).*", 1);

    if(!res)
    {
    printf("got %s\n", PQgeterror());
    return -1;
    }
    PQclear(res);
    PQfinish(conn);
    }
  • Merlin Moncure at Oct 25, 2010 at 8:51 pm

    On Mon, Oct 25, 2010 at 4:28 PM, Merlin Moncure wrote:
    I ran the following tests w/libpqtypes.  While you probably wont end
    up using libpqtypes, it's illustrative to mention it because it's
    generally the easiest way to get data into postgres and by far the
    fastest (excluding 'COPY').  source code follows after the sig (I
    banged it out quite quickly, it's messy!) :-).  I am not seeing your
    results.
    I had a really horrible bug in there -- leaking a param inside the
    array push loop. cleaning it up dropped another 5 seconds or so from
    the 4th million inserted to the remote server!. Using local prepare
    (PQspecPrepare) prob another second or two could be shaved off.

    PGparam *t = PQparamCreate(conn);

    for(i=0; i<INS_COUNT; i++)
    {
    PGint4 a=i;
    PGtext b = "some_text";
    PGtimestamp c;
    PGbytea d;

    d.len = 8;
    d.data = b;

    c.date.isbc = 0;
    c.date.year = 2000;
    c.date.mon = 0;
    c.date.mday = 19;
    c.time.hour = 10;
    c.time.min = 41;
    c.time.sec = 6;
    c.time.usec = 0;
    c.time.gmtoff = -18000;

    PQputf(t, "%int4 %text %timestamptz %bytea", a, b, &c, &d);
    PQputf(arr.param, "%ins_test", t);
    PQparamReset(t);
    }

    merlin
  • Divakar Singh at Oct 26, 2010 at 11:44 am
    Hi Merlin,
    Thanks for your quick input.
    Well 1 difference worth mentioning:
    I am inserting each row in a separate transaction, due to design of my program.

    -Divakar




    ________________________________
    From: Merlin Moncure <mmoncure@gmail.com>
    To: Divakar Singh <dpsmails@yahoo.com>
    Cc: pgsql-performance@postgresql.org
    Sent: Tue, October 26, 2010 2:21:02 AM
    Subject: Re: [PERFORM] Postgres insert performance and storage requirement
    compared to Oracle
    On Mon, Oct 25, 2010 at 4:28 PM, Merlin Moncure wrote:
    I ran the following tests w/libpqtypes. While you probably wont end
    up using libpqtypes, it's illustrative to mention it because it's
    generally the easiest way to get data into postgres and by far the
    fastest (excluding 'COPY'). source code follows after the sig (I
    banged it out quite quickly, it's messy!) :-). I am not seeing your
    results.
    I had a really horrible bug in there -- leaking a param inside the
    array push loop. cleaning it up dropped another 5 seconds or so from
    the 4th million inserted to the remote server!. Using local prepare
    (PQspecPrepare) prob another second or two could be shaved off.

    PGparam *t = PQparamCreate(conn);

    for(i=0; i<INS_COUNT; i++)
    {
    PGint4 a=i;
    PGtext b = "some_text";
    PGtimestamp c;
    PGbytea d;

    d.len = 8;
    d.data = b;

    c.date.isbc = 0;
    c.date.year = 2000;
    c.date.mon = 0;
    c.date.mday = 19;
    c.time.hour = 10;
    c.time.min = 41;
    c.time.sec = 6;
    c.time.usec = 0;
    c.time.gmtoff = -18000;

    PQputf(t, "%int4 %text %timestamptz %bytea", a, b, &c, &d);
    PQputf(arr.param, "%ins_test", t);
    PQparamReset(t);
    }

    merlin

    --
    Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-performance
  • Merlin Moncure at Oct 26, 2010 at 1:20 pm

    On Tue, Oct 26, 2010 at 7:44 AM, Divakar Singh wrote:
    Hi Merlin,
    Thanks for your quick input.
    Well 1 difference worth mentioning:
    I am inserting each row in a separate transaction, due to design of my
    program.
    Well, that right there is going to define your application
    performance. You have basically three major issues -- postgresql
    executes each query synchronously through the protocol, transaction
    overhead, and i/o issues coming from per transaction sync. libpq
    supports asynchronous queries, but only from the clients point of view
    -- so that this only helps if you have non trivial work to do setting
    up each query. The database is inherently capable of doing what you
    want it to do...you may just have to rethink certain things if you
    want to unlock the true power of postgres...

    You have several broad areas of attack:
    *) client side: use prepared queries (PQexecPrepared) possibly
    asynchronously (PQsendPrepared). Reasonably you can expect 5-50%
    speedup if not i/o bound
    *) Stage data to a temp table: temp tables are not wal logged or
    synced. Periodically they can be flushed to a permanent table.
    Possible data loss
    *) Relax sync policy (synchronous_commit/fsync) -- be advised these
    settings are dangerous
    *) Multiple client writers -- as long as you are not i/o bound, you
    will see big improvements in tps from multiple clients
    *) Stage/queue application data before inserting it -- requires
    retooling application, but you can see orders of magnitude jump insert
    performance

    merlin
  • Leonardo Francalanci at Oct 26, 2010 at 3:08 pm

    temp tables are not wal logged or
    synced. Periodically they can be flushed to a permanent table.

    What do you mean with "Periodically they can be flushed to
    a permanent table"? Just doing

    insert into tabb select * from temptable

    or using a proper, per-temporary table command???
  • Merlin Moncure at Oct 26, 2010 at 3:42 pm

    On Tue, Oct 26, 2010 at 11:08 AM, Leonardo Francalanci wrote:
    temp  tables are not wal logged or
    synced.  Periodically they can be flushed  to a permanent table.

    What do you mean with "Periodically they can be flushed  to
    a permanent table"? Just doing

    insert into tabb select * from temptable
    yup, that's exactly what I mean -- this will give you more uniform
    insert performance (your temp table doesn't even need indexes). Every
    N records (say 10000) you send to permanent and truncate the temp
    table. Obviously, this is more fragile approach so weigh the
    pros/cons carefully.

    merlin
  • Mladen Gogala at Oct 26, 2010 at 9:03 pm

    On 10/26/2010 11:41 AM, Merlin Moncure wrote:
    yup, that's exactly what I mean -- this will give you more uniform
    insert performance (your temp table doesn't even need indexes). Every
    N records (say 10000) you send to permanent and truncate the temp
    table. Obviously, this is more fragile approach so weigh the
    pros/cons carefully.

    merlin
    Truncate temporary table? What a horrible advice! All that you need is
    the temporary table to delete rows on commit.

    --

    Mladen Gogala
    Sr. Oracle DBA
    1500 Broadway
    New York, NY 10036
    (212) 329-5251
    http://www.vmsinfo.com
    The Leader in Integrated Media Intelligence Solutions
  • Jon Nelson at Oct 26, 2010 at 9:27 pm

    On Tue, Oct 26, 2010 at 4:02 PM, Mladen Gogala wrote:
    On 10/26/2010 11:41 AM, Merlin Moncure wrote:

    yup, that's exactly what I mean -- this will give you more uniform
    insert performance (your temp table doesn't even need indexes).  Every
    N records (say 10000) you send to permanent and truncate the temp
    table.  Obviously, this is more fragile approach so weigh the
    pros/cons carefully.

    merlin
    Truncate temporary table? What a horrible advice! All that you need is the
    temporary table to delete rows on commit.
    I believe Merlin was suggesting that, after doing 10000 inserts into
    the temporary table, that something like this might work better:

    start loop:
    populate rows in temporary table
    insert from temporary table into permanent table
    truncate temporary table
    loop

    I do something similar, where I COPY data to a temporary table, do
    lots of manipulations, and then perform a series of INSERTS from the
    temporary table into a permanent table.

    --
    Jon
  • Mladen Gogala at Oct 26, 2010 at 9:55 pm

    On 10/26/2010 5:27 PM, Jon Nelson wrote:
    start loop:
    populate rows in temporary table
    insert from temporary table into permanent table
    truncate temporary table
    loop

    I do something similar, where I COPY data to a temporary table, do
    lots of manipulations, and then perform a series of INSERTS from the
    temporary table into a permanent table.
    1) It's definitely not faster because you have to insert into the
    temporary table, in addition to inserting into the permanent table.
    2) This is what I had in mind:

    mgogala=# create table a(c1 int);
    CREATE TABLE
    mgogala=# create temporary table t1(c1 int) on commit delete rows;
    CREATE TABLE
    mgogala=# begin;
    BEGIN
    mgogala=# insert into t1 select generate_series(1,1000);
    INSERT 0 1000
    mgogala=# insert into a select * from t1;
    INSERT 0 1000
    mgogala=# commit;
    COMMIT
    mgogala=# select count(*) from a;
    count
    -------
    1000
    (1 row)

    mgogala=# select count(*) from t1;
    count
    -------

    (1 row)

    The table is created with "on commit obliterate rows" option which means
    that there is no need to do "truncate". The "truncate" command is a
    heavy artillery. Truncating a temporary table is like shooting ducks in
    a duck pond, with a howitzer.

    --

    Mladen Gogala
    Sr. Oracle DBA
    1500 Broadway
    New York, NY 10036
    (212) 329-5251
    http://www.vmsinfo.com
    The Leader in Integrated Media Intelligence Solutions
  • Merlin Moncure at Oct 26, 2010 at 10:14 pm

    On Tue, Oct 26, 2010 at 5:54 PM, Mladen Gogala wrote:
    On 10/26/2010 5:27 PM, Jon Nelson wrote:

    start loop:
    populate rows in temporary table
    insert from temporary table into permanent table
    truncate temporary table
    loop

    I do something similar, where I COPY data to a temporary table, do
    lots of manipulations, and then perform a series of INSERTS from the
    temporary table into a permanent table.
    1) It's definitely not faster because you have to insert into the temporary
    table, in addition to inserting into the permanent table.
    2) This is what I had in mind:

    mgogala=# create table a(c1 int);
    CREATE TABLE
    mgogala=# create temporary table t1(c1 int) on commit delete rows;
    CREATE TABLE
    mgogala=# begin;
    BEGIN
    mgogala=# insert into t1 select generate_series(1,1000);
    INSERT 0 1000
    mgogala=# insert into a select * from t1;
    INSERT 0 1000
    mgogala=# commit;
    COMMIT
    mgogala=# select count(*) from a;
    count
    -------
    1000
    (1 row)

    mgogala=# select count(*) from t1;
    count
    -------

    (1 row)

    The table is created with "on commit obliterate rows" option which means
    that there is no need to do "truncate". The "truncate" command is a heavy
    artillery. Truncating a temporary table is like shooting ducks in a duck
    pond, with a howitzer.
    You are not paying attention ;-). Look upthread: "I am inserting each
    row in a separate transaction, due to design of my program." (also on
    commit/drop is no picnic either, but I digress...)

    merlin
  • Robert Haas at Oct 26, 2010 at 10:50 pm

    On Tue, Oct 26, 2010 at 5:54 PM, Mladen Gogala wrote:
    The table is created with "on commit obliterate rows" option which means
    that there is no need to do "truncate". The "truncate" command is a heavy
    artillery. Truncating a temporary table is like shooting ducks in a duck
    pond, with a howitzer.
    This is just not true. ON COMMIT DELETE ROWS simply arranges for a
    TRUNCATE to happen immediately before each commit. See
    PreCommit_on_commit_actions() in tablecmds.c.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Merlin Moncure at Oct 26, 2010 at 11:17 pm

    On Tue, Oct 26, 2010 at 6:50 PM, Robert Haas wrote:
    On Tue, Oct 26, 2010 at 5:54 PM, Mladen Gogala
    wrote:
    The table is created with "on commit obliterate rows" option which means
    that there is no need to do "truncate". The "truncate" command is a heavy
    artillery. Truncating a temporary table is like shooting ducks in a duck
    pond, with a howitzer.
    This is just not true.  ON COMMIT DELETE ROWS simply arranges for a
    TRUNCATE to happen immediately before each commit.  See
    PreCommit_on_commit_actions() in tablecmds.c.
    quite so. If you are doing anything performance sensitive with 'on
    commit drop', you are better off organizing a cache around
    txid_current() (now(), pid for older pg versions). Skips the writes
    to the system catalogs and truncate.

    merlin
  • Divakar Singh at Oct 27, 2010 at 3:11 am
    Dear All,
    Thanks for your inputs on the insert performance part.
    Any suggestion on storage requirement?
    VACUUM is certainly not an option, because this is something related to
    maintenance AFTER insertion.

    I am talking about the plain storage requirement w.r. to Oracle, which I
    observed is twice of Oracle in case millions of rows are inserted.
    Anybody who tried to analyze the average storage requirement of PG w.r. to
    Oracle?

    Best Regards,
    Divakar




    ________________________________
    From: Merlin Moncure <mmoncure@gmail.com>
    To: Robert Haas <robertmhaas@gmail.com>
    Cc: Mladen Gogala <mladen.gogala@vmsinfo.com>; pgsql-performance@postgresql.org
    Sent: Wed, October 27, 2010 4:46:53 AM
    Subject: Re: [PERFORM] Postgres insert performance and storage requirement
    compared to Oracle
    On Tue, Oct 26, 2010 at 6:50 PM, Robert Haas wrote:
    On Tue, Oct 26, 2010 at 5:54 PM, Mladen Gogala
    wrote:
    The table is created with "on commit obliterate rows" option which means
    that there is no need to do "truncate". The "truncate" command is a heavy
    artillery. Truncating a temporary table is like shooting ducks in a duck
    pond, with a howitzer.
    This is just not true. ON COMMIT DELETE ROWS simply arranges for a
    TRUNCATE to happen immediately before each commit. See
    PreCommit_on_commit_actions() in tablecmds.c.
    quite so. If you are doing anything performance sensitive with 'on
    commit drop', you are better off organizing a cache around
    txid_current() (now(), pid for older pg versions). Skips the writes
    to the system catalogs and truncate.

    merlin

    --
    Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-performance
  • Merlin Moncure at Oct 27, 2010 at 6:06 pm

    On Tue, Oct 26, 2010 at 11:10 PM, Divakar Singh wrote:
    Dear All,
    Thanks for your inputs on the insert performance part.
    Any suggestion on storage requirement?
    VACUUM is certainly not an option, because this is something related to
    maintenance AFTER insertion.
    I am talking about the plain storage requirement w.r. to Oracle, which I
    observed is twice of Oracle in case millions of rows are inserted.
    Anybody who tried to analyze the average storage requirement of PG w.r. to
    Oracle?
    There isn't much you can to about storage use other than avoid stupid
    things (like using char() vs varchar()), smart table layout, toast
    compression, etc. Are you sure this is a problem?

    merlin
  • Divakar Singh at Oct 27, 2010 at 6:14 pm
    yes this is a very clearly visible problem.
    The difference b/w oracle and PG increases with more rows.
    when oracle takes 3 GB, PG takes around 6 GB.
    I only use varchar.
    I will try to use your tips on "smart table layout, toast compression".
    Assuming these suggested options do not have any performance penalty?

    Best Regards,
    Divakar




    ________________________________
    From: Merlin Moncure <mmoncure@gmail.com>
    To: Divakar Singh <dpsmails@yahoo.com>
    Cc: Robert Haas <robertmhaas@gmail.com>; Mladen Gogala
    <mladen.gogala@vmsinfo.com>; pgsql-performance@postgresql.org
    Sent: Wed, October 27, 2010 11:36:00 PM
    Subject: Re: [PERFORM] Postgres insert performance and storage requirement
    compared to Oracle
    On Tue, Oct 26, 2010 at 11:10 PM, Divakar Singh wrote:
    Dear All,
    Thanks for your inputs on the insert performance part.
    Any suggestion on storage requirement?
    VACUUM is certainly not an option, because this is something related to
    maintenance AFTER insertion.
    I am talking about the plain storage requirement w.r. to Oracle, which I
    observed is twice of Oracle in case millions of rows are inserted.
    Anybody who tried to analyze the average storage requirement of PG w.r. to
    Oracle?
    There isn't much you can to about storage use other than avoid stupid
    things (like using char() vs varchar()), smart table layout, toast
    compression, etc. Are you sure this is a problem?

    merlin
  • Merlin Moncure at Oct 27, 2010 at 6:28 pm

    On Wed, Oct 27, 2010 at 2:14 PM, Divakar Singh wrote:
    yes this is a very clearly visible problem.
    The difference b/w oracle and PG increases with more rows.
    when oracle takes 3 GB, PG takes around 6 GB.
    I only use varchar.
    I will try to use your tips on "smart table layout, toast compression".
    Assuming these suggested options do not have any performance penalty?
    These will only be helpful in particular cases, for example if your
    layout is bad :-). toast compression is for dealing with large datums
    (on by default iirc). Also it's very hard to get apples to apples
    comparison test via synthetic insertion benchmark. It's simply not
    the whole story.

    The deal with postgres is that things are pretty optimized and fairly
    unlikely to get a whole lot better than they are today. The table
    layout is pretty optimal already, nulls are bitmaps, data lengths are
    using fancy bitwise length mechanism, etc. Each record in postgres
    has a 20 byte header that has to be factored in to any storage
    estimation, plus the index usage.

    Postgres indexes are pretty compact, and oracle (internals I am not
    familiar with) also has to do MVCC type management, so I am suspecting
    your measurement is off (aka, operator error) or oracle is cheating
    somehow by optimizing away storage requirements somehow via some sort
    of tradeoff. However you still fail to explain why storage size is a
    problem. Are planning to port oracle to postgres on a volume that is
    50% full? :-)
    merlin
  • Jesper Krogh at Oct 27, 2010 at 6:42 pm

    On 2010-10-27 20:28, Merlin Moncure wrote:
    Postgres indexes are pretty compact, and oracle (internals I am not
    familiar with) also has to do MVCC type management, so I am suspecting
    your measurement is off (aka, operator error) or oracle is cheating
    somehow by optimizing away storage requirements somehow via some sort
    of tradeoff. However you still fail to explain why storage size is a
    problem. Are planning to port oracle to postgres on a volume that is
    50% full? :-)
    Pretty ignorant comment.. sorry ..

    But when your database approaches something that is not mainly
    fitting in memory, space directly translates into speed and a more
    compact table utillizes the OS-page cache better. This is both
    true for index and table page caching.

    And the more compact your table the later you hit the stage where
    you cant fit into memory anymore.

    .. but if above isn't issues, then your statements are true.

    --
    Jesper
  • Merlin Moncure at Oct 27, 2010 at 6:51 pm

    On Wed, Oct 27, 2010 at 2:42 PM, Jesper Krogh wrote:
    On 2010-10-27 20:28, Merlin Moncure wrote:

    Postgres indexes are pretty compact, and oracle (internals I am not
    familiar with) also has to do MVCC type management, so I am suspecting
    your measurement is off (aka, operator error) or oracle is cheating
    somehow by optimizing away storage requirements somehow via some sort
    of tradeoff.  However you still fail to explain why storage size is a
    problem.  Are planning to port oracle to postgres on a volume that is
    50% full? :-)
    Pretty ignorant comment.. sorry ..

    But when your database approaches something that is not mainly
    fitting in memory, space directly translates into speed and a more
    compact table utillizes the OS-page cache better. This is both
    true for index and table page caching.

    And the more compact your table the later you hit the stage where
    you cant fit into memory anymore.

    .. but if above isn't issues, then your statements are true.
    Yes, I am quite aware of how the o/s page cache works. All else being
    equal, I more compact database obviously would be preferred. However
    'all else' is not necessarily equal. I can mount my database on bzip
    volume, that must make it faster, right? wrong. I understand the
    postgres storage architecture pretty well, and the low hanging fruit
    having been grabbed further layout compression is only going to come
    as a result of tradeoffs.

    Now, comparing oracle vs postgres, mvcc works differently because
    oracle uses rollback logs while postgres maintains extra/old versions
    in the heap. This will add up to big storage usage based on various
    things, but should not so much be reflected via insert only test.

    merlin
  • Jesper Krogh at Oct 27, 2010 at 7:47 pm

    On 2010-10-27 20:51, Merlin Moncure wrote:

    Yes, I am quite aware of how the o/s page cache works. All else being
    equal, I more compact database obviously would be preferred. However
    'all else' is not necessarily equal. I can mount my database on bzip
    volume, that must make it faster, right? wrong. I understand the
    postgres storage architecture pretty well, and the low hanging fruit
    having been grabbed further layout compression is only going to come
    as a result of tradeoffs.
    Or configureabillity.. Not directly related to overall space consumption
    but I have been working on a patch that would make TOAST* kick in
    earlier in the process, giving a "slimmer" main table with visibillity
    information
    and simple columns and moving larger colums more aggressively to TOAST.

    The overall disadvantage of TOAST is the need for an extra disk seek if
    you actually need the data. If the application rarely needs the large
    columns but often do count/filtering on simple values this will eventually
    lead to a better utillization of the OS-page-cache with a very small
    overhead
    to PG (in terms of code) and 0 overhead in the applications that benefit.

    Keeping in mind that as SSD-drives get more common the "the extra disk seek"
    drops dramatically, but the drive is by itself probably still 100-1000x
    slower than
    main memory, so keeping "the right data" in the OS-cache is also a
    parameter.

    If you deal with data where the individual tuple-size goes up, currently
    TOAST
    first kicks in at 2KB (compressed size) which leads to a very sparse
    main table
    in terms of visibillity information and count and selects on simple values
    will drag a huge amount of data into the cache-layers thats not needed
    there.

    Another suggestion could be to make the compression of text columns kick in
    earlier .. if thats possible. (I dont claim that its achiveable)

    Unless the tuple-header is hugely bloated I have problems creating a
    situation in my
    head where hammering that one can change anything significantly.

    * http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg159726.html

    --
    Jesper
  • Merlin Moncure at Oct 28, 2010 at 1:13 pm

    On Wed, Oct 27, 2010 at 3:47 PM, Jesper Krogh wrote:
    On 2010-10-27 20:51, Merlin Moncure wrote:

    Yes, I am quite aware of how the o/s page cache works.  All else being
    equal, I more compact database obviously would be preferred.  However
    'all else' is not necessarily equal.  I can mount my database on bzip
    volume, that must make it faster, right?  wrong.  I understand the
    postgres storage architecture pretty well, and the low hanging fruit
    having been grabbed further layout compression is only going to come
    as a result of tradeoffs.
    Or configureabillity.. Not directly related to overall space consumption
    but I have been working on a patch that would make TOAST* kick in
    earlier in the process, giving a "slimmer" main table with visibillity
    information
    and simple columns and moving larger colums more aggressively to TOAST.
    Do you have any benchmarks supporting if/when such a change would be beneficial?

    merlin
  • Jesper Krogh at Oct 28, 2010 at 3:29 pm

    On 2010-10-28 15:13, Merlin Moncure wrote:
    On Wed, Oct 27, 2010 at 3:47 PM, Jesper Kroghwrote:
    On 2010-10-27 20:51, Merlin Moncure wrote:

    Yes, I am quite aware of how the o/s page cache works. All else being
    equal, I more compact database obviously would be preferred. However
    'all else' is not necessarily equal. I can mount my database on bzip
    volume, that must make it faster, right? wrong. I understand the
    postgres storage architecture pretty well, and the low hanging fruit
    having been grabbed further layout compression is only going to come
    as a result of tradeoffs.
    Or configureabillity.. Not directly related to overall space consumption
    but I have been working on a patch that would make TOAST* kick in
    earlier in the process, giving a "slimmer" main table with visibillity
    information
    and simple columns and moving larger colums more aggressively to TOAST.
    Do you have any benchmarks supporting if/when such a change would be beneficial?
    On, IO-bound queries it pretty much translates to the ration between
    the toast-table-size vs. the main-table-size.

    Trying to aggressively speed up "select count(*) from table" gives this:
    http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg146153.html
    with shutdown of pg and drop caches inbetween... the "default" select
    count (*) on 50K tuples
    gives 4.613ms (2 tuples pr page) vs. 318ms... (8 tuples pr page).

    PG default is inbetween...


    --
    Jesper
  • Steve Singer at Oct 27, 2010 at 6:51 pm

    On 10-10-27 02:14 PM, Divakar Singh wrote:
    yes this is a very clearly visible problem.
    The difference b/w oracle and PG increases with more rows.
    when oracle takes 3 GB, PG takes around 6 GB.
    I only use varchar.
    I will try to use your tips on "smart table layout, toast compression".
    Assuming these suggested options do not have any performance penalty?
    Best Regards,
    Divakar
    In between test runs are you cleaning out the tables with a "DELETE FROM
    aaaaa" or are you using the TRUNCATE command? Or dropping the table and
    recreating it.

    If your just using DELETE it might be that disk space is still being
    used by the old versions of the rows.

    Also is postgresql using more space than oracle for storing the index
    data or the main table data? and is any particular index larger on
    postgresql compared to Oracle.




    ------------------------------------------------------------------------
    *From:* Merlin Moncure <mmoncure@gmail.com>
    *To:* Divakar Singh <dpsmails@yahoo.com>
    *Cc:* Robert Haas <robertmhaas@gmail.com>; Mladen Gogala
    <mladen.gogala@vmsinfo.com>; pgsql-performance@postgresql.org
    *Sent:* Wed, October 27, 2010 11:36:00 PM
    *Subject:* Re: [PERFORM] Postgres insert performance and storage
    requirement compared to Oracle
    On Tue, Oct 26, 2010 at 11:10 PM, Divakar Singh wrote:
    Dear All,
    Thanks for your inputs on the insert performance part.
    Any suggestion on storage requirement?
    VACUUM is certainly not an option, because this is something related to
    maintenance AFTER insertion.
    I am talking about the plain storage requirement w.r. to Oracle, which I
    observed is twice of Oracle in case millions of rows are inserted.
    Anybody who tried to analyze the average storage requirement of PG w.r. to
    Oracle?
    There isn't much you can to about storage use other than avoid stupid
    things (like using char() vs varchar()), smart table layout, toast
    compression, etc. Are you sure this is a problem?

    merlin
  • Scott Carey at Oct 27, 2010 at 5:47 pm

    On Oct 26, 2010, at 2:54 PM, Mladen Gogala wrote:
    On 10/26/2010 5:27 PM, Jon Nelson wrote:
    start loop:
    populate rows in temporary table
    insert from temporary table into permanent table
    truncate temporary table
    loop

    I do something similar, where I COPY data to a temporary table, do
    lots of manipulations, and then perform a series of INSERTS from the
    temporary table into a permanent table.
    1) It's definitely not faster because you have to insert into the
    temporary table, in addition to inserting into the permanent table.
    It is almost always significantly faster than a direct bulk load into a table.
    * The temp table has no indexes, the final table usually does, bulk operations on indexes are faster than per row operations.
    * The final table might require both updates and inserts, doing these in bulk from a temp stage table is far faster than per row.
    * You don't even have to commit after the merge from the temp table, and can loop until its all done, then commit -- though this can have table/index bloat implications if doing updates.
    2) This is what I had in mind:

    mgogala=# create table a(c1 int);
    CREATE TABLE
    mgogala=# create temporary table t1(c1 int) on commit delete rows;
    CREATE TABLE
    mgogala=# begin;
    BEGIN
    mgogala=# insert into t1 select generate_series(1,1000);
    INSERT 0 1000
    mgogala=# insert into a select * from t1;
    INSERT 0 1000
    mgogala=# commit;
    COMMIT
    mgogala=# select count(*) from a;
    count
    -------
    1000
    (1 row)

    mgogala=# select count(*) from t1;
    count
    -------

    (1 row)

    The table is created with "on commit obliterate rows" option which means
    that there is no need to do "truncate". The "truncate" command is a
    heavy artillery. Truncating a temporary table is like shooting ducks in
    a duck pond, with a howitzer.
    ??? Test it. DELETE is slow, truncate is nearly instantaneous for normal tables. For temp tables its the same thing. Maybe in Oracle TRUNCATE is a howitzer, in Postgres its lightweight. Your loop above requires a commit after every 1000 rows. What if you require that all rows are seen at once or not at all? What if you fail part way through? One big transaction is often a better idea and/or required. Especially in postgres, with no undo-log, bulk inserts in one large transaction work out very well -- usually better than multiple smaller transactions.
    --

    Mladen Gogala
    Sr. Oracle DBA
    1500 Broadway
    New York, NY 10036
    (212) 329-5251
    http://www.vmsinfo.com
    The Leader in Integrated Media Intelligence Solutions




    --
    Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-performance
  • Mladen Gogala at Oct 27, 2010 at 6:07 pm

    On 10/27/2010 1:48 PM, Scott Carey wrote:
    It is almost always significantly faster than a direct bulk load into a table.
    * The temp table has no indexes, the final table usually does, bulk operations on indexes are faster than per row operations.
    * The final table might require both updates and inserts, doing these in bulk from a temp stage table is far faster than per row.
    * You don't even have to commit after the merge from the temp table, and can loop until its all done, then commit -- though this can have table/index bloat implications if doing updates.
    Scott, I find this very hard to believe. If you are inserting into a
    temporary table and then into the target table, you will do 2 inserts
    instead of just one. What you are telling me is that it is faster for me
    to drive from NYC to Washington DC by driving first to Miami and then
    from Miami to DC.
    2) This is what I had in mind:

    mgogala=# create table a(c1 int);
    CREATE TABLE
    mgogala=# create temporary table t1(c1 int) on commit delete rows;
    CREATE TABLE
    mgogala=# begin;
    BEGIN
    mgogala=# insert into t1 select generate_series(1,1000);
    INSERT 0 1000
    mgogala=# insert into a select * from t1;
    INSERT 0 1000
    mgogala=# commit;
    COMMIT
    mgogala=# select count(*) from a;
    count
    -------
    1000
    (1 row)

    mgogala=# select count(*) from t1;
    count
    -------

    (1 row)

    The table is created with "on commit obliterate rows" option which means
    that there is no need to do "truncate". The "truncate" command is a
    heavy artillery. Truncating a temporary table is like shooting ducks in
    a duck pond, with a howitzer.
    ??? Test it. DELETE is slow, truncate is nearly instantaneous for normal tables. For temp tables its the same thing. Maybe in Oracle TRUNCATE is a howitzer, in Postgres its lightweight.
    Truncate has specific list of tasks to do:
    1) lock the table in the exclusive mode to prevent concurrent
    transactions on the table.
    2) Release the file space and update the table headers.
    3) Flush any buffers possibly residing in shared memory.
    4) Repeat the procedures on the indexes.

    Of course, in case of the normal table, all of these changes are logged,
    possibly producing WAL archives. That is still much faster than delete
    which depends on the number of rows that need to be deleted, but not
    exactly lightweight, either. In Postgres, truncate recognizes that the
    table is a temporary table so it makes a few shortcuts, which makes the
    truncate faster.

    1) No need to flush buffers.
    2) Locking requirements are much less stringent.
    3) No WAL archives are produced.

    Temporary tables are completely different beasts in Oracle and Postgres.
    Yes, you are right, truncate of a temporary table is a big no-no in the
    Oracle world, especially in the RAC environment. However, I do find "ON
    COMMIT DELETE ROWS" trick to be more elegant than the truncate. Here is
    the classic Tom Kyte, on the topic of truncating the temporary tables:
    *http://tinyurl.com/29kph3p

    "*NO. truncate is DDL. DDL is expensive. Truncation is something that
    should be done very infrequently.
    Now, I don't mean "turn your truncates into DELETE's" -- that would
    be even worse. I mean -- avoid having
    to truncate or delete every row in the first place. Use a transaction
    based temporary table and upon commit, it'll empty itself."
    Your loop above requires a commit after every 1000 rows. What if you require that all rows are seen at once or not at all? What if you fail part way through? One big transaction is often a better idea and/or required. Especially in postgres, with no undo-log, bulk inserts in one large transaction work out very well -- usually better than multiple smaller transactions.
    I don't contest that. I also prefer to do things in one big transaction,
    if possible.

    --

    Mladen Gogala
    Sr. Oracle DBA
    1500 Broadway
    New York, NY 10036
    (212) 329-5251
    http://www.vmsinfo.com
    The Leader in Integrated Media Intelligence Solutions
  • Merlin Moncure at Oct 27, 2010 at 6:13 pm

    On Wed, Oct 27, 2010 at 2:06 PM, Mladen Gogala wrote:
    Scott, I find this very hard to believe. If you are inserting into a
    temporary table and then into the target table, you will do 2 inserts
    instead of just one. What you are telling me is that it is faster for me to
    drive from NYC to Washington DC by driving first to Miami and then from
    Miami to DC.
    The reason why in one transaction per insert environment staging to
    temp table first is very simple...non temp table inserts have to be
    wal logged and fsync'd. When you batch them into the main table, you
    get more efficient use of WAL and ONE sync operation. This is
    especially advantageous if the inserts are coming fast and furious and
    there are other things going on in the database at the time, or there
    are multiple inserters.

    If you have luxury of batching data in a transaction, you don't have
    to worry about it.

    merlin
  • Ivan Voras at Oct 27, 2010 at 10:13 am

    On 10/26/10 17:41, Merlin Moncure wrote:
    On Tue, Oct 26, 2010 at 11:08 AM, Leonardo Francalanci wrote:
    temp tables are not wal logged or
    synced. Periodically they can be flushed to a permanent table.

    What do you mean with "Periodically they can be flushed to
    a permanent table"? Just doing

    insert into tabb select * from temptable
    yup, that's exactly what I mean -- this will give you more uniform
    In effect, when so much data is in temporary storage, a better option
    would be to simply configure "synchronous_commit = off" (better in the
    sense that the application would not need to be changed). The effects
    are almost the same - in both cases transactions might be lost but the
    database will survive.
  • Merlin Moncure at Oct 27, 2010 at 11:05 am

    On Wed, Oct 27, 2010 at 6:13 AM, Ivan Voras wrote:
    On 10/26/10 17:41, Merlin Moncure wrote:
    On Tue, Oct 26, 2010 at 11:08 AM, Leonardo Francalanci wrote:
    temp  tables are not wal logged or
    synced.  Periodically they can be flushed  to a permanent table.

    What do you mean with "Periodically they can be flushed  to
    a permanent table"? Just doing

    insert into tabb select * from temptable
    yup, that's exactly what I mean -- this will give you more uniform
    In effect, when so much data is in temporary storage, a better option
    would be to simply configure "synchronous_commit = off" (better in the
    sense that the application would not need to be changed). The effects
    are almost the same - in both cases transactions might be lost but the
    database will survive.
    right -- although that's a system wide setting and perhaps other
    tables still require full synchronous fsync. Still -- fair point
    (although I bet you are still going to get better performance going by
    the temp route if only by a hair).

    merlin
  • Robert Haas at Oct 28, 2010 at 2:01 am

    On Wed, Oct 27, 2010 at 6:13 AM, Ivan Voras wrote:
    On 10/26/10 17:41, Merlin Moncure wrote:
    On Tue, Oct 26, 2010 at 11:08 AM, Leonardo Francalanci wrote:
    temp  tables are not wal logged or
    synced.  Periodically they can be flushed  to a permanent table.

    What do you mean with "Periodically they can be flushed  to
    a permanent table"? Just doing

    insert into tabb select * from temptable
    yup, that's exactly what I mean -- this will give you more uniform
    In effect, when so much data is in temporary storage, a better option
    would be to simply configure "synchronous_commit = off" (better in the
    sense that the application would not need to be changed). The effects
    are almost the same - in both cases transactions might be lost but the
    database will survive.
    Gee, I wonder if it would possible for PG to automatically do an
    asynchronous commit of any transaction which touches only temp tables.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Tom Lane at Oct 28, 2010 at 3:32 am

    Robert Haas writes:
    Gee, I wonder if it would possible for PG to automatically do an
    asynchronous commit of any transaction which touches only temp tables.
    Hmm ... do we need a commit at all in such a case? If our XID has only
    gone into temp tables, I think we need to write to clog, but we don't
    really need a WAL entry, synced or otherwise.

    regards, tom lane
  • Divakar Singh at Oct 27, 2010 at 2:00 pm
    Hi Steve and other friends,
    Some information you would be interested in:
    I did some further tests using libpq in my code.
    I used a stored proc to insert 100 thousand rows in a table, it took 25 sec
    (almost same as time taken by Oracle PL/SQL and OCI interface).
    Same inserts through libpq take 70 seconds.
    I am inserting all records in a single transaction.
    So, the problem seems to be optimization of usage of libpq in my code.
    I am attaching my code below.
    Is any optimization possible in this?
    Do prepared statements help in cutting down the insert time to half for this
    kind of inserts? One of the major problems with libpq usage is lack of good
    documentation and examples.

    I could not get any good example of prepared stmt usage anywhere.

    //----------------------------------------------------------------------------------------------------------------------------

    /*
    * testlibpq.c
    *
    * Test the C version of libpq, the PostgreSQL frontend library.
    */
    #include <stdio.h>
    #include <stdlib.h>
    #include <libpq-fe.h>
    #include "iostream.h"
    #include "stdio.h"
    #include <time.h>

    static void
    exit_nicely(PGconn *conn)
    {
    PQfinish(conn);
    exit(1);
    }

    int
    main(int argc, char **argv)
    {
    const char *conninfo;
    PGconn *conn;
    PGresult *res;
    int nFields;
    int i=0;
    int howmany=0;
    if (argc<2)
    {
    cout<<"please pass no of records as parameter"<<endl;
    return -1;
    }
    sscanf(argv[1], "%d", &howmany);
    cout<<"inserting "<<howmany<<" records"<<endl;

    time_t mytime1 = time(0);
    cout(&mytime1))<<endl;



    /*
    * If the user supplies a parameter on the command line, use it as the
    * conninfo string; otherwise default to setting dbname=postgres and using
    * environment variables or defaults for all other connection parameters.
    */
    conninfo = "host=x.y.z.a dbname=xyz port=5432 user=sd password=fg" ;


    /* Make a connection to the database */
    conn = PQconnectdb(conninfo);

    /* Check to see that the backend connection was successfully made */
    if (PQstatus(conn) != CONNECTION_OK)
    {
    fprintf(stderr, "Connection to database failed: %s",
    PQerrorMessage(conn));
    exit_nicely(conn);
    }

    /*
    * Our test case here involves using a cursor, for which we must be inside
    * a transaction block. We could do the whole thing with a single
    * PQexec() of "select * from pg_database", but that's too trivial to make
    * a good example.
    */

    /* Start a transaction block */
    res = PQexec(conn, "BEGIN");
    if (PQresultStatus(res) != PGRES_COMMAND_OK)
    {
    fprintf(stderr, "BEGIN command failed: %s", PQerrorMessage(conn));
    PQclear(res);
    exit_nicely(conn);
    }

    /*
    * Should PQclear PGresult whenever it is no longer needed to avoid memory
    * leaks
    */
    PQclear(res);

    char query[1024]={0};

    for (; i<howmany;i++ )
    {

    sprintf (query, "INSERT INTO aaaa(a, b, c, d, e, f, g, h, j, k, l, m, n, p)
    VALUES (67, 'ABRAKADABRA', 'ABRAKADABRA', 'ABRAKADABRA', '1-Dec-2010',
    'ABRAKADABRA', 'ABRAKADABRA', 'ABRAKADABRA', '1-Dec-2010', 99999, 99999, %d,
    9999,
    'ABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRA')",
    i);

    res = PQexec(conn, query);

    if (PQresultStatus(res) != PGRES_COMMAND_OK)
    {
    cout<<"error at iteration "<<i<<":"<<PQresultErrorMessage(res)<<endl;
    PQclear(res);
    break;
    }
    //PQclear(res);


    }

    /* close the portal ... we don't bother to check for errors ... */
    /*res = PQexec(conn, "CLOSE myportal");
    PQclear(res);*/

    /* end the transaction */
    res = PQexec(conn, "END");
    PQclear(res);

    cout<<i<<" records inserted!"<<endl;

    mytime1 = time(0);
    cout(&mytime1))<<endl;


    /* close the connection to the database and cleanup */
    PQfinish(conn);

    return 0;
    }

    //----------------------------------------------------------------------------------------------------------------------------



    Best Regards,
    Divakar




    ________________________________
    From: Divakar Singh <dpsmails@yahoo.com>
    To: Steve Singer <ssinger@ca.afilias.info>
    Cc: jd@commandprompt.com; pgsql-performance@postgresql.org
    Sent: Tue, October 26, 2010 12:22:31 AM
    Subject: Re: [PERFORM] Postgres insert performance and storage requirement
    compared to Oracle



    Answers:

    How are you using libpq?
    -Are you opening and closing the database connection between each insert?

    [Need to check, will come back on this]

    -Are you doing all of your inserts as one big transaction or are you doing a
    transaction per insert

    [Answer: for C++ program, one insert per transaction in PG as well as Oracle.
    But in stored proc, I think both use only 1 transaction for all inserts]

    -Are you using prepared statements for your inserts?

    [Need to check, will come back on this]

    -Are you using the COPY command to load your data or the INSERT command?

    [No]

    -Are you running your libpq program on the same server as postgresql?

    [Yes]

    -How is your libpq program connecting to postgresql, is it using ssl?

    [No]

    If your run "VACUUM VERBOSE tablename" on the table, what does it say?

    [Need to check, will come back on this]

    You also don't mention which version of postgresql your using.

    [Latest, 9.x]

    Best Regards,
    Divakar




    ________________________________
    From: Steve Singer <ssinger@ca.afilias.info>
    To: Divakar Singh <dpsmails@yahoo.com>
    Cc: jd@commandprompt.com; pgsql-performance@postgresql.org
    Sent: Tue, October 26, 2010 12:16:46 AM
    Subject: Re: [PERFORM] Postgres insert performance and storage requirement
    compared to Oracle
    On 10-10-25 02:31 PM, Divakar Singh wrote:

    My questions/scenarios are:

    1. How does PostgreSQL perform when inserting data into an indexed
    (type: btree)
    table? Is it true that as you add the indexes on a table, the
    performance
    deteriorates significantly whereas Oracle does not show that much
    performance
    decrease. I have tried almost all postgreSQL performance tips
    available. I want
    to have very good "insert" performance (with indexes), "select"
    performance is
    not that important at this point of time.
    -- Did you test?

    Yes. the performance was comparable when using SQL procedure. However,
    When I used libpq, PostgreSQL performed very bad. There was some
    difference in environment also between these 2 tests, but I am assuming
    libpq vs SQL was the real cause. Or it was something else?
    So your saying that when you load the data with psql it loads fine, but
    when you load it using libpq it takes much longer?

    How are you using libpq?
    -Are you opening and closing the database connection between each insert?
    -Are you doing all of your inserts as one big transaction or are you
    doing a transaction per insert
    -Are you using prepared statements for your inserts?
    -Are you using the COPY command to load your data or the INSERT command?
    -Are you running your libpq program on the same server as postgresql?
    -How is your libpq program connecting to postgresql, is it using ssl?
    Some 10-12 columns ( like 2 timestamp, 4 int and 4 varchar), with 5
    indexes on varchar and int fields including 1 implicit index coz of PK.
    If your run "VACUUM VERBOSE tablename" on the table, what does it say?

    You also don't mention which version of postgresql your using.

    Joshua D. Drake


    --
    PostgreSQL.org Major Contributor
    Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
    Consulting, Training, Support, Custom Development, Engineering
    http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt

    --
    Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-performance
  • Steve Singer at Oct 27, 2010 at 3:42 pm

    On 10-10-27 10:00 AM, Divakar Singh wrote:
    Hi Steve and other friends,
    Some information you would be interested in:
    I did some further tests using libpq in my code.
    I used a stored proc to insert 100 thousand rows in a table, it took 25
    sec (almost same as time taken by Oracle PL/SQL and OCI interface).
    Same inserts through libpq take 70 seconds.
    I am inserting all records in a single transaction.
    So, the problem seems to be optimization of usage of libpq in my code.
    I am attaching my code below.
    Is any optimization possible in this?
    Do prepared statements help in cutting down the insert time to half for
    this kind of inserts? One of the major problems with libpq usage is lack
    of good documentation and examples.
    I could not get any good example of prepared stmt usage anywhere.

    Yes using prepared statements should make this go faster, but your best
    bet might be to use the COPY command. I don't have a PQprepare example
    handy though we probably should add one to the docs.

    The copy command would be used similar to

    PQexec(conn,"COPY TO aaaa (a,b,c,d,e,f,g,h,j,k,l,m,n,p) FROM STDIN WITH
    (DELIMITER ',') ");
    for(; i < howmany;i++)
    {
    sprintf(query,"67,'ABRAKADABRA', 'ABRAKADABRA', 'ABRAKADABRA', '1-Dec
    2010', 'ABRAKADABRA', 'ABRAKADABRA', 'ABRAKADABRA', '1-Dec-2010',
    99999, 99999, %d, 9999,
    'ABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRA'\n",

    i);
    res = PQputCopyData(conn,query,strlen(query);

    }
    PQputCopyEnd(conn,NULL);

    I have not actually tried the above code snippet, it is just to give you
    the general idea.

    You call PQexec with the COPY command outside the loop then at each loop
    iteration you call PQputCopyData with some of the data that gets passed
    to the server.


    You can combine multiple lines on a single PQputCopyData call if you want.

    http://www.postgresql.org/docs/9.0/interactive/sql-copy.html
    http://www.postgresql.org/docs/9.0/interactive/libpq-copy.html






    char query[1024]={0};

    for (; i<howmany;i++ )
    {

    sprintf (query, "INSERT INTO aaaa(a, b, c, d, e, f, g, h, j, k, l, m, n,
    p) VALUES (67, 'ABRAKADABRA', 'ABRAKADABRA', 'ABRAKADABRA',
    '1-Dec-2010', 'ABRAKADABRA', 'ABRAKADABRA', 'ABRAKADABRA', '1-Dec-2010',
    99999, 99999, %d, 9999,
    'ABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRA')",
    i);

    res = PQexec(conn, query);

    if (PQresultStatus(res) != PGRES_COMMAND_OK)
    {
    cout<<"error at iteration "<<i<<":"<<PQresultErrorMessage(res)<<endl;
    PQclear(res);
    break;
    }
    //PQclear(res);


    }

    /* close the portal ... we don't bother to check for errors ... */
    /*res = PQexec(conn, "CLOSE myportal");
    PQclear(res);*/
    <snip>
  • Reid Thompson at Oct 27, 2010 at 3:49 pm

    On Wed, 2010-10-27 at 11:42 -0400, Steve Singer wrote:
    I don't have a PQprepare example
    handy though we probably should add one to the docs.
    I would like to see this. A several minutes web search didn't turn up
    an example for me either.

    thanks,
    reid
  • David Wilson at Oct 27, 2010 at 3:50 pm

    On Wed, Oct 27, 2010 at 10:00 AM, Divakar Singh wrote:
    for (; i<howmany;i++ )
    {

    sprintf (query, "INSERT INTO aaaa(a, b, c, d, e, f, g, h, j, k, l, m,
    n, p) VALUES (67, 'ABRAKADABRA', 'ABRAKADABRA', 'ABRAKADABRA', '1-Dec-2010',
    'ABRAKADABRA', 'ABRAKADABRA', 'ABRAKADABRA', '1-Dec-2010', 99999, 99999,
    %d, 9999,
    'ABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRA')",
    i);

    res = PQexec(conn, query);

    if (PQresultStatus(res) != PGRES_COMMAND_OK)
    {
    cout<<"error at iteration
    "<<i<<":"<<PQresultErrorMessage(res)<<endl;
    PQclear(res);
    break;
    }
    //PQclear(res);


    }
    Why is that PQclear(res) commented out? You're leaking result status for
    every insert.


    --
    - David T. Wilson
    david.t.wilson@gmail.com
  • Divakar Singh at Oct 27, 2010 at 4:12 pm

    for (; i<howmany;i++ )
    {

    sprintf (query, "INSERT INTO aaaa(a, b, c, d, e, f, g, h, j, k, l, m, n, p)
    VALUES (67, 'ABRAKADABRA', 'ABRAKADABRA', 'ABRAKADABRA', '1-Dec-2010',
    'ABRAKADABRA', 'ABRAKADABRA', 'ABRAKADABRA', '1-Dec-2010', 99999, 99999, %d,
    9999,
    'ABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRA')",
    i);

    res = PQexec(conn, query);

    if (PQresultStatus(res) != PGRES_COMMAND_OK)
    {
    cout<<"error at iteration "<<i<<":"<<PQresultErrorMessage(res)<<endl;
    PQclear(res);
    break;
    }
    //PQclear(res);


    }
    Why is that PQclear(res) commented out? You're leaking result status for every
    insert.

    I did that purposely to see if cleanup part is contributing to any performance
    loss.
    Right now in my test, memory leak is not a concern for me but performance is.
    Though I understand that memory leak can also result in performance loss if
    leak is too much.
    However, in this case, commenting or uncommenting this statement did not cause
    any change in performance.
  • Alex Hunsaker at Oct 27, 2010 at 7:45 pm

    On Wed, Oct 27, 2010 at 08:00, Divakar Singh wrote:
    I am attaching my code below.
    Is any optimization possible in this?
    Do prepared statements help in cutting down the insert time to half for this
    kind of inserts?
    In half? not for me. Optimization possible? Sure, using the code you
    pasted (time ./a.out 100000 <method>):
    PQexec: 41s
    PQexecPrepared: 36s
    1 insert statement: 7s
    COPY: 1s
    psql: 256ms

    Basically the above echoes the suggestions of others, use COPY if you can.

    Find the source for the above attached. Its just a very quick
    modified version of what you posted. [ disclaimer the additions I
    added are almost certainly missing some required error checking... ]

    [ psql is fast because the insert is really dumb: insert into aaaa (a,
    b, c, d, e, f, g, h, j, k, l, m, n, p) select 1, 'asdf', 'asdf',
    'asdf', 'asdf', 'asdf', 'asdf', 'asdf', 'asdf', 'asdf', 'asdf',
    'asdf', 'asdf', 'asdf' from generate_series(1, 100000); ]
  • Divakar Singh at Oct 28, 2010 at 3:09 am
    So another question pops up: What method in PostgreSQL does the stored proc use
    when I issue multiple insert (for loop for 100 thousand records) in the stored
    proc?
    It takes half the time compared to the consecutive "insert" using libpq.
    In the backend, does it use COPY or prepared statement? or something else?

    Best Regards,
    Divakar




    ________________________________
    From: Alex Hunsaker <badalex@gmail.com>
    To: Divakar Singh <dpsmails@yahoo.com>
    Cc: Steve Singer <ssinger@ca.afilias.info>; jd@commandprompt.com;
    pgsql-performance@postgresql.org
    Sent: Thu, October 28, 2010 1:15:06 AM
    Subject: Re: [PERFORM] Postgres insert performance and storage requirement
    compared to Oracle
    On Wed, Oct 27, 2010 at 08:00, Divakar Singh wrote:
    I am attaching my code below.
    Is any optimization possible in this?
    Do prepared statements help in cutting down the insert time to half for this
    kind of inserts?
    In half? not for me. Optimization possible? Sure, using the code you
    pasted (time ./a.out 100000 <method>):
    PQexec: 41s
    PQexecPrepared: 36s
    1 insert statement: 7s
    COPY: 1s
    psql: 256ms

    Basically the above echoes the suggestions of others, use COPY if you can.

    Find the source for the above attached. Its just a very quick
    modified version of what you posted. [ disclaimer the additions I
    added are almost certainly missing some required error checking... ]

    [ psql is fast because the insert is really dumb: insert into aaaa (a,
    b, c, d, e, f, g, h, j, k, l, m, n, p) select 1, 'asdf', 'asdf',
    'asdf', 'asdf', 'asdf', 'asdf', 'asdf', 'asdf', 'asdf', 'asdf',
    'asdf', 'asdf', 'asdf' from generate_series(1, 100000); ]
  • Alex Hunsaker at Oct 28, 2010 at 4:24 am

    On Wed, Oct 27, 2010 at 21:08, Divakar Singh wrote:
    So another question pops up: What method in PostgreSQL does the stored proc
    use when I issue multiple insert (for loop for 100 thousand records) in the
    stored proc?
    It uses prepared statements (unless you are using execute). There is
    also the benefit of not being on the network. Assuming 0.3ms avg
    latency, 1 packet per query and 100,000 queries-- thats 30s just from
    latency! Granted this is just a silly estimate that happens to (more
    or less) fit my numbers...

Related Discussions

People

Translate

site design / logo © 2022 Grokbase