Hi, I'm fairly new to postgres and am having trouble finding what I'm looking for. Is there a feature that allows bulk inserts into tables? My setup is Win XPe 2002 SP3 and PostgreSQL 8.3. I need to add entries from a file where each file contains 250 - 500 records. The files are created by a third-party and read/written as binary. The table looks like the following:

CREATE TABLE data_log_20msec_table
(
log_id bigserial NOT NULL,
timestamp_dbl double precision,
data bytea,
CONSTRAINT data_log_20msec_table_pkey PRIMARY KEY (log_id)
)
WITH (OIDS=FALSE);
ALTER TABLE data_log_20msec_table OWNER TO postgres;
-- Index: data_log_20msec_table_timestamp_index
-- DROP INDEX data_log_20msec_table_timestamp_index;
CREATE INDEX data_log_20msec_table_timestamp_index
ON data_log_20msec_table
USING btree
(timestamp_dbl);
The current method for transferring records from the file to postgres is using a prepared statement that is called iteratively on each record read from the file:

INSERT INTO data_log_20msec_table (timestamp_dbl,data) VALUES ($1::double precision,$2::bytea)

Using COPY is out of the question as the file is not formatted for that and since other operations need to occur, the file needs to be read sequentially anyway.

Any amount of help would be gladly excepted, even if it's pointing me to another thread or somewhere in the manual. Thanks,

Dave Huber

________________________________
This electronic mail message is intended exclusively for the individual(s) or entity to which it is addressed. This message, together with any attachment, is confidential and may contain privileged information. Any unauthorized review, use, printing, retaining, copying, disclosure or distribution is strictly prohibited. If you have received this message in error, please immediately advise the sender by reply email message to the sender and delete all copies of this message.
THIS E-MAIL IS NOT AN OFFER OR ACCEPTANCE: Notwithstanding the Uniform Electronic Transactions Act or any other law of similar import, absent an express statement to the contrary contained in this e-mail, neither this e-mail nor any attachments are an offer or acceptance to enter into a contract, and are not intended to bind the sender, LeTourneau Technologies, Inc., or any of its subsidiaries, affiliates, or any other person or entity.
WARNING: Although the company has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.

Search Discussions

  • Martijn van Oosterhout at Sep 28, 2009 at 3:52 pm

    On Mon, Sep 28, 2009 at 10:38:05AM -0500, Dave Huber wrote:
    Hi, I'm fairly new to postgres and am having trouble finding what I'm
    looking for. Is there a feature that allows bulk inserts into tables?
    My setup is Win XPe 2002 SP3 and PostgreSQL 8.3. I need to add
    entries from a file where each file contains 250 - 500 records. The
    files are created by a third-party and read/written as binary. The
    table looks like the following: <snip>
    The current method for transferring records from the file to postgres is using a prepared statement that is called iteratively on each record read from the file:

    INSERT INTO data_log_20msec_table (timestamp_dbl,data) VALUES ($1::double precision,$2::bytea)

    Using COPY is out of the question as the file is not formatted for that and since other operations need to occur, the file needs to be read sequentially anyway.
    The usual approach is to use COPY FROM STDIN, then using pqputCopyData
    (or whatever it's called). That way you can perform any necessary
    munging and don't require the file to be on disk at all.

    Have a nice day,

    --
    Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
    Please line up in a tree and maintain the heap invariant while
    boarding. Thank you for flying nlogn airlines.
  • Sam Mason at Sep 28, 2009 at 8:52 pm

    On Mon, Sep 28, 2009 at 10:38:05AM -0500, Dave Huber wrote:
    Using COPY is out of the question as the file is not formatted for
    that and since other operations need to occur, the file needs to be
    read sequentially anyway.
    Just to expand on what Martin said; if you can generate a set of EXECUTE
    commands, you can certainly generate a COPY command to insert the same
    data. The advantage is a large drop in parse time for inserting larger
    numbers of rows. As you're saying you want to insert 500 rows, I'd
    suggest at least trying to get COPY working.
  • Dave Huber at Sep 28, 2009 at 9:36 pm
    Thanks, Sam and Martijn. I am attempting to use the COPY command now. I had misunderstood what was meant by STDIN and assumed I could only use a file for my application and wasn't aware of PQputCopyData(). One assumption I am operating under right now is that the format of the binary file is the same as the buffer in PQputCopyData, including the header. If I am wrong, could someone please let me know? Thanks,

    Dave

    -----Original Message-----
    From: pgsql-general-owner@postgresql.org On Behalf Of Sam Mason
    Sent: Monday, September 28, 2009 3:53 PM
    To: pgsql-general@postgresql.org
    Subject: Re: [GENERAL] bulk inserts
    On Mon, Sep 28, 2009 at 10:38:05AM -0500, Dave Huber wrote:
    Using COPY is out of the question as the file is not formatted for
    that and since other operations need to occur, the file needs to be
    read sequentially anyway.
    Just to expand on what Martin said; if you can generate a set of EXECUTE
    commands, you can certainly generate a COPY command to insert the same
    data. The advantage is a large drop in parse time for inserting larger
    numbers of rows. As you're saying you want to insert 500 rows, I'd
    suggest at least trying to get COPY working.

    --
    Sam http://samason.me.uk/

    --
    Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-general


    This electronic mail message is intended exclusively for the individual(s) or entity to which it is addressed. This message, together with any attachment, is confidential and may contain privileged information. Any unauthorized review, use, printing, retaining, copying, disclosure or distribution is strictly prohibited. If you have received this message in error, please immediately advise the sender by reply email message to the sender and delete all copies of this message.
    THIS E-MAIL IS NOT AN OFFER OR ACCEPTANCE: Notwithstanding the Uniform Electronic Transactions Act or any other law of similar import, absent an express statement to the contrary contained in this e-mail, neither this e-mail nor any attachments are an offer or acceptance to enter into a contract, and are not intended to bind the sender, LeTourneau Technologies, Inc., or any of its subsidiaries, affiliates, or any other person or entity.
    WARNING: Although the company has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.
  • Sam Mason at Sep 28, 2009 at 10:35 pm

    On Mon, Sep 28, 2009 at 04:35:53PM -0500, Dave Huber wrote:
    One assumption I am operating under right now is
    that the format of the binary file is the same as the buffer in
    PQputCopyData, including the header. If I am wrong, could someone
    please let me know? Thanks,
    I've always used ASCII representations of the data; no need to worry
    about formats then. Not sure what sort of performance considerations
    you have, but it's certainly less of a worry for me. I'm reasonably
    sure the binary format changes more often than you'd like, maybe
    something like:

    http://libpqtypes.esilo.com/

    would help. I can't see anything about COPY support, but it should help
    with other things.
  • Martin Gainty at Sep 29, 2009 at 12:33 am
    INSERTS/UPDATES are historically slow especially with autocommit is on (implied autocommit on)
    the Database writer actually stops any processing and applies that one record to the database
    Most bulk operations such as import/export and copy are well worth their weight as they apply en-masse
    before any commit ..remember the DB actually stops flushes its buffers to Disk
    and then resumes..the only solution here is to disable autocommit but be wary you may have 100's of statements waiting to be commited and then someone does a quit on your session..all your work is lost

    good call on copy
    http://www.postgresql.org/docs/8.1/interactive/populate.html

    cheers,
    Martin Gainty
    ______________________________________________
    Verzicht und Vertraulichkeitanmerkung/Note de déni et de confidentialité

    Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den Inhalt uebernehmen.
    Ce message est confidentiel et peut être privilégié. Si vous n'êtes pas le destinataire prévu, nous te demandons avec bonté que pour satisfaire informez l'expéditeur. N'importe quelle diffusion non autorisée ou la copie de ceci est interdite. Ce message sert à l'information seulement et n'aura pas n'importe quel effet légalement obligatoire. Étant donné que les email peuvent facilement être sujets à la manipulation, nous ne pouvons accepter aucune responsabilité pour le contenu fourni.



    Date: Mon, 28 Sep 2009 21:52:36 +0100
    From: sam@samason.me.uk
    To: pgsql-general@postgresql.org
    Subject: Re: [GENERAL] bulk inserts
    On Mon, Sep 28, 2009 at 10:38:05AM -0500, Dave Huber wrote:
    Using COPY is out of the question as the file is not formatted for
    that and since other operations need to occur, the file needs to be
    read sequentially anyway.
    Just to expand on what Martin said; if you can generate a set of EXECUTE
    commands, you can certainly generate a COPY command to insert the same
    data. The advantage is a large drop in parse time for inserting larger
    numbers of rows. As you're saying you want to insert 500 rows, I'd
    suggest at least trying to get COPY working.

    --
    Sam http://samason.me.uk/

    --
    Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-general
    _________________________________________________________________
    Bing™ brings you maps, menus, and reviews organized in one place. Try it now.
    http://www.bing.com/search?q=restaurants&form=MLOGEN&publ=WLHMTAG&crea=TEXT_MLOGEN_Core_tagline_local_1x1
  • Sam Mason at Sep 29, 2009 at 10:53 am

    On Mon, Sep 28, 2009 at 08:33:45PM -0400, Martin Gainty wrote:
    INSERTS/UPDATES are historically slow especially with autocommit is
    on (implied autocommit on) the Database writer actually stops any
    processing and applies that one record to the database
    That seems to be overstating the issue somewhat. Each connection is
    still independent and (assuming no locks are taken) will carry on as
    such.
    Most bulk operations such as import/export and copy are well worth
    their weight as they apply en-masse before any commit ..
    ?? I'm not sure what you're implying about the semantics here, but it
    doesn't seem right. COPY doesn't somehow break out of ACID semantics,
    it's only an *optimization* that allows you to get large quantities of
    data into the database faster. The main reason it's faster is because
    parsing CSV data is easier than parsing SQL.

    At least I think that's the only difference; anybody know better?
    remember the DB actually stops flushes its buffers to Disk and then
    resumes..
    The DB as a whole does not stop if you issue a commit; just your
    session/connection.
    the only solution here is to disable autocommit but be wary
    you may have 100's of statements waiting to be commited and then
    someone does a quit on your session..all your work is lost
    I'm not sure what you're saying here. These are normal transactional
    semantics and are what all ACID databases are specified to do. You need
    to issue a "COMMIT" for data to be committed.
  • Alan Hodgson at Sep 29, 2009 at 3:46 pm

    On Tuesday 29 September 2009, Sam Mason wrote:
    ?? I'm not sure what you're implying about the semantics here, but it
    doesn't seem right. COPY doesn't somehow break out of ACID semantics,
    it's only an *optimization* that allows you to get large quantities of
    data into the database faster. The main reason it's faster is because
    parsing CSV data is easier than parsing SQL.

    At least I think that's the only difference; anybody know better?
    I think a big reason is also that the client can stream the data without
    waiting for a network round trip ack on every statement.


    --
    "No animals were harmed in the recording of this episode. We tried but that
    damn monkey was just too fast."
  • Sam Mason at Sep 29, 2009 at 4:04 pm

    On Tue, Sep 29, 2009 at 08:45:55AM -0700, Alan Hodgson wrote:
    On Tuesday 29 September 2009, Sam Mason wrote:
    it's faster is because
    parsing CSV data is easier than parsing SQL.

    At least I think that's the only difference; anybody know better?
    I think a big reason is also that the client can stream the data without
    waiting for a network round trip ack on every statement.
    I don't think so. I'm pretty sure you can send multiple statements in a
    single round trip. libpq is defined to work in such cases anyway:

    http://www.postgresql.org/docs/current/static/libpq-exec.html
  • Alan Hodgson at Sep 29, 2009 at 4:11 pm

    On Tuesday 29 September 2009, Sam Mason wrote:
    I think a big reason is also that the client can stream the data
    without waiting for a network round trip ack on every statement.
    I don't think so. I'm pretty sure you can send multiple statements in a
    single round trip. libpq is defined to work in such cases anyway:

    http://www.postgresql.org/docs/current/static/libpq-exec.html
    I'm sure you probably _can_, but how many programming loops do so?


    --
    "No animals were harmed in the recording of this episode. We tried but that
    damn monkey was just too fast."
  • Sam Mason at Sep 29, 2009 at 10:57 pm

    On Tue, Sep 29, 2009 at 09:11:19AM -0700, Alan Hodgson wrote:
    On Tuesday 29 September 2009, Sam Mason wrote:
    I'm pretty sure you can send multiple statements in a
    single round trip. libpq is defined to work in such cases anyway:

    http://www.postgresql.org/docs/current/static/libpq-exec.html
    I'm sure you probably _can_, but how many programming loops do so?
    It's not a very big sample, but I'm pretty sure I do! :)
  • Tom Lane at Sep 29, 2009 at 4:18 pm

    Sam Mason writes:
    On Tue, Sep 29, 2009 at 08:45:55AM -0700, Alan Hodgson wrote:
    I think a big reason is also that the client can stream the data without
    waiting for a network round trip ack on every statement.
    I don't think so. I'm pretty sure you can send multiple statements in a
    single round trip.
    You can, but that doesn't scale to megabytes of data (at least not
    well).

    I think the big points are elimination of per-row network and
    transaction commit overhead ... but there are some other optimizations
    in the COPY path too.

    regards, tom lane
  • Sam Mason at Sep 29, 2009 at 11:16 pm

    On Tue, Sep 29, 2009 at 12:17:51PM -0400, Tom Lane wrote:
    Sam Mason <sam@samason.me.uk> writes:
    On Tue, Sep 29, 2009 at 08:45:55AM -0700, Alan Hodgson wrote:
    I think a big reason is also that the client can stream the data without
    waiting for a network round trip ack on every statement.
    I don't think so. I'm pretty sure you can send multiple statements in a
    single round trip.
    You can, but that doesn't scale to megabytes of data (at least not
    well).
    No, but I didn't think that was being talked about. I was thinking
    network round trip time does seem to become a thousand times less
    important when you're putting a thousand statements together. This
    would seem to imply that network latency can be almost arbitrarily
    reduced.
    I think the big points are elimination of per-row network and
    transaction commit overhead ...
    Well, if you start including transaction commit then you've just changed
    semantics away from COPY. I was implicitly thinking of what changes
    when you keep the same semantics as COPY.
    but there are some other optimizations
    in the COPY path too.
    Cool, I'll continue to prefer COPY then!
  • Jasen Betts at Oct 10, 2009 at 10:42 am

    On 2009-09-29, Alan Hodgson wrote:
    On Tuesday 29 September 2009, Sam Mason wrote:
    ?? I'm not sure what you're implying about the semantics here, but it
    doesn't seem right. COPY doesn't somehow break out of ACID semantics,
    it's only an *optimization* that allows you to get large quantities of
    data into the database faster. The main reason it's faster is because
    parsing CSV data is easier than parsing SQL.

    At least I think that's the only difference; anybody know better?
    I think a big reason is also that the client can stream the data without
    waiting for a network round trip ack on every statement.
    a single insert statement can insert many rows.

    the win with copy is more that the data part can be parsed one record
    at a time. whereas for the insert the whole must be parsed, also the
    quoting rules are simpler for CSV or pg's tab-delimted format.
  • Dave Huber at Sep 29, 2009 at 2:32 pm
    All I have to say is wow! COPY works sooo much faster than the iterative method I was using. Even after having to read the entire binary file and reformat the data into the binary format that postgres needs it is an order of magnitude faster than using a prepared INSERT. At least that's what my prelim testing is showing me. I will have to let it run for a while with a larger table to see how it performs in the longrun, but initially this is very positive.

    Thanks, guys!

    Dave

    ________________________________
    This electronic mail message is intended exclusively for the individual(s) or entity to which it is addressed. This message, together with any attachment, is confidential and may contain privileged information. Any unauthorized review, use, printing, retaining, copying, disclosure or distribution is strictly prohibited. If you have received this message in error, please immediately advise the sender by reply email message to the sender and delete all copies of this message.
    THIS E-MAIL IS NOT AN OFFER OR ACCEPTANCE: Notwithstanding the Uniform Electronic Transactions Act or any other law of similar import, absent an express statement to the contrary contained in this e-mail, neither this e-mail nor any attachments are an offer or acceptance to enter into a contract, and are not intended to bind the sender, LeTourneau Technologies, Inc., or any of its subsidiaries, affiliates, or any other person or entity.
    WARNING: Although the company has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.
  • Grzegorz Jaśkiewicz at Sep 29, 2009 at 2:36 pm

    On Tue, Sep 29, 2009 at 3:31 PM, Dave Huber wrote:

    All I have to say is wow! COPY works sooo much faster than the iterative
    method I was using. Even after having to read the entire binary file and
    reformat the data into the binary format that postgres needs it is an order
    of magnitude faster than using a prepared INSERT. At least that’s what my
    prelim testing is showing me. I will have to let it run for a while with a
    larger table to see how it performs in the longrun, but initially this is
    very positive.
    you can also try wrapping whole bunch of inserts in a transaction block.


    --
    GJ

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-general @
categoriespostgresql
postedSep 28, '09 at 3:38p
activeOct 10, '09 at 10:42a
posts16
users8
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase