FAQ
Hi,
I've got to import data into a remote database.
I get some stuff usually as excel-files that doesn't fit the
db-structure in respect of normalisation so I import it into Access,
brush up the data and push the columns from there in the right tables
via ODBC.

For bigger imports (~5000 lines in Excel that get spread over 4-6 tables
in my db) I fetch the remote DB, do the import locally and transfer a
dump back to the remote site when noone uses the server.
This is getting unwieldy as the db grows and the connection is not
really that fast.

I can access the server by ssh so it might be way faster to run a
prepared SQL file that consists just of the new data against the DB on
the console.

Could I build a SQL script that adds the new stuff and creates foreign
keys on the fly without the need to know the new IDs before?

Lets suppose I got a couple of foreign-linked tables:
company (company_id serial primary key, name, ...)
person (person_id serial primary key, company_fk, name, ...)
contact (contact_id serial primary key, person_fk, contact_date,
contact_notes, ...)

The SQL script needed to:
1) INSERT INTO company ...
2) look up the last created company_id
3) INSERT INTO person and use the company_id to set company_fk
4) look up the last created person_id
5) INSERT INTO contact and use the person_id to set person_fk
Probaply create some other persons with their contacts.
Then another company and so on.

I do this right now with vba in Access but this is painfully slow via
InterNet as it constantly has to ask the server for the last ids and the
Net's latency spoils all the fun.


Regards
Andreas

Search Discussions

  • Sean Davis at Apr 16, 2008 at 2:05 am

    On Tue, Apr 15, 2008 at 9:54 PM, Andreas wrote:
    Hi,
    I've got to import data into a remote database.
    I get some stuff usually as excel-files that doesn't fit the db-structure
    in respect of normalisation so I import it into Access, brush up the data
    and push the columns from there in the right tables via ODBC.

    For bigger imports (~5000 lines in Excel that get spread over 4-6 tables in
    my db) I fetch the remote DB, do the import locally and transfer a dump back
    to the remote site when noone uses the server.
    This is getting unwieldy as the db grows and the connection is not really
    that fast.

    I can access the server by ssh so it might be way faster to run a prepared
    SQL file that consists just of the new data against the DB on the console.

    Could I build a SQL script that adds the new stuff and creates foreign keys
    on the fly without the need to know the new IDs before?

    Lets suppose I got a couple of foreign-linked tables:
    company (company_id serial primary key, name, ...)
    person (person_id serial primary key, company_fk, name, ...)
    contact (contact_id serial primary key, person_fk, contact_date,
    contact_notes, ...)

    The SQL script needed to:
    1) INSERT INTO company ...
    2) look up the last created company_id
    3) INSERT INTO person and use the company_id to set company_fk
    4) look up the last created person_id
    5) INSERT INTO contact and use the person_id to set person_fk
    Probaply create some other persons with their contacts.
    Then another company and so on.

    I do this right now with vba in Access but this is painfully slow via
    InterNet as it constantly has to ask the server for the last ids and the
    Net's latency spoils all the fun.
    Why not avoid Access altogether. Load the data into postgres "as-is"
    into a set of loader tables. Then, you can use SQL to do the data
    munging and inserts, including the foreign key relationships. You can
    use \copy in psql to load the data, so there is no need to have access
    to the server.

    Sean
  • Andreas at Apr 16, 2008 at 3:01 am

    Sean Davis schrieb:
    On Tue, Apr 15, 2008 at 9:54 PM, Andreas wrote:

    Hi,
    I've got to import data into a remote database.
    I get some stuff usually as excel-files that doesn't fit the db-structure
    in respect of normalisation so I import it into Access, brush up the data
    and push the columns from there in the right tables via ODBC.

    For bigger imports (~5000 lines in Excel that get spread over 4-6 tables in
    my db) I fetch the remote DB, do the import locally and transfer a dump back
    to the remote site when noone uses the server.
    This is getting unwieldy as the db grows and the connection is not really
    that fast.

    I can access the server by ssh so it might be way faster to run a prepared
    SQL file that consists just of the new data against the DB on the console.

    Could I build a SQL script that adds the new stuff and creates foreign keys
    on the fly without the need to know the new IDs before?

    Lets suppose I got a couple of foreign-linked tables:
    company (company_id serial primary key, name, ...)
    person (person_id serial primary key, company_fk, name, ...)
    contact (contact_id serial primary key, person_fk, contact_date,
    contact_notes, ...)

    The SQL script needed to:
    1) INSERT INTO company ...
    2) look up the last created company_id
    3) INSERT INTO person and use the company_id to set company_fk
    4) look up the last created person_id
    5) INSERT INTO contact and use the person_id to set person_fk
    Probaply create some other persons with their contacts.
    Then another company and so on.

    I do this right now with vba in Access but this is painfully slow via
    InterNet as it constantly has to ask the server for the last ids and the
    Net's latency spoils all the fun.
    Why not avoid Access altogether. Load the data into postgres "as-is"
    into a set of loader tables. Then, you can use SQL to do the data
    munging and inserts, including the foreign key relationships. You can
    use \copy in psql to load the data, so there is no need to have access
    to the server.

    Sean
    Thanks for your suggestion.
    As said usually I get the data as a xls with wrong column names and
    often enough wrong column types, missing infos or cluttered up in
    creative ways, too.
    How could I easily import a certain tab within a xls into a temporary
    pg-table?

    Then often the data is a dump of JOINs so it is not normalised.
    I can't help it. I just get the stuff dumped over me as our customers
    are able or willing to provide.
    Most of the time it's like this:
    ...
    company_1, ..., person_1, ...
    company_1, ..., person_2, ...
    company_1, ..., person_3, ...
    company_2, ..., person_4, ...
    company_3, ..., person_5, ...
    company_3, ..., person_6, ...
    ...
    I've got to catch the repeating companies to avoid doubles.
    Until now I try to find some columns that represent a key for the
    company part of the data.
    Then sort by this key columns, walk over the table and generate just a
    company whenever a part of the key changes.

    AFAIK I need to insert the person records right after I created their
    company because if I first create all companies and then the people I
    wouldn't know the company-id that belongs to a person.

    I'd love to learn how to do this more cleverly w/o Access.


    regards
    Andreas
  • Sean Davis at Apr 16, 2008 at 11:04 am

    On Tue, Apr 15, 2008 at 11:01 PM, Andreas wrote:
    Sean Davis schrieb:


    On Tue, Apr 15, 2008 at 9:54 PM, Andreas wrote:

    Hi,
    I've got to import data into a remote database.
    I get some stuff usually as excel-files that doesn't fit the
    db-structure
    in respect of normalisation so I import it into Access, brush up the
    data
    and push the columns from there in the right tables via ODBC.

    For bigger imports (~5000 lines in Excel that get spread over 4-6
    tables in
    my db) I fetch the remote DB, do the import locally and transfer a dump
    back
    to the remote site when noone uses the server.
    This is getting unwieldy as the db grows and the connection is not
    really
    that fast.

    I can access the server by ssh so it might be way faster to run a
    prepared
    SQL file that consists just of the new data against the DB on the
    console.
    Could I build a SQL script that adds the new stuff and creates foreign
    keys
    on the fly without the need to know the new IDs before?

    Lets suppose I got a couple of foreign-linked tables:
    company (company_id serial primary key, name, ...)
    person (person_id serial primary key, company_fk, name, ...)
    contact (contact_id serial primary key, person_fk, contact_date,
    contact_notes, ...)

    The SQL script needed to:
    1) INSERT INTO company ...
    2) look up the last created company_id
    3) INSERT INTO person and use the company_id to set company_fk
    4) look up the last created person_id
    5) INSERT INTO contact and use the person_id to set person_fk
    Probaply create some other persons with their contacts.
    Then another company and so on.

    I do this right now with vba in Access but this is painfully slow via
    InterNet as it constantly has to ask the server for the last ids and the
    Net's latency spoils all the fun.
    Why not avoid Access altogether. Load the data into postgres "as-is"
    into a set of loader tables. Then, you can use SQL to do the data
    munging and inserts, including the foreign key relationships. You can
    use \copy in psql to load the data, so there is no need to have access
    to the server.

    Sean

    Thanks for your suggestion.
    As said usually I get the data as a xls with wrong column names and often
    enough wrong column types, missing infos or cluttered up in creative ways,
    too.
    How could I easily import a certain tab within a xls into a temporary
    pg-table?
    Well, you would need to convert to tab-delimited text, an easy
    operation. Then, create a table based on the structure of the xls
    file; you can use varchar columns here if you like and then cast to
    the correct datatypes later.
    Then often the data is a dump of JOINs so it is not normalised.
    I can't help it. I just get the stuff dumped over me as our customers are
    able or willing to provide.
    Most of the time it's like this:
    ...
    company_1, ..., person_1, ...
    company_1, ..., person_2, ...
    company_1, ..., person_3, ...
    company_2, ..., person_4, ...
    company_3, ..., person_5, ...
    company_3, ..., person_6, ...
    ...
    I've got to catch the repeating companies to avoid doubles.
    Until now I try to find some columns that represent a key for the company
    part of the data.
    Then sort by this key columns, walk over the table and generate just a
    company whenever a part of the key changes.
    You could do all that, but that would be pretty suboptimal. How about
    a table like this:

    create table company_table (
    id serial primary key,
    company_name varchar unique
    );

    insert into table company_table(name) select distinct(company_name)
    from tmp_table where distinct(company_name) not in (select
    company_name from company_table);

    This is UNTESTED, but you can hopefully get the idea. You can use SQL
    to take care of finding unique values (distinct), generating IDs by
    looking that the company name is not in the company table already
    (again using SQL), using casts and string functions, etc.

    Hope that is enough direction to get you started.
    Sean

    AFAIK I need to insert the person records right after I created their
    company because if I first create all companies and then the people I
    wouldn't know the company-id that belongs to a person.

    I'd love to learn how to do this more cleverly w/o Access.


    regards
    Andreas

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-novice @
categoriespostgresql
postedApr 16, '08 at 1:54a
activeApr 16, '08 at 11:04a
posts4
users2
websitepostgresql.org
irc#postgresql

2 users in discussion

Sean Davis: 2 posts Andreas: 2 posts

People

Translate

site design / logo © 2022 Grokbase