FAQ
Hello,

I would like to discuss one design issue. I am inserting a large number of
objects (500.000) and for each of them I must set 3 relationships. These
target relationship tables consists of 6000, 15000 and 450 records.

Approaching this solution naively like this:

loop of 500.000 iterations {
Town town = persistenceService.findTownByMid(townMid); //
table town has 6000 records
Street street =
persistenceService.findStreetByMid(streetMid); // has 15000 records
PostOffice postOffice =
persistenceService.findPostOfficeByMid(postOfficeMid); // has 450 records

HouseNumber houseNumber =
dataContext.newObject(HouseNumber.class); // 500.000 to be inserted

houseNumber.setMid(mid);
houseNumber.setNumber(number);
houseNumber.setLetter(letter);

houseNumber.setTown(town);
houseNumber.setStreet(street);
houseNumber.setPostOffice(postOffice);

dataContext.commitChanges();
} // end loop

is not only very slow (1 hour for inserting 150.000 records), but memory
consuming (250 MBytes for 150.000 records).

For each of 500.000 house numbers I must do 3 database selects to fetch the
right object for the relationship. Obviously this is a flawed approach.

In the past I used iteration through data rows to read data fast in a memory
efficient way, but there were no relationships involved. So I was thinking
about setting the 3 foreign keys for the house number by an added FK setter
method in my HouseNumber class and having three custom prepopulated maps for
each relationship containing [mid,id] pairs as a "lookup table".

Any other idea?

Regards,
bob

Search Discussions

  • Andrus Adamchik at Oct 18, 2010 at 3:10 pm
    You can use meaningful FK.

    Alternatively you can pre-cache (either manually in a hash map or via cache groups) Towns, Streets and PostOffices. 22K of records in the cache shouldn't be a big deal. One possible caveat is a relationship back to HouseNumbers that may result in preventing garbage collection of HouseNumbers.

    Andrus
    On Oct 18, 2010, at 3:47 PM, Borut Bolčina wrote:

    Hello,

    I would like to discuss one design issue. I am inserting a large number of
    objects (500.000) and for each of them I must set 3 relationships. These
    target relationship tables consists of 6000, 15000 and 450 records.

    Approaching this solution naively like this:

    loop of 500.000 iterations {
    Town town = persistenceService.findTownByMid(townMid); //
    table town has 6000 records
    Street street =
    persistenceService.findStreetByMid(streetMid); // has 15000 records
    PostOffice postOffice =
    persistenceService.findPostOfficeByMid(postOfficeMid); // has 450 records

    HouseNumber houseNumber =
    dataContext.newObject(HouseNumber.class); // 500.000 to be inserted

    houseNumber.setMid(mid);
    houseNumber.setNumber(number);
    houseNumber.setLetter(letter);

    houseNumber.setTown(town);
    houseNumber.setStreet(street);
    houseNumber.setPostOffice(postOffice);

    dataContext.commitChanges();
    } // end loop

    is not only very slow (1 hour for inserting 150.000 records), but memory
    consuming (250 MBytes for 150.000 records).

    For each of 500.000 house numbers I must do 3 database selects to fetch the
    right object for the relationship. Obviously this is a flawed approach.

    In the past I used iteration through data rows to read data fast in a memory
    efficient way, but there were no relationships involved. So I was thinking
    about setting the 3 foreign keys for the house number by an added FK setter
    method in my HouseNumber class and having three custom prepopulated maps for
    each relationship containing [mid,id] pairs as a "lookup table".

    Any other idea?

    Regards,
    bob
  • Borut Bolčina at Oct 19, 2010 at 7:36 am
    Considering meaningful FK,

    My current working setup of the towns and street table looks like (mid is
    unique for both tables and candidate for meaningful FK):

    CREATE TABLE IF NOT EXISTS `maps`.`town` (
    `id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
    `mid` INT UNSIGNED NOT NULL ,
    `name` VARCHAR(60) NOT NULL ,
    PRIMARY KEY (`id`) ,
    UNIQUE INDEX `mid_UNIQUE` (`mid` ASC) )
    ENGINE = InnoDB

    CREATE TABLE IF NOT EXISTS `maps`.`street` (
    `id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
    `mid` INT UNSIGNED NOT NULL ,
    `name` VARCHAR(50) NULL ,
    `town_id` INT UNSIGNED NOT NULL ,
    INDEX `town_id` (`town_id` ASC) ,
    PRIMARY KEY (`id`) ,
    UNIQUE INDEX `mid_UNIQUE` (`mid` ASC) ,
    CONSTRAINT `town_id`
    FOREIGN KEY (`town_id` )
    REFERENCES `maps`.`town` (`id` )
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
    ENGINE = InnoDB

    And current relationship from street to town

    <db-relationship name="town" source="street" target="town"
    toMany="false">
    <db-attribute-pair source="town_id" target="id"/>
    </db-relationship>
    <obj-relationship name="town" source="Street" target="Town"
    deleteRule="Nullify" db-relationship-path="town"/>

    and the reverse

    <db-relationship name="streets" source="town" target="street"
    toMany="true">
    <db-attribute-pair source="id" target="town_id"/>
    </db-relationship>
    <obj-relationship name="streets" source="Town" target="Street"
    deleteRule="Deny" db-relationship-path="streets"/>

    Are you proposing to use target="mid" (instead of id) in the

    <db-relationship name="town" source="street" target="town"
    toMany="false">
    <db-attribute-pair source="town_id" target="mid"/>
    </db-relationship>

    This way using

    street.setTown(localTown);

    does not set the relationship, it inserts NULL for town_id FK in the streets
    table. Or did you mean something else?

    -Borut




    2010/10/18 Andrus Adamchik <andrus@objectstyle.org>
    You can use meaningful FK.

    Alternatively you can pre-cache (either manually in a hash map or via cache
    groups) Towns, Streets and PostOffices. 22K of records in the cache
    shouldn't be a big deal. One possible caveat is a relationship back to
    HouseNumbers that may result in preventing garbage collection of
    HouseNumbers.

    Andrus
    On Oct 18, 2010, at 3:47 PM, Borut Bolčina wrote:

    Hello,

    I would like to discuss one design issue. I am inserting a large number of
    objects (500.000) and for each of them I must set 3 relationships. These
    target relationship tables consists of 6000, 15000 and 450 records.

    Approaching this solution naively like this:

    loop of 500.000 iterations {
    Town town = persistenceService.findTownByMid(townMid); //
    table town has 6000 records
    Street street =
    persistenceService.findStreetByMid(streetMid); // has 15000 records
    PostOffice postOffice =
    persistenceService.findPostOfficeByMid(postOfficeMid); // has 450 records

    HouseNumber houseNumber =
    dataContext.newObject(HouseNumber.class); // 500.000 to be inserted

    houseNumber.setMid(mid);
    houseNumber.setNumber(number);
    houseNumber.setLetter(letter);

    houseNumber.setTown(town);
    houseNumber.setStreet(street);
    houseNumber.setPostOffice(postOffice);

    dataContext.commitChanges();
    } // end loop

    is not only very slow (1 hour for inserting 150.000 records), but memory
    consuming (250 MBytes for 150.000 records).

    For each of 500.000 house numbers I must do 3 database selects to fetch the
    right object for the relationship. Obviously this is a flawed approach.

    In the past I used iteration through data rows to read data fast in a memory
    efficient way, but there were no relationships involved. So I was thinking
    about setting the 3 foreign keys for the house number by an added FK setter
    method in my HouseNumber class and having three custom prepopulated maps for
    each relationship containing [mid,id] pairs as a "lookup table".

    Any other idea?

    Regards,
    bob
  • Bruno René Santos at Oct 20, 2010 at 1:05 am
    Hello All,

    I have created a pre-persist callback to set a property from my pojo. But when I commit changes the callback is not executed... Any ideas? For instants, the post-add callback works great.

    Thank you
  • Andrus Adamchik at Oct 20, 2010 at 7:32 am
    Odd. Should be working. Do you actually see INSERT statement in the logs, but no callback before that?

    Andrus
    On Oct 20, 2010, at 4:05 AM, Bruno René Santos wrote:

    Hello All,

    I have created a pre-persist callback to set a property from my pojo. But when I commit changes the callback is not executed... Any ideas? For instants, the post-add callback works great.

    Thank you
  • Bruno René Santos at Oct 20, 2010 at 10:14 am
    Hi Andrus,

    I have just found out the problem. I wa trying to affect a cayenne-mandatory
    table field with a calculated value during pre-persist. I think the problem is
    that Cayenne mandatory validation is executed before the prepersist, not letting
    me affect the field, which starts as null. Do you think validating after
    pre-persist could lead to problems? I think it would help pre-persist to be more
    useful, allowing us to calculate mandatory fields.

    I made it work by removing the Mandatory check on Cayenne Modeler.

    Regards
    Bruno Santos

    -----Mensagem original-----
    De: Andrus Adamchik
    Enviada: quarta-feira, 20 de Outubro de 2010 08:32
    Para: user@cayenne.apache.org
    Assunto: Re: OnPrePersist not working

    Odd. Should be working. Do you actually see INSERT statement in the logs, but no
    callback before that?

    Andrus
    On Oct 20, 2010, at 4:05 AM, Bruno René Santos wrote:

    Hello All,

    I have created a pre-persist callback to set a property from my pojo. But when
    I commit changes the callback is not executed... Any ideas? For instants, the
    post-add callback works great.
    Thank you
  • Andrus Adamchik at Oct 20, 2010 at 10:27 am

    On Oct 20, 2010, at 1:14 PM, Bruno René Santos wrote:

    Do you think validating after
    pre-persist could lead to problems? I think it would help pre-persist to be more
    useful, allowing us to calculate mandatory fields.
    Maybe a good idea actually. Care to open a Jira? We'll think about it.

    Andrus
  • Bruno René Santos at Oct 20, 2010 at 10:46 am
    Just opened an improvement issue on JIRA.

    Regards
    Bruno Santos

    -----Mensagem original-----
    De: Andrus Adamchik
    Enviada: quarta-feira, 20 de Outubro de 2010 11:27
    Para: user@cayenne.apache.org
    Assunto: Re: OnPrePersist not working

    On Oct 20, 2010, at 1:14 PM, Bruno René Santos wrote:

    Do you think validating after
    pre-persist could lead to problems? I think it would help pre-persist to be more
    useful, allowing us to calculate mandatory fields.
    Maybe a good idea actually. Care to open a Jira? We'll think about it.

    Andrus
  • Brd at Oct 20, 2010 at 5:58 pm
    Hello all,

    Whats the better way to do a regular delete from in raw sql? I tried a raw
    sql named query on the modeler but i get a Null PK error, maybe because no
    data rows are returned?

    Thank you
  • Michael Gentry at Oct 20, 2010 at 6:03 pm
    I've done this in the past:

    QueryChain chain = new QueryChain();

    // Add raw SQL templates to the chain.
    chain.addQuery(new SQLTemplate(Customer.class, "DELETE FROM Customers"));
    chain.addQuery(new SQLTemplate(Item.class, "DELETE FROM Items"));
    chain.addQuery(new SQLTemplate(LineItem.class, "DELETE FROM LineItems"));
    chain.addQuery(new SQLTemplate(Order.class, "DELETE FROM Orders"));

    // Execute all of the SQL statements.
    dataContext.performNonSelectingQuery(chain);

    Keep in mind that isn't a good thing to do if you are concerned about
    your in-memory object graphs (DataContexts/etc) suddenly becoming
    out-of-sync. Also, if you don't need to do multiple deletes, you
    don't need the query chain part.

    mrg


    On Wed, Oct 20, 2010 at 1:56 PM, wrote:
    Hello all,

    Whats the better way to do a regular delete from in raw sql? I tried a raw
    sql named query on the modeler but i get a Null PK error, maybe because no
    data rows are returned?

    Thank you

  • Andrus Adamchik at Oct 20, 2010 at 7:28 am

    On Oct 19, 2010, at 10:36 AM, Borut Bolčina wrote:

    This way using

    street.setTown(localTown);

    does not set the relationship, it inserts NULL for town_id FK in the streets
    table. Or did you mean something else?

    Probably (?) didn't quite get the explanation of your mapping. Anyways, have you tried setting meaningful FK, mapped as an ObjAttribute?

    Andrus
  • Borut Bolčina at Nov 4, 2010 at 1:48 pm
    I'm back.

    I created object attributes for 3 foreign keys. Example for one of them:

    public void setStreetId(Integer streetId) {
    writeProperty("streetId", streetId);
    }
    public Integer getStreetId() {
    return (Integer)readProperty("streetId");
    }

    and now instead of setting the relationship:

    houseNumber.setStreet(localStreet);

    I do:

    houseNumber.setStreetId(streetMid);

    But now I am getting

    Validation failure for com.acme.maps.model.HouseNumber.street: "street" is
    required.

    Nowhere in the mapping file or in the modeler there is an option to set the
    optionality of the relationship, only for attributes. Why am I getting this
    validation message?

    At the database level, the FK attribute is set to "NOT NULL", but this is
    not related with the Cayenne.

    Hints?

    -Borut


    2010/10/20 Andrus Adamchik <andrus@objectstyle.org>
    On Oct 19, 2010, at 10:36 AM, Borut Bolčina wrote:

    This way using

    street.setTown(localTown);

    does not set the relationship, it inserts NULL for town_id FK in the streets
    table. Or did you mean something else?

    Probably (?) didn't quite get the explanation of your mapping. Anyways,
    have you tried setting meaningful FK, mapped as an ObjAttribute?

    Andrus
  • Andrus Adamchik at Nov 4, 2010 at 2:38 pm

    On Nov 4, 2010, at 9:48 AM, Borut Bolčina wrote:

    At the database level, the FK attribute is set to "NOT NULL", but this is
    not related with the Cayenne.
    Yeah, this is what triggers the validation message. I'd say it is a bug - Cayenne should be smarter about meaningful FK + relationship validation. For now you can probably uncheck NOT NULL and file a Jira.

    Andrus
  • Michael Gentry at Oct 18, 2010 at 3:38 pm
    Hi Borut,

    In addition to what Andrus said about caching Towns, Streets, and
    PostOffices, you should also try to do your commits in larger
    groupings. One commit at a time will be slower than you want for
    something like this. Try grouping 100 or so at a time before you
    commit.

    mrg
  • Borut Bolčina at Oct 18, 2010 at 7:35 pm
    Hi,

    what makes insert faster when doing it every 100 objects? In each case the
    same number of inserts will be generated.

    Can, beside using templates, Cayenne be configured to do something like:

    INSERT INTO *tbl_name* (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);


    That is one insert for 3 rows, which is more efficient (mysql).

    -Borut

    2010/10/18 Michael Gentry <mgentry@masslight.net>
    Hi Borut,

    In addition to what Andrus said about caching Towns, Streets, and
    PostOffices, you should also try to do your commits in larger
    groupings. One commit at a time will be slower than you want for
    something like this. Try grouping 100 or so at a time before you
    commit.

    mrg
  • Michael Gentry at Oct 18, 2010 at 7:50 pm
    Hi Borut,

    If you did 100 at a time, Cayenne would create 1 insert statement with
    100 binds instead of your current situation of 100 inserts with 100
    binds (if you were only doing 100). Admittedly, the insert could
    potentially be compiled/cached somewhere (as a PreparedStatement), but
    you still have to deal with the database setting up 100 transactions
    and your network I/O is substantially higher one-at-a-time, too.
    Plus, there is a cost involved with Cayenne examining the objects in
    the DataContext for changes (and the objects are growing in your
    DataContext) each commit. So you are scanning for changes much more
    often with one-at-a-time. I'd even suggest throwing away your
    DataContext after you commit the 100 objects and create a new
    DataContext.

    The template thing you mentioned is more-or-less what Cayenne is doing
    with 1 insert statement and 100 binds.

    mrg

    PS. The number 100 is arbitrary. Your sweet spot may be 10, 50, 200,
    500, 1000, etc. Play around with a few values to find what works well
    for you time/memory-wise given your scenario. Values of 1 and 500k
    are going to be less optimal, though. :-)

    On Mon, Oct 18, 2010 at 3:34 PM, Borut Bolčina wrote:
    Hi,

    what makes insert faster when doing it every 100 objects? In each case the
    same number of inserts will be generated.

    Can, beside using templates, Cayenne be configured to do something like:

    INSERT INTO *tbl_name* (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);


    That is one insert for 3 rows, which is more efficient (mysql).

    -Borut

    2010/10/18 Michael Gentry <mgentry@masslight.net>
    Hi Borut,

    In addition to what Andrus said about caching Towns, Streets, and
    PostOffices, you should also try to do your commits in larger
    groupings.  One commit at a time will be slower than you want for
    something like this.  Try grouping 100 or so at a time before you
    commit.

    mrg
  • Andrus Adamchik at Oct 18, 2010 at 7:58 pm

    On Oct 18, 2010, at 10:34 PM, Borut Bolčina wrote:
    Hi,

    what makes insert faster when doing it every 100 objects? In each case the
    same number of inserts will be generated.
    batching stuff together is usually faster (this relates to JDBC batching described below, or more generally to batching of multiple Cayenne operations in a single transaction). Can't put an exact number on it, but there's definitely some Cayenne, JDBC and DB overhead processing each commit which is not related to the number of objects committed.
    Can, beside using templates, Cayenne be configured to do something like:

    INSERT INTO *tbl_name* (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);


    That is one insert for 3 rows, which is more efficient (mysql).
    Cayenne supports standard JDBC batching (i.e. generating queries with multiple sets of parameters like the above, using JDBC batching API). This is controlled by DbAdapter.supportsBatchUpdates(). Oracle, PostgreSQL, SQLServer adapters all return true, MySQL adapter returns false. Back in the day batch updates weren't an option on MySQL. Not sure what's the status of that now. Even if MySQL supports that, the driver may not. We may need to research it.

    In the meantime you can experiment forcing this flag on MySQLAdapter (or use a custom adapter subclass to do that), and give us feedback. I'll be glad to change the default on MySQL if it actually works with modern drivers.

    Andrus

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriescayenne
postedOct 18, '10 at 12:47p
activeNov 4, '10 at 2:38p
posts17
users5
websitecayenne.apache.org

People

Translate

site design / logo © 2021 Grokbase