FAQ
Dear list,

I have another question. My application converts a huge MS Access
database to a set of XML files by traversing some really big tables with
Select queries and also querying related tables to collect the
information needed for the output. Performance is horribly bad, so I did
some profiling and found that it spends 75% of its CPU time in
org.apache.cayenne.access.ExternalTransaction.close() and another 21% in
org.apache.cayenne.conn.DriverDataSource.getConnection(). Only 2% are
spent in SelectAction.performAction().

So it appears that it opens and closes a new connection for every single
select query. I must be doing something wrong here! Where can I tell it
to re-use connections? Or make one big transaction as it is read-only
anyway? Any other hints? I hope I can reduce the processing time to 4%
of what it is now...

Thanks,
Wernke





This email (including any attachments) may contain confidential and/or privileged
information or information otherwise protected from disclosure. If you are not the
intended recipient, please notify the sender immediately, do not copy this message
or any attachments and do not use it or any purpose or disclose its content to any
person, but delete this message and any attachments from your system.
RHEA System S.A. (RHEA) disclaims any and all liability if this email transmission
was virus corrupted, altered or falsified.

If a Non Disclosure Agreement (NDA) exists between RHEA and the receiving
organization, then all contents shall be considered proprietary and covered under
the NDA.

Search Discussions

  • Durchholz, Joachim at Jan 24, 2012 at 4:18 pm

    I have another question. My application converts a huge
    MS Access database to a set of XML files by traversing
    some really big tables with Select queries and also
    querying related tables to collect the information
    needed for the output. Performance is horribly bad,
    so I did some profiling and found that it spends 75%
    of its CPU time in
    org.apache.cayenne.access.ExternalTransaction.close()
    and another 21% in
    org.apache.cayenne.conn.DriverDataSource.getConnection().
    Phew, that's bad.
    Is the connection pool active?
  • Robert Zeigler at Jan 24, 2012 at 8:40 pm
    Are you using query prefetching?

    Robert
    On Jan 24, 2012, at 1/2410:11 AM , Wernke zur Borg wrote:


    Dear list,

    I have another question. My application converts a huge MS Access
    database to a set of XML files by traversing some really big tables with
    Select queries and also querying related tables to collect the
    information needed for the output. Performance is horribly bad, so I did
    some profiling and found that it spends 75% of its CPU time in
    org.apache.cayenne.access.ExternalTransaction.close() and another 21% in
    org.apache.cayenne.conn.DriverDataSource.getConnection(). Only 2% are
    spent in SelectAction.performAction().

    So it appears that it opens and closes a new connection for every single
    select query. I must be doing something wrong here! Where can I tell it
    to re-use connections? Or make one big transaction as it is read-only
    anyway? Any other hints? I hope I can reduce the processing time to 4%
    of what it is now...

    Thanks,
    Wernke





    This email (including any attachments) may contain confidential and/or privileged
    information or information otherwise protected from disclosure. If you are not the
    intended recipient, please notify the sender immediately, do not copy this message
    or any attachments and do not use it or any purpose or disclose its content to any
    person, but delete this message and any attachments from your system.
    RHEA System S.A. (RHEA) disclaims any and all liability if this email transmission
    was virus corrupted, altered or falsified.

    If a Non Disclosure Agreement (NDA) exists between RHEA and the receiving
    organization, then all contents shall be considered proprietary and covered under
    the NDA.
  • Wernke zur Borg at Jan 25, 2012 at 7:57 am

    On 2012-01-24 17:17, Durchholz, Joachim wrote:
    Is the connection pool active?
    Connection pool settings do not change the behaviour, I tried various
    values for min/max.

    On 2012-01-24 21:39, Robert Zeigler wrote:
    Are you using query prefetching?
    Adding prefetching does not change the behaviour.

    Any further hints will be appreciated, thanks!
    On Jan 24, 2012, at 1/2410:11 AM , Wernke zur Borg wrote:

    Dear list,

    I have another question. My application converts a huge MS Access
    database to a set of XML files by traversing some really big tables with
    Select queries and also querying related tables to collect the
    information needed for the output. Performance is horribly bad, so I did
    some profiling and found that it spends 75% of its CPU time in
    org.apache.cayenne.access.ExternalTransaction.close() and another 21% in
    org.apache.cayenne.conn.DriverDataSource.getConnection(). Only 2% are
    spent in SelectAction.performAction().

    So it appears that it opens and closes a new connection for every single
    select query. I must be doing something wrong here! Where can I tell it
    to re-use connections? Or make one big transaction as it is read-only
    anyway? Any other hints? I hope I can reduce the processing time to 4%
    of what it is now...

    Thanks,
    Wernke


    This email (including any attachments) may contain confidential and/or privileged
    information or information otherwise protected from disclosure. If you are not the
    intended recipient, please notify the sender immediately, do not copy this message
    or any attachments and do not use it or any purpose or disclose its content to any
    person, but delete this message and any attachments from your system.
    RHEA System S.A. (RHEA) disclaims any and all liability if this email transmission
    was virus corrupted, altered or falsified.

    If a Non Disclosure Agreement (NDA) exists between RHEA and the receiving
    organization, then all contents shall be considered proprietary and covered under
    the NDA.
  • Wernke zur Borg at Jan 25, 2012 at 9:02 am
    To give you a picture here is a screenshot of the profiler showing that
    one connection is opened and closed for every query.



    The question I do not understand is why does DataNode.performQueries()
    unconditionally close the current connection?

    Otherwise I could probably provide a custom DataSource class overriding
    getConnection() to re-use connections.

    Wernke

    On 2012-01-24 17:11, Wernke zur Borg wrote:
    Dear list,

    I have another question. My application converts a huge MS Access
    database to a set of XML files by traversing some really big tables with
    Select queries and also querying related tables to collect the
    information needed for the output. Performance is horribly bad, so I did
    some profiling and found that it spends 75% of its CPU time in
    org.apache.cayenne.access.ExternalTransaction.close() and another 21% in
    org.apache.cayenne.conn.DriverDataSource.getConnection(). Only 2% are
    spent in SelectAction.performAction().

    So it appears that it opens and closes a new connection for every single
    select query. I must be doing something wrong here! Where can I tell it
    to re-use connections? Or make one big transaction as it is read-only
    anyway? Any other hints? I hope I can reduce the processing time to 4%
    of what it is now...

    Thanks,
    Wernke


    This email (including any attachments) may contain confidential and/or privileged
    information or information otherwise protected from disclosure. If you are not the
    intended recipient, please notify the sender immediately, do not copy this message
    or any attachments and do not use it or any purpose or disclose its content to any
    person, but delete this message and any attachments from your system.
    RHEA System S.A. (RHEA) disclaims any and all liability if this email transmission
    was virus corrupted, altered or falsified.

    If a Non Disclosure Agreement (NDA) exists between RHEA and the receiving
    organization, then all contents shall be considered proprietary and covered under
    the NDA.
  • Wernke zur Borg at Jan 25, 2012 at 9:46 am
    I solved the problem by wrapping the entire processing into one
    transaction, following "User-Defined Transaction Scope" in
    http://cayenne.apache.org/doc/understanding-transactions.html.

    // embed the whole thing in one single transaction
    DataDomain d = getRuntime().getDataDomain();
    Transaction tx = d.createTransaction();
    Transaction.bindThreadTransaction(tx);

    try {
    // do all processing here
    } finally {
    Transaction.bindThreadTransaction(null);
    }

    No need to worry about rollbacks as everything is read-only.
    Processing time has gone down from one hour to one minute !!!
    Thanks to everybody,
    Wernke

    On 2012-01-25 10:01, Wernke zur Borg wrote:

    To give you a picture here is a screenshot of the profiler showing
    that one connection is opened and closed for every query.



    The question I do not understand is why does DataNode.performQueries()
    unconditionally close the current connection?

    Otherwise I could probably provide a custom DataSource class
    overriding getConnection() to re-use connections.

    Wernke

    On 2012-01-24 17:11, Wernke zur Borg wrote:
    Dear list,

    I have another question. My application converts a huge MS Access
    database to a set of XML files by traversing some really big tables with
    Select queries and also querying related tables to collect the
    information needed for the output. Performance is horribly bad, so I did
    some profiling and found that it spends 75% of its CPU time in
    org.apache.cayenne.access.ExternalTransaction.close() and another 21% in
    org.apache.cayenne.conn.DriverDataSource.getConnection(). Only 2% are
    spent in SelectAction.performAction().

    So it appears that it opens and closes a new connection for every single
    select query. I must be doing something wrong here! Where can I tell it
    to re-use connections? Or make one big transaction as it is read-only
    anyway? Any other hints? I hope I can reduce the processing time to 4%
    of what it is now...

    Thanks,
    Wernke


    This email (including any attachments) may contain confidential and/or privileged
    information or information otherwise protected from disclosure. If you are not the
    intended recipient, please notify the sender immediately, do not copy this message
    or any attachments and do not use it or any purpose or disclose its content to any
    person, but delete this message and any attachments from your system.
    RHEA System S.A. (RHEA) disclaims any and all liability if this email transmission
    was virus corrupted, altered or falsified.

    If a Non Disclosure Agreement (NDA) exists between RHEA and the receiving
    organization, then all contents shall be considered proprietary and covered under
    the NDA.
  • Durchholz, Joachim at Jan 25, 2012 at 1:21 pm
    It closes the transaction, not the connection (it's ExternalTransaction.close). You fixed that for now by putting everything into a single transaction, but I suppose it will come back to bite you once business logic dictates many small transactions (maybe in a later project, so this may not be relevant for you right now - it's just something to keep in mind).

    There's no good reason why a getConnection from a connection pool should take up more time than running any SQL statement.
    Either the connection isn't taken from the connection pool. Which makes me suspect the connection pool isn't active - maybe something is misconfigured.
    Or Cayenne's getConnection is unreasonably slow.
    It would be nice to know what actually caused the problem.

    N.B. writing your own DataSource that reuses connections would be just a reinvention of connection pooling. You don't want to do that - connection pooling is far more involved than meets the eye; doing it well will certainly make you go over the time budget, and doing it not-so-well is a waste of time. (The issues I'm aware of are: keeping connections open in advance to reduce latency, which means opening connections in a separate thread and avoiding race conditions; dealing with connection loss due to networking outages; dealing with connection loss due to server-side time-outs after periods of inactivity; avoiding unnecessary work in a transaction after the connection was gone; distinguishing whether to silently reestablish connection or report back a failed connection, that despends on whether you actually had any SQL activity after establishing the connection. There might be more.)

    Regards,
    Jo

    ________________________________

    From: Wernke zur Borg
    Sent: Wednesday, January 25, 2012 10:02 AM
    To: user@cayenne.apache.org
    Subject: Re: Performance question



    To give you a picture here is a screenshot of the profiler showing that one connection is opened and closed for every query.



    The question I do not understand is why does DataNode.performQueries() unconditionally close the current connection?

    Otherwise I could probably provide a custom DataSource class overriding getConnection() to re-use connections.

    Wernke
  • Wernke zur Borg at Jan 25, 2012 at 1:36 pm
    Hi Jo,

    thanks for your valuable explanations. I also found my error w.r.t.
    Connection.close(), that's why I got the idea with the one big
    transaction. I do not see why a single reading application with no
    concurrent writers should use multiple transactions anyway. Of course we
    would have to rethink/redesign this in case concurrent writers would
    eventually be added. But this is a simple offline tool, so there is
    currently no danger.

    And I can assure you, I will not reinvent the connection pooling of
    DataSource. Perhaps I'll find the time to investigate the issue why the
    pooling apparently did not work for me. I will let you know.

    Thanks again for your contributions to the list.

    Wernke

    On 2012-01-25 14:21, Durchholz, Joachim wrote:
    It closes the /transaction/, not the /connection/ (it's
    /ExternalTransaction/.close). You fixed that for now by putting
    everything into a single transaction, but I suppose it will come back
    to bite you once business logic dictates many small transactions
    (maybe in a later project, so this may not be relevant for you right
    now - it's just something to keep in mind).
    There's no good reason why a getConnection from a connection pool
    should take up more time than running any SQL statement.
    Either the connection isn't taken from the connection pool. Which
    makes me suspect the connection pool isn't active - maybe something is
    misconfigured.
    Or Cayenne's getConnection is unreasonably slow.
    It would be nice to know what actually caused the problem.
    N.B. writing your own DataSource that reuses connections would be just
    a reinvention of connection pooling. You don't want to do that -
    connection pooling is far more involved than meets the eye; doing it
    well will certainly make you go over the time budget, and doing it
    not-so-well is a waste of time. (The issues I'm aware of are: keeping
    connections open in advance to reduce latency, which means opening
    connections in a separate thread and avoiding race conditions; dealing
    with connection loss due to networking outages; dealing with
    connection loss due to server-side time-outs after periods of
    inactivity; avoiding unnecessary work in a transaction after the
    connection was gone; distinguishing whether to silently reestablish
    connection or report back a failed connection, that despends on
    whether you actually had any SQL activity after establishing the
    connection. There might be more.)
    Regards,
    Jo

    ------------------------------------------------------------------------
    *From:* Wernke zur Borg
    *Sent:* Wednesday, January 25, 2012 10:02 AM
    *To:* user@cayenne.apache.org
    *Subject:* Re: Performance question


    To give you a picture here is a screenshot of the profiler showing
    that one connection is opened and closed for every query.



    The question I do not understand is why does DataNode.performQueries()
    unconditionally close the current connection?

    Otherwise I could probably provide a custom DataSource class
    overriding getConnection() to re-use connections.

    Wernke

    --
    /Wernke zur Borg
    *Rhea System S.A.*
    Robert-Bosch-Str. 7
    64293 Darmstadt / Germany
    Tel. +49-6151-8709591/


    This email (including any attachments) may contain confidential and/or privileged
    information or information otherwise protected from disclosure. If you are not the
    intended recipient, please notify the sender immediately, do not copy this message
    or any attachments and do not use it or any purpose or disclose its content to any
    person, but delete this message and any attachments from your system.
    RHEA System S.A. (RHEA) disclaims any and all liability if this email transmission
    was virus corrupted, altered or falsified.

    If a Non Disclosure Agreement (NDA) exists between RHEA and the receiving
    organization, then all contents shall be considered proprietary and covered under
    the NDA.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriescayenne
postedJan 24, '12 at 4:12p
activeJan 25, '12 at 1:36p
posts8
users3
websitecayenne.apache.org

People

Translate

site design / logo © 2022 Grokbase