We just moved a large production instance of ours from Oracle to
Postgres 8.0.3 on linux. When running on Oracle the machine hummed
along using about 5% of the CPU easily handling the fairly constant
load, after moving the data to Postgres the machine was pretty much
maxed out on CPU and could no longer keep up with the transaction
volume. On a hunch I switched the jdbc driver to using the V2 protocol
and the load on the machine dropped down to what it was when using
Oracle and everything was fine.



Now obviously I have found a work around for the performance problem,
but I really don't want to rely on using the V2 protocol forever, and
don't want to have to recommend to our customers that they need to run
with the V2 protocol. So I would like to resolve the problem and be
able to move back to a default configuration with the V3 protocol and
the benefits thereof.



The problem is that I don't really know where to begin to debug a
problem like this. In development environments and testing environments
we have not seen performance problems with the V3 protocol in the jdbc
driver. But they don't come close to approaching the transaction volume
of this production instance.



What I see when running the V3 protocol under 'top' is that the postgres
processes are routinely using 15% or more of the CPU each, when running
the V2 protocol they use more like 0.3%.



Does anyone have any suggestions on an approach to debug a problem like
this?



Thanks,

--Barry

Search Discussions

  • Tom Lane at Aug 17, 2005 at 5:01 am

    "Barry Lind" <[email protected]> writes:
    ... On a hunch I switched the jdbc driver to using the V2 protocol
    and the load on the machine dropped down to what it was when using
    Oracle and everything was fine.
    First knee-jerk reaction is that it's an optimization problem stemming
    from V3 protocol feeding parameterized queries to the backend where V2
    did not, and the planner being unable to cope :-(

    Can you identify the specific queries causing the problem?

    regards, tom lane
  • Barry Lind at Aug 17, 2005 at 5:43 am
    That was my suspicion as well, which is why I tried the V2 protocol.

    I do not know of any specific queries that are causing the problem. As
    I monitor 'top' I see processes utilizing a significant amount of CPU
    running SELECT, UPDATE and DELETE, which would lead me to believe that
    it isn't any one specific query.

    How does one identify on a live system specific queries that are running
    slow, especially with the V3 protocol and when the system is executing
    about a 100 queries a second (which makes turning on any sort of logging
    very very verbose)? (I just subscribed to the performance list, so this
    is probably something that has been answered many times before on this
    list).

    I haven't tried to track down a performance problem like this before on
    postgres. Since most of our large customers run Oracle that is where I
    have the knowledge to figure something like this out.

    Thanks,
    --Barry


    -----Original Message-----
    From: Tom Lane
    Sent: Tuesday, August 16, 2005 10:02 PM
    To: Barry Lind
    Cc: [email protected]; [email protected]
    Subject: Re: [JDBC] Performance problem using V3 protocol in jdbc driver


    "Barry Lind" <[email protected]> writes:
    ... On a hunch I switched the jdbc driver to using the V2 protocol
    and the load on the machine dropped down to what it was when using
    Oracle and everything was fine.
    First knee-jerk reaction is that it's an optimization problem stemming
    from V3 protocol feeding parameterized queries to the backend where V2
    did not, and the planner being unable to cope :-(

    Can you identify the specific queries causing the problem?

    regards, tom lane
  • Dave Cramer at Aug 17, 2005 at 12:59 pm
    Barry,


    One way to do this is to turn logging on for calls over a certain
    duration


    log_duration in the config file. This will only log calls over n
    milliseconds.

    There's a tool called iron eye SQL that monitors JDBC calls.

    http://www.irongrid.com/

    unfortunately I am getting DNS errors from that site right now. I do
    have a copy of their code if you need it.

    Dave
    On 17-Aug-05, at 1:43 AM, Barry Lind wrote:

    That was my suspicion as well, which is why I tried the V2 protocol.

    I do not know of any specific queries that are causing the
    problem. As
    I monitor 'top' I see processes utilizing a significant amount of CPU
    running SELECT, UPDATE and DELETE, which would lead me to believe that
    it isn't any one specific query.

    How does one identify on a live system specific queries that are
    running
    slow, especially with the V3 protocol and when the system is executing
    about a 100 queries a second (which makes turning on any sort of
    logging
    very very verbose)? (I just subscribed to the performance list, so
    this
    is probably something that has been answered many times before on this
    list).

    I haven't tried to track down a performance problem like this
    before on
    postgres. Since most of our large customers run Oracle that is
    where I
    have the knowledge to figure something like this out.

    Thanks,
    --Barry


    -----Original Message-----
    From: Tom Lane
    Sent: Tuesday, August 16, 2005 10:02 PM
    To: Barry Lind
    Cc: [email protected]; [email protected]
    Subject: Re: [JDBC] Performance problem using V3 protocol in jdbc
    driver


    "Barry Lind" <[email protected]> writes:
    ... On a hunch I switched the jdbc driver to using the V2 protocol
    and the load on the machine dropped down to what it was when using
    Oracle and everything was fine.
    First knee-jerk reaction is that it's an optimization problem stemming
    from V3 protocol feeding parameterized queries to the backend where V2
    did not, and the planner being unable to cope :-(

    Can you identify the specific queries causing the problem?

    regards, tom lane


    ---------------------------(end of
    broadcast)---------------------------
    TIP 3: Have you checked our extensive FAQ?

    http://www.postgresql.org/docs/faq
  • Mudfoot at Aug 17, 2005 at 8:14 am
    Quoting Barry Lind <[email protected]>:

    <snip>

    What I see when running the V3 protocol under 'top' is that the postgres
    processes are routinely using 15% or more of the CPU each, when running
    the V2 protocol they use more like 0.3%.



    Does anyone have any suggestions on an approach to debug a problem like
    this?
    Tracing system calls is a good starting point--truss on Solaris, strace on Linux
    (Redhat anyway), ktrace on BSD. The difference between 0.3% and 15% CPU
    utilization under similar load will very likely (though not with complete
    certainty) be showing very noticeably different system call activity.

    If you notice a difference in system call activity, then that would probably
    provide a hint as to what's going on--where the inefficiency lies. It's
    possible to spin the CPU up without any system calls, but system call tracing
    can be done pretty quickly and you should be able to see any interesting
    patterns emerge quite quickly.

    ^
    This method is a good starting point for troubleshooting just about any funny
    process activity. And it comes with the added benefit of not having to know
    ahead of time about the specific matter at hand (JDBC implementation, in this
    case). :-) That's having your cake and eating it, too.
    Thanks,

    --Barry
  • Csaba Nagy at Aug 17, 2005 at 9:30 am
    Barry,

    I have made a similar experience, moving a big Oracle data base to
    Postgres 8.03 on linux.
    The first impact was similar, huge performance problems.
    The main problem was bad planner choices. The cause in our case: bad
    parameter types in the jdbc set methods (I guess you use Java). For
    oracle we used the NUMERIC type to set primary keys, but the postgres id
    type used was BIGINT, and it just refused to use the index in this case.
    Imagine that kicking in on a 100 million rows table... a sequential scan
    started a few times a second, now that made the DB unusable.
    So we fixed the code that for oracle continues to use NUMERIC and for
    postgres it uses BIGINT, and that is very important on setNull calls
    too.

    One very useful tool was the following query:

    prepare ps as
    SELECT procpid, substring(current_query for 97),
    to_char((now()-query_start), 'HH24:MI:SS') as t
    FROM pg_stat_activity
    where current_query not like '%<insufficient%'
    and current_query not like '%IDLE%' order by t desc;

    Then you just "execute ps;" in psql, and it will show you the queries
    which are already running for a while.

    Other problems were caused by complex queries, where more than 2 tables
    were joined. For oracle we were giving "hints" in the form of special
    comments, to point to the right index, right plan, but that's not an
    option for postgres (yet ?). So the fix in this case was to use explicit
    joins which do influence the postgres planner choices. This fixed
    another class of issues for us...

    Another problem: if you want to avoid worst-case plans, and do away with
    a generic plan for all cases, then you might force the usage of server
    side prepare statements in all cases. I had to do that, a lot of queries
    were performing very badly without this. Now maybe that could be solved
    by raising the statistics targets where needed, but in my case the
    generic plan was always good enough, by design. We rely on the DB
    picking a good generic plan in all cases. One typical example for us
    would be: a limit query which select 20 rows out of 100 million, with a
    where clause which actually selects 1 row out of it for the last
    chunk... it was going for an index scan, but on the wrong index. The
    right index would have selected that exactly 1 row, the wrong one had to
    cruise through a few million rows... the limit fooled the planner that
    it will get 20 rows quickly. Now when I forced the usage of a prepared
    statement, it went for the right index and all was good.
    I actually set this in our connection pool:
    ((PGConnection)connection).setPrepareThreshold(1);
    but it is possible to set/reset it on a statement level, I just didn't
    find any query I should to do it for yet... the DB is steady now.

    Another issue was that we've had some functional indexes on oracle
    returning null for uninteresting rows, to lower the index size. This is
    easier to implement on postgres using a partial index, which has a lot
    simpler syntax than the oracle hack, and it is easier to handle. The
    catch was that we needed to change the where clause compared to oracle
    so that postgres picks the partial index indeed. There are cases where
    the planner can't figure out that it can use the index, especially if
    you use prepared statements and one of the parameters is used in the
    index condition. In this case it is needed to add the proper restriction
    to the where clause to point postgres to use the partial index. Using
    partial indexes speeds up the inserts and updates on those tables, and
    could speed up some selects too.

    Hmmm... that's about what I recall now... beside the postgres admin
    stuff, have you analyzed your data after import ? I forgot to do that at
    first, and almost reverted again back to oracle... and then after a few
    days it was very clear that running the auto-vacuum daemon is also a
    must :-)
    And: for big data sets is important to tweak all performance settings in
    the config file, otherwise you get surprises. We've been running a
    smaller instance of the same code on postgres for quite a while before
    deciding to migrate a big one, and that was cruising along happily with
    the default settings, so the first time we needed to do optimizations
    was when using a data set with a lot of data in it...

    HTH,
    Csaba.

    On Wed, 2005-08-17 at 06:42, Barry Lind wrote:
    We just moved a large production instance of ours from Oracle to
    Postgres 8.0.3 on linux. When running on Oracle the machine hummed
    along using about 5% of the CPU easily handling the fairly constant
    load, after moving the data to Postgres the machine was pretty much
    maxed out on CPU and could no longer keep up with the transaction
    volume. On a hunch I switched the jdbc driver to using the V2
    protocol and the load on the machine dropped down to what it was when
    using Oracle and everything was fine.



    Now obviously I have found a work around for the performance problem,
    but I really don’t want to rely on using the V2 protocol forever, and
    don’t want to have to recommend to our customers that they need to run
    with the V2 protocol. So I would like to resolve the problem and be
    able to move back to a default configuration with the V3 protocol and
    the benefits thereof.



    The problem is that I don’t really know where to begin to debug a
    problem like this. In development environments and testing
    environments we have not seen performance problems with the V3
    protocol in the jdbc driver. But they don’t come close to approaching
    the transaction volume of this production instance.



    What I see when running the V3 protocol under ‘top’ is that the
    postgres processes are routinely using 15% or more of the CPU each,
    when running the V2 protocol they use more like 0.3%.



    Does anyone have any suggestions on an approach to debug a problem
    like this?



    Thanks,

    --Barry

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-performance @
categoriespostgresql
postedAug 17, '05 at 4:42a
activeAug 17, '05 at 12:59p
posts6
users5
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2023 Grokbase