I have made a similar experience, moving a big Oracle data base to
Postgres 8.03 on linux.
The first impact was similar, huge performance problems.
The main problem was bad planner choices. The cause in our case: bad
parameter types in the jdbc set methods (I guess you use Java). For
oracle we used the NUMERIC type to set primary keys, but the postgres id
type used was BIGINT, and it just refused to use the index in this case.
Imagine that kicking in on a 100 million rows table... a sequential scan
started a few times a second, now that made the DB unusable.
So we fixed the code that for oracle continues to use NUMERIC and for
postgres it uses BIGINT, and that is very important on setNull calls
One very useful tool was the following query:
prepare ps as
SELECT procpid, substring(current_query for 97),
to_char((now()-query_start), 'HH24:MI:SS') as t
where current_query not like '%<insufficient%'
and current_query not like '%IDLE%' order by t desc;
Then you just "execute ps;" in psql, and it will show you the queries
which are already running for a while.
Other problems were caused by complex queries, where more than 2 tables
were joined. For oracle we were giving "hints" in the form of special
comments, to point to the right index, right plan, but that's not an
option for postgres (yet ?). So the fix in this case was to use explicit
joins which do influence the postgres planner choices. This fixed
another class of issues for us...
Another problem: if you want to avoid worst-case plans, and do away with
a generic plan for all cases, then you might force the usage of server
side prepare statements in all cases. I had to do that, a lot of queries
were performing very badly without this. Now maybe that could be solved
by raising the statistics targets where needed, but in my case the
generic plan was always good enough, by design. We rely on the DB
picking a good generic plan in all cases. One typical example for us
would be: a limit query which select 20 rows out of 100 million, with a
where clause which actually selects 1 row out of it for the last
chunk... it was going for an index scan, but on the wrong index. The
right index would have selected that exactly 1 row, the wrong one had to
cruise through a few million rows... the limit fooled the planner that
it will get 20 rows quickly. Now when I forced the usage of a prepared
statement, it went for the right index and all was good.
I actually set this in our connection pool:
but it is possible to set/reset it on a statement level, I just didn't
find any query I should to do it for yet... the DB is steady now.
Another issue was that we've had some functional indexes on oracle
returning null for uninteresting rows, to lower the index size. This is
easier to implement on postgres using a partial index, which has a lot
simpler syntax than the oracle hack, and it is easier to handle. The
catch was that we needed to change the where clause compared to oracle
so that postgres picks the partial index indeed. There are cases where
the planner can't figure out that it can use the index, especially if
you use prepared statements and one of the parameters is used in the
index condition. In this case it is needed to add the proper restriction
to the where clause to point postgres to use the partial index. Using
partial indexes speeds up the inserts and updates on those tables, and
could speed up some selects too.
Hmmm... that's about what I recall now... beside the postgres admin
stuff, have you analyzed your data after import ? I forgot to do that at
first, and almost reverted again back to oracle... and then after a few
days it was very clear that running the auto-vacuum daemon is also a
And: for big data sets is important to tweak all performance settings in
the config file, otherwise you get surprises. We've been running a
smaller instance of the same code on postgres for quite a while before
deciding to migrate a big one, and that was cruising along happily with
the default settings, so the first time we needed to do optimizations
was when using a data set with a lot of data in it...
On Wed, 2005-08-17 at 06:42, Barry Lind wrote:
We just moved a large production instance of ours from Oracle to
Postgres 8.0.3 on linux. When running on Oracle the machine hummed
along using about 5% of the CPU easily handling the fairly constant
load, after moving the data to Postgres the machine was pretty much
maxed out on CPU and could no longer keep up with the transaction
volume. On a hunch I switched the jdbc driver to using the V2
protocol and the load on the machine dropped down to what it was when
using Oracle and everything was fine.
Now obviously I have found a work around for the performance problem,
but I really don’t want to rely on using the V2 protocol forever, and
don’t want to have to recommend to our customers that they need to run
with the V2 protocol. So I would like to resolve the problem and be
able to move back to a default configuration with the V3 protocol and
the benefits thereof.
The problem is that I don’t really know where to begin to debug a
problem like this. In development environments and testing
environments we have not seen performance problems with the V3
protocol in the jdbc driver. But they don’t come close to approaching
the transaction volume of this production instance.
What I see when running the V3 protocol under ‘top’ is that the
postgres processes are routinely using 15% or more of the CPU each,
when running the V2 protocol they use more like 0.3%.
Does anyone have any suggestions on an approach to debug a problem