Sean Shanny
Hardware: Apple G5 dual 2.0 with 8GB memory attached via dual fibre
channel to a fully loaded 3.5TB XRaid. The XRaid is configured as two 7
disk hardware based RAID5 sets software striped to form a RAID50 set.
The DB, WALS, etc are all on that file set. Running OSX journaled file
system Running postgres 7.4.1. OSX Server 10.3.2 Postgres is compiled
locally with '--enable-recode' '--enable-multibyte=UNICODE'
'CFLAGS=-mcpu=970 -mtune=970 -mpowerpc64 -O3'
Have you tried altering the blocksize to a higher value? Say 32K?
max_connections = 100
Why have you set this to 100 when you have typically 1-3 users?
sort_mem = 256000 # min 64, size in KB
If you have only 1-3 users, then that value seems reasonable.
The query is

SELECT t1.id, t2.md5, t2.url FROM referral_temp t2 LEFT OUTER JOIN
d_referral t1 ON t2.md5 = t1.referral_md5;


\d d_referral
id | integer | not null
referral_md5 | text | not null
referral_raw_url | text | not null
referral_host | text |
referral_path | text |
referral_query | text |
job_control_number | integer | not null


\d referral_temp
md5 | text |
url | text |
Have you looked at using reversed indexes, as per recent postings in
[performance]? These seemed to help considerably with lookup speed when
using a large URL database, which seems to be your situation here.

...
Jeff Boes writes
We have a large (several million row) table with a field containing
URLs. Now, funny thing about URLs: they mostly start with a common
substring ("http://www."). But not all the rows start with this, so we
can't just lop off the first N characters. However, we noticed some time
ago that an index on this field wasn't as effective as an index on the
REVERSE of the field. So ...

CREATE OR REPLACE FUNCTION fn_urlrev(text) returns text as '
return reverse(lc($_[0]))
' language 'plperl' with (iscachable,isstrict);

and then

CREATE UNIQUE INDEX ix_links_3 ON links
(fn_urlrev(path_base));
You have 2 CPUs: have you tried splitting your input data file into two
tables, then executing the same query simultaneously, to split the
processing? If you get the correct plan, you should use roughly the same
I/O but use all of the available CPU power.

I'm sure we'd all be interested in your further results!

Best Regards, Simon Riggs

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 10 of 11 | next ›
Discussion Overview
grouppgsql-performance @
categoriespostgresql
postedFeb 20, '04 at 7:17p
activeFeb 24, '04 at 12:54a
posts11
users5
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase