I currently have a query that calculates a sum of a field grouping by some
fields in other tables, like this:

select a.aid, b.bid, c.cid, sum(d.amount)
from a, b, c, d
where a.aid = b.aid
and b.bid = c.bid
and c.cid = d.cid
group by a.aid, b.bid, c.cid;

Those ids are all primary keys of their respective tables. A has about 20
entries, B has about 2000, C has about 3000 and D has about 2 million.

What happens is that a huge join is done, sorted and then aggregated. The
sort takes a huge amount of memory and slows everything down.

My idea is to have the planner do the equivalent of:

select d.cid, sum(d.amount) from d;

Which will be about 3,000 rows, and *then* do the join with A, B and C to
produce the final result. As a bonus (at least under the current
arrangement), the result would be sorted by d.cid, lending itself to merge

This works because (as far as I can see) sum(x) == sum(sum(x)). Is this
feasable or am I missing some corner cases?

Generalising is harder. count(x) == sum(count(x)) and avg doesn't work
obviously at all. However, avg = sum/count, both of which can be simplified
according to the earlier rules. Naturally you could handcraft this but it
would possibly be nicer for the planner to do it.

Basically, it works because by sorting earlier the width is less thus not as
much memory is required to do the sorting. If my logic is wrong, let me know.
When I get back home I'll have to check whether the amount of required
memory is taken into account for a sort.
Martijn van Oosterhout <kleptog@svana.org>
Magnetism, electricity and motion are like a three-for-two special offer:
if you have two of them, the third one comes free.

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-general @
postedOct 4, '01 at 3:44a
activeOct 4, '01 at 3:44a

1 user in discussion

Martijn van Oosterhout: 1 post



site design / logo © 2022 Grokbase