Can I chip in? I've had a look in the past at the way various databases
perform this. Most just go and read the data, though Informix does seem
to keep a permanent record of the number of rows in a table...which
probably adds overhead you don't really want.

Select count(*) could be evaluated against any available index
sub-tables, since all that is required is to count the rows. That would
be significantly faster than a full file scan and accurate too. You'd
simply count the pointers, after evaluating any WHERE clause against the
indexed col values - so it won't work except for fairly simple
count(*)'s.

Why not implement estimated_count as a dictionary lookup, directly using
the value recorded there by the analyze? That would be the easiest way
to reuse existing code and give you access to many previously calculated
values.

This whole area is a major performance improver, with lots of
cross-overs with the materialized view sub-project.

Could you say a little more about why you wanted to achieve this?

Best Regards

Simon Riggs
2nd Quadrant
+44-7900-255520

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
On Behalf Of Mark Kirkwood
Sent: Monday, December 29, 2003 08:36
To: Randolf Richardson
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] *sigh*

*growl* - it sounds like the business...and I was all set to code it,
however after delving into Pg's aggregation structure a bit, it suffers
a fatal flaw :

There appears to be no way to avoid visiting every row when defining an
aggregate (even if you do nothing on each one) -- which defeats the
whole point of my suggestion (i.e avoiding the visit to every row)

To make the original idea work requires amending the definition of Pg
aggregates to introduce "fake" aggregates that don't actually get
evaulated for every row. At this point I am not sure if this sort of
modification is possible or reasonable - others who know feel free to
chip in :-)

regards

Mark

Randolf Richardson wrote:
"markir@paradise.net.nz (Mark Kirkwood)" wrote in
comp.databases.postgresql.hackers:

[sNip]

How about:

Implement a function "estimated_count" that can be used instead of
"count". It could use something like the algorithm in
src/backend/commands/analyze.c to get a reasonably accurate psuedo
count
quickly.

The advantage of this approach is that "count" still means
(exact)count
(for your xact snapshot anyway). Then the situation becomes:

Want a fast count? - use estimated_count(*)
Want an exact count - use count(*)
I think this is an excellent solution.


---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

People

Translate

site design / logo © 2021 Grokbase