psql cannot \i a file which name contains a blank character. Why doesn't
it take the rest of the command line as the name of the file?!? Why
doesn't it use any of the standard command line parsing libraries which
respect "quoting" and \e\s\c\a\p\i\n\g?!? psql \i even does autocomplete
to the file name it fails to load...

Also will the BUG which causes postgresql to execute a sequential scan
when using min()/max()/count() ever be fixed? min()/max() can be
rewritten as SELECT $column ORDER BY $column ASC/DESC LIMIT 1 but this
should be done by the database, NOT by the user!

Tom

Search Discussions

  • Peter Eisentraut at Oct 18, 2003 at 5:46 pm

    Thomas Zehetbauer writes:

    psql cannot \i a file which name contains a blank character.
    \i 'blah blah blah'

    --
    Peter Eisentraut peter_e@gmx.net
  • Greg Stark at Oct 19, 2003 at 7:28 am

    Thomas Zehetbauer writes:

    Also will the BUG which causes postgresql to execute a sequential scan
    when using min()/max()/count() ever be fixed? min()/max() can be
    rewritten as SELECT $column ORDER BY $column ASC/DESC LIMIT 1 but this
    should be done by the database, NOT by the user!
    First of all, you should take COUNT() out of that list. While MIN/MAX could be
    implemented to take advantage of indexes like "DISTINCT ON" (however it's much
    more complex than your rewrite indicates), COUNT() *cannot* be done that way.

    Nobody is currently working on this or planning to work on this soon. So no,
    at least currently it appears this issue will not be changed. Postgresql is
    open source and this is the hackers mailing list. Feel free to contribute a
    patch.

    --
    greg
  • Joshua D. Drake at Oct 19, 2003 at 5:00 pm

    Greg Stark wrote:
    Thomas Zehetbauer <thomasz@hostmaster.org> writes:


    Also will the BUG which causes postgresql to execute a sequential scan
    when using min()/max()/count() ever be fixed? min()/max() can be
    rewritten as SELECT $column ORDER BY $column ASC/DESC LIMIT 1 but this
    should be done by the database, NOT by the user!
    I would add that this is not a bug as much as a feature request. count()
    works. It may not be as feature
    filled as we would like (e.g; it won't use an index) but it does work.
    Nobody is currently working on this or planning to work on this soon. So no,
    at least currently it appears this issue will not be changed. Postgresql is
    open source and this is the hackers mailing list. Feel free to contribute a
    patch.

    Personally I think there are greater things that need to be patched
    versus count(). As you can implement
    procedures on your own to deliver faster counts.


    Sincerely,

    Joshua D. Drake

    --
    Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
    Postgresql support, programming shared hosting and dedicated hosting.
    +1-503-222-2783 - jd@commandprompt.com - http://www.commandprompt.com
    Editor-N-Chief - PostgreSQl.Org - http://www.postgresql.org
  • Greg Stark at Oct 19, 2003 at 7:00 pm

    "Joshua D. Drake" <jd@commandprompt.com> writes:

    Greg Stark wrote:
    Thomas Zehetbauer <thomasz@hostmaster.org> writes:

    Also will the BUG which causes postgresql to execute a sequential scan
    when using min()/max()/count() ever be fixed? min()/max() can be
    rewritten as SELECT $column ORDER BY $column ASC/DESC LIMIT 1 but this
    should be done by the database, NOT by the user!
    I would add that this is not a bug as much as a feature request. count() works.
    It may not be as feature
    filled as we would like (e.g; it won't use an index) but it does work.
    count will use an index just fine where it's useful. If you say "select
    count(*) where foo = ?" and there's an index on foo it will use the index. If
    there's a partial index that helps with that clause it'll consider that too.

    You're thinking of min/max. min/max can use an index to avoid traversing all
    of the table. count(*) has to see all the rows to count them.

    To optimize count effectively would require a very powerful materalized view
    infrastructure with incremental updates. Something I don't believe any
    database has, and that I doubt postgres will get any time soon.

    You can implement it with triggers, which would be effectively equivalent to
    what mysql does, but then you would be introducing a massive point of
    contention and deadlocks.

    --
    greg
  • Randolf Richardson at Nov 29, 2003 at 6:27 am
    [sNip]
    I would add that this is not a bug as much as a feature request.
    count() works. It may not be as feature
    filled as we would like (e.g; it won't use an index) but it does work.
    count will use an index just fine where it's useful. If you say "select
    count(*) where foo = ?" and there's an index on foo it will use the
    index. If there's a partial index that helps with that clause it'll
    consider that too.

    You're thinking of min/max. min/max can use an index to avoid traversing
    all of the table. count(*) has to see all the rows to count them.

    To optimize count effectively would require a very powerful materalized
    view infrastructure with incremental updates. Something I don't believe
    any database has, and that I doubt postgres will get any time soon.

    You can implement it with triggers, which would be effectively
    equivalent to what mysql does, but then you would be introducing a
    massive point of contention and deadlocks.
    What about adding a "total number of rows" value to the internal
    header of each table which gets incremented/decremented after each row is
    INSERT/DELETE has been committed. This way, a generic "count(*)" by itself
    could simply return this value without any delay at all.

    --
    Randolf Richardson - rr@8x.ca
    Vancouver, British Columbia, Canada

    Please do not eMail me directly when responding
    to my postings in the newsgroups.
  • Doug McNaught at Nov 29, 2003 at 7:41 pm

    Randolf Richardson writes:

    What about adding a "total number of rows" value to the internal
    header of each table which gets incremented/decremented after each row is
    INSERT/DELETE has been committed. This way, a generic "count(*)" by itself
    could simply return this value without any delay at all.
    Because different sessions have a (validly) different concept of what
    that number should be, due to MVCC.

    -Doug
  • Gaetano Mendola at Nov 30, 2003 at 2:23 am

    Doug McNaught wrote:
    Randolf Richardson <rr@8x.ca> writes:

    What about adding a "total number of rows" value to the internal
    header of each table which gets incremented/decremented after each row is
    INSERT/DELETE has been committed. This way, a generic "count(*)" by itself
    could simply return this value without any delay at all.

    Because different sessions have a (validly) different concept of what
    that number should be, due to MVCC.
    The count(*) information can be revisioned too, am I wrong ? I'm able to
    create a trigger that store the count(*) information in a special table,
    why not implement the same in a way "builded in" ?




    Regards
    Gaetano Mendola
  • Doug McNaught at Nov 30, 2003 at 3:37 am

    Gaetano Mendola writes:

    Doug McNaught wrote:
    Because different sessions have a (validly) different concept of what
    that number should be, due to MVCC.
    The count(*) information can be revisioned too, am I wrong ? I'm able to
    create a trigger that store the count(*) information in a special table,
    why not implement the same in a way "builded in" ?
    This has been discussed extensively before (last time was a few months
    ago); please search the archives if you're curious.

    -Doug
  • Chris Browne at Nov 30, 2003 at 4:27 am

    Oops! mendola@bigfoot.com (Gaetano Mendola) was seen spray-painting on a wall:
    Doug McNaught wrote:
    Randolf Richardson <rr@8x.ca> writes:
    What about adding a "total number of rows" value to the
    internal header of each table which gets incremented/decremented
    after each row is INSERT/DELETE has been committed. This way, a
    generic "count(*)" by itself could simply return this value without
    any delay at all.
    Because different sessions have a (validly) different concept of what
    that number should be, due to MVCC.
    The count(*) information can be revisioned too, am I wrong ? I'm able to
    create a trigger that store the count(*) information in a special table,
    why not implement the same in a way "builded in" ?
    You could do this; the cost would be pretty high, as there would be an
    extra table update done every time rows were inserted or deleted.
    Doing it on _every_ table would be a heavy overhead that is not
    worthwhile.

    What might make sense would be to set up triggers to do this on those
    large tables where you frequently need COUNT(*).
    --
    "cbbrowne","@","cbbrowne.com"
    http://www.ntlug.org/~cbbrowne/lisp.html
    As of next month, MACLISP "/" will be flushed in favor of "\".
    Please update the WORLD.
  • Greg Stark at Nov 30, 2003 at 8:01 am

    Gaetano Mendola writes:

    The count(*) information can be revisioned too, am I wrong ? I'm able to
    create a trigger that store the count(*) information in a special table,
    why not implement the same in a way "builded in" ?
    Then every insert or delete would have to lock that count. Nobody else would
    be able to insert or delete any records until you either commit or roll back.

    That would lead to much lower concurrency, much more contention for locks, and
    tons of deadlocks.

    --
    greg
  • Randolf Richardson at Dec 3, 2003 at 5:28 am

    The count(*) information can be revisioned too, am I wrong ? I'm able
    to create a trigger that store the count(*) information in a special
    table, why not implement the same in a way "builded in" ?
    Then every insert or delete would have to lock that count. Nobody else
    would be able to insert or delete any records until you either commit or
    roll back.

    That would lead to much lower concurrency, much more contention for
    locks, and tons of deadlocks.
    What about queueing all these updates for a separate low-priority
    thread? The thread would be the only one with access to update this field.

    --
    Randolf Richardson - rr@8x.ca
    Vancouver, British Columbia, Canada

    Please do not eMail me directly when responding
    to my postings in the newsgroups.
  • Chris Browne at Dec 3, 2003 at 6:28 am

    Martha Stewart called it a Good Thing when Randolf Richardson wrote:
    The count(*) information can be revisioned too, am I wrong ? I'm able
    to create a trigger that store the count(*) information in a special
    table, why not implement the same in a way "builded in" ?
    Then every insert or delete would have to lock that count. Nobody else
    would be able to insert or delete any records until you either commit or
    roll back.

    That would lead to much lower concurrency, much more contention for
    locks, and tons of deadlocks.
    What about queueing all these updates for a separate
    low-priority thread? The thread would be the only one with access
    to update this field.
    If updates are "queued," then how do you get to use them if the
    "update thread" isn't running because it's not high enough in
    priority?

    I am not being facetious.

    The one way that is expected to be successful would be to have a
    trigger that, upon seeing an insert of 5 rows to table "ABC", puts,
    into table "count_detail", something like:

    insert into count_detail (table, value) values ('ABC', 5);

    You then replace
    select count(*) from abc;

    with
    select sum(value) from count_detail where table = 'ABC';

    The "low priority" thread would be a process that does something akin
    to vacuuming, where it would replace the contents of the table every
    so often...

    for curr_table in (select table from count_detail) do
    new_total = select sum(value) from count_detail
    where table = curr_table;
    delete from count_detail where table = curr_table;
    insert into count_detail (table, value) values (curr_table,
    new_total);
    done

    The point of this being to try to keep the number of rows to 1 per
    table.

    Note that this gets _real_ expensive for tables that see lots of
    single row inserts and deletes. There isn't a cheaper way that will
    actually account for the true numbers of records that have been
    committed.

    For a small table, it will be cheaper to walk through and calculate
    count(*) directly from the tuples themselves.

    The situation where it may be worthwhile to do this is a table which
    is rather large (thus count(*) is expensive) where there is some
    special reason to truly care how many rows there are in the table.
    For _most_ tables, it seems unlikely that this will be true. For
    _most_ tables, it is absolutely not worth the cost of tracking the
    information.
    --
    (format nil "~S@~S" "cbbrowne" "acm.org")
    http://cbbrowne.com/info/spreadsheets.html
    Predestination was doomed from the start.
  • Mark Kirkwood at Dec 3, 2003 at 8:35 am
    How about:

    Implement a function "estimated_count" that can be used instead of
    "count". It could use something like the algorithm in
    src/backend/commands/analyze.c to get a reasonably accurate psuedo count
    quickly.

    The advantage of this approach is that "count" still means (exact)count
    (for your xact snapshot anyway). Then the situation becomes:

    Want a fast count? - use estimated_count(*)
    Want an exact count - use count(*)

    regards

    Mark

    Christopher Browne wrote:
    For a small table, it will be cheaper to walk through and calculate
    count(*) directly from the tuples themselves.

    The situation where it may be worthwhile to do this is a table which
    is rather large (thus count(*) is expensive) where there is some
    special reason to truly care how many rows there are in the table.
    For _most_ tables, it seems unlikely that this will be true. For
    _most_ tables, it is absolutely not worth the cost of tracking the
    information.
  • Shridhar Daithankar at Dec 4, 2003 at 7:12 am

    On Wednesday 03 December 2003 13:59, Mark Kirkwood wrote:
    How about:

    Implement a function "estimated_count" that can be used instead of
    "count". It could use something like the algorithm in
    src/backend/commands/analyze.c to get a reasonably accurate psuedo count
    quickly.

    The advantage of this approach is that "count" still means (exact)count
    (for your xact snapshot anyway). Then the situation becomes:

    Want a fast count? - use estimated_count(*)
    Want an exact count - use count(*)
    Something like select reltuples from pg_class where relname='foo'?

    Shridhar
  • Mark Kirkwood at Dec 7, 2003 at 6:41 am

    Shridhar Daithankar wrote:
    Something like select reltuples from pg_class where relname='foo'?

    Shridhar
    [chuckles] - I had envisaged something more accurate that the last
    ANALYZE, "estimate_count" would effectively *do* acquire_sample_rows()
    then and there for you...

    regards

    Mark
  • Randolf Richardson at Dec 12, 2003 at 8:00 pm
    "markir@paradise.net.nz (Mark Kirkwood)" wrote in
    comp.databases.postgresql.hackers:

    [sNip]
    How about:

    Implement a function "estimated_count" that can be used instead of
    "count". It could use something like the algorithm in
    src/backend/commands/analyze.c to get a reasonably accurate psuedo count
    quickly.

    The advantage of this approach is that "count" still means (exact)count
    (for your xact snapshot anyway). Then the situation becomes:

    Want a fast count? - use estimated_count(*)
    Want an exact count - use count(*)
    I think this is an excellent solution.

    --
    Randolf Richardson - rr@8x.ca
    Vancouver, British Columbia, Canada

    Please do not eMail me directly when responding
    to my postings in the newsgroups.
  • Paul Punett at Dec 22, 2003 at 10:36 am
    Hi,

    I need to write a tab separated text file such that the first row contains
    number of records in the table.
    I insert first row with '0' (zero) as first column & rest columns NULL.
    Then at the end of writing records to table I do a select into Variable
    count(*) from table.
    & update the first record with the count returned.
    Unfortunately after the update the first row becomes the last row & hence
    COPY TO FileName sends the count as the last record.

    I need count as the first record? Any suggestions please ?
    The documentation says indexing does not affect copy order.

    I am developing on C++ with PostGre on windows. I need to port to Linux
    later. Any suggestions on linking C++ code to PostGre (queries & functions)

    Thanks
    Paul
  • Jeroen T. Vermeulen at Dec 22, 2003 at 1:51 pm

    On Mon, Dec 22, 2003 at 10:35:08AM -0000, Paul Punett wrote:

    I need to write a tab separated text file such that the first row contains
    number of records in the table.
    Whether COPY does what you want may depend on what you want to do with
    special characters. If your table contains strings with strange
    characters like newline, tab, non-ASCII characters etc. then COPY will
    replace them by escape sequences. I guess in most cases you won't
    have any problems with this, but it's a thing to keep in mind.

    I need count as the first record? Any suggestions please ?
    The documentation says indexing does not affect copy order.
    Rather than tricking COPY into generating your file format, you may
    want to use COPY TO STDOUT and do some processing on the lines you
    get from that.

    I am developing on C++ with PostGre on windows. I need to port to Linux
    later. Any suggestions on linking C++ code to PostGre (queries & functions)
    Try libpqxx (http://pqxx.tk/). Use the tablereader class to read raw
    lines from your table and write them to your file. Something like this
    should do the trick:

    connection c(myoptions);
    transaction<serializable> t(c);
    result count = t.exec("select count(*) from " + mytable);
    myfile << count[0][0] << endl;
    tablereader reader(t, mytable);
    string line;
    while (reader.get_raw_line(line)) myfile << line << endl;

    This may be a bit slower than a direct COPY because the data has to go
    through your program rather than directly to the file, but it gives you
    more control over the file's format.

    (I used a serializable transaction here because otherwise rows may be
    added or deleted by somebody else at just the wrong moment, e.g. after
    the count but before we read the table)


    Jeroen

    PS - It's Postgres or PostgreSQL, not PostGre!
  • Tom Lane at Dec 22, 2003 at 3:13 pm

    "Paul Punett" <paul.punett@shazamteam.com> writes:
    I need count as the first record? Any suggestions please ?
    SQL does not guarantee any particular ordering of rows in a table.
    You cannot do what you're doing and expect it to be reliable.

    You could do something like this: add a sequence-number column to your
    table and then do "SELECT ... ORDER BY seqno" to extract the data in
    a controlled order.

    regards, tom lane
  • Paulo Scardine at Dec 22, 2003 at 3:33 pm
    SELECT x, y
    (SELECT 1 AS ord, COUNT(*) as x, NULL AS y FROM tablex
    UNION
    SELECT 2, x, y FROM tablex)

    May be you will have to do some explicit casting depending on the field
    types.

    --
    Paulo Scardine

    ----- Original Message -----
    From: "Paul Punett" <paul.punett@shazamteam.com>
    To: <pgsql-hackers@postgresql.org>
    Sent: Monday, December 22, 2003 8:35 AM
    Subject: [HACKERS] COPY TABLE TO

    Hi,

    I need to write a tab separated text file such that the first row contains
    number of records in the table.
    I insert first row with '0' (zero) as first column & rest columns NULL.
    Then at the end of writing records to table I do a select into Variable
    count(*) from table.
    & update the first record with the count returned.
    Unfortunately after the update the first row becomes the last row & hence
    COPY TO FileName sends the count as the last record.

    I need count as the first record? Any suggestions please ?
    The documentation says indexing does not affect copy order.

    I am developing on C++ with PostGre on windows. I need to port to Linux
    later. Any suggestions on linking C++ code to PostGre (queries &
    functions)
    Thanks
    Paul


    ---------------------------(end of broadcast)---------------------------
    TIP 3: if posting/reading through Usenet, please send an appropriate
    subscribe-nomail command to majordomo@postgresql.org so that your
    message can get through to the mailing list cleanly
  • Mark Kirkwood at Dec 29, 2003 at 8:35 am
    *growl* - it sounds like the business...and I was all set to code it,
    however after delving into Pg's aggregation structure a bit, it suffers
    a fatal flaw :

    There appears to be no way to avoid visiting every row when defining an
    aggregate (even if you do nothing on each one) -- which defeats the
    whole point of my suggestion (i.e avoiding the visit to every row)

    To make the original idea work requires amending the definition of Pg
    aggregates to introduce "fake" aggregates that don't actually get
    evaulated for every row. At this point I am not sure if this sort of
    modification is possible or reasonable - others who know feel free to
    chip in :-)

    regards

    Mark

    Randolf Richardson wrote:
    "markir@paradise.net.nz (Mark Kirkwood)" wrote in
    comp.databases.postgresql.hackers:

    [sNip]

    How about:

    Implement a function "estimated_count" that can be used instead of
    "count". It could use something like the algorithm in
    src/backend/commands/analyze.c to get a reasonably accurate psuedo count
    quickly.

    The advantage of this approach is that "count" still means (exact)count
    (for your xact snapshot anyway). Then the situation becomes:

    Want a fast count? - use estimated_count(*)
    Want an exact count - use count(*)
    I think this is an excellent solution.

  • Simon Riggs at Dec 29, 2003 at 10:35 am
    Can I chip in? I've had a look in the past at the way various databases
    perform this. Most just go and read the data, though Informix does seem
    to keep a permanent record of the number of rows in a table...which
    probably adds overhead you don't really want.

    Select count(*) could be evaluated against any available index
    sub-tables, since all that is required is to count the rows. That would
    be significantly faster than a full file scan and accurate too. You'd
    simply count the pointers, after evaluating any WHERE clause against the
    indexed col values - so it won't work except for fairly simple
    count(*)'s.

    Why not implement estimated_count as a dictionary lookup, directly using
    the value recorded there by the analyze? That would be the easiest way
    to reuse existing code and give you access to many previously calculated
    values.

    This whole area is a major performance improver, with lots of
    cross-overs with the materialized view sub-project.

    Could you say a little more about why you wanted to achieve this?

    Best Regards

    Simon Riggs
    2nd Quadrant
    +44-7900-255520

    -----Original Message-----
    From: pgsql-hackers-owner@postgresql.org
    On Behalf Of Mark Kirkwood
    Sent: Monday, December 29, 2003 08:36
    To: Randolf Richardson
    Cc: pgsql-hackers@postgresql.org
    Subject: Re: [HACKERS] *sigh*

    *growl* - it sounds like the business...and I was all set to code it,
    however after delving into Pg's aggregation structure a bit, it suffers
    a fatal flaw :

    There appears to be no way to avoid visiting every row when defining an
    aggregate (even if you do nothing on each one) -- which defeats the
    whole point of my suggestion (i.e avoiding the visit to every row)

    To make the original idea work requires amending the definition of Pg
    aggregates to introduce "fake" aggregates that don't actually get
    evaulated for every row. At this point I am not sure if this sort of
    modification is possible or reasonable - others who know feel free to
    chip in :-)

    regards

    Mark

    Randolf Richardson wrote:
    "markir@paradise.net.nz (Mark Kirkwood)" wrote in
    comp.databases.postgresql.hackers:

    [sNip]

    How about:

    Implement a function "estimated_count" that can be used instead of
    "count". It could use something like the algorithm in
    src/backend/commands/analyze.c to get a reasonably accurate psuedo
    count
    quickly.

    The advantage of this approach is that "count" still means
    (exact)count
    (for your xact snapshot anyway). Then the situation becomes:

    Want a fast count? - use estimated_count(*)
    Want an exact count - use count(*)
    I think this is an excellent solution.


    ---------------------------(end of broadcast)---------------------------
    TIP 7: don't forget to increase your free space map settings
  • Mark Kirkwood at Dec 30, 2003 at 7:46 am
    Could certainly do that - a scalar function that returns reltuples from
    pg_class. I was hoping to do 2 additional things:

    i) provide better accuracy than the last ANALYZE
    ii) make it behave like an aggregate

    So I wanted to be able to use estimated_count as you would use count, i.e:

    SELECT estimated_count() FROM rel

    returns 1 row, whereas the scalar function :

    SELECT estimated_count(rel) FROM rel

    returns the result as many times as there are rows in rel - of course
    you would use

    SELECT estimated_count(rel)

    but hopefully you see what I mean!

    BTW, the scalar function is pretty simple to achieve - here is a basic
    example that ignores schema qualifiers:

    CREATE FUNCTION estimated_count(text) RETURNS real AS '
    SELECT reltuples FROM pg_class WHERE relname = $1;
    ' LANGUAGE SQL;


    cheers

    Mark

    Simon Riggs wrote:
    Why not implement estimated_count as a dictionary lookup, directly using
    the value recorded there by the analyze? That would be the easiest way
    to reuse existing code and give you access to many previously calculated
    values.

  • Neil Conway at Jan 4, 2004 at 4:23 pm

    "Simon Riggs" <simon@2ndquadrant.com> writes:
    Select count(*) could be evaluated against any available index
    sub-tables, since all that is required is to count the rows. That would
    be significantly faster than a full file scan and accurate too.
    PostgreSQL stores MVCC information in heap tuples only, so index-only
    plans such as you're suggesting can't be used (i.e. we need to check
    the heap tuple to see if a particular index entry is visible to the
    current transaction).

    -Neil
  • Randolf Richardson at Dec 12, 2003 at 8:00 pm
    "Christopher Browne <cbbrowne@acm.org>" wrote in
    comp.databases.postgresql.hackers:
    Martha Stewart called it a Good Thing when Randolf Richardson wrote:
    [sNip]
    What about queueing all these updates for a separate
    low-priority thread? The thread would be the only one with access
    to update this field.
    If updates are "queued," then how do you get to use them if the
    "update thread" isn't running because it's not high enough in
    priority?
    That would be an administration issue. This background process would
    need to have enough priority in order for this to be functional, yet could
    also be completely disabled by administrators who know their systems don't
    need to use "count(*)" at all.

    Also, if the thread was well-designed, then it could combine all the
    queued entries for a single table first in order to reduce disk I/O when
    updating each table.
    I am not being facetious.
    Oh, I see that. Don't worry, I know better than to take things
    personally on newsgroups -- go ahead and be blunt if you like. =D
    The one way that is expected to be successful would be to have a
    trigger that, upon seeing an insert of 5 rows to table "ABC", puts,
    into table "count_detail", something like:

    insert into count_detail (table, value) values ('ABC', 5);

    You then replace
    select count(*) from abc;

    with
    select sum(value) from count_detail where table = 'ABC';

    The "low priority" thread would be a process that does something akin
    to vacuuming, where it would replace the contents of the table every
    so often...

    for curr_table in (select table from count_detail) do
    new_total = select sum(value) from count_detail
    where table = curr_table;
    delete from count_detail where table = curr_table;
    insert into count_detail (table, value) values (curr_table,
    new_total);
    done

    The point of this being to try to keep the number of rows to 1 per
    table.
    Interesting. A different way of solving the same problem, but
    wouldn't it require more disk I/O on the table being updated then a
    separate tracker would?
    Note that this gets _real_ expensive for tables that see lots of
    single row inserts and deletes. There isn't a cheaper way that will
    actually account for the true numbers of records that have been
    committed.

    For a small table, it will be cheaper to walk through and calculate
    count(*) directly from the tuples themselves.

    The situation where it may be worthwhile to do this is a table which
    is rather large (thus count(*) is expensive) where there is some
    special reason to truly care how many rows there are in the table.
    For _most_ tables, it seems unlikely that this will be true. For
    _most_ tables, it is absolutely not worth the cost of tracking the
    information.
    Ah, but that's the point -- do we truly care how many rows are in the
    table, or is the purpose of "count(*)" to just give us a general idea?

    This statistic would be delayed because it's being updated by a
    background process, thus "count" won't always be accurate, but at least it
    won't be slow -- it could be the fastest "count" in the industry! =)

    --
    Randolf Richardson - rr@8x.ca
    Vancouver, British Columbia, Canada

    Please do not eMail me directly when responding
    to my postings in the newsgroups.
  • Christoph Haller at Dec 3, 2003 at 12:06 pm
    Fairly good idea IMHO, especially considering Christopher's point
    about the unlikeliness of needing an exact count anyway.

    Regards, Christoph
    How about:

    Implement a function "estimated_count" that can be used instead of
    "count". It could use something like the algorithm in
    src/backend/commands/analyze.c to get a reasonably accurate psuedo count
    quickly.

    The advantage of this approach is that "count" still means (exact)count
    (for your xact snapshot anyway). Then the situation becomes:

    Want a fast count? - use estimated_count(*)
    Want an exact count - use count(*)

    regards

    Mark

    Christopher Browne wrote:
    For a small table, it will be cheaper to walk through and calculate
    count(*) directly from the tuples themselves.

    The situation where it may be worthwhile to do this is a table which
    is rather large (thus count(*) is expensive) where there is some
    special reason to truly care how many rows there are in the table.
    For _most_ tables, it seems unlikely that this will be true. For
    _most_ tables, it is absolutely not worth the cost of tracking the
    information.

Related Discussions

People

Translate

site design / logo © 2021 Grokbase