Hello

I have a table that has a boolean column telling if that particular
record has been billed already. So most of the time that column holds
only true values.

To select the unbilled record I use the following query:

# select * where billed = false order by calldate;

and defined the following index:

# Create index cdr_billed_index on cdr (billed) where billed = false;

This query took surprisingly long, and explain seems to be telling
me the query is using a sequential scan:

=# explain select cdr.* from cdr where billed = false order by calldate;

QUERY PLAN
-------------------------------------------------------------------
Sort (cost=37448.75..37526.94 rows=31273 width=465)
Sort Key: calldate
-> Seq Scan on cdr (cost=0.00..20323.81 rows=31273 width=465)
Filter: (billed = false)

How can I ensure the patial billing index will be used?

thanks for any insights,
Ron

Search Discussions

  • Michael Fuhr at Feb 12, 2007 at 2:28 am

    On Mon, Feb 12, 2007 at 01:23:13AM +0100, Ron Arts wrote:
    I have a table that has a boolean column telling if that particular
    record has been billed already. So most of the time that column holds
    only true values.
    How often is "most of the time"? What are the results of the following
    queries?

    select version();
    select count(*) from cdr;
    select count(*) from cdr where billed = false;
    =# explain select cdr.* from cdr where billed = false order by calldate;

    QUERY PLAN
    -------------------------------------------------------------------
    Sort (cost=37448.75..37526.94 rows=31273 width=465)
    Sort Key: calldate
    -> Seq Scan on cdr (cost=0.00..20323.81 rows=31273 width=465)
    Filter: (billed = false)

    How can I ensure the patial billing index will be used?
    If using the index would be slower than a sequential scan then you
    don't want the query to use the index. Let's see how accurate the
    row count estimates are and whether using an index really would be
    faster -- please post the output of the following queries:

    set enable_seqscan to on;
    explain analyze select cdr.* from cdr where billed = false order by calldate;
    set enable_seqscan to off;
    explain analyze select cdr.* from cdr where billed = false order by calldate;

    If the number of rows returned isn't close to the planner's estimate
    (31273 in the output you showed) then trying running ANALYZE or
    VACUUM ANALYZE on the table, then run the above statements again.

    --
    Michael Fuhr
  • Michael Fuhr at Feb 12, 2007 at 2:43 am

    On Sun, Feb 11, 2007 at 07:27:29PM -0700, Michael Fuhr wrote:
    If using the index would be slower than a sequential scan then you
    don't want the query to use the index. Let's see how accurate the
    row count estimates are and whether using an index really would be
    faster -- please post the output of the following queries:
    Also, what's your hardware configuration and what non-default
    settings do you have in postgresql.conf? In particular, how much
    memory do you have and what values do you have for shared_buffers,
    effective_cache_size, work_mem, and random_page_cost?

    --
    Michael Fuhr
  • Ron Arts at Feb 12, 2007 at 8:08 am

    Michael Fuhr schreef:
    On Mon, Feb 12, 2007 at 01:23:13AM +0100, Ron Arts wrote:
    I have a table that has a boolean column telling if that particular
    record has been billed already. So most of the time that column holds
    only true values.
    How often is "most of the time"? What are the results of the following
    queries?
    Michael,

    most of the time means: 9 out of ten times.
    select version();
    # select version();
    version
    -------------------------------------------------------------------------------------------------------------------------------
    PostgreSQL 7.4.7 on i386-redhat-linux-gnu, compiled by GCC i386-redhat-linux-gcc (GCC) 3.3.3 20040412 (Red Hat Linux 3.3.3-7)
    (1 row)
    select count(*) from cdr;
    # select count(*) from cdr;
    count
    -------
    62547
    (1 row)
    select count(*) from cdr where billed = false;
    # select count(*) from cdr where billed = false;
    count
    -------
    3
    (1 row)


    I am worried that is is doing a sequential scan. On production systems
    the cdr table might contain of millions of records.

    Thanks,
    Ron
    =# explain select cdr.* from cdr where billed = false order by calldate;

    QUERY PLAN
    -------------------------------------------------------------------
    Sort (cost=37448.75..37526.94 rows=31273 width=465)
    Sort Key: calldate
    -> Seq Scan on cdr (cost=0.00..20323.81 rows=31273 width=465)
    Filter: (billed = false)

    How can I ensure the patial billing index will be used?
    If using the index would be slower than a sequential scan then you
    don't want the query to use the index. Let's see how accurate the
    row count estimates are and whether using an index really would be
    faster -- please post the output of the following queries:

    set enable_seqscan to on;
    explain analyze select cdr.* from cdr where billed = false order by calldate;
    set enable_seqscan to off;
    explain analyze select cdr.* from cdr where billed = false order by calldate;

    If the number of rows returned isn't close to the planner's estimate
    (31273 in the output you showed) then trying running ANALYZE or
    VACUUM ANALYZE on the table, then run the above statements again.
  • Tom Lane at Feb 12, 2007 at 2:23 pm

    Ron Arts writes:
    # select count(*) from cdr where billed = false;
    count
    -------
    3
    (1 row)
    =# explain select cdr.* from cdr where billed = false order by calldate;

    QUERY PLAN
    -------------------------------------------------------------------
    Sort (cost=37448.75..37526.94 rows=31273 width=465)
    Sort Key: calldate
    -> Seq Scan on cdr (cost=0.00..20323.81 rows=31273 width=465)
    ^^^^^^^^^^
    Filter: (billed = false)
    You haven't ANALYZEd this table (lately, or perhaps ever). The planner
    is thus working with a default selectivity estimate (which I think is
    50% for a bool column --- that seems to match your numbers anyway), and
    that leads it to the conclusion that a seqscan is the right thing.
    Which it would indeed be, if half the table has to be retrieved.

    regards, tom lane
  • Ron Arts at Feb 12, 2007 at 4:24 pm

    Tom Lane schreef:
    Ron Arts <ron.arts@neonova.nl> writes:
    # select count(*) from cdr where billed = false;
    count
    -------
    3
    (1 row)
    =# explain select cdr.* from cdr where billed = false order by calldate;

    QUERY PLAN
    -------------------------------------------------------------------
    Sort (cost=37448.75..37526.94 rows=31273 width=465)
    Sort Key: calldate
    -> Seq Scan on cdr (cost=0.00..20323.81 rows=31273 width=465)
    ^^^^^^^^^^
    Filter: (billed = false)
    You haven't ANALYZEd this table (lately, or perhaps ever). The planner
    is thus working with a default selectivity estimate (which I think is
    50% for a bool column --- that seems to match your numbers anyway), and
    that leads it to the conclusion that a seqscan is the right thing.
    Which it would indeed be, if half the table has to be retrieved.

    regards, tom lane
    Oh man,

    I totally forgot about analyze because I have it running daily from a script.
    And of course the script was broken, and I didn't notice.

    Thanks!

    Ron



    --
    NeoNova BV, The Netherlands
    Professional internet and VoIP solutions

    http://www.neonova.nl Kruislaan 419 1098 VA Amsterdam
    info: 020-5628292 servicedesk: 020-5628292 fax: 020-5628291
    KvK Amsterdam 34151241

    The following disclaimer applies to this email:
    http://www.neonova.nl/maildisclaimer

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-novice @
categoriespostgresql
postedFeb 12, '07 at 12:43a
activeFeb 12, '07 at 4:24p
posts6
users3
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase