I have been testing the performance of PostgreSQL using the simple tool
found at http://benchw.sourceforge.net however I have found that all the
queries it run execute with sequential scans. The website where the code
runs has examples of the execution plan using indexes.

When I disable the sequential plan query 0 and query 1 run faster (
http://benchw.sourceforge.net/benchw_results_postgres_history.html ) by
using the indexes as suggested by the website.

I have tried increasing the effective_cache_size and reducing the
random_page_cost to try and force the optimiser to use the index but it
always uses the sequential scan.

What is the best way to force the use of indexes in these queries?
Currently testing with version 8.1.4.

Regards

Robin Smith

British Telecommunications plc Registered office: 81 Newgate Street
London EC1A 7AJ

Registered in England no. 1800000

This electronic message contains information from British
Telecommunications plc which may be privileged and confidential. The
information is intended to be for the use of the individual(s) or entity
named above. If you are not the intended recipient, be aware that any
disclosure, copying, distribution or use of the contents of this
information is prohibited. If you have received this electronic message
in error, please notify us by telephone or e-mail (to the number or
address above) immediately.

Search Discussions

  • Robin C Smith at Jul 21, 2006 at 12:02 pm
    More information from the query:-

    explain analyze
    SELECT
    d0.dmth,
    count(f.fval )
    FROM
    dim0 AS d0,
    fact0 AS f
    WHERE d0.d0key = f.d0key
    AND d0.ddate BETWEEN '2010-01-01' AND '2010-12-28'
    GROUP BY
    d0.dmth
    ;

    QUERY PLAN

    ------------------------------------------------------------------------
    -------------------------------------------------------------
    HashAggregate (cost=336998.83..336998.84 rows=1 width=8) (actual
    time=33823.124..33823.134 rows=12 loops=1)
    -> Hash Join (cost=214.83..335343.83 rows=331000 width=8) (actual
    time=61.065..33605.343 rows=336000 loops=1)
    Hash Cond: ("outer".d0key = "inner".d0key)
    -> Seq Scan on fact0 f (cost=0.00..281819.00 rows=10000000
    width=8) (actual time=12.766..28945.036 rows=10000000 loops=1)
    -> Hash (cost=214.00..214.00 rows=331 width=8) (actual
    time=31.120..31.120 rows=336 loops=1)
    -> Seq Scan on dim0 d0 (cost=0.00..214.00 rows=331
    width=8) (actual time=26.362..30.895 rows=336 loops=1)
    Filter: ((ddate >= '2010-01-01'::date) AND (ddate
    <= '2010-12-28'::date))
    Total runtime: 33823.220 ms
    (8 rows)


    benchw=# \d fact0
    Table "public.fact0"
    Column | Type | Modifiers
    --------+------------------------+-----------
    d0key | integer | not null
    d1key | integer | not null
    d2key | integer | not null
    fval | integer | not null
    ffill | character varying(100) | not null
    Indexes:
    "fact0_d0key" btree (d0key)
    "fact0_d1key" btree (d1key)
    "fact0_d2key" btree (d2key)

    benchw=# \d dim0
    Table "public.dim0"
    Column | Type | Modifiers
    --------+---------+-----------
    d0key | integer | not null
    ddate | date | not null
    dyr | integer | not null
    dmth | integer | not null
    dday | integer | not null
    Indexes:
    "dim0_d0key" UNIQUE, btree (d0key)

    The example on the web site has the following execution plan:-

    QUERY PLAN

    ------------------------------------------------------------------------
    --------------------
    HashAggregate (cost=286953.94..286953.94 rows=1 width=8)
    -> Nested Loop (cost=0.00..285268.93 rows=337002 width=8)
    -> Seq Scan on dim0 d0 (cost=0.00..219.00 rows=337 width=8)
    Filter: ((ddate >= '2010-01-01'::date) AND (ddate <=
    '2010-12-28'::date))
    -> Index Scan using fact0_d0key on fact0 f (cost=0.00..833.07
    rows=1022 width=8)
    Index Cond: ("outer".d0key = f.d0key)

    It uses the index on the join condition.

    When I disable the sequential scan with:-

    SET enable_seqscan TO off;

    The execution plan looks like:-

    QUERY
    PLAN
    ------------------------------------------------------------------------
    ----------------------------------------------------------------
    HashAggregate (cost=648831.52..648831.53 rows=1 width=8) (actual
    time=19155.060..19155.071 rows=12 loops=1)
    -> Nested Loop (cost=7.51..647176.52 rows=331000 width=8) (actual
    time=97.878..18943.155 rows=336000 loops=1)
    -> Index Scan using dim0_d0key on dim0 d0 (cost=0.00..248.00
    rows=331 width=8) (actual time=40.467..55.780 rows=336 loops=1)
    Filter: ((ddate >= '2010-01-01'::date) AND (ddate <=
    '2010-12-28'::date))
    -> Bitmap Heap Scan on fact0 f (cost=7.51..1941.94 rows=1002
    width=8) (actual time=0.991..55.391 rows=1000 loops=336)
    Recheck Cond: ("outer".d0key = f.d0key)
    -> Bitmap Index Scan on fact0_d0key (cost=0.00..7.51
    rows=1002 width=0) (actual time=0.583..0.583 rows=1000 loops=336)
    Index Cond: ("outer".d0key = f.d0key)
    Total runtime: 19155.176 ms
    (9 rows)

    The query is 19 seconds long now; down from 34 seconds although the
    execution plan doesn't match the example from the website.

    Regards

    Robin
    -----Original Message-----
    From: Peter Eisentraut
    Sent: 21 July 2006 12:46
    To: pgsql-performance@postgresql.org
    Cc: Smith,R,Robin,XJE4JA C
    Subject: Re: [PERFORM] Forcing using index instead of sequential scan?


    robin.c.smith@bt.com wrote:
    What is the best way to force the use of indexes in these queries?
    Well, the brute-force method is to use SET enable_seqscan TO off, but if

    you want to get to the bottom of this, you should look at or post the
    EXPLAIN ANALYZE output of the offending queries.
  • Robin C Smith at Jul 21, 2006 at 12:10 pm
    The tables have all been analysed.

    I set the work_mem to 500000 and it still doesn't use the index :-(

    Regards

    Robin

    -----Original Message-----
    From: Richard Huxton
    Sent: 21 July 2006 12:54
    To: Smith,R,Robin,XJE4JA C
    Subject: Re: [PERFORM] Forcing using index instead of sequential scan?


    robin.c.smith@bt.com wrote:
    I have been testing the performance of PostgreSQL using the simple
    tool found at http://benchw.sourceforge.net however I have found that
    all the queries it run execute with sequential scans. The website
    where the code runs has examples of the execution plan using indexes.

    When I disable the sequential plan query 0 and query 1 run faster (
    http://benchw.sourceforge.net/benchw_results_postgres_history.html )
    by using the indexes as suggested by the website.

    I have tried increasing the effective_cache_size and reducing the
    random_page_cost to try and force the optimiser to use the index but
    it always uses the sequential scan.

    What is the best way to force the use of indexes in these queries?
    Currently testing with version 8.1.4.
    Well, you don't want to be forcing it if possible. Ideally, PG should be

    able to figure out what to use itself.

    In the case of query0 and query1 as shown on your web-page I'd expect a
    sequential scan of dim0 then access via the index on fact0. Reasons why
    this might not be happening include:
    1. Inaccurate stats - ANALYSE your tables
    2. Insufficient memory for sorting etc - issue SET work_mem=XXX before
    the query and try increased values.
    3. Other parameters are out-of-whack. For example, effective_cache_size
    doesn't change how much cache PG uses, it tells PG how much the O.S.
    will cache. You might find http://www.powerpostgresql.com/PerfList is a
    good quick introduction.


    So - ANALYSE your tables
    http://www.postgresql.org/docs/8.1/static/sql-analyze.html

    Then post EXPLAIN ANALYSE for the queries and we'll see what they're
    doing.
    --
    Richard Huxton
    Archonet Ltd
  • Peter Eisentraut at Jul 21, 2006 at 12:16 pm

    robin.c.smith@bt.com wrote:
    What is the best way to force the use of indexes in these queries?
    Well, the brute-force method is to use SET enable_seqscan TO off, but if
    you want to get to the bottom of this, you should look at or post the
    EXPLAIN ANALYZE output of the offending queries.
  • Tom Lane at Jul 22, 2006 at 4:52 pm

    <robin.c.smith@bt.com> writes:
    I have been testing the performance of PostgreSQL using the simple tool
    found at http://benchw.sourceforge.net however I have found that all the
    queries it run execute with sequential scans. The website where the code
    runs has examples of the execution plan using indexes.
    The reason the website gets indexscans is that he's fooled with the
    planner cost parameters. In particular I see that benchw's
    documentation suggests
    effective_cache_size = 48000
    random_page_cost = 0.8
    The latter is physically silly but it's a pretty effective thumb on the
    scales if you want to force indexscan usage.

    The real issue here is caching across successive queries, an effect that
    Postgres doesn't deal with very well at the moment. If you run these
    queries from a standing start (freshly booted machine) you'll likely
    find that the indexscan plan is indeed slower than the seqscan/hash
    plan, just like the planner thinks. I get about 52 sec for query0
    with an indexscan vs about 35 sec for the seqscan. However, successive
    executions of the seqscan plan stay at about 35 sec, whereas the
    indexscan plan drops to 2 sec(!). This is because the fraction of the
    table touched by the indexscan plan is small enough to fit in my
    machine's RAM --- I can see by das blinkenlights (and also vmstat) that
    there's no I/O going on at all during re-executions of the indexscan.
    If I run the seqscan and then the indexscan, the indexscan takes about
    28 sec, so there's still some useful cached data even though the seqscan
    read more stuff than fits in RAM. (Note: this is with Fedora Core 5,
    YMMV depending on your kernel's cache algorithms.)

    In a real-world situation it's unlikely you'd just re-execute the same
    query over and over, so this benchmark is really too simplistic to trust
    very far as an indicator of what to do in practice.

    I find that CVS tip will choose the indexscan for query0 if I set
    effective_cache_size to 62500 (ie, half a gigabyte, or half of this
    machine's RAM) and set random_page_cost to 1.5 or less.

    If you want the planner to work on the assumption that everything's
    cached, set effective_cache_size to a large value and set
    random_page_cost to 1.0 --- you might also want to increase the CPU
    cost settings, reflecting the fact that I/O is cheaper relative to
    CPU effort than the default settings assume. However, if your database
    is too large to fit in RAM then these are likely to be pretty bad
    settings. Many people compromise with a random_page_cost around 2
    or so.

    regards, tom lane
  • Craig A. James at Jul 22, 2006 at 6:27 pm

    The real issue here is caching across successive queries, an effect that
    Postgres doesn't deal with very well at the moment. If you run these
    queries from a standing start (freshly booted machine) you'll likely
    find that the indexscan plan is indeed slower than the seqscan/hash
    plan, just like the planner thinks.
    Here's a little trick I learned to speed up this test.

    find / -type f -exec grep foobar {} \;

    This causes massive file-system activity and flushes all files that the kernel has cached. If you run this between each Postgres test (let it run for a couple minutes), it gives you an apples-to-apples comparison between successive benchmarks, and eliminates the effects of caching.

    If you run this as a regular user (NOT super-user or 'postgres'), you won't have permission to access your Postgres files, so you're guaranteed they'll be flushed from the cache.

    Craig
  • Gregory Stark at Jul 22, 2006 at 11:15 pm

    "Craig A. James" <cjames@modgraph-usa.com> writes:

    This causes massive file-system activity and flushes all files that the
    kernel has cached. If you run this between each Postgres test (let it run
    for a couple minutes), it gives you an apples-to-apples comparison between
    successive benchmarks, and eliminates the effects of caching.
    On Linux at least the best way to flush the cache is to unmount and then mount
    the filesystem. This requires putting the data files on partition that you
    aren't otherwise using and shutting down postgres.

    Note that "nothing cached" isn't necessarily any more accurate a model as
    "everything cached". In reality many databases *do* in fact run the same
    queries over and over again, though often with some parameters different each
    time. But the upper pages of most indexes and many of the common leaf pages
    and heap pages will in fact be cached.


    --
    Gregory Stark
    EnterpriseDB http://www.enterprisedb.com
  • Michael Stone at Jul 23, 2006 at 11:29 am

    On Sat, Jul 22, 2006 at 10:26:53AM -0700, Craig A. James wrote:
    This causes massive file-system activity and flushes all files that the
    kernel has cached. If you run this between each Postgres test (let it run
    for a couple minutes), it gives you an apples-to-apples comparison between
    successive benchmarks, and eliminates the effects of caching.
    Assuming a system with small ram or an unusually large system
    installation. Unmounting is a much more realiable mechanism.

    Mike Stone
  • Craig A. James at Jul 23, 2006 at 3:04 pm

    Michael Stone wrote:
    On Sat, Jul 22, 2006 at 10:26:53AM -0700, Craig A. James wrote:
    This causes massive file-system activity and flushes all files that
    the kernel has cached. If you run this between each Postgres test
    (let it run for a couple minutes), it gives you an apples-to-apples
    comparison between successive benchmarks, and eliminates the effects
    of caching.
    Assuming a system with small ram or an unusually large system
    installation. Unmounting is a much more realiable mechanism.
    Indeed, but it only works if you can. For example, in my small-ish installation, my WAL and system tables are mounted on the root disk. Or someone might not have super-user access.

    Craig
  • Mark Kirkwood at Jul 23, 2006 at 3:44 am

    Tom Lane wrote:
    <robin.c.smith@bt.com> writes:
    I have been testing the performance of PostgreSQL using the simple tool
    found at http://benchw.sourceforge.net however I have found that all the
    queries it run execute with sequential scans. The website where the code
    runs has examples of the execution plan using indexes.
    The reason the website gets indexscans is that he's fooled with the
    planner cost parameters. In particular I see that...(snipped)
    Indeed I did - probably should have discussed that alteration better in
    the documentation for the test suite!

    In addition I was a bit naughty in running the benchmark using size 1
    (i.e about 1G) an a box with 2G ram - as this meant that (on the machine
    I was using then anyway) indexscans on query 0 and 1 were *always*
    better than the sequential options.

    A better test is to use the size factor at 2 x physical ram, as then the
    planners defaults make more sense! (unless or course you *want* to model
    a data mart smaller than physical ram).

    Best wishes

    Mark
  • Mark Kirkwood at Jul 23, 2006 at 3:51 am

    robin.c.smith@bt.com wrote:
    I have been testing the performance of PostgreSQL using the simple tool
    found at _http://benchw.sourceforge.net_ however I have found that all
    the queries it run execute with sequential scans. The website where the
    code runs has examples of the execution plan using indexes.

    When I disable the sequential plan query 0 and query 1 run faster (
    _http://benchw.sourceforge.net/benchw_results_postgres_history.html_ )
    by using the indexes as suggested by the website.

    I have tried increasing the effective_cache_size and reducing the
    random_page_cost to try and force the optimiser to use the index but it
    always uses the sequential scan.

    What is the best way to force the use of indexes in these queries?
    Currently testing with version 8.1.4.
    Hi Robin,


    Being responsible for this piece of software, I should try to help, only
    saw this now sorry (nice to see someone using this).

    Unless you really want to reproduce the numbers on the website, it is
    best to test with Benchw's scale factor at least 2 x your physical ram,
    as this makes the planner's defaults work more sensibly (and models
    *most* real world data warehouse situations better!).

    Cheers

    Mark

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-performance @
categoriespostgresql
postedJul 21, '06 at 10:40a
activeJul 23, '06 at 3:04p
posts11
users7
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase