I have a table with a number of columns.

I perform

Select *
from table
order by a,b

There is an index on a,b which is clustered (as well as indexes on a and b
alone).
I have issued the cluster and anyalze commands.

Nevertheless, PostgreSQL performs a Sequential Scan on the table and then
performs a sort.

Am I missing something?

Jonathan Blitz

Search Discussions

  • Thom Brown at Nov 22, 2009 at 1:11 pm
    2009/11/22 Jonathan Blitz <jblitz@013.net>
    I have a table with a number of columns.

    I perform

    Select *
    from table
    order by a,b

    There is an index on a,b which is clustered (as well as indexes on a and b
    alone).
    I have issued the cluster and anyalze commands.

    Nevertheless, PostgreSQL performs a Sequential Scan on the table and then
    performs a sort.
    Am I missing something?

    Jonathan Blitz
    It depends on firstly the size of the table, and also the distribution of
    data in columns a and b. If the stats for that table knows that the table
    has a natural order (i.e. they happen to be in roughly the order you've
    asked for them in), or the table isn't big enough to warrant using an index,
    then it won't bother using one. It will pick whichever it believes to be
    the most efficient method.

    Regards

    Thom
  • Craig Ringer at Nov 22, 2009 at 1:25 pm

    On 22/11/2009 8:50 PM, Jonathan Blitz wrote:
    I have a table with a number of columns.

    I perform

    Select *
    from table
    order by a,b

    There is an index on a,b which is clustered (as well as indexes on a and
    b alone).
    I have issued the cluster and anyalze commands.

    Nevertheless, PostgreSQL performs a Sequential Scan on the table and
    then performs a sort.
    PostgreSQL's query planner probably thinks it'll be faster to read the
    pages off the disk sequentially then sort them in memory. To use an
    index instead, Pg would have to read the whole index from disk
    (sequentially) then fetch all the pages off the disk in a probably
    near-random order. So it'd be doing more disk I/O, and much more of it
    would be random I/O, which is a LOT slower.

    So Pg does it the fast way, reading the table into memory then sorting
    it there.

    The most important thing to understand is that sometimes, a sequential
    scan is just the fastest way to do the job.

    I suspect you're working on the assumption that Pg can get all the data
    it needs from the index, so it doesn't need to read the tables proper.
    In some other database systems this *might* be possible if you had an
    index on fields "a" and "b" and issued a "select a,b from table" instead
    of a "select *". PostgreSQL, though, can not do this. PostgreSQL's
    indexes do not contain all the information required to return values
    from queries, only enough information to find the places in the main
    tables where those values are to be found.

    If you want to know more and understand why that's the case, search for
    the phrase "covered index" and the words "index visibility". Suffice it
    to say that there are pretty good reasons why it works how it does, and
    there would be very large downsides to changing how it works as well as
    large technical problems to solve to even make it possible. It's to do
    with the trade-off between update/insert/delete speeds and query speeds,
    the cost of "fatter" indexes taking longer to read from disk, and lots more.

    By the way, if you want to test out different query plans for a query to
    see which way is faster, you can use the "enable_" parameters like
    "enable_seqscan", "enable_hashjoin" etc to control how PostgreSQL
    performs queries. There's *LOTS* to be learned about this in the mailing
    list archives. You should also read the following page:

    http://www.postgresql.org/docs/current/static/runtime-config-query.html

    but understand that the planner method configuration parameters are
    intended mostly for testing and performance analysis, not for production
    use.

    If you find a query that's lots faster with a particular enable_
    parameter set to "off", try increasing your statistics targets on the
    tables / columns of interest, re-ANALYZEing, and re-testing. See these
    pages re statistics:

    http://www.postgresql.org/docs/current/static/using-explain.html
    http://www.postgresql.org/docs/current/static/planner-stats.html
    http://www.postgresql.org/docs/current/static/planner-stats-details.html

    If after increasing your stats targets the planner still picks a vastly
    slower plan, consider posting to the mailing list with the full output
    of "EXPLAIN ANALYZE SELECT myquery....", the full exact text of your
    query, and your table schema as shown by "\d tablename" in psql. Someone
    may be able to help you or at least explain why it's happening.

    --
    Craig Ringer
  • Jonathan Blitz at Nov 22, 2009 at 1:35 pm
    Many thanks.
    I'll give it a try and see what happens.

    -----Original Message-----
    From: Craig Ringer
    Sent: Sunday, November 22, 2009 3:25 PM
    To: Jonathan Blitz
    Cc: pgsql-performance@postgresql.org
    Subject: Re: [PERFORM] Why is the query not using the index for sorting?
    On 22/11/2009 8:50 PM, Jonathan Blitz wrote:
    I have a table with a number of columns.

    I perform

    Select *
    from table
    order by a,b

    There is an index on a,b which is clustered (as well as indexes on a
    and b alone).
    I have issued the cluster and anyalze commands.

    Nevertheless, PostgreSQL performs a Sequential Scan on the table and
    then performs a sort.
    PostgreSQL's query planner probably thinks it'll be faster to read the pages
    off the disk sequentially then sort them in memory. To use an index instead,
    Pg would have to read the whole index from disk
    (sequentially) then fetch all the pages off the disk in a probably
    near-random order. So it'd be doing more disk I/O, and much more of it would
    be random I/O, which is a LOT slower.

    So Pg does it the fast way, reading the table into memory then sorting it
    there.

    The most important thing to understand is that sometimes, a sequential scan
    is just the fastest way to do the job.

    I suspect you're working on the assumption that Pg can get all the data it
    needs from the index, so it doesn't need to read the tables proper.
    In some other database systems this *might* be possible if you had an index
    on fields "a" and "b" and issued a "select a,b from table" instead of a
    "select *". PostgreSQL, though, can not do this. PostgreSQL's indexes do not
    contain all the information required to return values from queries, only
    enough information to find the places in the main tables where those values
    are to be found.

    If you want to know more and understand why that's the case, search for the
    phrase "covered index" and the words "index visibility". Suffice it to say
    that there are pretty good reasons why it works how it does, and there would
    be very large downsides to changing how it works as well as large technical
    problems to solve to even make it possible. It's to do with the trade-off
    between update/insert/delete speeds and query speeds, the cost of "fatter"
    indexes taking longer to read from disk, and lots more.

    By the way, if you want to test out different query plans for a query to see
    which way is faster, you can use the "enable_" parameters like
    "enable_seqscan", "enable_hashjoin" etc to control how PostgreSQL performs
    queries. There's *LOTS* to be learned about this in the mailing list
    archives. You should also read the following page:

    http://www.postgresql.org/docs/current/static/runtime-config-query.html

    but understand that the planner method configuration parameters are intended
    mostly for testing and performance analysis, not for production use.

    If you find a query that's lots faster with a particular enable_ parameter
    set to "off", try increasing your statistics targets on the tables / columns
    of interest, re-ANALYZEing, and re-testing. See these pages re statistics:

    http://www.postgresql.org/docs/current/static/using-explain.html
    http://www.postgresql.org/docs/current/static/planner-stats.html
    http://www.postgresql.org/docs/current/static/planner-stats-details.html

    If after increasing your stats targets the planner still picks a vastly
    slower plan, consider posting to the mailing list with the full output of
    "EXPLAIN ANALYZE SELECT myquery....", the full exact text of your query, and
    your table schema as shown by "\d tablename" in psql. Someone may be able to
    help you or at least explain why it's happening.

    --
    Craig Ringer
    No virus found in this incoming message.
    Checked by AVG - www.avg.com
    Version: 9.0.709 / Virus Database: 270.14.76/2517 - Release Date: 11/21/09
    21:41:00
  • Matthew Wakeling at Nov 23, 2009 at 11:00 am

    On Sun, 22 Nov 2009, Jonathan Blitz wrote:
    I have a table with a number of columns.

    I perform

    Select *
    from table
    order by a,b

    There is an index on a,b which is clustered (as well as indexes on a and b alone).
    I have issued the cluster and anyalze commands.
    Did you analyse *after* creating the index and clustering, or before?

    Matthew

    --
    [About NP-completeness] These are the problems that make efficient use of
    the Fairy Godmother. -- Computer Science Lecturer
  • Jonathan Blitz at Nov 23, 2009 at 2:11 pm
    Definitely after.

    Jonathan

    -----Original Message-----
    From: pgsql-performance-owner@postgresql.org
    On Behalf Of Matthew
    Wakeling
    Sent: Monday, November 23, 2009 1:00 PM
    To: Jonathan Blitz
    Cc: pgsql-performance@postgresql.org
    Subject: Re: [PERFORM] Why is the query not using the index for sorting?
    On Sun, 22 Nov 2009, Jonathan Blitz wrote:
    I have a table with a number of columns.

    I perform

    Select *
    from table
    order by a,b

    There is an index on a,b which is clustered (as well as indexes on a and b alone).
    I have issued the cluster and anyalze commands.
    Did you analyse *after* creating the index and clustering, or before?

    Matthew

    --
    [About NP-completeness] These are the problems that make efficient use of
    the Fairy Godmother. -- Computer Science Lecturer
    --
    Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-performance
    No virus found in this incoming message.
    Checked by AVG - www.avg.com
    Version: 9.0.709 / Virus Database: 270.14.76/2517 - Release Date: 11/22/09
    21:40:00

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-performance @
categoriespostgresql
postedNov 22, '09 at 1:07p
activeNov 23, '09 at 2:11p
posts6
users4
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase