FAQ
I need to get the latest entry of a large table matching a certain criteria.
This is my query:

SELECT * FROM "data" WHERE "data"."fk" = 238496 ORDER BY "data"."id" DESC
LIMIT 1

This query is quite slow. If I do a explain on it, it seems that it uses an
Index Scan Backward.

If I omit the order by on the query:

SELECT * FROM "data" WHERE "data"."fk" = 238496 LIMIT 1

It is very fast. And the explain says that it uses Index scan. This is also
very fast if there aren't any matches. But I've read that I'm not guaranteed
to get the correct match If I do not use a order by, postgres just returns
its fastest possible match. Is this right? But will not the fastest possible
match always be the first match in the index? Is there another way to make
the order by query go faster?

Thanks!

Odd-R.

Search Discussions

  • James David Smith at Jun 29, 2011 at 2:23 pm
    Dear Odd,

    I am only a novice, but from what I understand from your email, you
    want to write a query that selects the newest record from the table
    for a certain set of data? To do this, I would use the MIN () function
    on a column such as 'time_of_entry' or 'order_id' if the 'order_id' is
    a sequence? Perhaps something like the below maybe...? Though my
    construction of the query is probably incorrect - I'm only learning!

    SELECT *
    FROM "data"
    WHERE
    "data"."fk" = '238496'
    AND
    "data"."id" = (SELECT MIN("data"."id")

    Cheers

    James



    On 29 June 2011 14:48, Odd Hogstad wrote:
    I need to get the latest entry of a large table matching a certain criteria.
    This is my query:

    SELECT * FROM "data" WHERE "data"."fk" = 238496 ORDER BY "data"."id" DESC
    LIMIT 1

    This query is quite slow. If I do a explain on it, it seems that it uses an
    Index Scan Backward.

    If I omit the order by on the query:

    SELECT * FROM "data" WHERE "data"."fk" = 238496 LIMIT 1

    It is very fast. And the explain says that it uses Index scan. This is also
    very fast if there aren't any matches. But I've read that I'm not guaranteed
    to get the correct match If I do not use a order by, postgres just returns
    its fastest possible match. Is this right? But will not the fastest possible
    match always be the first match in the index? Is there another way to make
    the order by query go faster?

    Thanks!

    Odd-R.

  • Jean-Yves F. Barbier at Jun 29, 2011 at 2:27 pm

    On Wed, 29 Jun 2011 15:48:56 +0200, Odd Hogstad wrote:

    SELECT * FROM "data" WHERE "data"."fk" = 238496 ORDER BY "data"."id" DESC
    LIMIT 1

    This query is quite slow. If I do a explain on it, it seems that it uses an
    Index Scan Backward.

    If I omit the order by on the query:

    SELECT * FROM "data" WHERE "data"."fk" = 238496 LIMIT 1

    It is very fast. And the explain says that it uses Index scan. This is also
    very fast if there aren't any matches. But I've read that I'm not guaranteed
    to get the correct match If I do not use a order by, postgres just returns
    its fastest possible match. Is this right? But will not the fastest possible
    match always be the first match in the index? Is there another way to make
    the order by query go faster?
    Unfortunately (and AFAIK), you don't have any other solution as you want the
    *latest* row; may be often clustering this table in this order would help a
    bit.
    Perhaps creating fragmented indexes could also help (1 >= data.fk < 50001, and so on)

    JY
    --
    He asked me if I knew what time it was -- I said yes, but not right now.
    -- Steven Wright
  • Odd Hogstad at Jun 29, 2011 at 2:42 pm

    SELECT * FROM "data" WHERE "data"."fk" = 238496 ORDER BY "data"."id" DESC
    LIMIT 1

    This query is quite slow. If I do a explain on it, it seems that it uses an
    Index Scan Backward.

    If I omit the order by on the query:

    SELECT * FROM "data" WHERE "data"."fk" = 238496 LIMIT 1

    It is very fast. And the explain says that it uses Index scan. This is also
    very fast if there aren't any matches. But I've read that I'm not
    guaranteed
    to get the correct match If I do not use a order by, postgres just returns
    its fastest possible match. Is this right? But will not the fastest possible
    match always be the first match in the index? Is there another way to make
    the order by query go faster?
    Unfortunately (and AFAIK), you don't have any other solution as you want
    the
    *latest* row; may be often clustering this table in this order would help a
    bit.
    Perhaps creating fragmented indexes could also help (1 >= data.fk < 50001,
    and so on)
    From the docs:

    By default, B-tree indexes store their entries in ascending order with nulls
    last. This means that a forward scan of an index on column x produces output
    satisfying ORDER BY x (or more verbosely, ORDER BY x ASC NULLS LAST). The
    index can also be scanned backward, producing output satisfying ORDER BY x
    DESC (or more verbosely, ORDER BY x DESC NULLS FIRST, since NULLS FIRST is
    the default for ORDER BY DESC).

    Doesn't this mean that when I'm not using the order by clause, and it uses a
    Index Scan, I will always get the latest value in return? Also I don't
    understand why the order by query is scanning backwards, when the record I
    want is in the other end?

    Thanks!

    Odd-R.
  • Jean-Yves F. Barbier at Jun 29, 2011 at 3:03 pm
    On Wed, 29 Jun 2011 16:42:36 +0200, Odd Hogstad wrote:

    ...
    By default, B-tree indexes store their entries in ascending order with nulls
    last. This means that a forward scan of an index on column x produces output
    satisfying ORDER BY x (or more verbosely, ORDER BY x ASC NULLS LAST). The
    index can also be scanned backward, producing output satisfying ORDER BY x
    DESC (or more verbosely, ORDER BY x DESC NULLS FIRST, since NULLS FIRST is
    the default for ORDER BY DESC).

    Doesn't this mean that when I'm not using the order by clause, and it uses a
    Index Scan, I will always get the latest value in return?
    Yes, but you're ordering by column id while you seek value into column fk.

    This is very different from the doc you quote: in your case, ordering by id
    returns you all values of fk BUT the ordering of fk is absolutely undefined.

    And, as I suppose fk stands for Foreign Key, you have many row using same
    values for fk (?)

    Also I don't
    understand why the order by query is scanning backwards, when the record I
    want is in the other end?
    Because id is the primary key (I guess:) and ordering DESC puts id latest
    rows first in list, so limiting select to 1 returns the last one.

    --
    BOFH excuse #287:
    Telecommunications is downshifting.
  • Odd Hogstad at Jun 29, 2011 at 3:19 pm
    2011/6/29 Jean-Yves F. Barbier <12ukwn@gmail.com>
    On Wed, 29 Jun 2011 16:42:36 +0200, Odd Hogstad wrote:

    ...
    By default, B-tree indexes store their entries in ascending order with nulls
    last. This means that a forward scan of an index on column x produces output
    satisfying ORDER BY x (or more verbosely, ORDER BY x ASC NULLS LAST). The
    index can also be scanned backward, producing output satisfying ORDER BY x
    DESC (or more verbosely, ORDER BY x DESC NULLS FIRST, since NULLS FIRST is
    the default for ORDER BY DESC).

    Doesn't this mean that when I'm not using the order by clause, and it uses a
    Index Scan, I will always get the latest value in return?
    Yes, but you're ordering by column id while you seek value into column fk.

    This is very different from the doc you quote: in your case, ordering by id
    returns you all values of fk BUT the ordering of fk is absolutely
    undefined.
    And, as I suppose fk stands for Foreign Key, you have many row using same
    values for fk (?)
    The ordering of the fk doesn't matter to me now. Yes, there might be (and
    are) several ones with the same value for this. I just want the latest added
    one that matches. And I don't understand why this is not always the first
    one matching a forward scan, as new entries are put in front?

    Also I don't
    understand why the order by query is scanning backwards, when the record I
    want is in the other end?
    Because id is the primary key (I guess:) and ordering DESC puts id latest
    rows first in list, so limiting select to 1 returns the last one.

    Then why is it slow?
  • Jean-Yves F. Barbier at Jun 29, 2011 at 3:38 pm
    On Wed, 29 Jun 2011 17:19:00 +0200, Odd Hogstad wrote:

    ...
    The ordering of the fk doesn't matter to me now.
    It should: it gives the condition that let you get the latest fk...
    Yes, there might be (and
    are) several ones with the same value for this. I just want the latest added
    one that matches. And I don't understand why this is not always the first
    one matching a forward scan, as new entries are put in front?
    Because, according to the docs, such a query as you told about *cannot*
    guaranty the order of the rows (logical: you ask for all fk=111 but nothing
    except sorting on id can insure you'll have fk rows in the right order.)
    Also I don't
    understand why the order by query is scanning backwards, when the record I
    want is in the other end?
    Take a sheet of paper and a pencil, write the whole shebang down, make this
    model run'by'hand and you'll see why.
    Because id is the primary key (I guess:) and ordering DESC puts id latest
    rows first in list, so limiting select to 1 returns the last one.
    Anyway, Tom gave you the answer to speed up your query.

    --
    <stu> Stupid nick highlighting
    <stu> Whenever someone starts with "stupid" it highlights the nick. Hmm.
    -- #Debian
  • Odd Hogstad at Jun 30, 2011 at 10:01 am
    I need to get the latest entry of a large table matching a certain
    criteria.
    This is my query:
    SELECT * FROM "data" WHERE "data"."fk" = 238496 ORDER BY "data"."id" DESC
    LIMIT 1
    A two-column index on (fk, id) would help for this. If you think about
    the ordering of the index entries you'll see why.
    I added an index for this, and now it goes much faster, thanks. The ones
    that contain data is now superfast, 1-2 ms, but the ones that does not
    match, and returns empty sets, are considerably slower, 150 -200 ms. Is this
    something I have to live with, or are there any tricks that will make these
    queries fast to?

    Thanks again!

    Odd-R.
  • Tom Lane at Jun 29, 2011 at 3:06 pm

    Odd Hogstad writes:
    I need to get the latest entry of a large table matching a certain criteria.
    This is my query:
    SELECT * FROM "data" WHERE "data"."fk" = 238496 ORDER BY "data"."id" DESC
    LIMIT 1
    A two-column index on (fk, id) would help for this. If you think about
    the ordering of the index entries you'll see why.

    regards, tom lane

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-novice @
categoriespostgresql
postedJun 29, '11 at 1:56p
activeJun 30, '11 at 10:01a
posts9
users4
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase