Dear Gurus,

Version: 7.4.6

I use a query on a heavily indexed table which picks a wrong index
unexpectedly. Since this query is used in response to certain user
interactions thousands of times in succession (with different constants),
500ms is not affordable for us. I can easily work around this, but I'd like
to understand the root of the problem.

Basically, there are two relevant indexes:
- muvelet_vonalkod_muvelet btree (muvelet, ..., idopont)
- muvelet_vonalkod_pk3 btree (idopont, ...)

Query is:
SELECT idopont WHERE muvelet = x ORDER BY idopont LIMIT 1.

I expected the planner to choose the index on muvelet, then sort by idopont.
Instead, it took the other index. I think there is heavy correlation since
muvelet references to a sequenced pkey and idopont is a timestamp (both
increase with passing time). May that be a cause?

See full table description and explain analyze results at end of the email.


TIA,
--
G.

---- table :
Table "public.muvelet_vonalkod"
Column | Type | Modifiers
------------+--------------------------+-----------------------------------
az | integer | not null def. nextval('...')
olvaso_nev | character varying | not null
vonalkod | character varying | not null
mozgasnem | integer | not null
idopont | timestamp with time zone | not null
muvelet | integer |
minoseg | integer | not null
cikk | integer |
muszakhely | integer |
muszakkod | integer |
muszaknap | date |
repre | boolean | not null default false
hiba | integer | not null default 0
Indexes:
"muvelet_vonalkod_pkey" primary key, btree (az)
"muvelet_vonalkod_pk2" unique, btree (olvaso_nev, idopont)
"muvelet_vonalkod_muvelet" btree
(muvelet, mozgasnem, vonalkod, olvaso_nev, idopont)
"muvelet_vonalkod_pk3" btree (idopont, olvaso_nev)
"muvelet_vonalkod_vonalkod" btree
(vonalkod, mozgasnem, olvaso_nev, idopont)
Foreign-key constraints:
"$1" FOREIGN KEY (mozgasnem) REFERENCES mozgasnem(az)
"$2" FOREIGN KEY (muvelet) REFERENCES muvelet(az)
"$3" FOREIGN KEY (minoseg) REFERENCES minoseg(az)
"$4" FOREIGN KEY (cikk) REFERENCES cikk(az)
"$5" FOREIGN KEY (muszakhely) REFERENCES hely(az)
"$6" FOREIGN KEY (muszakkod) REFERENCES muszakkod(az)
"muvelet_vonalkod_muszak_fk"
FOREIGN KEY (muszakhely, muszaknap, muszakkod)
REFERENCES muszak(hely, nap, muszakkod)
Triggers:
muvelet_vonalkod_aiud AFTER INSERT OR DELETE OR UPDATE ON
muvelet_vonalkod FOR EACH ROW EXECUTE PROCEDURE muvelet_vonalkod_aiud()
muvelet_vonalkod_biu BEFORE INSERT OR UPDATE ON muvelet_vonalkod FOR
EACH ROW EXECUTE PROCEDURE muvelet_vonalkod_biu()
muvelet_vonalkod_noty AFTER INSERT OR DELETE OR UPDATE ON
muvelet_vonalkod FOR EACH ROW EXECUTE PROCEDURE muvelet_vonalkod_noty()


-- original query, limit
# explain analyze
select idopont from muvelet_vonalkod
where muvelet=6859 order by idopont
limit 1;
QUERY PLAN

----------------------------------------------------------------------------
Limit (cost=0.00..25.71 rows=1 width=8) (actual time=579.528..579.529
rows=1 loops=1)
-> Index Scan using muvelet_vonalkod_pk3 on muvelet_vonalkod
(cost=0.00..8304.42 rows=323 width=8) (actual time=579.522..579.522 rows=1
loops=1)
Filter: (muvelet = 6859)
Total runtime: 579.606 ms
(4 rows)

-- however, if I omit the limit clause:
# explain analyze
select idopont from muvelet_vonalkod
where muvelet=6859 order by idopont;
QUERY PLAN

---------------------------------------------------------------------------
Sort (cost=405.41..405.73 rows=323 width=8) (actual time=1.295..1.395
rows=360 loops=1)
Sort Key: idopont
-> Index Scan using muvelet_vonalkod_muvelet on muvelet_vonalkod
(cost=0.00..400.03 rows=323 width=8) (actual time=0.049..0.855 rows=360 loops=1)
Index Cond: (muvelet = 6859)
Total runtime: 1.566 ms
(5 rows)

-- workaround 1: the planner is hard to trick...
# explain analyze
select idopont from
(select idopont from muvelet_vonalkod
where muvelet=6859) foo
order by idopont limit 1;
QUERY PLAN

---------------------------------------------------------------------------
Limit (cost=0.00..25.71 rows=1 width=8) (actual time=584.403..584.404
rows=1 loops=1)
-> Index Scan using muvelet_vonalkod_pk3 on muvelet_vonalkod
(cost=0.00..8304.42 rows=323 width=8) (actual time=584.397..584.397 rows=1
loops=1)
Filter: (muvelet = 6859)
Total runtime: 584.482 ms
(4 rows)

-- workaround 2: quite ugly but seems to work (at least for this
-- one test case):
# explain analyze
select idopont from
(select idopont from muvelet_vonalkod
where muvelet=6859 order by idopont) foo
order by idopont limit 1;
QUERY PLAN

---------------------------------------------------------------------------
Limit (cost=405.41..405.42 rows=1 width=8) (actual time=1.754..1.755
rows=1 loops=1)
-> Subquery Scan foo (cost=405.41..407.35 rows=323 width=8) (actual
time=1.751..1.751 rows=1 loops=1)
-> Sort (cost=405.41..405.73 rows=323 width=8) (actual
time=1.746..1.746 rows=1 loops=1)
Sort Key: idopont
-> Index Scan using muvelet_vonalkod_muvelet on
muvelet_vonalkod (cost=0.00..400.03 rows=323 width=8) (actual
time=0.377..1.359 rows=360 loops=1)
Index Cond: (muvelet = 6859)
Total runtime: 1.853 ms
(7 rows)

Search Discussions

  • Michael Fuhr at Dec 21, 2005 at 6:51 pm

    On Wed, Dec 21, 2005 at 07:03:00PM +0100, Sz?cs Gbor wrote:
    Version: 7.4.6 [...]
    Query is:
    SELECT idopont WHERE muvelet = x ORDER BY idopont LIMIT 1.

    I expected the planner to choose the index on muvelet, then sort by idopont.
    Instead, it took the other index.
    I think the planner is guessing that since you're ordering on
    idopont, scanning the idopont index will find the first matching
    row faster than using the muvelet index would. In many cases that's
    a good bet, but in this case the guess is wrong and you end up with
    a suboptimal plan.

    I just ran some tests with 8.1.1 and it chose the better plan for
    a query similar to what you're doing. One of the developers could
    probably explain why; maybe it's because of the changes that allow
    better use of multicolumn indexes. Try 8.1.1 if you can and see
    if you get better results.
    -- workaround 2: quite ugly but seems to work (at least for this
    -- one test case):
    # explain analyze
    select idopont from
    (select idopont from muvelet_vonalkod
    where muvelet=6859 order by idopont) foo
    order by idopont limit 1;
    Another workaround is to use OFFSET 0 in the subquery.

    --
    Michael Fuhr
  • Tom Lane at Dec 21, 2005 at 7:34 pm

    =?ISO-8859-2?Q?Sz=FBcs_G=E1bor?= <surrano@gmail.com> writes:
    Query is:
    SELECT idopont WHERE muvelet = x ORDER BY idopont LIMIT 1.
    Much the best solution for this would be to have an index on
    (muvelet, idopont)
    --- perhaps you can reorder the columns of "muvelet_vonalkod_muvelet"
    instead of making a whole new index --- and then say

    SELECT idopont WHERE muvelet = x ORDER BY muvelet, idopont LIMIT 1

    PG 8.1 can apply such an index to your original query, but older
    versions will need the help of the modified ORDER BY to recognize
    that the index is usable.

    regards, tom lane
  • Szűcs Gábor at Dec 22, 2005 at 12:59 pm
    Dear Tom,
    On 2005.12.21. 20:34, Tom Lane wrote:
    =?ISO-8859-2?Q?Sz=FBcs_G=E1bor?= <surrano@gmail.com> writes:
    Query is:
    SELECT idopont WHERE muvelet = x ORDER BY idopont LIMIT 1.
    Much the best solution for this would be to have an index on
    (muvelet, idopont)
    --- perhaps you can reorder the columns of "muvelet_vonalkod_muvelet"
    instead of making a whole new index --- and then say

    SELECT idopont WHERE muvelet = x ORDER BY muvelet, idopont LIMIT 1
    I was far too tired yesterday evening to produce such a clean solution but
    finally came to this conclusion this morning :) Even without the new index,
    it picks the index on muvelet, which decreases time to ~1.5ms. The new index
    takes it down to 0.1ms.

    However, this has a problem; namely, what if I don't (or can't) tell the
    exact int value in the WHERE clause? In general: will the following query:

    SELECT indexed_ts_field FROM table WHERE indexed_int_field IN (100,200)
    -- or even: indexed_int_field BETWEEN 100 AND 200
    ORDER BY indexed_ts_field LIMIT n

    always pick the index on the timestamp field, or does it depend on something
    else, say the limit size n and the attributes' statistics?
    PG 8.1 can apply such an index to your original query, but older
    versions will need the help of the modified ORDER BY to recognize
    that the index is usable.
    So the direct cause is that 7.x planners prefer ORDER BY to WHERE when
    picking indexes? But only when there is a LIMIT clause present?

    I'd like to know how much of our code should I review; if it's explicitly
    connected to LIMIT, I'd probably have to check far less code.

    --
    G.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-performance @
categoriespostgresql
postedDec 21, '05 at 6:02p
activeDec 22, '05 at 12:59p
posts4
users3
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase