Hi,

I have a query, which runs fast for one id (query 1)
and slow for other id (query 2)
though both plans and cost are same except
these two qeries return different number of rows.

explain analyze
SELECT *
FROM user U LEFT JOIN user_timestamps T USING
(user_id), user_alias A
WHERE U.user_id = A.user_id AND A.domain_id=7551070;

\g

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=234.22..61015.98 rows=12 width=238)
(actual time=7.73..7.73 rows=0 loops=1)
Merge Cond: ("outer".user_id = "inner".user_id)
-> Merge Join (cost=0.00..58585.67 rows=909864
width=180) (actual time=0.07..0.07 rows=1 loops=1)
Merge Cond: ("outer".user_id =
"inner".user_id)
-> Index Scan using user_pkey on user u
(cost=0.00..29714.99 rows=909864 width=156) (actual
time=0.04..0.04 rows=1 loops=1)
-> Index Scan using user_timestamps_uid_idx
on user_timestamps t (cost=0.00..16006.05 rows=706896
width=24) (actual time=0.02..0.02 rows=1 loops=1)
-> Sort (cost=234.22..234.25 rows=12 width=58)
(actual time=7.65..7.65 rows=0 loops=1)
Sort Key: a.user_id
-> Seq Scan on user_alias a
(cost=0.00..234.00 rows=12 width=58) (actual
time=7.61..7.61 rows=0 loops=1)
Filter: (domain_id = 7551070)
Total runtime: 7.96 msec
(11 rows)

explain analyze
SELECT *
FROM user U LEFT JOIN user_timestamps T USING
(user_id), user_alias A
WHERE U.user_id = A.user_id AND
A.domain_id=2005921193;
\g

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=247.92..61035.28 rows=332
width=238) (actual time=94511.70..95127.94 rows=493
loops=1)
Merge Cond: ("outer".user_id = "inner".user_id)
-> Merge Join (cost=0.00..58585.67 rows=909864
width=180) (actual time=6.43..93591.06 rows=897655
loops=1)
Merge Cond: ("outer".user_id =
"inner".user_id)
-> Index Scan using user_pkey on user u
(cost=0.00..29714.99 rows=909864 width=156) (actual
time=6.29..55634.85 rows=897655 loops=1)
-> Index Scan using user_timestamps_uid_idx
on user_timestamps t (cost=0.00..16006.05 rows=706896
width=24) (actual time=0.10..20331.13 rows=700466
loops=1)
-> Sort (cost=247.92..248.75 rows=332 width=58)
(actual time=10.76..11.17 rows=493 loops=1)
Sort Key: a.user_id
-> Seq Scan on user_alias a
(cost=0.00..234.00 rows=332 width=58) (actual
time=7.43..9.86 rows=493 loops=1)
Filter: (domain_id = 2005921193)
Total runtime: 95128.74 msec
(11 rows)

I also know if I change the order of 2nd query, it
will run much faster:

explain analyze
SELECT *
FROM (user_alias A JOIN user U USING (user_id) ) LEFT
JOIN user_timestamps T USING (user_id)
WHERE A.domain_id=2005921193;
\g

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..2302.31 rows=332 width=238)
(actual time=15.32..256.54 rows=493 loops=1)
-> Nested Loop (cost=0.00..1263.43 rows=332
width=214) (actual time=15.17..130.58 rows=493
loops=1)
-> Seq Scan on user_alias a
(cost=0.00..234.00 rows=332 width=58) (actual
time=15.04..21.01 rows=493 loops=1)
Filter: (domain_id = 2005921193)
-> Index Scan using user_pkey on user u
(cost=0.00..3.08 rows=1 width=156) (actual
time=0.17..0.17 rows=1 loops=493)
Index Cond: ("outer".user_id =
u.user_id)
-> Index Scan using user_timestamps_uid_idx on
user_timestamps t (cost=0.00..3.11 rows=1 width=24)
(actual time=0.16..0.23 rows=1 loops=493)
Index Cond: ("outer".user_id = t.user_id)
Total runtime: 257.79 msec
(9 rows)



user with 911932 rows
user_id - PK

user_timestamps with 708851 rows
user_id - FK with index

user_alias with 9689 rows
user_id - FK with index
domain_id - no index on this column

My questions are:
1. Why 1st "Merge Join" in 2nd query gets actual
rows=897655 while 1st "Merge Join" in 1st query is
actual rows=1?

If I know the answer, I will understand:
Why 1st "Merge Join" in 2nd query took so longer time
than 1st "Merge Join" in 1st query?

2. Why PG optimzer is not smart enough to use 3rd
(nested Loop) plan?

Thanks,





__________________________________
Do you Yahoo!?
Vote for the stars of Yahoo!'s next ad campaign!
http://advision.webevents.yahoo.com/yahoo/votelifeengine/

Search Discussions

  • Tom Lane at Jul 17, 2004 at 4:43 am

    Litao Wu writes:
    SELECT *
    FROM user U LEFT JOIN user_timestamps T USING
    (user_id), user_alias A
    WHERE U.user_id = A.user_id AND A.domain_id=7551070;
    Ick. Try changing the join order, perhaps

    SELECT *
    FROM (user U JOIN user_alias A ON (U.user_id = A.user_id))
    LEFT JOIN user_timestamps T USING (user_id)
    WHERE A.domain_id=7551070;

    As you have it, the entire LEFT JOIN has to be formed first,
    and the useful restriction clause only gets applied later.

    The fact that the case with 7551070 finishes quickly is just
    blind luck --- the slow case is much more representative.

    regards, tom lane

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-performance @
categoriespostgresql
postedJul 16, '04 at 9:36p
activeJul 17, '04 at 4:43a
posts2
users2
websitepostgresql.org
irc#postgresql

2 users in discussion

Litao Wu: 1 post Tom Lane: 1 post

People

Translate

site design / logo © 2022 Grokbase