Hi, all.

I have a query, looking like this:
SELECT
pub_date
FROM
tubesite_object
INNER JOIN tubesite_image
ON tubesite_image.object_ptr_id = tubesite_object.id
WHERE
tubesite_object.site_id = 8
AND tubesite_object.pub_date < E'2011-07-12 13:25:00'
ORDER BY
tubesite_object.pub_date ASC
LIMIT 21;


That query takes cca 10-15 seconds to run. Here is query plan:

Limit (cost=0.00..415.91 rows=21 width=8) (actual
time=11263.089..11263.089 rows=0 loops=1)
-> Nested Loop (cost=0.00..186249.55 rows=9404 width=8) (actual
time=11263.087..11263.087 rows=0 loops=1)
-> Index Scan using tubesite_object_pub_date_idx on
tubesite_object (cost=0.00..183007.09 rows=9404 width=12) (actual
time=0.024..11059.487 rows=9374 loops=1)
Index Cond: (pub_date < '2011-07-12
13:25:00-05'::timestamp with time zone)
Filter: (site_id = 8)
-> Index Scan using tubesite_image_pkey on tubesite_image
(cost=0.00..0.33 rows=1 width=4) (actual time=0.021..0.021 rows=0
loops=9374)
Index Cond: (tubesite_image.object_ptr_id =
tubesite_object.id)
Total runtime: 11263.141 ms


This query runs quickly (around second or two) when there is only few
connections to the database. Once I have 50-80 connections (200 is the
limit, although I never have more than 120-150 connections), that query
takes around 10-15 seconds.

But, if I disable nestedloops, here is the query plan:

Limit (cost=22683.45..22683.51 rows=21 width=8) (actual
time=136.009..136.009 rows=0 loops=1)
-> Sort (cost=22683.45..22706.96 rows=9404 width=8) (actual
time=136.007..136.007 rows=0 loops=1)
Sort Key: tubesite_object.pub_date
Sort Method: quicksort Memory: 25kB
-> Hash Join (cost=946.51..22429.91 rows=9404 width=8)
(actual time=135.934..135.934 rows=0 loops=1)
Hash Cond: (tubesite_object.id =
tubesite_image.object_ptr_id)
-> Bitmap Heap Scan on tubesite_object
(cost=545.40..21828.97 rows=9404 width=12) (actual time=20.874..104.075
rows=9374 loops=1)
Recheck Cond: (site_id = 8)
Filter: (pub_date < '2011-07-12
13:25:00-05'::timestamp with time zone)
-> Bitmap Index Scan on tubesite_object_site_id
(cost=0.00..543.05 rows=9404 width=0) (actual time=18.789..18.789
rows=9374 loops=1)
Index Cond: (site_id = 8)
-> Hash (cost=215.49..215.49 rows=14849 width=4)
(actual time=21.068..21.068 rows=14849 loops=1)
-> Seq Scan on tubesite_image (cost=0.00..215.49
rows=14849 width=4) (actual time=0.029..9.073 rows=14849 loops=1)
Total runtime: 136.287 ms


Now, if I disable nested loops in postgres.conf, then my load average on
the server goes skyhigh (i presume because a lot of other queries are
now being planned incorrectly).

I have set up default_statistics_target to 2000, and have vacumed and
analyzed the database.

Here are the other options I have set up in postgresql.conf (that differ
from the default settings):

version | PostgreSQL 8.4.8 on x86_64-pc-linux-gnu,
compiled by GCC gcc-4.3.real (Debian 4.3.2-1.1) 4.3.2, 64-bit
checkpoint_segments | 64
default_statistics_target | 2000
effective_cache_size | 20GB
external_pid_file | /var/run/postgresql/8.4-main.pid
lc_collate | en_US.UTF-8
lc_ctype | en_US.UTF-8
listen_addresses | *
log_autovacuum_min_duration | 0
log_checkpoints | on
log_line_prefix | %t [%p]: [%l-1]
log_min_duration_statement | 1s
maintenance_work_mem | 256MB
max_connections | 200
max_stack_depth | 3MB
port | 5432
server_encoding | UTF8
shared_buffers | 2GB
statement_timeout | 30min
temp_buffers | 4096
TimeZone | localtime
track_activity_query_size | 2048
unix_socket_directory | /var/run/postgresql
wal_buffers | 128MB
work_mem | 64MB



Why is planner using NestedLoops, that is, what can I do to make him NOT
to use NestedLoops (other than issuing SET enable_nestloop TO false;
before each query) ?

Mario

Search Discussions

  • Tom Lane at Jul 12, 2011 at 8:04 pm

    Mario Splivalo writes:
    Limit (cost=0.00..415.91 rows=21 width=8) (actual
    time=11263.089..11263.089 rows=0 loops=1)
    -> Nested Loop (cost=0.00..186249.55 rows=9404 width=8) (actual
    time=11263.087..11263.087 rows=0 loops=1)
    Why is planner using NestedLoops,
    Because it thinks the LIMIT will kick in and end the query when the join
    is only 21/9404ths (ie, a fraction of a percent) complete. A NestLoop
    results in saving a lot of work in that situation, whereas hash-and-sort
    has to do the whole join despite the LIMIT.

    What you need to look into is why the estimated join size is 9400 rows
    when the actual join size is zero. Are both tables ANALYZEd? Are you
    intentionally selecting rows that have no join partners?

    regards, tom lane
  • Mario Splivalo at Jul 12, 2011 at 10:06 pm

    On 07/12/2011 10:04 PM, Tom Lane wrote:
    Mario Splivalo<mario.splivalo@megafon.hr> writes:
    Limit (cost=0.00..415.91 rows=21 width=8) (actual
    time=11263.089..11263.089 rows=0 loops=1)
    -> Nested Loop (cost=0.00..186249.55 rows=9404 width=8) (actual
    time=11263.087..11263.087 rows=0 loops=1)
    Why is planner using NestedLoops,
    Because it thinks the LIMIT will kick in and end the query when the join
    is only 21/9404ths (ie, a fraction of a percent) complete. A NestLoop
    results in saving a lot of work in that situation, whereas hash-and-sort
    has to do the whole join despite the LIMIT.

    What you need to look into is why the estimated join size is 9400 rows
    when the actual join size is zero. Are both tables ANALYZEd? Are you
    intentionally selecting rows that have no join partners?
    Hi, Tom.

    Yes, both tables have been ANALYZEd. What do you mean, intentilnaly
    selecting rows taht have no join partners?

    Mario
  • Tom Lane at Jul 12, 2011 at 10:39 pm

    Mario Splivalo writes:
    On 07/12/2011 10:04 PM, Tom Lane wrote:
    What you need to look into is why the estimated join size is 9400 rows
    when the actual join size is zero. Are both tables ANALYZEd? Are you
    intentionally selecting rows that have no join partners?
    Yes, both tables have been ANALYZEd. What do you mean, intentilnaly
    selecting rows taht have no join partners?
    I'm wondering why the actual join size is zero. That seems like a
    rather unexpected case for a query like this.

    regards, tom lane
  • Mario Splivalo at Jul 12, 2011 at 11:57 pm

    On 07/13/2011 12:39 AM, Tom Lane wrote:
    Mario Splivalo<mario.splivalo@megafon.hr> writes:
    On 07/12/2011 10:04 PM, Tom Lane wrote:
    What you need to look into is why the estimated join size is 9400 rows
    when the actual join size is zero. Are both tables ANALYZEd? Are you
    intentionally selecting rows that have no join partners?
    Yes, both tables have been ANALYZEd. What do you mean, intentilnaly
    selecting rows taht have no join partners?
    I'm wondering why the actual join size is zero. That seems like a
    rather unexpected case for a query like this.
    It is true that this particular query returns 0 rows. But it's created
    by django, and I can't do much to alter it.

    Mario
  • Mario Splivalo at Jul 13, 2011 at 12:53 am

    On 07/13/2011 12:39 AM, Tom Lane wrote:
    Mario Splivalo<mario.splivalo@megafon.hr> writes:
    On 07/12/2011 10:04 PM, Tom Lane wrote:
    What you need to look into is why the estimated join size is 9400 rows
    when the actual join size is zero. Are both tables ANALYZEd? Are you
    intentionally selecting rows that have no join partners?
    Yes, both tables have been ANALYZEd. What do you mean, intentilnaly
    selecting rows taht have no join partners?
    I'm wondering why the actual join size is zero. That seems like a
    rather unexpected case for a query like this.
    Yes, seems that planer gets confused by LIMIT. This query:

    select * from tubesite_object join tubesite_image on id=object_ptr_id
    where site_id = 8 and pub_date < '2011-07-12 13:25:00' order by pub_date
    desc ;

    Does not choose Nested Loop, and is done instantly (20 ms), and returns
    no rows. However, if I add LIMIT at the end, it chooses NestedLoop and
    it takes 500ms if I'm alone on the server, and 10+ seconds if there 50+
    connections on the server.

    Mario
  • Mario Splivalo at Jul 13, 2011 at 2:04 am

    On 07/13/2011 02:53 AM, Mario Splivalo wrote:
    On 07/13/2011 12:39 AM, Tom Lane wrote:
    Mario Splivalo<mario.splivalo@megafon.hr> writes:
    On 07/12/2011 10:04 PM, Tom Lane wrote:
    What you need to look into is why the estimated join size is 9400 rows
    when the actual join size is zero. Are both tables ANALYZEd? Are you
    intentionally selecting rows that have no join partners?
    Yes, both tables have been ANALYZEd. What do you mean, intentilnaly
    selecting rows taht have no join partners?
    I'm wondering why the actual join size is zero. That seems like a
    rather unexpected case for a query like this.
    Yes, seems that planer gets confused by LIMIT. This query:

    select * from tubesite_object join tubesite_image on id=object_ptr_id
    where site_id = 8 and pub_date < '2011-07-12 13:25:00' order by pub_date
    desc ;

    Does not choose Nested Loop, and is done instantly (20 ms), and returns
    no rows. However, if I add LIMIT at the end, it chooses NestedLoop and
    it takes 500ms if I'm alone on the server, and 10+ seconds if there 50+
    connections on the server.
    As explained/suggested by RhodiumToad on IRC, adding composite index on
    (site_id, pub_date) made nestedloop query finish in around 100 seconds!

    Thank you!

    Mario
  • Clem Dickey at Jul 12, 2011 at 8:13 pm

    On 07/12/2011 11:11 AM, Mario Splivalo wrote:
    Hi, all.

    I have a query, looking like this:
    SELECT
    pub_date
    FROM
    tubesite_object
    INNER JOIN tubesite_image
    ON tubesite_image.object_ptr_id = tubesite_object.id
    WHERE
    tubesite_object.site_id = 8
    AND tubesite_object.pub_date < E'2011-07-12 13:25:00'
    ORDER BY
    tubesite_object.pub_date ASC
    LIMIT 21;
    Why is planner using NestedLoops, that is, what can I do to make him NOT
    to use NestedLoops (other than issuing SET enable_nestloop TO false;
    before each query) ?
    The planner is using a nested loops because the startup overhead is
    less, and it think that it will only have run a small 0.2% (21/9404) of
    the loops before reaching your limit of 21 results. In fact it has to
    run all the loops, because there are 0 results. (Is that what you expected?)

    Try a using CTE to make the planner think you are going to use all the
    rows of the joined table. That may cause the planner to use a merge
    join, which has higher startup cost (sort) but less overall cost if it
    the join will not finish early.

    WITH t AS (
    SELECT tubesite_object.site_id AS site_id,
    tubesite_object.pub_date as pub_date
    FROM tubesite_object
    INNER JOIN tubesite_image
    ON tubesite_image.object_ptr_id = tubesite_object.id
    )
    SELECT pub_date
    FROM t
    WHERE t.site_id = 8 AND t.pub_date < E'2011-07-12 13:25:00'
    ORDER BY t.pub_date ASC LIMIT 21;

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-performance @
categoriespostgresql
postedJul 12, '11 at 6:18p
activeJul 13, '11 at 2:04a
posts8
users3
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase