FAQ

[Oracle-l] Subquery (IN) is more efficient than JOIN

Matt McClernon
May 31, 2011 at 11:16 pm
In the test case below the subquery (IN) SQL visits 25% less buffers than the join query.  The :B1 bind variable is an array of strings.
any ideas why the IN does so much less work than the JOIN for the same row count and same plan..?

SQL ID: 52a8u971qm7tqPlan Hash: 3545774334SELECT /*+ RULE */ DOM_NAMEFROM DOMAINS, TABLE(CAST(:B1 AS DOMAIN_LIST)) DL WHERE DOM_NAME = DL.COLUMN_VALUE^_at_call     count       cpu    elapsed       disk      query    current        rows------- ------  -------- ---------- ---------- ---------- ----------  ----------Parse        1      0.00       0.00          0          0          0           0Execute      1      0.01       0.01          0          0          0           0Fetch        1      0.86       0.86          0     200047          0      115195------- ------  -------- ---------- ---------- ---------- ----------  ----------total        3      0.87       0.87          0     200047          0      115195
Misses in library cache during parse: 0Optimizer mode: RULEParsing user id: 63     (recursive depth: 1)

Rows     Row Source Operation-------  --------------------------------------------------- 115195  NESTED LOOPS  (cr=200047 pr=0 pw=0 time=6355 us)  99704   COLLECTION ITERATOR PICKLER FETCH (cr=0 pr=0 pw=0 time=284 us) 115195   INDEX RANGE SCAN DOM_NAME_IDX (cr=200047 pr=0 pw=0 time=3191 us)(object id 54309)
********************************************************************************

SQL ID: brdpfk7gs8cm7Plan Hash: 2750862036SELECT DOM_NAMEFROM DOMAINS WHERE DOM_NAME IN (SELECT COLUMN_VALUE FROM TABLE(CAST(:B1 AS  DOMAIN_LIST)))

call     count       cpu    elapsed       disk      query    current        rows------- ------  -------- ---------- ---------- ---------- ----------  ----------Parse        1      0.00       0.00          0          0          0           0Execute      1      0.01       0.01          0          0          0           0Fetch        1      0.78       0.78          0     157986          0      115195------- ------  -------- ---------- ---------- ---------- ----------  ----------total        3      0.80       0.80          0     157986          0      115195
Misses in library cache during parse: 0Optimizer mode: ALL_ROWSParsing user id: 63     (recursive depth: 1)

Rows     Row Source Operation-------  --------------------------------------------------- 115195  NESTED LOOPS  (cr=157986 pr=0 pw=0 time=4720 us cost=536 size=6141 card=267)  99704   SORT UNIQUE (cr=0 pr=0 pw=0 time=417 us)  99704    COLLECTION ITERATOR PICKLER FETCH (cr=0 pr=0 pw=0 time=259 us) 115195   INDEX RANGE SCAN DOM_NAME_IDX (cr=157986 pr=0 pw=0 time=3353 us cost=2 size=21 card=1)(object id 54309)
reply

Search Discussions

6 responses

  • Jonathan Lewis at Jun 1, 2011 at 9:09 am
    Your output was hard to read, but the CBO IN operation has a SORT UNIQUE step
    that the simple RBO join doesn't have.

    This means the inner table was visited in key order for the join, which may have
    allowed the run-time engine to keep more index blocks pinned while accessing the
    data.

    Run the query two or three times in each version, and check the statistic for
    "buffer is pinned count". I think you'll find that the drop in "session logical
    reads" corresponds to an increase in "buffer is pinned count".

    Regards

    Jonathan Lewis
    http://jonathanlewis.wordpress.com

    Original Message -----
    From: "Matt McClernon"
    To:
    Sent: Wednesday, June 01, 2011 12:16 AM
    Subject: Subquery (IN) is more efficient than JOIN

    In the test case below the subquery (IN) SQL visits 25% less buffers than the
    join query. The :B1 bind variable is an array of strings.
    any ideas why the IN does so much less work than the JOIN for the same row count
    and same plan..?

    SQL ID: 52a8u971qm7tqPlan Hash: 3545774334SELECT /*+ RULE */ DOM_NAMEFROM
    DOMAINS, TABLE(CAST(:B1 AS DOMAIN_LIST)) DL WHERE DOM_NAME =

    DL.COLUMN_VALUE^_at_call count cpu elapsed disk query current
    rows------- ------ -------- ---------- ---------- ---------- ---------- ----------Parse
    1 0.00 0.00 0 0 0 0Execute 1 0.01 0.01 0 0 0 0Fetch 1 0.86 0.86 0 200047 0
    115195------- ------ -------- ---------- ---------- ---------- ---------- ----------total
    3 0..87 0.87 0 200047 0 115195
    Misses in library cache during parse: 0Optimizer mode: RULEParsing user id: 63
    (recursive depth: 1)
    Rows Row Source

    Operation------- --------------------------------------------------- 115195

    NESTED LOOPS (cr=200047 pr=0 pw=0 time=6355 us) 99704 COLLECTION ITERATOR
    PICKLER FETCH (cr=0 pr=0 pw=0 time=284 us) 115195 INDEX RANGE SCAN DOM_NAME_IDX
    (cr=200047 pr=0 pw=0 time=3191 us)(object id 54309)

    SQL ID: brdpfk7gs8cm7Plan Hash: 2750862036SELECT DOM_NAMEFROM DOMAINS WHERE
    DOM_NAME IN (SELECT COLUMN_VALUE FROM TABLE(CAST(:B1 AS DOMAIN_LIST)))

    call count cpu elapsed disk query current
    rows------- ------ -------- ---------- ---------- ---------- ---------- ----------Parse
    1 0.00 0.00 0 0 0 0Execute 1 0.01 0.01 0 0 0 0Fetch 1 0.78 0.78 0 157986 0
    115195------- ------ -------- ---------- ---------- ---------- ---------- ----------total
    3 0.80 0.80 0 157986 0 115195
    Misses in library cache during parse: 0Optimizer mode: ALL_ROWSParsing user id:
    63 (recursive depth: 1)
    Rows Row Source

    Operation------- --------------------------------------------------- 115195
    NESTED LOOPS (cr=157986 pr=0 pw=0 time=4720 us cost=536 size=6141 card=267)

    99704 SORT UNIQUE (cr=0 pr=0 pw=0 time=417 us) 99704 COLLECTION ITERATOR PICKLER
    FETCH (cr=0 pr=0 pw=0 time=259 us) 115195 INDEX RANGE SCAN DOM_NAME_IDX
    (cr=157986 pr=0 pw=0 time=3353 us cost=2 size=21 card=1)(object id 54309)

    --
    http://www.freelists.org/webpage/oracle-l

    No virus found in this message.
    Checked by AVG - www.avg.com
    Version: 10.0.1375 / Virus Database: 1509/3672 - Release Date: 05/31/11
  • Jonathan Lewis at Jun 1, 2011 at 9:22 am
    I should have pointed out that the two queries aren't logically equivalent.

    Your collection won't have a uniqueness constraint so it could contain
    duplicates, which would be eliminated by the CBO IN version, but not by the RBO
    simple join. The "rows" output from the Rowsource Operation report suggest that
    the data in the collection happened to be unique in your example.

    If you're a member of the IOUG I wrote an article about optimising a two-table
    join for the last issue of their journal - the option for using this rewrite was
    one of the ones I mentioned.

    Regards

    Jonathan Lewis
    http://jonathanlewis.wordpress.com

    Original Message -----
    From: "Matt McClernon"
    To:
    Sent: Wednesday, June 01, 2011 12:16 AM
    Subject: Subquery (IN) is more efficient than JOIN

    In the test case below the subquery (IN) SQL visits 25% less buffers than the
    join query. The :B1 bind variable is an array of strings.
    any ideas why the IN does so much less work than the JOIN for the same row count
    and same plan..?
  • Matt McClernon at Jun 1, 2011 at 11:01 pm
    Your output was hard to read, but the CBO IN operation has a SORT UNIQUE step that the simple RBO join doesn't have.
    hmm, it must have got mangled when I pasted it in, does this look any better:

    SQL ID: 52a8u971qm7tqPlan Hash: 3545774334SELECT /*+ RULE */ DOM_NAMEFROM DOMAINS, TABLE(CAST(:B1 AS DOMAIN_LIST)) DL WHERE DOM_NAME = DL.COLUMN_VALUE^_at_call     count       cpu    elapsed       disk      query    current        rows------- ------  -------- ---------- ---------- ---------- ----------  ----------Parse        1      0.00       0.00          0          0          0           0Execute      1      0.01       0.01          0          0          0           0Fetch        1      0.86       0.86          0     200047          0      115195------- ------  -------- ---------- ---------- ---------- ----------  ----------total        3      0.87       0.87          0     200047          0      115195
    Misses in library cache during parse: 0Optimizer mode: RULEParsing user id: 63     (recursive depth: 1)

    Rows     Row Source Operation-------  --------------------------------------------------- 115195  NESTED LOOPS  (cr=200047 pr=0 pw=0 time=6355 us)  99704   COLLECTION ITERATOR PICKLER FETCH (cr=0 pr=0 pw=0 time=284 us) 115195   INDEX RANGE SCAN DOM_NAME_IDX (cr=200047 pr=0 pw=0 time=3191 us)(object id 54309)

    SQL ID: brdpfk7gs8cm7Plan Hash: 2750862036SELECT DOM_NAMEFROM DOMAINS WHERE DOM_NAME IN (SELECT COLUMN_VALUE FROM TABLE(CAST(:B1 AS  DOMAIN_LIST)))

    call     count       cpu    elapsed       disk      query    current        rows------- ------  -------- ---------- ---------- ---------- ----------  ----------Parse        1      0.00       0.00          0          0          0           0Execute      1      0.01       0.01          0          0          0           0Fetch        1      0.78       0.78          0     157986          0      115195------- ------  -------- ---------- ---------- ---------- ----------  ----------total        3      0.80       0.80          0     157986          0      115195
    Misses in library cache during parse: 0Optimizer mode: ALL_ROWSParsing user id: 63     (recursive depth: 1)

    Rows     Row Source Operation-------  --------------------------------------------------- 115195  NESTED LOOPS  (cr=157986 pr=0 pw=0 time=4720 us cost=536 size=6141 card=267)  99704   SORT UNIQUE (cr=0 pr=0 pw=0 time=417 us)  99704    COLLECTION ITERATOR PICKLER FETCH (cr=0 pr=0 pw=0 time=259 us) 115195   INDEX RANGE SCAN DOM_NAME_IDX (cr=157986 pr=0 pw=0 time=3353 us cost=2 size=21 card=1)(object id 54309)
    This means the inner table was visited in key order for the join, which may have allowed the run-time engine to keep more index blocks >  pinned while accessing the data.
    OK, sounds reasonable
    Run the query two or three times in each version, and check the statistic for "buffer is pinned count". I think you'll find that the drop in "session logical reads" corresponds to an increase in "buffer >  is pinned count".
    I ran the 2 queries using Tom Kytes RUNSTATS harness and there was a significant difference in 'buffer is pinned count':
    STAT...shared hash latch upgra           0      11,049      11,049STAT...buffer is pinned count       99,997      73,056     -26,941STAT...consistent gets - exami     100,016      73,075     -26,941STAT...consistent gets from ca     100,679      73,678     -27,001STAT...no work - consistent re     100,672      73,671     -27,001STAT...session logical reads       200,720     157,835     -42,885STAT...consistent gets             200,695     157,802     -42,893STAT...consistent gets from ca     200,695     157,802     -42,893LATCH.cache buffers chains         301,507     242,665     -58,842STAT...sorts (rows)                      0     100,000     100,000STAT...session pga memory        9,764,864   3,342,336  -6,422,528

    --
    http://www.freelists.org/webpage/oracle-l
  • Bill Ferguson at Jun 2, 2011 at 12:48 pm

    On Wed, Jun 1, 2011 at 5:01 PM, Matt McClernon wrote:
    Your output was hard to read, but the CBO IN operation has a SORT UNIQUE step that the simple RBO join doesn't have.
    hmm, it must have got mangled when I pasted it in, does this look any better:
    Depending on mail clients used, the problem is probably due to HTML
    formatting. Maybe wrapping output with tags would work?

    To experiment, I'll try a little sample below:
    <----- there is a pre tag here

    column_1 column_2 Column_3 <---- each 'header' has five spaces
    value1 value2 value3 <---- the first

    letter was lined up under the first letter of the header
    <----- the ending pre tag

    So, depending on what this looks like after I click send 'may' help
    explain formatting problems?
  • Matt McClernon at Jun 1, 2011 at 11:09 pm

    I should have pointed out that the two queries aren't logically equivalent.
    Your collection won't have a uniqueness constraint so it could contain duplicates, which would be eliminated by the CBO IN version, but not by the RBO simple join. The "rows" output from the Rowsource >  Operation report suggest that the data in the collection happened to be unique in your example.
    Spot on, I made sure that the contents of the array were unique
    My first thoughts were that the IN would stop working once it had gotten its first match, whereas the JOIN would continue until it had checked all rows in the driven table.  In my mind that would reduce the CPU work done but wouldn't necessarily reduce the logical read count because the extra rows that the JOIN would check would all be in buffers that had already been visited.
    Matt

    --
    http://www.freelists.org/webpage/oracle-l
  • Jonathan Lewis at Jun 2, 2011 at 7:11 am
    It;'s a valid consideration, and could work in some cases - it's another one
    that I mentioned in the article.
    Technically you would aim to get Oracle to transform from IN to EXISTS and then
    run the subquery as a filter:
    In your case adding the hint /*+ no_unnest */ should be sufficient.

    You may find that this increases the CPU, though, because it probably casts the
    collection for every row in DOMAINS. The effect on buffer visits will depend
    pretty much on how Oracle drives through DOMAINS.
    Regards

    Jonathan Lewis
    http://jonathanlewis.wordpress.com

    ] ----- Original Message -----
    ] From: "Matt McClernon"
    ] To:;
    ] Sent: Thursday, June 02, 2011 12:09 AM
    ] Subject: Re: Subquery (IN) is more efficient than JOIN

    ] My first thoughts were that the IN would stop working once it had gotten its
    first match, whereas the JOIN would continue until it had checked all rows in
    the driven table. In my mind that would reduce the CPU work done but wouldn't
    necessarily reduce the logical read count because the extra rows that the JOIN
    would check would all be in buffers that had already been visited.

Related Discussions

Discussion Navigation
viewthread | post