Is it impossible for Oracle to output these queries the same - a completely different execution plan worked

I have the following two very similar subquery statements. I have highlighted the difference with **.

1

SELECT DISTINCT name
FROM person, nameindex n
WHERE person.id1    ='0812'
AND person.id2    =n.id2
AND person.id1    =n.id1
AND n.phonetic IN
  (SELECT n2.phonetic
  FROM nameindex n2
  WHERE n2.id1=person.id1 **
  GROUP BY n2.phonetic
 HAVING COUNT(*) BETWEEN 4 AND 500)

      

2:

SELECT DISTINCT name
FROM person, nameindex n
WHERE person.id1    ='0812'
AND person.id2    =n.id2
AND person.id1    =n.id1
AND n.phonetic IN
  (SELECT n2.phonetic
  FROM nameindex n2
  WHERE n2.id1='0812'  **
  GROUP BY n2.phonetic
 HAVING COUNT(*) BETWEEN 4 AND 500)

      

I think the oracle could infer that it person.id1

should be constant 0812

in the subquery. However, both queries give extremely different execution plans and costs (1: cost 4404211855, and 2: cost: 36237). Why is this?

This is an analytical query rather than OLTP, so there are no indexes for this particular query.

(Query background: Get the names of persons within id1 = '0812' that have a phonetic entry in the index name table for which there are 4 to 500 entries.)

+3


source to share


1 answer


I followed a similar query with the following setup:

CREATE TABLE person (id1, id2, NAME) AS 
   SELECT to_char(mod(ROWNUM, 1000), 'fm0000'), ROWNUM,
          dbms_random.string('A',10)
     FROM dual 
   CONNECT BY LEVEL <= 1e6;
CREATE TABLE nameindex (id1, id2, phonetic) AS
   SELECT id1, id2, to_char(dbms_random.value(1, 200), 'fm000')
     FROM person;

      

I found that your first request creates the following plan:

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 291343677
--------------------------------------------------------------------------------
| Id  | Operation             | Name      | Rows  | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |           |     1 |  2040 |   331K  (2)| 01:06:2
|   1 |  HASH UNIQUE          |           |     1 |  2040 |   331K  (2)| 01:06:2
|*  2 |   FILTER              |           |       |       |            |
|*  3 |    HASH JOIN          |           |   891 |  1775K|  1750   (2)| 00:00:2
|*  4 |     TABLE ACCESS FULL | NAMEINDEX |   892 | 18732 |   739   (2)| 00:00:0
|*  5 |     TABLE ACCESS FULL | PERSON    |  1395 |  2750K|  1010   (2)| 00:00:1
|*  6 |    FILTER             |           |       |       |            |
|   7 |     HASH GROUP BY     |           |  9550 | 76400 |   740   (2)| 00:00:0
|*  8 |      TABLE ACCESS FULL| NAMEINDEX |  9550 | 76400 |   739   (2)| 00:00:0
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter( EXISTS (SELECT 0 FROM "NAMEINDEX" "N2" WHERE "N2"."ID1"=:B1
              GROUP BY "N2"."PHONETIC" HAVING "N2"."PHONETIC"=:B2 AND COUNT(*)>=
              COUNT(*)<=500))
   3 - access("PERSON"."ID2"="N"."ID2" AND "PERSON"."ID1"="N"."ID1")
   4 - filter("N"."ID1"='0812')
   5 - filter("PERSON"."ID1"='0812')
   6 - filter("N2"."PHONETIC"=:B1 AND COUNT(*)>=4 AND COUNT(*)<=500)
   8 - filter("N2"."ID1"=:B1)

      

As you can see that the IN semi-join is rewritten as EXISTS, which produces the same plan as this query:



SELECT DISTINCT NAME
  FROM person, nameindex n
 WHERE person.id1 = '0812'
   AND person.id2 = n.id2
   AND person.id1 = n.id1
   AND EXISTS (SELECT NULL
                 FROM nameindex n2
                WHERE n2.id1 = person.id1
                  AND n2.phonetic = n.phonetic
                GROUP BY n2.phonetic
               HAVING COUNT(*) BETWEEN 4 AND 500);

      

Here you can see that the subquery is NOT constant and therefore is calculated for each row of the main query, resulting in a less optimal execution plan.

I suggest you use all join columns in the GROUP BY when you are using an aggregate semi-join. The following query creates the optimal plan:

SELECT DISTINCT NAME
  FROM person, nameindex n
 WHERE person.id1 = '0812'
   AND person.id2 = n.id2
   AND person.id1 = n.id1
   AND (n.id1, n.phonetic) IN (SELECT n2.id1, n2.phonetic
                                 FROM nameindex n2
                                GROUP BY n2.id1, n2.phonetic
                               HAVING COUNT(*) BETWEEN 4 AND 500);

      

+4


source







All Articles