请教in和exists的区别

shiri512003 · 发表于 2008-4-14 11:49

[php]11.5.3.4 Use of EXISTS versus IN for Subqueries
In certain circumstances, it is better to use IN rather than EXISTS. In general, if the selective predicate is in the subquery, then use IN. If the selective predicate is in the parent query, then use EXISTS.

Note:

This discussion is most applicable in an OLTP environment, where the access paths either to the parent SQL or subquery are through indexed columns with high selectivity. In a DSS environment, there can be low selectivity in the parent SQL or subquery, and there might not be any indexes on the join columns. In a DSS environment, consider using semijoins for the EXISTS case.
See Also:

Oracle Database Data Warehousing Guide
Sometimes, Oracle can rewrite a subquery when used with an IN clause to take advantage of selectivity specified in the subquery. This is most beneficial when the most selective filter appears in the subquery and there are indexes on the join columns. Conversely, using EXISTS is beneficial when the most selective filter is in the parent query. This allows the selective predicates in the parent query to be applied before filtering the rows against the EXISTS criteria.

Note:

You should verify the optimizer cost of the statement with the actual number of resources used (BUFFER_GETS, DISK_READS, CPU_TIME from V$SQLSTATS or V$SQLAREA). Situations such as data skew (without the use of histograms) can adversely affect the optimizer's estimated cost for an operation.
"Example 1: Using IN - Selective Filters in the Subquery" and "Example 2: Using EXISTS - Selective Predicate in the Parent" are two examples that demonstrate the benefits of IN and EXISTS. Both examples use the same schema with the following characteristics:

There is a unique index on the employees.employee_id field.

There is an index on the orders.customer_id field.

There is an index on the employees.department_id field.

The employees table has 27,000 rows.

The orders table has 10,000 rows.

The OE and HR schemas, which own these segments, were both analyzed with COMPUTE.

11.5.3.4.1 Example 1: Using IN - Selective Filters in the Subquery
This example demonstrates how rewriting a query to use IN can improve performance. This query identifies all employees who have placed orders on behalf of customer 144.

The following SQL statement uses EXISTS:

SELECT /* EXISTS example */
      e.employee_id, e.first_name, e.last_name, e.salary
  FROM employees e
WHERE EXISTS (SELECT 1 FROM orders o                /* Note 1 */
               WHERE e.employee_id = o.sales_rep_id /* Note 2 */
                  AND o.customer_id = 144);          /* Note 3 */

Notes:

Note 1: This shows the line containing EXISTS.

Note 2: This shows the line that makes the subquery a correlated subquery.

Note 3: This shows the line where the correlated subqueries include the highly selective predicate customer_id = number.

The following plan output is the execution plan (from V$SQL_PLAN) for the preceding statement. The plan requires a full table scan of the employees table, returning many rows. Each of these rows is then filtered against the orders table (through an index).

ID OPERATION          OPTIONS       OBJECT_NAME          OPT    COST
---- -------------------- --------------- ---------------------- --- ----------
0 SELECT STATEMENT                                           CHO
1  FILTER
2 TABLE ACCESS    FULL          EMPLOYEES             ANA       155
3 TABLE ACCESS    BY INDEX ROWID  ORDERS                ANA       3
4 INDEX          RANGE SCAN    ORD_CUSTOMER_IX       ANA       1

Rewriting the statement using IN results in significantly fewer resources used.

The SQL statement using IN:

SELECT /* IN example */
      e.employee_id, e.first_name, e.last_name, e.salary
FROM employees e
WHERE e.employee_id IN (SELECT o.sales_rep_id       /* Note 4 */
                           FROM orders o
                        WHERE o.customer_id = 144);  /* Note 3 */

Note:

Note 3: This shows the line where the correlated subqueries include the highly selective predicate customer_id = number

Note 4: This indicates that an IN is being used. The subquery is no longer correlated, because the IN clause replaces the join in the subquery.

The following plan output is the execution plan (from V$SQL_PLAN) for the preceding statement. The optimizer rewrites the subquery into a view, which is then joined through a unique index to the employees table. This results in a significantly better plan, because the view (that is, subquery) has a selective predicate, thus returning only a few employee_ids. These employee_ids are then used to access the employees table through the unique index.

ID OPERATION          OPTIONS       OBJECT_NAME          OPT    COST
---- -------------------- --------------- ---------------------- --- ----------
0 SELECT STATEMENT                                           CHO
1  NESTED LOOPS                                                          5
2 VIEW                                                                3
3 SORT             UNIQUE                                           3
4    TABLE ACCESS    FULL          ORDERS                ANA       1
5 TABLE ACCESS    BY INDEX ROWID  EMPLOYEES             ANA       1
6 INDEX          UNIQUE SCAN    EMP_EMP_ID_PK       ANA
11.5.3.4.2 Example 2: Using EXISTS - Selective Predicate in the Parent
This example demonstrates how rewriting a query to use EXISTS can improve performance. This query identifies all employees from department 80 who are sales reps who have placed orders.

The following SQL statement uses IN:

SELECT /* IN example */
      e.employee_id, e.first_name, e.last_name, e.department_id, e.salary
FROM employees e
WHERE e.department_id = 80                                  /* Note 5 */
   AND e.job_id       = 'SA_REP'                            /* Note 6 */
   AND e.employee_id IN (SELECT o.sales_rep_id FROM orders o); /* Note 4 */
Note:

Note 4: This indicates that an IN is being used. The subquery is no longer correlated, because the IN clause replaces the join in the subquery.

Note 5 and 6: These are the selective predicates in the parent SQL.

The following plan output is the execution plan (from V$SQL_PLAN) for the preceding statement. The SQL statement was rewritten by the optimizer to use a view on the orders table, which requires sorting the data to return all unique employee_ids existing in the orders table. Because there is no predicate, many employee_ids are returned. The large list of resulting employee_ids are then used to access the employees table through the unique index.

ID OPERATION          OPTIONS       OBJECT_NAME          OPT    COST
---- -------------------- --------------- ---------------------- --- ----------
0 SELECT STATEMENT                                           CHO
1  NESTED LOOPS                                                       125
2 VIEW                                                                116
3 SORT             UNIQUE                                           116
4    TABLE ACCESS    FULL          ORDERS                ANA       40
5 TABLE ACCESS    BY INDEX ROWID  EMPLOYEES             ANA       1
6 INDEX          UNIQUE SCAN    EMP_EMP_ID_PK       ANA

The following SQL statement uses EXISTS:

SELECT /* EXISTS example */
      e.employee_id, e.first_name, e.last_name, e.salary
FROM employees e
WHERE e.department_id = 80                         /* Note 5 */
   AND e.job_id       = 'SA_REP'                   /* Note 6 */
   AND EXISTS (SELECT 1                            /* Note 1 */
               FROM orders o
               WHERE e.employee_id = o.sales_rep_id);  /* Note 2 */

Note:

Note 1: This shows the line containing EXISTS.

Note 2: This shows the line that makes the subquery a correlated subquery.

Note 5 & 6:These are the selective predicates in the parent SQL.

The following plan output is the execution plan (from V$SQL_PLAN) for the preceding statement. The cost of the plan is reduced by rewriting the SQL statement to use an EXISTS. This plan is more effective, because two indexes are used to satisfy the predicates in the parent query, thus returning only a few employee_ids. The employee_ids are then used to access the orders table through an index.

ID OPERATION          OPTIONS       OBJECT_NAME          OPT    COST
---- -------------------- --------------- ---------------------- --- ----------
0 SELECT STATEMENT                                           CHO
1  FILTER
2 TABLE ACCESS    BY INDEX ROWID  EMPLOYEES             ANA       98
3 AND-EQUAL
4    INDEX          RANGE SCAN    EMP_JOB_IX          ANA
5    INDEX          RANGE SCAN    EMP_DEPARTMENT_IX    ANA
6 INDEX             RANGE SCAN    ORD_SALES_REP_IX    ANA       8
Note:

An even more efficient approach is to have a concatenated index on department_id and job_id. This eliminates the need to access two indexes and reduces the resources used.

[/php]

shiri512003 · 发表于 2008-4-14 11:53

原帖由 anlinew 于 2008-4-14 11:49 发表

这个例子不太符合这个role

不是吧
好像很相符的
因为lz的parent query里是没有任何条件的，而选择性条件在children query里

anlinew · 发表于 2008-4-14 12:11

原帖由 shiri512003 于 2008-4-14 11:53 发表

不是吧
好像很相符的
因为lz的parent query里是没有任何条件的，而选择性条件在children query里

[php]
select id, role_id, login
from tb_user e
where Exists (Select /*+ UNNEST */
ID
From tb_role f
where e.role_id = f.id
Connect By Prior Id = parent_id
Start With Id = 1);
-----
[/php]
试试这个
如果不做unnest，in一样会filter
这里的问题其实是这种sql下exists为何不unnest的问题

anlinew · 发表于 2008-4-14 12:16

顺便请各位帮个忙，谁能看这个：
@ INTERNAL Note 250646.1 SubQuery Unnesting - IN SubQuery
@ INTERNAL Note 258228.1 SubQuery Unnesting - NOT IN SubQuery
@ INTERNAL Note 258676.1 SubQuery Unnesting - EXISTS SubQuery .
帮贴一个

paulyibinyi · 发表于 2008-4-14 12:24

原帖由 anlinew 于 2008-4-14 12:11 发表

[php]
select id, role_id, login
from tb_user e
where Exists (Select /*+ UNNEST */
ID
From tb_role f
where e.role_id = f.id
Connect By Prior Id = parent_id
Start With Id = 1);
-----
[/php]
试试这个
如果不做unnest，in一样会filter
这里的问题其实是这种sql下exists为何不unnest的问题

加了这个提示执行计划和exists以前一样

shiri512003 · 发表于 2008-4-14 12:58

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1292 Card=428 Bytes=
      9416)

1 0 FILTER
2 1 TABLE ACCESS (FULL) OF 'tb_user' (Cost=8 Card=428 Byte
      s=9416)

3 1 FILTER
4 3 CONNECT BY (WITH FILTERING)



Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=14 Card=18 Bytes=630
      )

1 0 HASH JOIN (SEMI) (Cost=14 Card=18 Bytes=630)
2 1 TABLE ACCESS (FULL) OF 'tb_user' (Cost=8 Card=8556 Byt
      es=188232)

3 1 VIEW OF 'VW_NSO_1' (Cost=3 Card=18 Bytes=234)
4 3 CONNECT BY (WITH FILTERING)

日月明王 · 发表于 2008-4-14 12:59

Use a /*+ NO_UNNEST */ hint in the subquery

棉花糖ONE · 发表于 2008-4-14 13:06

select id, role_id, login
from tb_user e
where Exists (Select /*+ hash_sj */ ID
From tb_role f
where e.role_id = f.id
Connect By Prior Id = parent_id
Start With Id = 1);
看看

paulyibinyi · 发表于 2008-4-14 13:26

原帖由 日月明王 于 2008-4-14 12:59 发表
Use a /*+ NO_UNNEST */ hint in the subquery

加的提示都没起作用

paulyibinyi · 发表于 2008-4-14 13:27

原帖由 棉花糖ONE 于 2008-4-14 13:06 发表
select id, role_id, login
from tb_user e
where Exists (Select /*+ hash_sj */ ID
From tb_role f
where e.role_id = f.id
Connect By Prior Id = parent_id
Start With Id = 1);
看看

也没起作用,执行计划还是和以前一样