|
[php]11.5.3.4 Use of EXISTS versus IN for Subqueries
In certain circumstances, it is better to use IN rather than EXISTS. In general, if the selective predicate is in the subquery, then use IN. If the selective predicate is in the parent query, then use EXISTS.
Note:
This discussion is most applicable in an OLTP environment, where the access paths either to the parent SQL or subquery are through indexed columns with high selectivity. In a DSS environment, there can be low selectivity in the parent SQL or subquery, and there might not be any indexes on the join columns. In a DSS environment, consider using semijoins for the EXISTS case.
See Also:
Oracle Database Data Warehousing Guide
Sometimes, Oracle can rewrite a subquery when used with an IN clause to take advantage of selectivity specified in the subquery. This is most beneficial when the most selective filter appears in the subquery and there are indexes on the join columns. Conversely, using EXISTS is beneficial when the most selective filter is in the parent query. This allows the selective predicates in the parent query to be applied before filtering the rows against the EXISTS criteria.
Note:
You should verify the optimizer cost of the statement with the actual number of resources used (BUFFER_GETS, DISK_READS, CPU_TIME from V$SQLSTATS or V$SQLAREA). Situations such as data skew (without the use of histograms) can adversely affect the optimizer's estimated cost for an operation.
"Example 1: Using IN - Selective Filters in the Subquery" and "Example 2: Using EXISTS - Selective Predicate in the Parent" are two examples that demonstrate the benefits of IN and EXISTS. Both examples use the same schema with the following characteristics:
There is a unique index on the employees.employee_id field.
There is an index on the orders.customer_id field.
There is an index on the employees.department_id field.
The employees table has 27,000 rows.
The orders table has 10,000 rows.
The OE and HR schemas, which own these segments, were both analyzed with COMPUTE.
11.5.3.4.1 Example 1: Using IN - Selective Filters in the Subquery
This example demonstrates how rewriting a query to use IN can improve performance. This query identifies all employees who have placed orders on behalf of customer 144.
The following SQL statement uses EXISTS:
SELECT /* EXISTS example */
e.employee_id, e.first_name, e.last_name, e.salary
FROM employees e
WHERE EXISTS (SELECT 1 FROM orders o /* Note 1 */
WHERE e.employee_id = o.sales_rep_id /* Note 2 */
AND o.customer_id = 144); /* Note 3 */
Notes:
Note 1: This shows the line containing EXISTS.
Note 2: This shows the line that makes the subquery a correlated subquery.
Note 3: This shows the line where the correlated subqueries include the highly selective predicate customer_id = number.
The following plan output is the execution plan (from V$SQL_PLAN) for the preceding statement. The plan requires a full table scan of the employees table, returning many rows. Each of these rows is then filtered against the orders table (through an index).
ID OPERATION OPTIONS OBJECT_NAME OPT COST
---- -------------------- --------------- ---------------------- --- ----------
0 SELECT STATEMENT CHO
1 FILTER
2 TABLE ACCESS FULL EMPLOYEES ANA 155
3 TABLE ACCESS BY INDEX ROWID ORDERS ANA 3
4 INDEX RANGE SCAN ORD_CUSTOMER_IX ANA 1
Rewriting the statement using IN results in significantly fewer resources used.
The SQL statement using IN:
SELECT /* IN example */
e.employee_id, e.first_name, e.last_name, e.salary
FROM employees e
WHERE e.employee_id IN (SELECT o.sales_rep_id /* Note 4 */
FROM orders o
WHERE o.customer_id = 144); /* Note 3 */
Note:
Note 3: This shows the line where the correlated subqueries include the highly selective predicate customer_id = number
Note 4: This indicates that an IN is being used. The subquery is no longer correlated, because the IN clause replaces the join in the subquery.
The following plan output is the execution plan (from V$SQL_PLAN) for the preceding statement. The optimizer rewrites the subquery into a view, which is then joined through a unique index to the employees table. This results in a significantly better plan, because the view (that is, subquery) has a selective predicate, thus returning only a few employee_ids. These employee_ids are then used to access the employees table through the unique index.
ID OPERATION OPTIONS OBJECT_NAME OPT COST
---- -------------------- --------------- ---------------------- --- ----------
0 SELECT STATEMENT CHO
1 NESTED LOOPS 5
2 VIEW 3
3 SORT UNIQUE 3
4 TABLE ACCESS FULL ORDERS ANA 1
5 TABLE ACCESS BY INDEX ROWID EMPLOYEES ANA 1
6 INDEX UNIQUE SCAN EMP_EMP_ID_PK ANA
11.5.3.4.2 Example 2: Using EXISTS - Selective Predicate in the Parent
This example demonstrates how rewriting a query to use EXISTS can improve performance. This query identifies all employees from department 80 who are sales reps who have placed orders.
The following SQL statement uses IN:
SELECT /* IN example */
e.employee_id, e.first_name, e.last_name, e.department_id, e.salary
FROM employees e
WHERE e.department_id = 80 /* Note 5 */
AND e.job_id = 'SA_REP' /* Note 6 */
AND e.employee_id IN (SELECT o.sales_rep_id FROM orders o); /* Note 4 */
Note:
Note 4: This indicates that an IN is being used. The subquery is no longer correlated, because the IN clause replaces the join in the subquery.
Note 5 and 6: These are the selective predicates in the parent SQL.
The following plan output is the execution plan (from V$SQL_PLAN) for the preceding statement. The SQL statement was rewritten by the optimizer to use a view on the orders table, which requires sorting the data to return all unique employee_ids existing in the orders table. Because there is no predicate, many employee_ids are returned. The large list of resulting employee_ids are then used to access the employees table through the unique index.
ID OPERATION OPTIONS OBJECT_NAME OPT COST
---- -------------------- --------------- ---------------------- --- ----------
0 SELECT STATEMENT CHO
1 NESTED LOOPS 125
2 VIEW 116
3 SORT UNIQUE 116
4 TABLE ACCESS FULL ORDERS ANA 40
5 TABLE ACCESS BY INDEX ROWID EMPLOYEES ANA 1
6 INDEX UNIQUE SCAN EMP_EMP_ID_PK ANA
The following SQL statement uses EXISTS:
SELECT /* EXISTS example */
e.employee_id, e.first_name, e.last_name, e.salary
FROM employees e
WHERE e.department_id = 80 /* Note 5 */
AND e.job_id = 'SA_REP' /* Note 6 */
AND EXISTS (SELECT 1 /* Note 1 */
FROM orders o
WHERE e.employee_id = o.sales_rep_id); /* Note 2 */
Note:
Note 1: This shows the line containing EXISTS.
Note 2: This shows the line that makes the subquery a correlated subquery.
Note 5 & 6:These are the selective predicates in the parent SQL.
The following plan output is the execution plan (from V$SQL_PLAN) for the preceding statement. The cost of the plan is reduced by rewriting the SQL statement to use an EXISTS. This plan is more effective, because two indexes are used to satisfy the predicates in the parent query, thus returning only a few employee_ids. The employee_ids are then used to access the orders table through an index.
ID OPERATION OPTIONS OBJECT_NAME OPT COST
---- -------------------- --------------- ---------------------- --- ----------
0 SELECT STATEMENT CHO
1 FILTER
2 TABLE ACCESS BY INDEX ROWID EMPLOYEES ANA 98
3 AND-EQUAL
4 INDEX RANGE SCAN EMP_JOB_IX ANA
5 INDEX RANGE SCAN EMP_DEPARTMENT_IX ANA
6 INDEX RANGE SCAN ORD_SALES_REP_IX ANA 8
Note:
An even more efficient approach is to have a concatenated index on department_id and job_id. This eliminates the need to access two indexes and reduces the resources used.
[/php] |
|