<高性能SQL调优精要与案例解析>中10.4部分SQL改写代码

sqysl · 发表于 2021-4-6 09:11

newkid 发表于 2021-4-4 04:40
惭愧，我都没注意到闫老师在本版发了样章。确实写得比较粗糙了点，有很多内容没有展开。关于10.4.7那个OR改 ...

帖子有点久了，谢谢各位大师点评。

〇〇 · 发表于 2021-4-7 06:07

没换工作，电脑换成arm64 CPU了

newkid · 发表于 2021-4-7 08:22

这两天我也会写个点评，欢迎讨论。

newkid · 发表于 2021-4-7 22:14

10.4 SQL 语句改写

10.4.1 消除视图
改写前：
SQL> create view v_test as select * from tab1 where col1=...;
SQL> select * from tab2 t2,tab3 t3,v_test v where t2.col1=t3.col1 and t3.col1=v.col1 and v.col2=...;
改写后：
SQL> select * from tab2 t2,tab3 t3,tab1 t1
where t2.col1=t3.col1 and t3.col1=t1.col1 and t1.col2=... and t1.col1=...;

(点评：这样做的好处在哪里？一般来说，定义视图是为了重用代码，隐藏复杂逻辑。这个例子改写后代码变多了，性能上也不见得提升。在12C已经有一个系统功能可以自动替你做这样的事：DBMS_UTILITY.expand_sql_text)

10.4.2 标量子查询改为外连接
改写前：
SQL> select t1.col1,
         ( select col2
            from ( select t2.col1,sum(t2.col2) col2 from tab2 t2 group by t2.col1 ) t3
            where t3.col1=t1.col2
         ) col2_sum
   from tab1 t1
where t1.col3=...;
改写后：
SQL> select t1.col1,t3.col2 col2_sum
   from tab1 t1, ( select t2.col1,sum(t2.col2) col2 from tab2 t2 group by t2.col1 ) t3
   where t1.col2=t3.col1(+) and t1.col3=...;

(点评：标量子查询好不好，取决于调用次数有多少以及执行一次代价有多少。如果外层结果行数不多，或者驱动子查询所用的值没有多少个（这样可以利用行间缓存），每次查询都能高效利用索引，那么有时候比改写成外连接还更好。
)

10.4.3 updatepdate改为改为merge into
改写前：
SQL> update tab1 t1 set col2= ( select col2 from tab2 t2 where t1.col1=t2.col1 ) where t1.col3=...;
改写后：
SQL> merge into tab1 t1 using tab2 t2 on ( t1.col1=t2.col1 ) when matched then update set t1.col2=t2.col2 where t1.col3=...;

(点评：注意改写之前，假设T2中找不到相应数据，那么T1.COL2是会被修改成NULL的。所以不能够直接 USING TAB2，而是要USING (select ... from tab1 t1 left join tab2 t2 on t1.col1=t2.col1 where t1.col3=...)
如果改写前WHERE条件里面有EXISTS条件保证T2一定能找到数据，那么可以直接 USING TAB2。
)

10.4.4 正确使用分析函数
改写前：
SQL> select t1.col1 from tab1 t1 where t1.col2> ( select avg(col2) from tab1 t2 where t2.col3=t1.col3 );
改写后：
SQL> select t2.col1 from ( select t1.col1,t1.col2,avg(t1.col2) over(partition by c3) avg_col2 from tab1 t1 ) t2
where t2.col2>t2.avg_col2;

10.4.5 with as去除多次扫描
改写前：
SQL> select t1.col1,t1.col2
   from tab1 t1
   where t1.col3 in ( select col2 from tab2 where col1=... )
         and t1.col4 in  ( select col3 from tab2 where col1=... and col2>... )
         and t1.col5 in ( select col4 from tab2 where col1=... and col2>… and col3<... );

改写后：
SQL> with t1 as ( select col2,col3,col4 from tab2 where col1=... )
      , t2 as ( select col3 from t1 where col2>... )
      , t3 as ( select col4 from t1 where col2>... and col3<... )
select distint t4.col1,t4.col2
from tab1 t4,t1,t2,t3
   where t4.col3=t1.col2 and t4.col4=t2.col3 and t4.col5=t3.col4;

(点评：这个改写是不对的，加上distinct之后和原来不等价了。原查询可能返回重复的col1,col2结果。正确的改法是t1,t2,t3分别加上distinct。但是t1为了照顾t2和t3加上了额外的列，distinct也不行，要在join之前做。
with t1 as ( select col2,col3,col4 from tab2 where col1=... )
      , t2 as ( select distinct col3 from t1 where col2>... )
      , t3 as ( select distinct col4 from t1 where col2>... and col3<... )
select t4.col1,t4.col2
from tab1 t4,(select distinct col2 from t1) t1,t2,t3
   where t4.col3=t1.col2 and t4.col4=t2.col3 and t4.col5=t3.col4;
)

10.4.6 union 改为or
改写前：
SQL> select t1.col1
from tab1 t1
where t1.col2=...
      and t1.col3 in ( select col1 from tab2 t2 where col2=... union select col1 from tab2 t3 where col3>... );

改写后：
SQL> select t1.col1
   from tab1 t1
      where t1.col2=...
         and t1.col3 in ( select col1 from tab2 t2 where col2=... or col3>...);

10.4.7 or改为union
改写前：
SQL> select t1.col1
   from tab1 t1
   where t1.col2=... and t1.col3 in ( select col1 from tab2 t2 where col2=... or col3>...);

改写后：
SQL> select t1.col1
   from tab1 t1
   where t1.col2=... and t1.col3 in ( select col1 from tab2 t2 where col2=... union select col1 from tab2 t3 where col3>... );

(点评：此处改写的是 IN 子查询里面的代码，这样做是没有问题，否则要注意UNION是会对整个结果集去重的，可能和OR写法不等价)

10.4.8 in改为join
改写前：
SQL> select t1.col1 from tab1 t1 where t1.col2 in ( select col1 from tab2 where col2=... and col3>... );

改写后：
SQL> select distinct t1.col1 from tab1 t1,tab2 t2 where t1.col2=t2.col1 and col2=... and col3>...;
(点评：类似10.4.5，distinct不能放在这里，而是应该放在t2。现在的优化器会自动做这类改写。
select t1.col1
from tab1 t1,
( select distinct col1 from tab2 where col2=... and col3>... ) t2
where t1.col2=t2.col1
)

10.4.9 in改为改为exists
改写前：
SQL> select t1.col1 from tab1 t1 where t1.col2 in ( select col1 from tab2 where col2=... );

改写后：
SQL> select t1.col1 from tab1 t1 where exists ( select 1 from tab2 t2 where t1.col2=t2.col1 and t2.col2=... );

10.4.10
not in改为改为not exists
改写前：
SQL> select col1
   from tab1
   where col2=...
      and col3 not in ( select col1 from tab2 where col2=… and col1 is not null);

改写后：
SQL> select col1
   from tab1 t1
   where col2=...
         and not exists ( select 1 from tab2 t2 where col2=... and t1.col3=t2.col1)
         and col3 is not null;

10.4.11 not exists改为not in
改写前：
SQL> select col1
   from tab1 t1
   where col2=...
         and not exists ( select 1 from tab2 t2 where col2=... and t1.col3=t2.col1)
         and col3 is not null;

改写后：
SQL> select col1
   from tab1
   where col2=...
         and col3 not in ( select col1 from tab2 where col2=… and col1 is not null);

10.4.12 exists改为join
改写前：
SQL> select * from tab1 t1 where col2=... and exists ( select * from tab2 t2 where t1.col3=t2.col1 and t2.col2=... );

改写后：
SQL> select distinct t1.*
   from tab1 t1,tab2 t2 where t1.col2=... and t1.col3=t2.col1 and t2.col2=...;

(点评：因为是select *, 假设t1有唯一键，那么加上distinct是没问题的。但是在一些情况下，效率上不如在 join 之前做distinct:
select t1.*
   from tab1 t1,
      (select distinct col1 from tab2 where col2=...) t2
where t1.col2=... and t1.col3=t2.col1;
)

10.4.13 not exists改为join
改写前：
SQL> select col1 from tab1 t1 where col2=... and not exists
   ( select 1 from tab2 t2 where col1=... and t1.col3=t2.col2 );

改写后：
SQL> select t1.col1
   from tab1 t1,tab2 t2
   where t1.col2=... and t2.col1=... and t1.col3=t2.col2(+) and t2.col2 is null;

(点评：严谨一点应该说改为outer join
这个改法是错的，最后有t2.col1=... 这个条件，已经变成inner join了。应该写t2.col1(+)=...
推荐用ansi join语法：
select t1.col1
   from tab1 t1 left join tab2 t2 on t2.col1=... and t1.col3=t2.col2
   where t1.col2=... and t2.col2 is null;
)

10.4.14 join改为exists
改写前：
SQL> select distinct t1.col1 from tab1 t1,tab2 t2 where t1.col2=... and t2.col1=... and t1.col3=t2.col2;

改写后：
SQL> select distinct col1
   from tab1
   where col2=... and exists ( select 1 from tab2 t2 where t2.col1=... and t1.col3=t2.col2 );

10.4.15 join改为not exists
改写前：
SQL> select t1.col1 from tab1 t1,tab2 t2 where t1.col2=... and t2.col1=... and t1.col3=t2.col2(+) and t2.col2 is null;

改写后：
SQL> select col1 from tab1 t1 where col2=... and not exists ( select 1 from tab2 t2 where col1=... and t1.col3=t2.col2 );

(点评：参见10.4.13)

10.4.16 改写为集合运算符
改写前：
SQL> select col1
   from tab1
   where col2=... and col3 in ( select col1 from tab2 where col2=... )
      and col3 in ( select col1 from tab3 where col2... and col3>... );

改写后：
SQL> select col1
from tab1
where col2=...
      and col3 in ( select col1 from tab2 where col2=... intersect select col1 from tab3 where col2=... and col3>... );

(点评：把in子查询的intersect改成inner join可以提高效率)

[原创] <高性能SQL调优精要与案例解析>中10.4部分SQL改写代码