还是查询优化的问题，可能有点怪

newkid · 发表于 2021-7-21 04:00

sql_tigerliu 发表于 2021-7-20 19:55
这个数据量用这个写法确实非常简单高效, 如果range_b表的数据量达到几百万(比如IP地址库), 这个时候这个写 ...

楼主说B表有2000行，做nested loop驱动表是很合适的。
不过我很疑惑，这么简单的解决办法为什么楼主没有发现呢？难道原数据还有什么奇怪之处？现在离原帖都快过去二十年了，我们也只能纸上谈兵而已。

jihuyao · 发表于 2021-7-21 09:37

I doubt about Oracle optimizer here get into much details on B ID ranges. So based on business design and scenarios either merge join (expand and sort B first) or anti-join (consolidate B rows and find gaps) may be more efficient.

newkid · 发表于 2021-7-22 00:17

如果不加HINTS那就是MERGE JOIN:

Plan hash value: 4188551763

-----------------------------------------------------------------------------------------
| Id  | Operation             | Name       | Rows  | Bytes | Cost (%CPU)| Time    |
-----------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT       |             | 96M|  1740M|  1470  (68)| 00:00:01 |
| 1 |  LOAD TABLE CONVENTIONAL | RANGE_RESULT |    |    |          |       |
| 2 | MERGE JOIN          |             | 96M|  1740M|  1470  (68)| 00:00:01 |
| 3 | SORT JOIN          |             | 200K|  1171K| 476 (1)| 00:00:01 |
| 4 |    INDEX FULL SCAN    | A_IDX       | 200K|  1171K| 476 (1)| 00:00:01 |
|*  5 | FILTER             |             |    |    |          |       |
|*  6 |    SORT JOIN          |             |  2000 | 26000 |    4  (25)| 00:00:01 |
| 7 |    TABLE ACCESS FULL | RANGE_B    |  2000 | 26000 |    3 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - filter("A"."ID"<="B"."END_ID")
6 - access(INTERNAL_FUNCTION("A"."ID")>=INTERNAL_FUNCTION("B"."START_ID"))
   filter(INTERNAL_FUNCTION("A"."ID")>=INTERNAL_FUNCTION("B"."START_ID"))

这是典型的不等连接的计划。

但是你说的 ANTI JOIN 我看不懂？你是说把空范围找出来然后用 NOT IN/NOT EXISTS? 注意这里要求取得B表的TYPE列，所以 ANTI JOIN是不行的。

newkid · 发表于 2021-7-22 00:47

还有一个现象，如果刘兄那个标量子查询转移到PL/SQL函数中（12C之前只能这么做），就变得慢了很多：

create or replace function f_get_type (p_id in number) return number
as
  v_ret number;
begin
  select case when start_id<=p_id then type else null end as type
into v_ret
      from
      (select /*+ index(range_b idx_range_b_1) */ type,start_id
      from range_b
      where end_id>=p_id
      order by end_id
      ) where rownum=1;
  return v_ret;

end f_get_type;
/

insert into range_result(id,type)
select *
from
(
select id,f_get_type(id) as type
from range_a a
) where type is not null;

111824 rows created.

Elapsed: 00:00:11.41

CONTEXT SWITCH的开销也太厉害了。

jihuyao · 发表于 2021-7-22 09:09

Yes anti-join has that problem here. And by design type col should be in A table to avoid possible multiple types for each ID after join unless there are some unknown business meaning behind. I am also curious on merge join. It should perform ok because I realized it only needs sorting B first (no need to expand ) on start and end columns.

newkid · 发表于 2021-7-22 21:18

楼主明确说了B表的范围不会有重复，所以连接后一行A表不会变成多行。正因为有这个前提我们才能改用标量子查询，因为标量子查询不能返回超过一行。
MERGE JOIN计划在我的机器上要跑半分钟。

newkid · 发表于 2021-7-22 21:53

上面提到写成PL/SQL函数之后context switch造成性能下降，下面来测试一下几种减少context switch的方法。

先尝试把函数定义在WITH里面。先不执行INSERT, 用COUNT测试一下：
WITH
  function f_get_type (p_id in number) return number
as
  v_ret number;
begin
  select case when start_id<=p_id then type else null end as type
into v_ret
      from
      (select /*+ index(range_b idx_range_b_1) */ type,start_id
      from range_b
      where end_id>=p_id
      order by end_id
      ) where rownum=1;
  return v_ret;

end f_get_type;
select count(*)
from
(
select id,f_get_type(id) as type
from range_a a
) where type is not null
/

SP2-0642: SQL*Plus internal error state 2147, context 0:0:0
Unsafe to proceed
Elapsed: 00:00:00.02

上面这个错误是SQLPLUS报的，比较奇怪。改了几次之后发现是因为函数里面的SELECT有可能引发NO_DATA_FOUND异常。平常的PL/SQL函数在SQL里面调用时，NO_DATA_FOUND会被SQL引擎忽略，当作返回NULL处理。但是写在WITH里面就会出这个奇怪的错。为了避免NO_DATA_FOUND，在外面套个MAX函数：

WITH
  function f_get_type (p_id in number) return number
as
  v_ret number;
begin
  select max(case when start_id<=p_id then type else null end) as type
into v_ret
      from
      (select /*+ index(range_b idx_range_b_1) */ type,start_id
      from range_b
      where end_id>=p_id
      order by end_id
      ) where rownum=1;
  return v_ret;

end f_get_type;
select count(*)
from
(
select id,f_get_type(id) as type
from range_a a
) where type is not null
/

  COUNT(*)
----------
111824

Elapsed: 00:00:06.05

比起原来写法少了5秒，相当可观了。下面尝试INSERT:

insert into range_result(id,type)
WITH
  function f_get_type (p_id in number) return number
as
  v_ret number;
begin
  select max(case when start_id<=p_id then type else null end) as type
into v_ret
      from
      (select /*+ index(range_b idx_range_b_1) */ type,start_id
      from range_b
      where end_id>=p_id
      order by end_id
      ) where rownum=1;
  return v_ret;

end f_get_type;
select *
from
(
select id,f_get_type(id) as type
from range_a a
) where type is not null
/

WITH
*
ERROR at line 2:
ORA-32034: unsupported use of WITH clause

可惜INSERT...SELECT不支持WITH里面写函数。

下面再来尝试PRAGMA UDF:

create or replace function f_get_type (p_id in number) return number
as
  PRAGMA UDF;    ---------- 这一行是关键
  v_ret number;
begin
  select case when start_id<=p_id then type else null end as type
into v_ret
      from
      (select /*+ index(range_b idx_range_b_1) */ type,start_id
      from range_b
      where end_id>=p_id
      order by end_id
      ) where rownum=1;
  return v_ret;

end f_get_type;
/

insert into range_result(id,type)
select *
from
(
select id,f_get_type(id) as type
from range_a a
) where type is not null
/

111824 rows created.

Elapsed: 00:00:11.09

和不加PRAGMA并没有什么区别。

ZALBB · 发表于 2021-7-23 09:00

newkid 发表于 2021-7-22 21:53
上面提到写成PL/SQL函数之后context switch造成性能下降，下面来测试一下几种减少context switch的方法。先 ...

测试数据是你自己准备的？

jihuyao · 发表于 2021-7-23 12:39

I was thinking about the logical flow in procedure way for merge join. Clearly if a_id is PK and b_id is expanded and sorted without overlap or duplicate then the merge join is straight one loop from 1 to 200,000(max). But if Oracle can not assume it is the case, merge join can be still performed without expanding in different path. And to deal with possible cross join result there will be a lot overload check and small inner loop. So if I am right, in order to force Oracle to take unique merge join, both A and B must have unique index on ID (current design prevents B from doing so). But for testing purpose can we prove the difference, one case using two unique index on ID and one case using two normal index on ID. I remember there are some discussions on how each index entry is stored in DB blocks but this may go too far beyond present topic. We can simply assume using unique and normal index are equally efficient.

newkid · 发表于 2021-7-23 21:21

ZALBB 发表于 2021-7-23 09:00
测试数据是你自己准备的？

测试数据是刘兄设计的，生成数据的脚本在20楼。