对USE_NL这个hint的一些疑问

ZALBB · 发表于 2008-4-24 17:45

原帖由 iso90000 于 2008-4-24 10:00 发表
I always thought use_hash('table') means the 'table' is the build table, i.e. the table whose joined columns are hashed into memory (or spilled to temp segment if hash area is too small 我用个例子来表明我对这句话的理解

eg1：
select /*+use_hash(a)*/ * from a,b where a.id1=b.id1;
Execution Plan
----------------------------------------------------------
0    SELECT STATEMENT Optimizer=CHOOSE (Cost=5 Card=409 Bytes=212
      68)
1 0 HASH JOIN (Cost=5 Card=409 Bytes=21268)
2 1    TABLE ACCESS (FULL) OF 'B' (Cost=2 Card=82 Bytes=2132)
3 1    TABLE ACCESS (FULL) OF 'A' (Cost=2 Card=409 Bytes=10634)

use_hash(a)表明将表a在内存中根据连接键建立hash table，扫描表b并探测刚建成的hash table，找出与hash table匹配
的行（然后扫描表b，每读到一条记录就来探测hash表一次，找出与hash表匹配的行）。

执行顺序：扫描表b（驱动表）,并通过hash算法探测刚建成的hash table（由表a建立的），找出匹配的行

eg2:
   如果想已表a在内存中建立的hash table为驱动表
SQL> select /*+ordered use_hash(a)*/ * from a,b where a.id1=b.id1;
Execution Plan
----------------------------------------------------------
0    SELECT STATEMENT Optimizer=CHOOSE (Cost=5 Card=409 Bytes=212
      68)
1 0 HASH JOIN (Cost=5 Card=409 Bytes=21268)
2 1    TABLE ACCESS (FULL) OF 'A' (Cost=2 Card=409 Bytes=10634)
3 1    TABLE ACCESS (FULL) OF 'B' (Cost=2 Card=82 Bytes=2132)

这2种方式有区别吗？

ps:在eygle的9i Performance Tuning Guide 读书笔记一中有这样一句
  Execution Expain 中在上面的是先被扫描的小表，用作建立 hash table，在下的是大表。这好像与我想的eg1就有出入了，按照这个理解就是表b在内存中建立了hash table.

在USE_HASH(A) 里指定驱动表并无意义，ORACLE不理会此说明，而是根据统计信息自行决定，除非用ORDERED 或 LEADING.

iso90000 · 发表于 2008-4-24 18:49

问题1：
我在前面也说了use_hash(a)对驱动表没有意思，我只是想问在hash join中是将这个hint—use_hash(a)中指定的表a做为hash table，还是将oracle根据统计信息决定的驱动表做为hash table

[ 本帖最后由 iso90000 于 2008-4-24 20:07 编辑 ]

iso90000 · 发表于 2008-4-24 19:58

问题2：
又看了些itpub的资料发现了这个SQL> SELECT /*+ ordered use_hash(t_max t_middle) */ COUNT (*)
  2 FROM  t_max, t_middle,t_small
  3  WHERE t_small.object_id = t_middle.object_id
  4  AND t_middle.object_id = t_max.object_id
  5  /Execution Plan
----------------------------------------------------------
0    SELECT STATEMENT Optimizer=CHOOSE (Cost=228 Card=1 Bytes=12)
1 0 SORT (AGGREGATE)
2 1    HASH JOIN (Cost=228 Card=400 Bytes=4800)
3 2    TABLE ACCESS (FULL) OF 'T_SMALL' (Cost=2 Card=100 Bytes=400)
4 2    HASH JOIN (Cost=225 Card=113776 Bytes=910208)
5 4       TABLE ACCESS (FULL) OF 'T_MAX' (Cost=151 Card=113792 Bytes=455168)
6 4       TABLE ACCESS (FULL) OF 'T_MIDDLE' (Cost=39 Card=28447 Bytes=113788)用了ordered，那么应该使用from子句中的表的顺序进行表连接，但为什么这里确没使用t_max为驱动表？还有就是SWAP_JOIN_INPUTS是什么意思，看英文没看明白

希望各位大师帮我解决上面提出的2个问题

[ 本帖最后由 iso90000 于 2008-4-24 20:04 编辑 ]

remen · 发表于 2008-4-26 16:31

你不妨去掉+ ordered看看区别到底在哪里

iso90000 · 发表于 2008-4-26 20:20

SQL> select /*+use_hash(t_max t_middle) */ * from t_max,t_middle,t_min where t_middle.object_id = t_max.object_id  and t_max.object_id = t_min.object_id;

已选择40行。

Execution Plan
----------------------------------------------------------
0    SELECT STATEMENT Optimizer=CHOOSE (Cost=91 Card=33 Bytes=153
      45)

1 0 HASH JOIN (Cost=91 Card=33 Bytes=15345)
2 1    HASH JOIN (Cost=47 Card=42 Bytes=14154)
3 2    TABLE ACCESS (FULL) OF 'T_MIN' (Cost=2 Card=42 Bytes=6
      720)

4 2    TABLE ACCESS (FULL) OF 'T_MAX' (Cost=42 Card=27586 Byt
      es=4882722)

5 1    TABLE ACCESS (FULL) OF 'T_MIDDLE' (Cost=41 Card=21956 Bytes=281
      0368)

[ 本帖最后由 iso90000 于 2008-4-26 20:21 编辑 ]

remen · 发表于 2008-4-28 08:49

Hint好像没生效阿，你的/*+use_hash(t_max t_middle) */应该是/*+ use_hash(t_max t_middle) */，少了一个空格

Yong Huang · 发表于 2008-4-28 12:51

原帖由 iso90000 于 2008-4-24 05:58 发表
问题2：
又看了些itpub的资料发现了这个SQL> SELECT /*+ ordered use_hash(t_max t_middle) */ COUNT (*)
  2 FROM  t_max, t_middle,t_small
  3  WHERE t_small.object_id = t_middle.object_id
  4  AND t_middle.object_id = t_max.object_id
  5  /Execution Plan
----------------------------------------------------------
0    SELECT STATEMENT Optimizer=CHOOSE (Cost=228 Card=1 Bytes=12)
1 0 SORT (AGGREGATE)
2 1    HASH JOIN (Cost=228 Card=400 Bytes=4800)
3 2    TABLE ACCESS (FULL) OF 'T_SMALL' (Cost=2 Card=100 Bytes=400)
4 2    HASH JOIN (Cost=225 Card=113776 Bytes=910208)
5 4       TABLE ACCESS (FULL) OF 'T_MAX' (Cost=151 Card=113792 Bytes=455168)
6 4       TABLE ACCESS (FULL) OF 'T_MIDDLE' (Cost=39 Card=28447 Bytes=113788)用了ordered，那么应该使用from子句中的表的顺序进行表连接，但为什么这里确没使用t_max为驱动表？还有就是SWAP_JOIN_INPUTS是什么意思

I think I know now. I was wrong in my understanding. iso90000 and ZALBB are correct. This comment by Jonathan Lewis is also interesting:
http://jonathanlewis.wordpress.com/2007/01/16/full-hinting/
"the use_hash(tabX) hint tells the optimizer to use a hash join to join this table to the preceding result set. It doesn’t dictate the order at all. The swap_join_inputs(tabX) forces tabX to be the build (first) table in the join, the 10g hint no_swap_join_inputs(tabX) forces tabX to be the probe (second) table."

So why did you get the above plan when you have ordered and use_hash and tables are t_max, t_middle and t_small in that order? Here, t_max is first hashed into memory (build table), as dictated by the ordered hint. t_middle is the driving i.e. probe table (each row of it is taken out, compute its hash value, check the in-memory hash table built earlier for t_max to find a match). This result is used to do a hash join with t_small. Now here's the tricky part. Since t_small (line 3) is above, not below, line 4, it's the build table for this hash join (line 2). I think the normal reading order of the plan is not quite appropriate here. In reality, I'm not sure if Oracle already starts to build t_small's hash cluster in memory or not when it's working on lines 5 and 6.

What if you add swap_join_inputs(t_small)? If the plan stays the same as above, it just guarantees this plan's stability. If t_small drops to the bottom, I'll be surprised.

Yong Huang

Yong Huang · 发表于 2008-4-28 12:58

原帖由 iso90000 于 2008-4-26 06:20 发表
SQL> select /*+use_hash(t_max t_middle) */ * from t_max,t_middle,t_min where t_middle.object_id = t_max.object_id  and t_max.object_id = t_min.object_id;

已选择40行。

Execution Plan
----------------------------------------------------------
0    SELECT STATEMENT Optimizer=CHOOSE (Cost=91 Card=33 Bytes=15345)
1 0 HASH JOIN (Cost=91 Card=33 Bytes=15345)
2 1    HASH JOIN (Cost=47 Card=42 Bytes=14154)
3 2    TABLE ACCESS (FULL) OF 'T_MIN' (Cost=2 Card=42 Bytes=6720)
4 2    TABLE ACCESS (FULL) OF 'T_MAX' (Cost=42 Card=27586 Bytes=4882722)
5 1    TABLE ACCESS (FULL) OF 'T_MIDDLE' (Cost=41 Card=21956 Bytes=2810368)

It's always a good idea to leave a space between + and the first letter of the hint. It matters in PL/SQL but I think it's OK in straight SQL.

This SQL simply says whenever t_max or t_middle is in a join, do hash join (not any other type of join), regardless which of the joining tables is build or probe (earlier I was wrong on this). Since t_min (line 3) has much lower cardinality than t_max (line 4), t_min is chosen to be build. Then since the cardinality of the result (line 2) is much smaller than t_middle (line 5), line 2's result is build and line 5 probe.

Correct me if I'm wrong again.

Yong Huang

iso90000 · 发表于 2008-4-30 10:54

to remen :正如Yong Huang所说在PL/SQL块中的提示，在'+'后面必须跟一个空格。这是很重要的，否则提示可能会被忽略，因为一些版本的          PL/SQL在把查询传给SQL引擎的时候，会忽略掉'+'号后面的第一个字符。但是直接写的SQL应该不需要吧
to Yong Huang：我把你的话做了个总结
(1)首先是2张表的情况：
SQL> set autotrace traceonly
SQL> select /*+ use_hash(t_middle) */ * from t_middle,t_min where t_middle.object_id = t_min.object_id ;
已选择55行。

Execution Plan
----------------------------------------------------------
0    SELECT STATEMENT Optimizer=CHOOSE (Cost=44 Card=56 Bytes=161
      28)
1 0 HASH JOIN (Cost=44 Card=56 Bytes=16128)
2 1    TABLE ACCESS (FULL) OF 'T_MIN' (Cost=2 Card=56 Bytes=896
      0)
3 1    TABLE ACCESS (FULL) OF 'T_MIDDLE' (Cost=39 Card=24069 By
      tes=3080832)

Statistics
----------------------------------------------------------
      268  recursive calls
      0  db block gets
      445  consistent gets
      397  physical reads
      0  redo size
   8284  bytes sent via SQL*Net to client
      536  bytes received via SQL*Net from client
      5  SQL*Net roundtrips to/from client
      2  sorts (memory)
      0  sorts (disk)
      55  rows processed
SQL> select /*+ use_hash(t_middle) */ * from t_min,t_middle where t_middle.object_id = t_min.object_id ;
已选择55行。

Execution Plan
----------------------------------------------------------
0    SELECT STATEMENT Optimizer=CHOOSE (Cost=44 Card=56 Bytes=161
      28)
1 0 HASH JOIN (Cost=44 Card=56 Bytes=16128)
2 1    TABLE ACCESS (FULL) OF 'T_MIN' (Cost=2 Card=56 Bytes=896
      0)
3 1    TABLE ACCESS (FULL) OF 'T_MIDDLE' (Cost=39 Card=24069 By
      tes=3080832)

Statistics
----------------------------------------------------------
      6  recursive calls
      0  db block gets
      418  consistent gets
      299  physical reads
      0  redo size
   8284  bytes sent via SQL*Net to client
      536  bytes received via SQL*Net from client
      5  SQL*Net roundtrips to/from client
      2  sorts (memory)
      0  sorts (disk)
      55  rows processed
从这儿可以看出use_hash(t_middle)并没有将t_middle指定为驱动表，应该是根据你说的oracle由cardinality决定哪张表作为hash table,这里可以看出是将t_min做为hash table,然后扫描表t_middle,并通过hash算法探测刚建成的hash table（由表t_min建立的），找出匹配的行。
在这里我认为哪张表在内存中建立了hash table，那么它就是驱动表(driving table),在这个例子里就是t_min。

   这点好像与Yong Huang大师上面的一句话有出入“So why did you get the above plan when you have ordered and use_hash and tables are t_max, t_middle and t_small in that order? Here, t_max is first hashed into memory (build table), as dictated by the ordered hint. t_middle is the driving i.e. probe table (each row of it is taken out, compute its hash value, check the in-memory hash table built earlier for t_max to find a match). ”在大师引用的例子中，首先t_max表为hash table这个是肯定的，因为使用了ordered hint。但是大师好像认为t_middle为驱动表(t_middle is the driving),这个观点我不认同，t_middle只是探测表(probe table),驱动表(driving table)应该是t_max。不知我理解的对不对，希望大师帮助。

最后如果你想让t_max在内存中建里hash table(即让t_max为驱动表)，这是使用Hint ordered就可以了，因为ordered使用from子句中的表的顺序进行表连接。
SQL> select /*+ordered use_hash(t_max) */ * from t_max,t_min where t_max.object_id = t_min.object_id ;
已选择55行。

Execution Plan
----------------------------------------------------------
0    SELECT STATEMENT Optimizer=CHOOSE (Cost=127 Card=56 Bytes=18
      872)

1 0 HASH JOIN (Cost=127 Card=56 Bytes=18872)
2 1    TABLE ACCESS (FULL) OF 'T_MAX' (Cost=42 Card=17681 Bytes
      =3129537)

3 1    TABLE ACCESS (FULL) OF 'T_MIN' (Cost=2 Card=56 Bytes=896
      0)


Statistics
----------------------------------------------------------
      6  recursive calls
      0  db block gets
      441  consistent gets
      476  physical reads
      0  redo size
   9294  bytes sent via SQL*Net to client
      536  bytes received via SQL*Net from client
      5  SQL*Net roundtrips to/from client
      2  sorts (memory)
      0  sorts (disk)
      55  rows processed
ps:
Yong Huang:What if you add swap_join_inputs(t_small)? If the plan stays the same as above, it just guarantees this plan's stability. If t_small drops to the bottom, I'll be surprised.
使用swap_join_inputs(t_small)不会产生t_small drops to the bottom这种情况，因为 swap_join_inputs(tabX) forces tabX to be the build (first) table in the join，也就是说使用这个hint，会将t_small表作为内存中的hash table,而这与没有使用这个hint的原来的计划没有变化，所以不会出现t_small drops to the bottom这种情况。
但是如果执行计划中t_small本身就在最后(不是hash table)，那么使用这个hint会将t_small表变成hash table(驱动表)。

the end:
   超过2张表时，ordered hint好像会有问题，关于多张表的连接的详细测试，我稍后补上。

[ 本帖最后由 iso90000 于 2008-4-30 15:00 编辑 ]

Yong Huang · 发表于 2008-5-1 02:31

原帖由 iso90000 于 2008-4-29 20:54 发表
...
在这里我认为哪张表在内存中建立了hash table，那么它就是驱动表(driving table),在这个例子里就是t_min。

这点好像与Yong Huang大师上面的一句话有出入“So why did you get the above plan when you have ordered and use_hash and tables are t_max, t_middle and t_small in that order? Here, t_max is first hashed into memory (build table), as dictated by the ordered hint. t_middle is the driving i.e. probe table (each row of it is taken out, compute its hash value, check the in-memory hash table built earlier for t_max to find a match). ”在大师引用的例子中，首先t_max表为hash table这个是肯定的，因为使用了ordered hint。但是大师好像认为t_middle为驱动表(t_middle is the driving),这个观点我不认同，t_middle只是探测表(probe table),驱动表(driving table)应该是t_max。不知我理解的对不对，希望大师帮助。
...
Yong Huang:What if you add swap_join_inputs(t_small)? If the plan stays the same as above, it just guarantees this plan's stability. If t_small drops to the bottom, I'll be surprised.
使用swap_join_inputs(t_small)不会产生t_small drops to the bottom这种情况，因为 swap_join_inputs(tabX) forces tabX to be the build (first) table in the join，也就是说使用这个hint，会将t_small表作为内存中的hash table,而这与没有使用这个hint的原来的计划没有变化，所以不会出现t_small drops to the bottom这种情况。
但是如果执行计划中t_small本身就在最后(不是hash table)，那么使用这个hint会将t_small表变成hash table(驱动表)。

The word "driving" or 驱动 applies to the table from which you take one row at a time out (let's not consider table prefetching for now.) In a nested loop join, this plan

Nest Loop
  table1
  table2

shows that table1 is the driving table. But in a hash join

Hash Jion
  table1
  table2
the word driving should be applied to the probe table, because the upper one table1 is already hashed into memory. We need to fetch one row from table2 out, calculate its hash value, look up the value in the hash table and see if there's a match. Then fetch the second row from table2 and do the same.

I don't think there's ambiguity about this word. Words such as "inner" and "outer" may be confusing and mis-used even in Oracle documentation, especially in the context of hash joins (so never say "inner" or "outer" when you talk about hash joins). But I don't think there's confusion in literature about the word "driving".

I'm glad to see that swap_join_inputs(t_small) indeed does not change the plan. The name of this undocumented hint would better be called something like "hash_build_table", because "swap" implies it does position swapping even if the table is already the build table.

Yong Huang

[ 本帖最后由 Yong Huang 于 2008-5-1 12:31 编辑 ]

对USE_NL这个hint的一些疑问

浏览过的版块