|
原帖由 sharklove 于 2008-3-21 12:01 发表
刚刚做了几个实验,感觉hash table的建立并不是简单的根据hash key的值,
因为这个实验中,发现不同的hash key居然可以放在一个bucket中,
两张表,small_table 有1857条记录,big_table有418072条记录
SQL> select /*+leading(a) full(a) use_hash(a b)*/ count(b.q)
2 from small_table a,big_table b
3 where a.stcdt=b.stcdt;
COUNT(B.Q)
----------
275340
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=551 Card=1 Bytes=15)
1 0 SORT (AGGREGATE)
2 1 HASH JOIN (Cost=551 Card=100336 Bytes=1505040)
3 2 TABLE ACCESS (FULL) OF 'ST_RIVER_R' (Cost=188 Card=418072 Bytes=3762648)
4 2 TABLE ACCESS (FULL) OF 'ST_STINFO_B' (Cost=5 Card=1857 Bytes=11142)
在这个实验中,hash key是一个primary key,是唯一值。
SQL> select count(*) from small_table;
COUNT(*)
----------
1857
SQL> select count(distinct stcdt) from small_table;
COUNT(DISTINCTSTCDT)
--------------------
1857
如果按照lz的说法,每个bucket中是不是最多只有一条记录呢?
看看trace信息
############# 10104 trace info #############
*** (continued) HASH JOIN BUILD HASH TABLE (PHASE 1) ***
### Hash table ###
# NOTE: The calculated number of rows in non-empty buckets may be smaller
# than the true number.
Number of buckets with 0 rows: 2625
Number of buckets with 1 rows: 1152
Number of buckets with 2 rows: 260
Number of buckets with 3 rows: 51
Number of buckets with 4 rows: 8
Number of buckets with 5 rows: 0
Number of buckets with 6 rows: 0
Number of buckets with 7 rows: 0
Number of buckets with 8 rows: 0
Number of buckets with 9 rows: 0
Number of buckets with between 10 and 19 rows: 0
Number of buckets with between 20 and 29 rows: 0
Number of buckets with between 30 and 39 rows: 0
Number of buckets with between 40 and 49 rows: 0
Number of buckets with between 50 and 59 rows: 0
Number of buckets with between 60 and 69 rows: 0
Number of buckets with between 70 and 79 rows: 0
Number of buckets with between 80 and 89 rows: 0
Number of buckets with between 90 and 99 rows: 0
Number of buckets with 100 or more rows: 0
### Hash table overall statistics ###
Total buckets: 4096 Empty buckets: 2625 Non-empty buckets: 1471
Total number of rows: 1857
Maximum number of rows in a bucket: 4
Average number of rows in non-empty buckets: 1.262407
从trace信息中看到Maximum number of rows in a bucket: 4
你没看清我的回复啊:
“col_name值不同也可能出现在一个bucket里面,因为当col_name number of distinct value > number of buckets是,难免会出现一个篮子里面有多个鸡蛋的情况”
一个大集合到小集合的映射当然会出现N->1的情况 |
|