|
原帖由 anlinew 于 2008-5-10 21:04 发表 ![]()
请教为何hash join是不可能的了
后面可以看到实际查询结果400多万,原表都是500万,所以条件的选择性是很差的,索引及nested loop join效果应该会适得其反
能做的只有选择hash join 和 msjoin 二者更高效的join 方式(oracle这里是hash 更好),并对其进行优化,如增加排序使用的内存
优化全表访问,如并行等
Probably the most common type of join method is the nested loop join (NLJ). To perform a NLJ, a qualifying row is identified in the outer table, and then the inner table is scanned searching for a match. A qualifying row is one in which the predicates for columns in the table match. When the inner table scan is complete, another qualifying row in the outer table is identified. The inner table is scanned for a match again, and so on. The repeated scanning of the inner table is usually accomplished with an index to minimize I/O cost.
The second type of join method used by DB2 is the merge join (MJ). With the MJ, the tables to be joined need to be ordered by the join predicates. That means that each table must be accessed in order by the columns that specify the join criteria. This ordering can be the result of either a sort or indexed access. After ensuring that both the outer and inner tables are properly sequenced, each table is read sequentially, and the join columns are matched up. Neither table is read more than once during a merge scan join.
The third type of join depends on the platform on which you are running DB2. For DB2 for OS/390 and z/OS there is the hybrid join. The hybrid join combines data and pointers to access and combine the rows from the tables being joined. A complete discussion of this join type is beyond the scope of this article.
For DB2 for Linux, UNIX, and Windows, the third type of join is the hash join. Hash join requires one or more predicates of the form table1.ColX = table2.ColY, and for which the column types are the same. The inner table is scanned and the rows copied into memory buffers drawn from the sort heap allocation. The memory buffers are divided into partitions based on a "hash code" computed from the column(s) of the join predicate(s). If the size of the first table exceeds the available sort heap space, buffers from selected partitions are written to temporary tables. After processing the inner table, the outer table is scanned and its rows are matched to the inner table rows by comparing the "hash code." Hash joins can require a significant amount of memory. So, for the hash join to produce realistic performance benefits, you may need to change the value of the sortheap database configuration parameter, and the sheapthres database manager configuration parameter.
这里是满足hash join的条件的啊,以前测试过,inner table很大(肯定会产生loop hash ,当然可以增大sorthep 可以减少循环次数)的情况下DB2比oracle更高效
hash join 应该需要优化级别 5 及以上级别,DB2_HASH_JOIN=Y应该是默认值吧
如果连接字段distinct key 很少的话,merge join 可能更高效,楼主的显然不是
总之LZ试试 HASH JOIN 吧 |
|