这个sql怎么优化!

wangzhonnew · 发表于 2008-5-10 23:23

这里hashjoin不一定好，因为sortheap过小，会有很多的hash overflow到temp中去

askgyliu · 发表于 2008-5-11 00:56

原帖由 anlinew 于 2008-5-10 21:43 发表

如果连接字段distinct key 很少的话，merge join 可能更高效

can elaborate more on this? Don't get the point:

1) What does "distinct key" mean? Number of columns in the joib, or number of distinct values in the join column(s)?
2) Why MS Joib will be more efficient if "distinct key" is less?

unixnewbie · 发表于 2008-5-11 08:23

原帖由 anlinew 于 10/5/2008 23:04 发表

请教为何hash join是不可能的了

后面可以看到实际查询结果400多万，原表都是500万，所以条件的选择性是很差的，索引及nested loop join效果应该会适得其反

能做的只有选择hash join 和 msjoin 二者更高效的join 方式（oracle这里是hash 更好），并对其进行优化，如增加排序使用的内存

优化全表访问，如并行等

因为从第一个explain中看到MSJOIN已经overflow了，且HASH JOIN比MSJOIN需要更多内存。所以这个时候optimizor没有选择HSJOIN。
当然，OPTIMIZER也是基于stats上做判断的，如果stats不够或不更新，那实际运行与explain是有很大区别的。

[ 本帖最后由 unixnewbie 于 2008-5-12 20:23 编辑 ]

myfriend2010 · 发表于 2008-5-11 14:43

原帖由 anlinew 于 2008-5-10 21:43 发表

这里是满足hash join的条件的啊，以前测试过，inner table很大（肯定会产生loop hash ，当然可以增大sorthep 可以减少循环次数）的情况下DB2比oracle更高效

hash join 应该需要优化级别 5 及以上级别,DB2_HASH_JOIN=Y应该是默认值吧

如果连接字段distinct key 很少的话，merge join 可能更高效，楼主的显然不是

总之LZ试试 HASH JOIN 吧

天，DB2又不是ORACLE，敢问我怎么试试HASH JOIN ？别给我说用狼的那个精化帖--执行计划文件！

wangzhonnew · 发表于 2008-5-11 21:32

感觉hashjoin还是不如msjoin，因为first key cardinality = number of row:
Schema: CCP
Name: IDX_CBSUM_C_0712
Type: Index
Number of rows: 1269369
Width of rows: -1
Number of buffer pool pages: 317505
Distinct row values: Yes
Tablespace name: CCP
Tablespace overhead: 12.670000
Tablespace transfer rate: 0.180000
Source for statistics: Single Node
Prefetch page count: 32
Container extent page count: 32
Index clustering statistic: 25.000000
Index leaf pages: 6717
Index tree levels: 3
Index full key cardinality: 1269369
Index first key cardinality: 1269369
Base Table Schema: CCP
Base Table Name: CUST_BILLSUM_200712
Columns in index:
CUST_ID

askgyliu · 发表于 2008-5-11 22:54

Don't see why Hash Join will be more efficient for this query, even in theory.

http://dsnowondb2.blogspot.com/2005/10/joining-in-db2-udb.html

But of couse, the only proven method to find the performance is by the actual test, i.e. run the query.

DB2 for LUW lacks the ability to do a detailed trace of the SQL execution like what Oracle provides, and this makes the performance tuning/learning a painful process. I would strongly suggest LZ to test out the different scenarios/possibility of performance improving, that will really help the community.

anlinew · 发表于 2008-5-12 08:34

原帖由 myfriend2010 于 2008-5-11 14:43 发表

天，DB2又不是ORACLE，敢问我怎么试试HASH JOIN ？别给我说用狼的那个精化帖--执行计划文件！

把优化级别设置到更高，看DB2会不会选择hash join

anlinew · 发表于 2008-5-12 08:37

原帖由 wangzhonnew 于 2008-5-10 23:23 发表
这里hashjoin不一定好，因为sortheap过小，会有很多的hash overflow到temp中去

所以也需要适当增加sortheap减少hash noops，提高join的效率

myfriend2010 · 发表于 2008-5-12 08:53

这就是我最近一直考虑的事!---见20L

回复 #20 myfriend2010 的帖子

原帖由 anlinew 于 2008-5-12 08:37 发表

所以也需要适当增加sortheap减少hash noops，提高join的效率

anlinew · 发表于 2008-5-12 09:16

原帖由 askgyliu 于 2008-5-11 00:56 发表

can elaborate more on this? Don't get the point:

1) What does "distinct key" mean? Number of columns in the joib, or number of distinct values in the join column(s)?
2) Why MS Joib will be more efficient if "distinct key" is less?

1)select count(distinct col ) from table
2)这个说的有些随意，成本高低还是要看merge 时 sort + sort后的结果集做merge 的成本及 hash时生成hash值构建hash table然后与out table hash匹配的成本，关联字段的数据分布是会影响到sort的成本及其结果集
举个特例 select * from a,b where a.col=b.col
而a，b的col 的值全为1，这样的sql最好的join 方式应该就是 Cartesian join 了，而Cartesian应该算作merge的一种特例
个人意见

[ 本帖最后由 anlinew 于 2008-5-12 09:19 编辑 ]

这个sql怎么优化!

浏览过的版块