这个sql怎么优化!

myfriend2010 · 发表于 2008-5-9 09:01

这个夸张了点吧,我还的建缓冲池!

原帖由 askgyliu 于 2008-5-8 21:26 发表
正常TABLE的话，可以再考虑把EXTENT SIZE调大到4M到64MB，看看会有什么不同没有。

anlinew · 发表于 2008-5-9 09:06

原帖由 askgyliu 于 2008-5-8 20:40 发表
要测试的话要么用个DMS TEMP TABLESPACE，或是正常TABLE但NOT LOGGED INITIALLY，才比较有可比性。

SMS 的temp tablespace快还是DMS的temp tablespace快？
有人建议使用SMS的temp tablespace
实际测试也的确很有效果

myfriend2010 · 发表于 2008-5-9 09:19

原帖由 anlinew 于 2008-5-9 09:06 发表

SMS 的temp tablespace快还是DMS的temp tablespace快？
有人建议使用SMS的temp tablespace
实际测试也的确很有效果

用DMS的temp tablespace目的是为了提前分配使用空间,避免在运行是不停进行分配空间操作,浪费CPU,也就是说为了测试而已!

当然在实际开发过程中,推荐使用SMS的temp tablespace,这样不浪费存储!

anlinew · 发表于 2008-5-9 10:57

原帖由 myfriend2010 于 2008-5-9 09:19 发表

用DMS的temp tablespace目的是为了提前分配使用空间,避免在运行是不停进行分配空间操作,浪费CPU,也就是说为了测试而已!

当然在实际开发过程中,推荐使用SMS的temp tablespace,这样不浪费存储!

我之前也是这样认为的。。。。
不是从浪费存储的角度考虑，是从效率的角度考虑，你可以试试哪种效率高
SMS，no file system caching，提供合适的bufferpool

[ 本帖最后由 anlinew 于 2008-5-9 10:58 编辑 ]

askgyliu · 发表于 2008-5-9 16:54

原帖由 myfriend2010 于 2008-5-9 09:01 发表
这个夸张了点吧,我还的建缓冲池!

extent size, not page size...

So, have you tested out the performance with DMS?

myfriend2010 · 发表于 2008-5-9 17:40

sorry,服务器被占用了！那边开始出数据！no hurry，我抓紧时间搞！

wangzhonnew · 发表于 2008-5-9 19:19

依然感觉主要瓶颈在查询，LZ为什么不对查询单独作benchmark？

anlinew · 发表于 2008-5-10 21:04

原帖由 unixnewbie 于 2008-5-6 20:50 发表
如果table CUST_BILLSUM_200802没有update的distribution stats，则先用
select count(*) from CUST_BILLSUM_200802 where USEE8_TIME is null with ur
看看满足条件的有多少。

我估计要不USEE8_TIME上没有关于“null”值的frequent value stats，要不就是满足null的Row仍然有很多。由于参加joining的三个都是超过million rows的table，从第一个explain output就知道就算merge join已经产生spilling，所以hash join是不可能的了。那么nljoin可能吗？假设由于满足null的Row仍然有很多或者根本没有null的统计值（DB2仍假设有很多满足null的rows），所以就算其作为outer table，nested loop join中的index scan次数仍然很多。且index scan后还要根据RID从Table中再读出ROWS内容（如果cluster ratio不高的话，产生的“小”IO更多）。MSJOIN虽然溢出，不过有可能还在bufferpool中并未产生physical IO，且outer and inter只需要分别scan 一次，还可以利用sequential scan prefeching。

正如askgyliu所说，关键是看符合null值的有多少。如果很少且db2已经知道（通过distribution stats），那么nljoin（须有cust_id index配合）会是个更好的选择。

请教为何hash join是不可能的了

后面可以看到实际查询结果400多万，原表都是500万，所以条件的选择性是很差的，索引及nested loop join效果应该会适得其反

能做的只有选择hash join 和 msjoin 二者更高效的join 方式（oracle这里是hash 更好），并对其进行优化，如增加排序使用的内存

优化全表访问，如并行等

[ 本帖最后由 anlinew 于 2008-5-10 21:16 编辑 ]

anlinew · 发表于 2008-5-10 21:05

原帖由 wangzhonnew 于 2008-5-9 19:19 发表
依然感觉主要瓶颈在查询，LZ为什么不对查询单独作benchmark？

同意主要成本耗费在查询上

anlinew · 发表于 2008-5-10 21:43

原帖由 anlinew 于 2008-5-10 21:04 发表

请教为何hash join是不可能的了

后面可以看到实际查询结果400多万，原表都是500万，所以条件的选择性是很差的，索引及nested loop join效果应该会适得其反

能做的只有选择hash join 和 msjoin 二者更高效的join 方式（oracle这里是hash 更好），并对其进行优化，如增加排序使用的内存

优化全表访问，如并行等

Probably the most common type of join method is the nested loop join (NLJ). To perform a NLJ, a qualifying row is identified in the outer table, and then the inner table is scanned searching for a match. A qualifying row is one in which the predicates for columns in the table match. When the inner table scan is complete, another qualifying row in the outer table is identified. The inner table is scanned for a match again, and so on. The repeated scanning of the inner table is usually accomplished with an index to minimize I/O cost.

The second type of join method used by DB2 is the merge join (MJ). With the MJ, the tables to be joined need to be ordered by the join predicates. That means that each table must be accessed in order by the columns that specify the join criteria. This ordering can be the result of either a sort or indexed access. After ensuring that both the outer and inner tables are properly sequenced, each table is read sequentially, and the join columns are matched up. Neither table is read more than once during a merge scan join.

The third type of join depends on the platform on which you are running DB2. For DB2 for OS/390 and z/OS there is the hybrid join. The hybrid join combines data and pointers to access and combine the rows from the tables being joined. A complete discussion of this join type is beyond the scope of this article.

For DB2 for Linux, UNIX, and Windows, the third type of join is the hash join. Hash join requires one or more predicates of the form table1.ColX = table2.ColY, and for which the column types are the same. The inner table is scanned and the rows copied into memory buffers drawn from the sort heap allocation. The memory buffers are divided into partitions based on a "hash code" computed from the column(s) of the join predicate(s). If the size of the first table exceeds the available sort heap space, buffers from selected partitions are written to temporary tables. After processing the inner table, the outer table is scanned and its rows are matched to the inner table rows by comparing the "hash code." Hash joins can require a significant amount of memory. So, for the hash join to produce realistic performance benefits, you may need to change the value of the sortheap database configuration parameter, and the sheapthres database manager configuration parameter.

这里是满足hash join的条件的啊，以前测试过，inner table很大（肯定会产生loop hash ，当然可以增大sorthep 可以减少循环次数）的情况下DB2比oracle更高效

hash join 应该需要优化级别 5 及以上级别,DB2_HASH_JOIN=Y应该是默认值吧

如果连接字段distinct key 很少的话，merge join 可能更高效，楼主的显然不是

总之LZ试试 HASH JOIN 吧

这个sql怎么优化!

浏览过的版块