这个sql怎么优化!

myfriend2010 · 发表于 2008-5-7 09:28

原帖由 unixnewbie 于 2008-5-6 20:50 发表
如果table CUST_BILLSUM_200802没有update的distribution stats，则先用
select count(*) from CUST_BILLSUM_200802 where USEE8_TIME is null with ur
看看满足条件的有多少。

我估计要不USEE8_TIME上没有关于“null”值的frequent value stats，要不就是满足null的Row仍然有很多。由于参加joining的三个都是超过million rows的table，从第一个explain output就知道就算merge join已经产生spilling，所以hash join是不可能的了。那么nljoin可能吗？假设由于满足null的Row仍然有很多或者根本没有null的统计值（DB2仍假设有很多满足null的rows），所以就算其作为outer table，nested loop join中的index scan次数仍然很多。且index scan后还要根据RID从Table中再读出ROWS内容（如果cluster ratio不高的话，产生的“小”IO更多）。MSJOIN虽然溢出，不过有可能还在bufferpool中并未产生physical IO，且outer and inter只需要分别scan 一次，还可以利用sequential scan prefeching。

正如askgyliu所说，关键是看符合null值的有多少。如果很少且db2已经知道（通过distribution stats），那么nljoin（须有cust_id index配合）会是个更好的选择。

1> "从第一个explain output就知道就算merge join已经产生spilling，所以hash join是不可能的",why

,我记得hash join如果没有足够的空间容纳表,那么会写入磁盘的临时表啊!这里能否详细解释下?
2> 我去掉a.USEE8_TIME is null 的过滤,执行计划也没有发生什么变化!也就是说执行计划和USEE8_TIME 的关系不大!

附上去掉a.USEE8_TIME is null 后的执行计划

myfriend2010 · 发表于 2008-5-7 09:34

深刻啊!还有一个1166257*1460的排序,这么说来即使我想方设法把sortheap增大也无济于事,因为需要的sortheap太大了!

原帖由 wangzhonnew 于 2008-5-6 19:22 发表
got questions, for your comment (2), why do you think MSJOIN is correct? i was thinking SORT may underestimated the cost because for a small SORTHEAP, 3 sorts in the plan may overflow to tempspace or even overflow to disk... (but didn't do manual calculation, just a brief idea)
i think you mean TBSCAN on CUST_BILLSUM_200 was correct, because most of the rows in the table are NOT NULL...

3) when optlvl<2 db2 uses greedy algorithm rather than dynamic programming, especially in opt=0, which will pickup index whenever possible, so that's a common way to test how's the runtime performance with using an index

4) yeah you are right, the whole row will be placed in sortheap, unless the plan uses FETCH on top, TBSCAN+SORT on outter side and table name on inner side...
ex:
      9) SORT  : (Sort)
            Cumulative Total Cost:                2.61297e+006
            Cumulative CPU Cost:                1.72482e+010
            Cumulative I/O Cost:                474282
            Cumulative Re-Total Cost:       0
            Cumulative Re-CPU Cost:       0
            Cumulative Re-I/O Cost:       159922
            Cumulative First Row Cost:       2.61297e+006
            Cumulative Comm Cost:             0
            Cumulative First Comm Cost:       0
            Estimated Bufferpool Buffers:       547612

            Arguments:
            ---------
            DUPLWARN: (Duplicates Warning flag)
                     FALSE
            NUMROWS : (Estimated number of rows)
                     1166257
            PARTCOLS: (Table partitioning columns)
                     1: Q3."CUST_ID"
            ROWWIDTH: (Estimated width of rows)
                     780
            SORTKEY : (Sort Key column)
                     1: Q3."CUST_ID"(A)
            SORTTYPE: (Intra-Partition parallelism sort type)
                     PARTITIONED
            SPILLED : (Pages spilled to bufferpool or disk)
                     233252
            TEMPSIZE: (Temporary Table Page Size)
                     4096
            UNIQUE  : (Uniqueness required flag)
                     FALSE

in this sort, please note that ROWWIDTH=780 and NUMROWS=1166257, so from calculation 780*1166257/1024/1024~=867MB.... that means at least 867MB of sortheap will be required for this particular sort.... and please note there are 3 sorts in the original plan....
3) when optlvl

myfriend2010 · 发表于 2008-5-7 09:40

原帖由 askgyliu 于 2008-5-6 17:51 发表
觉得关键是符合“USEE8_TIME is null”的有多少？

比如说，每个TABLE都是五百万，但若是符合条件的只有几万/几十万的话，我倒是觉得改一下SQL可能会有帮助：

with cust_id_tmp as
(select cust_id_tmp from CCP.CUST_BILLSUM_200802
where USEE8_TIME is null)
select ....
from CCP.CUST_BILLSUM_200802 a,
(select * from CCP.CUST_BILLSUM_200801 where cust_id in (select cust_id from cust_id_tmp)) b,
(select * from CCP.CUST_BILLSUM_200712 where cust_id in (select cust_id from cust_id_tmp)) c
where a.cust_id = b.cust_id and a.cust_id =c.cust_id and a.USEE8_TIME is null

Indexing on CUST_BILLSUM_200802 (USEE8_TIME, CUST_ID)
Indexing on CUST_BILLSUM_200801 (CUST_ID)
Indexing on CUST_BILLSUM_200712 (CUST_ID)

runstats with distribution on CUST_BILLSUM_200802

LZ可以把这个的SQL PLAN倒出来看看？

如果符合条件的很多的话(>30%)，那就没什么必要再想了。TABSCAN，DB2已经给答案了。

wangzhonnew · 发表于 2008-5-7 11:25

still don't know what is the runtime performance?
please show the execute time for MSJOIN and NLJOIN, don't forget clean package cache after perform each operation

myfriend2010 · 发表于 2008-5-7 11:26

恩，执行时间太长了,刚跑了一个,等下中午的时候重启再跑另一个

askgyliu · 发表于 2008-5-7 11:33

This doesn't seem to be any efficient filtering:

Filter Factor: 0.927966

So even with the filtering, more than 92% data are still accessed?

===snippet from the access plan======>
Predicates:
----------
7) Start Key Predicate
Relational Operator: Is Null
Subquery Input Required: No
Filter Factor: 0.927966

Predicate Text:
--------------
Q3."USEE8_TIME" IS NULL

7) Stop Key Predicate
Relational Operator: Is Null
Subquery Input Required: No
Filter Factor: 0.927966

Predicate Text:
--------------
Q3."USEE8_TIME" IS NULL
===End of snippet from the access plan======>

myfriend2010 · 发表于 2008-5-7 11:46

USEE8_TIME IS NULL 的行很多!

unixnewbie · 发表于 2008-5-7 12:24

原帖由 myfriend2010 于 7/5/2008 13:46 发表
USEE8_TIME IS NULL 的行很多!

If there is a great number of "USEE8_TIME IS NULL ", that's probably the reason why DB2 choose MSJOIN. And it's expected that DB2 is to keep this plan even after you take out the NULL condition in this case.

Hash join needs addition memory to build hash table and hashing operation on both outer & inner tables (more CPU cycles). If MSJOIN has already spilled out to disk, hash join would likely spill to disk as well. So DB2 prefers MSJOIN in this case.

unixnewbie · 发表于 2008-5-7 12:26

原帖由 myfriend2010 于 7/5/2008 11:28 发表

1> "从第一个explain output就知道就算merge join已经产生spilling，所以hash join是不可能的",why,我记得hash join如果没有足够的空间容纳表,那么会写入磁盘的临时表啊!这里能否详细解释下?
2> 我去掉a.USEE8_TIME is null 的过滤,执行计划也没有发生什么变化!也就是说执行计划和USEE8_TIME 的关系不大!

附上去掉a.USEE8_TIME is null 后的执行计划

I don't agree on your second statement. Actually, 'a.USEE8_TIME is null ' really means a lot in choosing an access plan.

unixnewbie · 发表于 2008-5-7 12:31

原帖由 myfriend2010 于 7/5/2008 11:34 发表
深刻啊!还有一个1166257*1460的排序,这么说来即使我想方设法把sortheap增大也无济于事,因为需要的sortheap太大了!

Don't forget you're running DPF. That 8xxMB is just a total across all the partitions.

这个sql怎么优化!

浏览过的版块