语句:
DELETE
FROM CTWEBTRACK
WHERE CTWEBTRACK_ID IN
(SELECT CTWEBTRACK_ID
FROM ctwebtrack
WHERE timecreated < (CURRENT TIMESTAMP - ? DAYS)
FETCH FIRST 5000 ROWS ONLY)
optlevel=1:
Original Statement:
------------------
DELETE
FROM CTWEBTRACK
WHERE CTWEBTRACK_ID IN
(SELECT CTWEBTRACK_ID
FROM ctwebtrack
WHERE timecreated < (CURRENT TIMESTAMP - ? DAYS)
FETCH FIRST 5000 ROWS ONLY)
Optimized Statement:
-------------------
DELETE
FROM DB2RTZ1.CTWEBTRACK AS Q1
WHERE $RID$ IN
(SELECT $RID$
FROM DB2RTZ1.CTWEBTRACK AS Q4
WHERE (Q4.CTWEBTRACK_ID <= $C1) AND (Q4.CTWEBTRACK_ID >= $C0) AND
Q4.CTWEBTRACK_ID = ANY
(SELECT DISTINCT Q2.CTWEBTRACK_ID, $C0, $C1
FROM DB2RTZ1.CTWEBTRACK AS Q2
WHERE (Q2.TIMECREATED < -(CURRENT TIMESTAMP, :?, 3))
ORDER BY Q2.CTWEBTRACK_ID))
Access Plan:
-----------
Total Cost: 219912
Query Degree: 1
optlvl=5:
Original Statement:
------------------
DELETE
FROM CTWEBTRACK
WHERE CTWEBTRACK_ID IN
(SELECT CTWEBTRACK_ID
FROM ctwebtrack
WHERE timecreated < (CURRENT TIMESTAMP - ? DAYS)
FETCH FIRST 5000 ROWS ONLY)
Optimized Statement:
-------------------
DELETE
FROM DB2RTZ1.CTWEBTRACK AS Q1
WHERE $RID$ IN
(SELECT $RID$
FROM
(SELECT Q2.CTWEBTRACK_ID
FROM DB2RTZ1.CTWEBTRACK AS Q2
WHERE (Q2.TIMECREATED < -(CURRENT TIMESTAMP, :?, 3))) AS Q3,
DB2RTZ1.CTWEBTRACK AS Q4
WHERE (Q4.CTWEBTRACK_ID = Q3.$C0))
Access Plan:
-----------
Total Cost: 437517
Query Degree: 1
这个是由于同样的语句被rewrite成不同查询。
当opt=1时,我们有:
Optimized Statement:
-------------------
DELETE
FROM DB2RTZ1.CTWEBTRACK AS Q1
WHERE $RID$ IN
(SELECT $RID$
FROM DB2RTZ1.CTWEBTRACK AS Q4
WHERE (Q4.CTWEBTRACK_ID <= $C1) AND (Q4.CTWEBTRACK_ID >= $C0) AND
Q4.CTWEBTRACK_ID = ANY
(SELECT DISTINCT Q2.CTWEBTRACK_ID, $C0, $C1
FROM DB2RTZ1.CTWEBTRACK AS Q2
WHERE (Q2.TIMECREATED < -(CURRENT TIMESTAMP, :?, 3))
ORDER BY Q2.CTWEBTRACK_ID))
当opt=5时则有:
DELETE
FROM DB2RTZ1.CTWEBTRACK AS Q1
WHERE $RID$ IN
(SELECT $RID$
FROM
(SELECT Q2.CTWEBTRACK_ID
FROM DB2RTZ1.CTWEBTRACK AS Q2
WHERE (Q2.TIMECREATED < -(CURRENT TIMESTAMP, :?, 3))) AS Q3,
DB2RTZ1.CTWEBTRACK AS Q4
WHERE (Q4.CTWEBTRACK_ID = Q3.$C0))
看一看两者之间有什么区别,问题会不会出现在这里?
在opt1里面,原语句的IN被改写为
Q4.CTWEBTRACK_ID = ANY
(SELECT DISTINCT Q2.CTWEBTRACK_ID, $C0, $C1
FROM DB2RTZ1.CTWEBTRACK AS Q2
WHERE (Q2.TIMECREATED < -(CURRENT TIMESTAMP, :?, 3))
ORDER BY Q2.CTWEBTRACK_ID)
而在opt5中则被写为JOIN。
Estimated number of rows: 11487.1
Number of columns: 1
Subquery predicate ID: Not Applicable
Column Names:
------------
+Q2.TIMECREATED(A)
Output Streams:
--------------
3) To Operator #9
Estimated number of rows: 11487.1
Number of columns: 1
Subquery predicate ID: Not Applicable
Column Names:
------------
+Q2.$RID$(A)
首先看到,其SORTKEY竟然是RID,而不是索引中的任何一列。这个到底是干什么用的?
谜底揭晓,这个SORT是LIST PREFETCH中的一部分。
大家应该早已理解什么是LIST PREFETCH,在这里不重复了。这里的SORT就是为了把按照TIMECREATED列排序的索引重新按照RID排序,然后使用LIST PREFETCH从磁盘中迅速取出数据。
由于预期的返回行数为一万左右,所以从estimate上来看,这里的SORT不会开销很大,然后LIST PREFETCH却可以节省我们很多的I/O开销。
看一看索引的统计数据:
UPDATE SYSSTAT.INDEXES
SET NLEAF=386196,
NLEVELS=4,
FIRSTKEYCARD=41576849,
FIRST2KEYCARD=41640039,
FIRST3KEYCARD=41646606,
FIRST4KEYCARD=-1,
FULLKEYCARD=41646606,
CLUSTERFACTOR=-1.000000,
CLUSTERRATIO=99,
SEQUENTIAL_PAGES=386152,
DENSITY=98,
AVERAGE_SEQUENCE_GAP=6.000000,
AVERAGE_SEQUENCE_FETCH_GAP=-1.000000,
AVERAGE_SEQUENCE_PAGES=18388.000000,
AVERAGE_SEQUENCE_FETCH_PAGES=-1.000000,
AVERAGE_RANDOM_PAGES=0.000000,
AVERAGE_RANDOM_FETCH_PAGES=-1.000000,
NUMRIDS=41758559,
NUMRIDS_DELETED=0,
NUM_EMPTY_LEAFS=863
WHERE INDNAME = 'I_CTWEBTRACK001' AND INDSCHEMA = 'DB2RTZ1 '
AND TABNAME = 'CTWEBTRACK' AND TABSCHEMA = 'DB2RTZ1 ';
呵呵,CLUSTER RATIO = 99%,说明绝大部分的页都是顺序的,这样就可以确认优化器是想使用list prefetch代替random access,希望能够提升性能。
如果大家对db2set变量熟悉的话应该可以想起来,有一个叫做DB2_MINIMIZE_LISTPREFETCH的变量。
从infocenter上我们看到:
DB2_MINIMIZE_LISTPREFETCH 3 All 3 Default=NO 3
Values: YES or NO
List prefetch is a special table access method that involves retrieving the qualifying RIDs from the index, sorting them by page number and then prefetching the data pages. Sometimes the optimizer does not have accurate information to determine if list prefetch is a good access method. This might occur when predicate selectivities contain parameter markers or host variables that prevent the optimizer from using catalog statistics to determine the selectivity.
This registry variable prevents the optimizer from considering list prefetch in such situations.
Note:
9 9
When either or both of the DB2 UDB SQL compiler variables DB2_MINIMIZE_LISTPREFETCH 9 and DB2_INLIST_TO_NLJN, are set to YES, they remain active even if REOPT(ONCE) is specified.