|
问题描述:
客户有两系统,两套系统中数据完全一样,一套系统为8分区,另一系统为16分区.
现在客户有一update语句:
UPDATE mf.POSITION t
SET start_dt = (SELECT d2.DATE_ID
FROM mf.DATE d1,
mf.DATE d2
WHERE d1.date_id = t.start_dt
AND d2.CALENDAR_DATE = d1.PRIOR_BUS_DAY_DATE)
where t.start_dt > 19010101
;
这条语句在8分区系统中运行时间24分钟,在16分区系统中运行超过24小时....
环境:
16分区环境:
Server is running MPP/EEE.
DB21085I Instance "db2apdw" uses "64" bits and DB2 code release "SQL09011"
with level identifier "01020107".
Informational tokens are "DB2 v9.1.0.1", "s061104", "U809676", and Fix Pack
"1".
Product is installed at "/opt/IBM/db2/V9.1_01".
8分区环境为v9 GA
两边的DDL完全相同:
------------------------------------------------
-- DDL Statements for table "MF "."DATE"
------------------------------------------------
CREATE TABLE "MF "."DATE" (
"DATE_ID" INTEGER NOT NULL ,
"CALENDAR_DATE" DATE ,
"DAY_OF_WEEK" SMALLINT ,
"DAY_NUM_IN_MONTH" SMALLINT ,
"DAY_NAME" VARCHAR(9) ,
"DAY_ABBREV" CHAR(3) ,
"WEEKDAY_FLAG" CHAR(1) ,
"WEEK_NUM_IN_YEAR" SMALLINT ,
"WEEK_BEGIN_DATE" DATE ,
"WEEK_ENDING_DATE" DATE ,
"MONTH_NUM" SMALLINT ,
"MONTH_NAME" VARCHAR(9) ,
"MONTH_ABBREV" CHAR(3) ,
"MONTH_END_DATE" DATE ,
"QUARTER_NUM" SMALLINT ,
"QUARTER_NAME" CHAR(2) ,
"YEAR_QUARTER_NAME" CHAR(7) ,
"YEAR_NUM" SMALLINT ,
"YEAR_MONTH_NUM" INTEGER ,
"YEAR_QUARTER_NUM" INTEGER ,
"FISCAL_WEEK" SMALLINT ,
"FISCAL_MONTH" SMALLINT ,
"FISCAL_QUARTER" SMALLINT ,
"FISCAL_YEAR" SMALLINT ,
"LAST_DAY_IN_MONTH_FLAG" CHAR(1) ,
"CURR_DAY_FLAG" CHAR(1) ,
"CURR_WEEK_FLAG" CHAR(1) ,
"CURR_MONTH_FLAG" CHAR(1) ,
"US_BANK_HOLIDAY_FLAG" CHAR(1) ,
"US_EXCH_HOLIDAY_FLAG" CHAR(1) ,
"BW_PRELIM_FLAG" CHAR(1) ,
"BW_MONTHEND_FLAG" CHAR(1) ,
"LAST_BUS_DAY_IN_MONTH_FLAG" CHAR(1) ,
"ETL_PROCESS_ID" INTEGER NOT NULL WITH DEFAULT 0 ,
"ETL_USER_ID" INTEGER NOT NULL WITH DEFAULT 0 ,
"ETL_CREATE_TS" TIMESTAMP NOT NULL WITH DEFAULT CURRENT TIMESTAMP ,
"ETL_UPDATE_TS" TIMESTAMP ,
"QUARTER_END_DATE" DATE ,
"YEAR_END_DATE" DATE ,
"PRIOR_BUS_DAY_DATE" DATE ,
"PRIOR_MONTH_END_DATE" DATE ,
"PRIOR_QUARTER_END_DATE" DATE ,
"PRIOR_YEAR_END_DATE" DATE ,
"TRAILING_TWELVE_MONTH_END_DATE" DATE )
DISTRIBUTE BY HASH("DATE_ID"
IN "ACR_DATE" INDEX IN "ACR_IDXS" ;
...
------------------------------------------------
-- DDL Statements for table "MF "."POSITION"
------------------------------------------------
CREATE TABLE "MF "."POSITION" (
"AS_OF_DT" INTEGER ,
"START_DT" INTEGER NOT NULL ,
"END_DT" INTEGER NOT NULL ,
"LMR_SOURCE_ID" INTEGER NOT NULL ,
"LMR_INSTR_ID" INTEGER NOT NULL ,
"LMR_ACCT_ID" INTEGER NOT NULL ,
"CD_POSITION_TYPE_ID" INTEGER ,
"LMR_POSITION_TYPE_ID" INTEGER ,
"CD_PERF_SECTOR_ID" INTEGER ,
"QTY" DECIMAL(19,4) ,
"ORIG_FACE" DECIMAL(17,2) ,
"BASE_MKT_VALUE" DECIMAL(17,2) ,
"BASE_AVG_COST" DECIMAL(17,2) ,
"BASE_AVG_PRICE" DECIMAL(17,2) ,
"BASE_ACCRUED_INCOME" DECIMAL(17,2) ,
"BASE_UNREALIZED_GL" DECIMAL(17,2) ,
"BASE_AMORTIZED_COST" DECIMAL(17,4) ,
"YTM" DECIMAL(17,4) ,
"BASE_COMMISSION" DECIMAL(17,2) ,
"LOCAL_MKT_VALUE" DECIMAL(17,2) ,
"LOCAL_AVG_COST" DECIMAL(17,2) ,
"LOCAL_AVG_PRICE" DECIMAL(17,2) ,
"LOCAL_ACCRUED_INCOME" DECIMAL(17,2) ,
"LOCAL_UNREALIZED_GL" DECIMAL(17,2) ,
"LOCAL_COMMISSION" DECIMAL(17,2) ,
"FX_RATE" DECIMAL(20,5) ,
"LMR_BASE_CURRENCY_ID" INTEGER ,
"LMR_LOCAL_CURRENCY_ID" INTEGER ,
"ETL_PROCESS_ID" INTEGER NOT NULL WITH DEFAULT 0 ,
"ETL_USER_ID" INTEGER NOT NULL WITH DEFAULT 0 ,
"ETL_CREATE_TS" TIMESTAMP NOT NULL WITH DEFAULT CURRENT TIMESTAMP ,
"ETL_UPDATE_TS" TIMESTAMP )
DISTRIBUTE BY HASH("LMR_ACCT_ID"
IN "ACR_POSITION" INDEX IN "ACR_IDXS" ;
两个table所在tablespace都是跨所有分区,具体的DDL就不贴了
在16分区系统中使用的access plan:
- Access Plan:
- -----------
- Total Cost: 3.97053e+07
- Query Degree: 1
- Rows
- RETURN
- ( 1)
- Cost
- I/O
- |
- 465308
- UPDATE
- ( 2)
- 3.97053e+07
- 472862
- /----+---\
- 465308 465308
- NLJOIN TABLE: MF
- ( 3) POSITION
- 3.6185e+07
- 7554
- /---+---\
- 465308 1
- TBSCAN BTQ
- ( 4) ( 5)
- 7808.78 117.907
- 7507 33.0625
- | |
- 465308 0.0625
- TABLE: MF MSJOIN
- POSITION ( 6)
- 117.734
- 33.0625
- /------+------\
- 39664 1.57574e-06
- TBSCAN FILTER
- ( 7) ( 13)
- 102.187 8.04381
- 32 1.0625
- | |
- 39664 0.0625
- TEMP TBSCAN
- ( 8) ( 14)
- 66.4508 8.04381
- 32 1.0625
- | |
- 39664 0.0625
- MBTQ SORT
- ( 9) ( 15)
- 52.0979 8.04317
- 32 1.0625
- | |
- 2479 0.0625
- TBSCAN FETCH
- ( 10) ( 16)
- 35.7837 8.04202
- 32 1.0625
- | /---+---\
- 2479 0.0625 2479
- SORT IXSCAN TABLE: MF
- ( 11) ( 17) DATE
- 34.6639 7.56908
- 32 1
- | |
- 2479 2479
- TBSCAN INDEX: MF
- ( 12) DATE_IDX1
- 32.6103
- 32
- |
- 2479
- TABLE: MF
- DATE
复制代码
在8分区系统中:
- Total Cost: 9.66272e+06
- Query Degree: 1
- Rows
- RETURN
- ( 1)
- Cost
- I/O
- |
- 1.05788e+06
- UPDATE
- ( 2)
- 9.66272e+06
- 1.07495e+06
- /----+----\
- 1.05788e+06 1.05788e+06
- NLJOIN TABLE: MF
- ( 3) POSITION
- 1.6609e+06
- 17069
- /---+---\
- 1.05788e+06 1
- TBSCAN BTQ
- ( 4) ( 5)
- 17765.2 142.679
- 17066 17
- | |
- 1.05788e+06 0.125
- TABLE: MF NLJOIN
- POSITION ( 6)
- 17.6203
- 2.125
- /---+---\
- 1 0.125
- BTQ IXSCAN
- ( 7) ( 10)
- 10.0424 7.57788
- 1.125 1
- | |
- 0.125 5031
- FETCH INDEX: MF
- ( 8) DATE_IDX2
- 8.52351
- 1.125
- /---+---\
- 0.125 5031
- IXSCAN TABLE: MF
- ( 9) DATE
- 7.57769
- 1
- |
- 5031
- INDEX: MF
- DATE_IDX1
复制代码
问题分析:
首先当我们第一眼看到这个问题的时候,第一个映射到我们脑子里的想法是什么?
俺第一个想到的是access plan得问题.因为在8分区中的access plan喝16分区中的不同,而estimate cost同样也是在8分区中的比较小,因此,第一个所想到的肯定是访问计划出问题了.
那么我们就需要考虑,到底是什么引起的这个问题呢.
在16分区的计划中我们可以看到那个MSJOIN,其左边的子树包含table queue,
39664
MBTQ
( 9)
52.0979
32
可以看出每一个TQ的预期输出为39664行,对比8分区的NLJN下的TQ的输出结果预期为1行,而MSJOIN上面的TQ的cost甚至要小于NLJN上面TQ的cost,可以初步想象为network cost heuristic出现问题.
那么什么地方能够控制network cost呢?答案就是comm_speed.
在8分区中我们有:
Parallelism: Inter-Partition Parallelism
CPU Speed: 3.070237e-07
Comm Speed: 1
Buffer Pool size: 66280
Sort Heap size: 8192
Database Heap size: 1500
Lock List size: 50000
Maximum Lock List: 10
Average Applications: 1
Locks Available: 320000
16分区中为
Parallelism: Inter-Partition Parallelism
CPU Speed: 4.251098e-07
Comm Speed: 100
Buffer Pool size: 66280
Sort Heap size: 8192
Database Heap size: 10000
Lock List size: 50000
Maximum Lock List: 10
Average Applications: 1
Locks Available: 320000
这里可以看到,在16分区中,由于comm_speed设置过高,所以使得heuristic产生错误的结果.
将该值调整为1,我们在16分区中也得到同样的access plan.
但是,现在尽管access plan完全一样,可是依然是用超过6小时.这个是怎么回事呢?
让我们继续来看访问计划,我们可以看到在plan中有两个table queue,而且两个table queue都是broadcast to all partitions.回头去检查SQL,发现join方式为A join B and B join B,根据计划看来,对于A中的每一条记录(也就是NLJN左边的POSITION table),都会对右边的子树作检索.
在右边的子树中同样有NLJN,对B and B作join,发送每个分区的B中所有数据到其他分区.
可以想象,这种correlation会对网络造成极大的压力,而且该压力随分区的增多而成指数级增长,做以下db2trc -pcf可以看到tablequeue操作占用了大量的时间,而对每一条数据的平均等待时间为大约30秒 (幸好不是block操作,要不然估计一年也弄不完这几亿条数据) ,而一秒钟的tablequeue read高达2400多次...
于是乎,结论是不是已经出来了呢?把DATE表整合到一个单独的分区中,作同样的操作,嘿嘿,13分钟解决战斗! |
|