Index Range Scan成本 Histogram 和 10053

foreverlee · 发表于 2006-6-30 14:21

[PHP]

试验环境:
Linux AS4 + 10.2.0.1.0 - 64bit Production

为了模拟数据库对象占用多个block,故设置较高pctfree.

SQL> create table irs pctfree 80 tablespace users as
  2  select rownum id,a.* from all_objects a;

Table created.

SQL> alter table irs add constraint irs_id_pk primary key(id);

Table altered.

SQL> create index irs_owner_idx on irs(owner) tablespace indx pctfree 80;

Index created.

SQL> show parameter optimizer_index

NAME                               TYPE       VALUE
------------------------------------ ----------- ------------------------------
optimizer_index_caching             integer    0
optimizer_index_cost_adj          integer    100

获取表和索引的统计信息
SQL> analyze table irs compute statistics for table for all indexes;

Table analyzed.

获取列的histogram信息
SQL> analyze table irs compute statistics for all indexed columns;

Table analyzed.

SQL> select owner,count(*) from irs
  2  group by owner;

OWNER                         COUNT(*)
------------------------------ ----------
MDSYS                               859
DMSYS                               189
TSMSYS                               2
CTXSYS                               338
OLAPSYS                            718
OUTLN                                  7
PUBLIC                            20079
EXFSYS                               279
SCOTT                                  6
SYSTEM                               425
DBSNMP                               46

OWNER                         COUNT(*)
------------------------------ ----------
LIYONG                               2
ORDPLUGINS                            10
ORDSYS                            1669
SYSMAN                            1291
XDB                                  346
CWT                                  310
PERFSTAT                            139
SI_INFORMTN_SCHEMA                   8
SYS                               23134
WMSYS                               232

21 rows selected.

SQL> select NUM_ROWS,BLOCKS,EMPTY_BLOCKS from dba_tables
  2 where table_name='IRS';

  NUM_ROWS    BLOCKS EMPTY_BLOCKS
---------- ---------- ------------
   50089    3423          33

SQL> select BLEVEL,LEAF_BLOCKS,DISTINCT_KEYS,CLUSTERING_FACTOR from dba_indexes
  2 where INDEX_NAME='IRS_OWNER_IDX';

BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR
---------- ----------- ------------- -----------------
      2       570          21             5890

SQL> select DENSITY,NUM_DISTINCT,HISTOGRAM from dba_tab_columns
  2 where OWNER='LIYONG'
  3 and TABLE_NAME='IRS'
  4 and COLUMN_NAME='OWNER';

DENSITY NUM_DISTINCT HISTOGRAM
---------- ------------ ---------------
9.9822E-06          21 FREQUENCY

SQL> alter session set events '10053 trace name context forever ,level 2';

Session altered.

SQL> select * from irs where owner='SYSMAN';

1291 rows selected.

SQL> alter session set events '10053 trace name context off';

10053内容截取
***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: IRS  Alias: IRS
#Rows: 50089  #Blks:  3423  AvgRowLen:  102.00
Index Stats::
  Index: IRS_ID_PK  Col#: 1
LVLS: 1  #LB: 104  #DK: 50089  LB/K: 1.00  DB/K: 1.00  CLUF: 3361.00
  Index: IRS_OWNER_IDX  Col#: 2
LVLS: 2  #LB: 570  #DK: 21  LB/K: 27.00  DB/K: 280.00  CLUF: 5890.00
***************************************
SINGLE TABLE ACCESS PATH
  Column (#2): OWNER(VARCHAR2)
AvgLen: 5.00 NDV: 21 Nulls: 0 Density: 9.9822e-06
这里看到OWNER列的Density: 9.9822e-06
Histogram: Freq  #Bkts: 21  UncompBkts: 50089  EndPtVals: 21
  Table: IRS  Alias: IRS
Card: Original: 50089  Rounded: 1291  Computed: 1291.00  Non Adjusted: 1291.00 --这里可以看到CBO根据列的histogram信息统计出Card集的相关信息,
SQL> select 1291/50089 from dual;

1291/50089
----------
.025774122
这样可以准确算出effective index selectivity为0.025774122

  Access Path: TableScan --全表扫描的代价为750
Cost:  754.17  Resp: 754.17  Degree: 0
   Cost_io: 750.00  Cost_cpu: 35706109
   Resp_io: 750.00  Resp_cpu: 35706109
  Access Path: index (AllEqRange)
Index: IRS_OWNER_IDX
resc_io: 169.00  resc_cpu: 1991033
ix_sel: 0.025774  ix_sel_with_filters: 0.025774
Cost: 169.23  Resp: 169.23  Degree: 1
  Best:: AccessPath: IndexRange  Index: IRS_OWNER_IDX
      Cost: 169.23  Degree: 1  Resp: 169.23  Card: 1291.00  Bytes: 0

根据索引扫描成本计算公式
cost = (cost(INDEX RANGE SCAN)+cost(TABLE ACCESS BY INDEX ROWID)) * optimizer_index_cost_adj%
   = blevel +ceiling(leaf_blocks * effective index selectivity) +  -- cost(INDEX RANGE SCAN)
            ceiling(clustering_factor * effective table selectivity) --cost(TABLE ACCESS BY INDEX ROWID)
   = 2 + ceil(570*0.025774) + ceil(5890*0.025774)
   = 2 + 15 (17) -- cost(INDEX RANGE SCAN)
      + 152 --cost(TABLE ACCESS BY INDEX ROWID)
   = 169

----------------------------------------------------+-----------------------------------+
| Id  | Operation                   | Name       | Rows  | Bytes | Cost  | Time    |
----------------------------------------------------+-----------------------------------+
| 0 | SELECT STATEMENT          |             |    |    | 169 |          |
| 1 |  TABLE ACCESS BY INDEX ROWID | IRS       |  1291 |  129K | 169 |  00:00:03 |
| 2 | INDEX RANGE SCAN          | IRS_OWNER_IDX|  1291 |    | 17 |  00:00:01 |
----------------------------------------------------+-----------------------------------+
解释一下整个sql执行过程：
1 首先根据谓词 table_name='IRS'做INDEX RANGE SCAN，找到1291个Index Entrys,整个这个过程的代价为17.
2 Oracle根据Index Entrys中Rowid扫描原表中的blocks,获取到1291条记录. 整个TABLE ACCESS BY INDEX ROWID的代价为169-17=152.
3 TABLE ACCESS BY INDEX ROWID过程中,表的blocks被Server process读入到SGA的DB Cache中,读入的数据量近似的等于129K.(Oracle读取一行记录会读整个block;命中率等因素)

SQL> select AVG_ROW_LEN,BLOCKS from dba_tables
  2 where table_name='IRS';

AVG_ROW_LEN    BLOCKS
----------- ----------
      102    3423

SQL> select 102*1291/1024 from dual;

102*1291/1024
-------------
128.595703

关于索引的索引聚簇因子CLUF算法下次再作探讨.

索引扫描成本还和另一个参数有很大关系.先前eygle老大已经发表过专门的帖子.http://www.eygle.com/sql/OPTIMIZER_INDEX_COST_ADJ.htm
我这里拿来主义.

SQL> alter session set optimizer_index_cost_adj=40;

Session altered.

SQL> set autotrace traceonly

SQL> set linesize 150

SQL> select * from irs where owner='SYSMAN';

1291 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 4071038474

---------------------------------------------------------------------------------------------
| Id  | Operation                | Name       | Rows  | Bytes | Cost (%CPU)| Time    |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT          |             |  1291 | 128K| 68 (0)| 00:00:01 |
| 1 |  TABLE ACCESS BY INDEX ROWID| IRS          |  1291 | 128K| 68 (0)| 00:00:01 |
|*  2 | INDEX RANGE SCAN       | IRS_OWNER_IDX |  1291 |    |    7 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
可以看到 Cost=68=ceil(169*0.4)

这篇文章只是一个引子,希望更多是希望能举一反三.
[/PHP]

附:
在没有列的histogram情况下的成本计算,我们看到偏差很大.CBO无法准确获取到Card集的信息，所以只能估算.

10053内容截取
***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: IRS  Alias: IRS
#Rows: 50089  #Blks:  3423  AvgRowLen:  102.00
Index Stats::
  Index: IRS_ID_PK  Col#: 1
LVLS: 1  #LB: 104  #DK: 50089  LB/K: 1.00  DB/K: 1.00  CLUF: 3361.00
  Index: IRS_OWNER_IDX  Col#: 2
LVLS: 2  #LB: 570 (索引LEAF_BLOCKS个数)  #DK: 21 (索引distinct key个数)  LB/K: 27.00  DB/K: 280.00  CLUF: 5890.00 (索引聚簇因子,稍后会有详细介绍)
***************************************
SINGLE TABLE ACCESS PATH
  Column (#2): OWNER(VARCHAR2)  NO STATISTICS (using defaults)
AvgLen: 30.00 NDV: 1565 Nulls: 0 Density: 6.3886e-04
  Table: IRS  Alias: IRS
Card: Original: 50089  Rounded: 501  Computed: 500.89  Non Adjusted: 500.89
  Access Path: TableScan
Cost:  754.15  Resp: 754.15  Degree: 0
   Cost_io: 750.00  Cost_cpu: 35516509
   Resp_io: 750.00  Resp_cpu: 35516509
  Access Path: index (AllEqGuess) -- 我们注意这里索引的访问路径为index (AllEqGuess),我猜想是由于没有OWNER列的Histogram,Oracle无法计算OWNER列的selectivity造成的
Index: IRS_OWNER_IDX
resc_io: 53.00  resc_cpu: 500046
ix_sel: 0.004  ix_sel_with_filters: 0.004
Cost: 53.06  Resp: 53.06  Degree: 1
  Best:: AccessPath: IndexRange  Index: IRS_OWNER_IDX
      Cost: 53.06  Degree: 1  Resp: 53.06  Card: 500.89  Bytes: 0
根据10053 Oracle评估出:
cost(TableScan)=750
cost(index)=53.06
所以选择索引扫描为执行计划.
----------------------------------------------------+-----------------------------------+
| Id  | Operation                   | Name       | Rows  | Bytes | Cost  | Time    |
----------------------------------------------------+-----------------------------------+
| 0 | SELECT STATEMENT          |             |    |    | 53 |          |
| 1 |  TABLE ACCESS BY INDEX ROWID | IRS       | 501 | 50K | 53 |  00:00:01 |
| 2 | INDEX RANGE SCAN          | IRS_OWNER_IDX| 200 |    | 29 |  00:00:01 |
----------------------------------------------------+-----------------------------------+

根据索引扫描成本计算公式
cost = (cost(INDEX RANGE SCAN)+cost(TABLE ACCESS BY INDEX ROWID)) * optimizer_index_cost_adj%
   = blevel +ceiling(leaf_blocks * effective index selectivity) +  -- cost(INDEX RANGE SCAN)
            ceiling(clustering_factor * effective table selectivity) --cost(TABLE ACCESS BY INDEX ROWID)
   = 2 + ceil(570*0.004) + ceil(5890*0.004)
   = 2 + 3 + 24
   = 29

foreverlee · 发表于 2006-7-4 23:37

Jonathan Lewis关于clustering factor的形象描述.

Figure 4-1. Calculating the clustering_factor

In Figure 4-1, we have a table with four blocks and 20 rows, and an index on the column V1,
whose values are shown. If you start to walk across the bottom of the index, the first rowid
points to the third row in the first block. We haven’t visited any blocks yet, so this is a new
block, so we count 1. Take one step along the index, and the rowid points to the fourth row of
the second block—we’ve changed block, so increment the count. Take one step along the
index, and the rowid points to the second row of the first block—we’ve changed block again, so
increment the count again. Take one step along the index, and the rowid points to the fifth row
of the first block—we haven’t changed blocks, so don’t increment the count.
In the diagram, I have put a number against each row of the table—this is to show the
value of the counter as the walk gets to that row. By the time we get to the end of the index, we
have changed table blocks ten times, so the clustering factor is 10.
Notice how small clumps of data stop the clustering_factor from growing—look at block 2
where the value 8 appears four times because four consecutive entries in the index point to the
same block; the same effect shows up in block 3 to give three rows the value 6.
The table doesn’t have to be completely sorted for this type of thing to happen; it only
needs to have little clumps (or clusters) of rows that are nearly sorted—hence the term
clustering_factor, rather than sort_factor.
Given the way the clustering_factor is calculated, you will appreciate that the smallest
possible value has to be the same as the number of blocks in the table, and the largest possible
value has to be the same as the number of rows in the table—provided you have computed
statistics.
If there are lots of blocks like block 2 in the table, the clustering_factor will turn out to be
quite close to the number of blocks in the table, but if the data in the table is randomly scattered,
the clustering_factor will tend to come out close to the number of rows in the table.

Weeping Willow · 发表于 2006-10-11 19:54

楼主辛苦了。
收藏慢慢研究一下。

foreverlee · 发表于 2006-10-12 10:34

其实也没必要深入研究这些了解原理即可

hanjs · 发表于 2007-11-29 10:28

晕了，晚上做个测试看下！

[原创] Index Range Scan成本 Histogram 和 10053

浏览过的版块