从底向上第三篇--了解index的compress

sundog315 · 发表于 2010-3-28 17:42

从底向上第一篇--了解DML操作
从底向上第二篇--了解行迁移
从底向上第三篇--了解index的compress
从底向上第四篇--了解行链接
从底向上第五篇--了解表的压缩属性
从底向上第六篇--compress for oltp真正压缩的阈值触发条件
从底向上第七篇--超255列表的存储
从底向上第八篇--伪造基于ASSM表空间的数据块

对于创建index时，是否选择compress功能概念一直比较模糊，什么情况下适用compress，什么情况下应该避免使用，都不太清楚，于是，做了如下的实验。
首先，交代环境。

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod
PL/SQL Release 10.2.0.1.0 - Production
CORE 10.2.0.1.0    Production
TNS for 32-bit Windows: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 – Production

由于本例中，初始化参数影响不大，这里就不在列出了。

下面，开始真正的测试。

SQL> conn / as sysdba
已连接。
SQL> create table t as select * from dba_objects;

表已创建。

SQL> insert into t select * from t;

已创建48149行。

SQL> insert into t select * from t;

已创建96298行。

SQL> insert into t select * from t;

已创建192596行。

SQL> insert into t select * from t;

已创建385192行。

SQL> insert into t select * from t;

已创建770384行。

SQL> insert into t select * from t;

已创建1540768行。

SQL> insert into t select * from t;

已创建3081536行。

SQL> commit;

提交完成。

SQL> select count(*) from t;

  COUNT(*)
----------
6163072

SQL> select t.num_distinct
  2 from dba_tab_col_statistics t
  3 where t.table_name='T'
  4    and t.column_name='OBJECT_TYPE';

NUM_DISTINCT
------------
      38

SQL> set autot on
SQL> select count(*) from t where object_type='INDEX';

  COUNT(*)
----------
595456

执行计划
----------------------------------------------------------
Plan hash value: 2966233522

---------------------------------------------------------------------------
| Id  | Operation       | Name | Rows  | Bytes | Cost (%CPU)| Time    |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT |    |    1 |    8 | 18378 (1)| 00:03:41 |
| 1 |  SORT AGGREGATE |    |    1 |    8 |          |       |
|*  2 | TABLE ACCESS FULL| T | 691K|  5400K| 18378 (1)| 00:03:41 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("OBJECT_TYPE"='INDEX')

Note
-----
- dynamic sampling used for this statement

统计信息
----------------------------------------------------------
      5  recursive calls
      0  db block gets
   83694  consistent gets
   63336  physical reads
      0  redo size
      410  bytes sent via SQL*Net to client
      385  bytes received via SQL*Net from client
      2  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
      1  rows processed

当前由于没有索引，CBO使用Table Full Scan来访问数据，consistent gets非常大。
建立普通索引：

SQL> create index T_IDX on t(object_type);

索引已创建。

SQL> select t.header_file,t.header_block from dba_segments t
  2 where t.segment_name='T_IDX';

HEADER_FILE HEADER_BLOCK
----------- ------------
1 50729

看看branch的情况

SQL> alter system dump datafile 1 block 50730;

系统已更改。

Branch block dump
=================
header address 135471684=0x8132244
kdxcolev 2
KDXCOLEV Flags = - - -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 24
kdxcofbo 76=0x4c
kdxcofeo 7753=0x1e49
kdxcoavs 7677
kdxbrlmc 4329550=0x42104e
kdxbrsno 0
kdxbrbksz 8060
kdxbr2urrc 0
row#0[8043] dba: 4286443=0x4167eb
col 0; len 5; (5):  49 4e 44 45 58
col 1; len 6; (6):  00 40 e4 2e 00 4a
row#1[8026] dba: 4286444=0x4167ec
col 0; len 5; (5):  49 4e 44 45 58
col 1; len 6; (6):  00 40 e4 71 00 43
row#2[8009] dba: 4286445=0x4167ed
col 0; len 5; (5):  49 4e 44 45 58
col 1; len 6; (6):  00 40 e5 18 00 40
row#3[7992] dba: 4286446=0x4167ee
col 0; len 5; (5):  49 4e 44 45 58
col 1; len 6; (6):  00 40 e7 9f 00 05
...

Leaf情况
SQL> alter system dump datafile 1 block 50731;

系统已更改。
Block header dump:  0x0040c62a
Object id on Block? Y
seg/obj: 0xbf0a  csc: 0x00.cdd14  itc: 1  flg: -  typ: 2 - INDEX
   fsl: 0  fnx: 0x0 ver: 0x01

Itl          Xid                Uba       Flag  Lck       Scn/Fsc
0x01 0xffff.000.00000000  0x00000000.0000.00  C--- 0  scn 0x0000.000cdd14

Branch block dump
=================
header address 135471684=0x8132244
kdxcolev 2
KDXCOLEV Flags = - - -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 24
kdxcofbo 76=0x4c
kdxcofeo 7753=0x1e49
kdxcoavs 7677
kdxbrlmc 4329550=0x42104e
kdxbrsno 0
kdxbrbksz 8060
kdxbr2urrc 0
row#0[8019] flag: ------, lock: 0, len=17
col 0; len 7; (7):  43 4c 55 53 54 45 52
col 1; len 6; (6):  00 40 e4 29 00 04
row#1[8002] flag: ------, lock: 0, len=17
col 0; len 7; (7):  43 4c 55 53 54 45 52
col 1; len 6; (6):  00 40 e4 29 00 15
row#2[7985] flag: ------, lock: 0, len=17
col 0; len 7; (7):  43 4c 55 53 54 45 52
col 1; len 6; (6):  00 40 e4 29 00 24
row#3[7968] flag: ------, lock: 0, len=17
col 0; len 7; (7):  43 4c 55 53 54 45 52
col 1; len 6; (6):  00 40 e4 29 00 25
row#4[7951] flag: ------, lock: 0, len=17
col 0; len 7; (7):  43 4c 55 53 54 45 52
col 1; len 6; (6):  00 40 e4 29 00 30
…

一切都比较正常，Leaf存储了键值及链接，看看这时的查询情况：

SQL> select t.blevel,t.leaf_blocks,t.distinct_keys,t.avg_leaf_blocks_per_key,t.a
vg_data_blocks_per_key,
  2       t.clustering_factor
  3 from user_indexes t
  4 where t.index_name='T_IDX';

BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY
---------- ----------- ------------- -----------------------
AVG_DATA_BLOCKS_PER_KEY CLUSTERING_FACTOR
----------------------- -----------------
      2    16729          38                   440
               6227          236654

SQL> select count(*) from t where object_type='INDEX';

  COUNT(*)
----------
595456

执行计划
----------------------------------------------------------
Plan hash value: 293504097

--------------------------------------------------------------------------
| Id  | Operation       | Name  | Rows  | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT  |    |    1 | 11 |  1735 (1)| 00:00:21
| 1 |  SORT AGGREGATE |    |    1 | 11 |          |
|*  2 | INDEX RANGE SCAN| T_IDX | 691K|  7425K|  1735 (1)| 00:00:21
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("OBJECT_TYPE"='INDEX')

Note
-----
- dynamic sampling used for this statement

统计信息
----------------------------------------------------------
      0  recursive calls
      0  db block gets
   1414  consistent gets
      0  physical reads
      0  redo size
      410  bytes sent via SQL*Net to client
      385  bytes received via SQL*Net from client
      2  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
      1  rows processed

索引可以利用上了，consistent gets由全表扫描的83694降低至1414，降为0.0169。对于一般的优化而言，效果已经很明显了。

但是，有没有更好的降低consistent gets的方法呢？

SQL> drop index t_idx;

索引已删除。

SQL> create index t_idx on t(object_type) compress;

索引已创建。

SQL> select t.header_file,t.header_block from dba_segments t
  2 where t.segment_name='T_IDX';

HEADER_FILE HEADER_BLOCK
----------- ------------
1 134705

SQL> alter system dump datafile 1 block 134705;

系统已更改。

  Extent Control Header
  -----------------------------------------------------------------
  Extent Header:: spare1: 0    spare2: 0    #extents: 81    #blocks: 10239
               last map  0x00000000  #maps: 0    offset: 4128
   Highwater::  0x00423321  ext#: 80    blk#: 280 ext size: 1024
  #blocks in seg. hdr's freelists: 0
  #blocks below: 9495
  mapblk  0x00000000  offset: 80
               Unlocked
   Map Header:: next  0x00000000  #extents: 81 obj#: 48909  flag: 0x40000000

SQL> alter system dump datafile 1 block 134707;

系统已更改。
81C41A0 2100AAC5 00060000 00AAC540 06000014  [...!....@.......]
81C41B0 AAC54000 00080400 554C4307 52455453  [.@.......CLUSTER] --都是CLUSTER
81C41C0 00000000 00000000 00000000 00000000  [................]
      Repeat 2 times
81C41F0 00000000 00000000 00000000 30EB0601  [...............0]
Block header dump:  0x00420e33
Object id on Block? Y
seg/obj: 0xbf0d  csc: 0x00.d30e8  itc: 2  flg: -  typ: 2 - INDEX
   fsl: 0  fnx: 0x0 ver: 0x01

Itl          Xid                Uba       Flag  Lck       Scn/Fsc
0x01 0x0000.000.00000000  0x00000000.0000.00  ---- 0  fsc 0x0000.00000000
0x02 0xffff.000.00000000  0x00000000.0000.00  C--- 0  scn 0x0000.000d30e8

Leaf block dump
===============
header address 136061532=0x81c225c
kdxcolev 0
KDXCOLEV Flags = - - -
kdxcolok 0
kdxcoopc 0xa0: opcode=0: iot flags=-C- is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 651
kdxcofbo 1346=0x542
kdxcofeo 2167=0x877
kdxcoavs 821
kdxlespl 0
kdxlende 0
kdxlenxt 4329012=0x420e34
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
kdxlepnro 1
kdxlepnco 1
prefix row#0[8026] flag: -P----, lock: 0, len=10 --prefix row 0
col 0; len 7; (7):  43 4c 55 53 54 45 52
prc 651 --651个
row#0[8017] flag: ------, lock: 0, len=9
col 0; len 6; (6):  00 40 c5 aa 00 04
psno 0                   --prefix row number，这里是0,对应上面的prefix row#0 CLUSTER，如果一个块有多个prefix，则这里就是对应的number。
row#1[8008] flag: ------, lock: 0, len=9
col 0; len 6; (6):  00 40 c5 aa 00 14
psno 0
…一共651条

可以看到，compress的意义就是把关键字提取出来，作为prefix。仅仅是rowid占用绝大部分Block空间。降低了很多冗余。

看看这时的效率：

SQL> select t.blevel,t.leaf_blocks,t.distinct_keys,t.avg_leaf_blocks_per_key,t.a
vg_data_blocks_per_key,
  2       t.clustering_factor
  3 from user_indexes t
  4 where t.index_name='T_IDX';

BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY
---------- ----------- ------------- -----------------------
AVG_DATA_BLOCKS_PER_KEY CLUSTERING_FACTOR
----------------------- -----------------
      2       9468          38                   249
               6227          236654

BLOCKS从16729降至9468，AVG_LEAF_BLOCKS_PER_KEY从440降至249。Compress后，block仅占用没有Compress时的57.67%空间。这57.69%还会带来什么？

SQL> select count(*) from t where object_type='INDEX';

  COUNT(*)
----------
595456

执行计划
----------------------------------------------------------
Plan hash value: 293504097

---------------------------------------------------------------------------
| Id  | Operation       | Name  | Rows  | Bytes | Cost (%CPU)| Time    |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT  |    |    1 | 11 | 985 (1)| 00:00:12 |
| 1 |  SORT AGGREGATE |    |    1 | 11 |          |       |
|*  2 | INDEX RANGE SCAN| T_IDX | 691K|  7425K| 985 (1)| 00:00:12 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("OBJECT_TYPE"='INDEX')

Note
-----
- dynamic sampling used for this statement

统计信息
----------------------------------------------------------
      0  recursive calls
      0  db block gets
      918  consistent gets
      0  physical reads
      0  redo size
      410  bytes sent via SQL*Net to client
      385  bytes received via SQL*Net from client
      2  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
      1  rows processed

COST: 1735->985，56.77%
CONTISTENT GETS: 1414->918，64.92%

综上，可以了解index compress的原理，而从这些原理，可以推出，如果index的选择度越低，也就是DISTINCT KEY越少，在传统意义上来讲，在这些列上建立索引时没有太大意义的，但是，业务中却往往需要这样的索引，由于此时index的效率并不高（选择度低），往往需要读取大量的block，只能通过尽可能的降低block数量来提交效率，此时，compress属性就有了发挥的空间。
当distinct key越少，compress节省的空间也就越多，但是，每次需要读取的block也就越多。

当然，在真正的索引扫描时，Oracle还需要将compress的数据decompress成普通的索引结构，这将增加一部分CPU使用。但是，由于BLOCK的减少，往往可以抵消增加使用的CPU资源。

那么，到底什么情况下应该使用？什么情况下不应该使用呢？摘抄Oracle 10g Concepts里的一段话：
Performance and Storage Considerations
Key compression can lead to a huge saving in space, letting you store more keys in each index block, which can lead to less I/O and better performance.
Although key compression reduces the storage requirements of an index, it can increase the CPU time required to reconstruct the key column values during an index scan. It also incurs some additional storage overhead, because every prefix entry has an overhead of 4 bytes associated with it.

[ 本帖最后由 sundog315 于 2010-6-25 14:51 编辑 ]

Yong Huang · 发表于 2010-3-30 04:42

Your 50729 is the segment header block ID. 50730 is the root block. kdxcolev for a leaf block is 0. Value 2 is for a branch or root (root is a special branch). For a leaf block, you should see the words "Leaf block dump".

I suggest you place the words "看看branch的情况" and "Leaf情况" in correct places.

Yong Huang

[ 本帖最后由 Yong Huang 于 2010-3-30 13:45 编辑 ]

sundog315 · 发表于 2010-3-30 19:30

原帖由 Yong Huang 于 2010-3-30 04:42 发表
Your 50729 is the segment header block ID. 50730 is the root block. kdxcolev for a leaf block is 0. Value 2 is for a branch or root (root is a special branch). For a leaf block, you should the words "Leaf block dump".

I suggest you place the words "看看branch的情况" and "Leaf情况" in correct places.

Yong Huang

修正了，您看的真仔细。

原文档都被我给搞乱了，又重新来了一遍。：（

huanhuanlove · 发表于 2010-3-31 21:09

不错的文章。。。

hotdog04 · 发表于 2010-3-31 23:59

不懂。。。。。。

lovenankai · 发表于 2014-10-3 16:43

所的很详细。多谢

zw176974 · 发表于 2014-10-8 20:59

不明觉厉啊~~！

kakamylove000 · 发表于 2014-11-23 16:49

这时索引维护的开销会增加多少？

RegistIn · 发表于 2014-11-27 13:42

本帖最后由 RegistIn 于 2014-11-27 13:42 编辑

我想问下针对大约30G 的索引，来使用如下语句compress ：alter index index_name rebuild compress;
有哪些需要注意的地方，比如：Tablespace/Cpu/Memory 有啥要注意的吗？
非常感谢！

RegistIn · 发表于 2014-11-27 14:29

RegistIn 发表于 2014-11-27 13:42
我想问下针对大约30G 的索引，来使用如下语句compress ：alter index index_name rebuild compress;
有 ...

为什么这种压缩方式没效果？索引的大小并未变小

[笔记] 从底向上第三篇--了解index的compress

浏览过的版块