为什么加上索引速度奇慢无比？

yyy · 发表于 2002-8-2 10:43

刚开始，我很不理解，这个语句怎么可能有很好的性能，后来仔细想来得出下面的分析：

上面产生的表b，是取出较少字段的一个“较小”的结果集，而且对于他现在的情况where的三个条件是通过索引还是全表扫描没什么差别，整个语句在两个order by 中都用到索引。所以，在整个语句中，两个关联表（a、b）中的一个表（b）的记录的数目很少而且每条纪录的大小很小才是性能的关键（当然关联条件是rowid也是关键因素）。

不知我的分析对不对？

jimmylee · 发表于 2002-8-2 11:59

最初由 yyy 发布
[B]刚开始，我很不理解，这个语句怎么可能有很好的性能，后来仔细想来得出下面的分析：

上面产生的表b，是取出较少字段的一个“较小”的结果集，而且对于他现在的情况where的三个条件是通过索引还是全表扫描没什么差别，整个语句在两个order by 中都用到索引。所以，在整个语句中，两个关联表（a、b）中的一个表（b）的记录的数目很少而且每条纪录的大小很小才是性能的关键（当然关联条件是rowid也是关键因素）。

不知我的分析对不对？ [/B]

而且用到了rowid索引。

只看该作者 · 发表于 2002-8-2 15:28

其实这个用ROWID并不是个好主意。参见我的贴子，和我的建议做法。这种情况下应该是用count(stopkey)最棒.

yyy · 发表于 2002-8-2 15:48

你指的是这个吗？：
drop index index1;
drop index index2;

create index index1 on cstmonilog(moni_time) nologing;

set timing on
set autot traceonly

select * from (
select rownum rno, a.*
from
(select MONI_TIME,MONI_TYPE,MONI_APERRCODE,MONI_CONSTANT,SubStr(MONI_USERINFO,1,50) MONI_USERINFO
from cstmonilog
where MONI_AREA = :MoniAreaNo and MONI_TYPEID = :MoniTypeID and MONI_FLAG = '1'
order by MONI_TIME) a where rownum <= 20
) where rno >= 5
/

我觉得上面语句最大的问题是最外层select的源可能是一个很大的结果集（比如rownum<=900000）。

还有一点很关键：那个substr运算很影响效率，如果放在最里面，在这种情况下（三个where条件几乎没用），几乎要面对整个表，我想很多时间都用在它上面了。这跟jimmylee的里层只取rowid有很大的差别。

还是请有实验环境的jimmylee说说实验结果（时间）吧。毕竟实践才是检验真理的唯一标准嘛。
顺便请jimmylee实验一下，下面的语句会不会更优化一点：（对你的语句稍微改动了一点点）
create index ind_cstmonilog_monitime on cstmonilog(moni_time);
select a.rowid , a.MONI_TIME,a.MONI_TYPE,a.MONI_APERRCODE,a.MONI_CONSTANT,
SubStr(a.MONI_USERINFO,1,50) MONI_USERINFO
from cstmonilog a ,
(
select rid from
(
select rowid rid
from cstmonilog
where MONI_AREA = '9' and MONI_TYPEID = '1' and MONI_FLAG = '1'
order by MONI_TIME
)
where rownum between 900000 and 900035
) b
where a.rowid=b.rid
order by a.moni_time;

另外你说的count(stopkey)是什么？

jimmylee · 发表于 2002-8-2 16:57

最初由 yyy 发布
[B]你指的是这个吗？：
where rownum between 900000 and 900035
[/B]

rownum是不能这样使用的，rownum只支持如下三种情况：
1、rownum=1
2、rownum<number
3、rownum<=number

yyy · 发表于 2002-8-2 17:28

哎呀，低级错误

对了，你对hahaer的语句测试了没有？

只看该作者 · 发表于 2002-8-2 18:04

是的，我也说了，search more than 5000 （do not ask me why 5000）rows 以上可能会有问题，而且越到后面会越慢，要是那样，BITY给出了很好的建议，新建一表保存中间结果。等回我有空时会对以上问题作个总结。

只看该作者 · 发表于 2002-8-3 00:57

I'll emulate the problem discussed here by creating a table T. and the data is from all_objects. Just assuming we will browing TABLE T by pages sorted by object_id, while rows retrieved out should meet the condition 'object_name < 'Z', which will ask us to retrieved most rows in table t:

rudolf@TEST920.WORLD>create table t pctfree 0 nologging
   2 as select * from all_objects;

Table created.

rudolf@TEST920.WORLD>insert /*+append*/ into t select * from t;

6619 rows created.

rudolf@TEST920.WORLD>commit;

Commit complete.

I repeated the last 2 stats serveral times, so that make the table T large enough:

rudolf@TEST920.WORLD>  select count(*) from t;

   COUNT(*)
----------
      52952

Now, I'll create index on column object_id, and make a query asking for a page from 20 through 30:

rudolf@TEST920.WORLD>create index t_oid_ind on t ( object_id ) pctfree 0 nologging;

Index created

rudolf@TEST920.WORLD>col object_name format a30
rudolf@TEST920.WORLD>col owner format a30
rudolf@TEST920.WORLD>set autot on
rudolf@TEST920.WORLD>analyze table t compute satistics for table for all indexes for all indexed columns;

Table analyzed.
rudolf@TEST920.WORLD>l
   1  select owner,object_name,object_id
   2 from ( select rownum rno,a.*
   3          from (select /*+first_rows*/ * from t where object_name < 'Z' order by object_id ) a
   4          where rownum <= 30
   5       )
   6* where rno >= 20
rudolf@TEST920.WORLD> /

OWNER                         OBJECT_NAME                   OBJECT_ID
------------------------------ ------------------------------ ----------
SYS                         CLU$                                  5
SYS                         CLU$                                  5
SYS                         CLU$                                  5
SYS                         CLU$                                  5
SYS                         CLU$                                  5
SYS                         I_TS#                                  7
SYS                         I_TS#                                  7
SYS                         I_TS#                                  7
SYS                         I_TS#                                  7
SYS                         I_TS#                                  7
SYS                         I_TS#                                  7

11 rows selected.

Execution Plan
----------------------------------------------------------
0    SELECT STATEMENT Optimizer=HINT: FIRST_ROWS (Cost=53053 Card=30 Bytes=1800)
1 0 VIEW (Cost=53053 Card=30 Bytes=1800)
2 1    COUNT (STOPKEY)
3 2    VIEW (Cost=53053 Card=52483 Bytes=2466701)
4 3       TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=53053 Card=52483 Bytes=4513538)
5 4          INDEX (FULL SCAN) OF 'T_OID_IND' (NON-UNIQUE) (Cost=101 Card=52952)

Statistics
----------------------------------------------------------
            0  recursive calls
            0  db block gets
         33  consistent gets                                  ***********************
            0  physical reads
            0  redo size
         587  bytes sent via SQL*Net to client
         503  bytes received via SQL*Net from client
            2  SQL*Net roundtrips to/from client
            0  sorts (memory)
            0  sorts (disk)
         11  rows processed

Please notice the line I marked with '**********'.  Now let's create index on (object_name,object_id), and query in your way:

rudolf@TEST920.WORLD> analyze table t delete statistics;

Table analyzed.
rudolf@TEST920.WORLD>l
   1* create index t_oname_oid_ind on t (object_name,object_id) nologging pctfree 0
rudolf@TEST920.WORLD>/

Index created.

rudolf@TEST920.WORLD>l
   1  select owner,object_name,object_id
   2  from t a,
   3  ( select rid from
   4    ( select rownum rno,rowid rid from
   5       ( select rowid from t
   6          where object_name < 'Z'
   7          order by object_id
   8       )
   9    ) where rNo between 20 and 30
   10  ) b
   11  where a.rowid=b.rid
   12* order by a.object_id
   rudolf@TEST920.WORLD> /

OWNER                         OBJECT_NAME                   OBJECT_ID
------------------------------ ------------------------------ ----------
SYS                         CLU$                                  5
SYS                         CLU$                                  5
SYS                         CLU$                                  5
SYS                         CLU$                                  5
SYS                         CLU$                                  5
SYS                         I_TS#                                  7
SYS                         I_TS#                                  7
SYS                         I_TS#                                  7
SYS                         I_TS#                                  7
SYS                         I_TS#                                  7
SYS                         I_TS#                                  7

11 rows selected.

Execution Plan
----------------------------------------------------------
   0    SELECT STATEMENT Optimizer=CHOOSE
   1 0 SORT (ORDER BY)
   2 1    NESTED LOOPS
   3 2    VIEW
   4 3       COUNT                                  **************
   5 4          VIEW
   6 5          SORT (ORDER BY)
   7 6             INDEX (RANGE SCAN) OF 'T_ONAME_OID_IND' (NON-UNIQUE)
   8 2    TABLE ACCESS (BY USER ROWID) OF 'T'

Statistics
----------------------------------------------------------
            0  recursive calls
            0  db block gets
         220  consistent gets                               *****************
         208  physical reads                               *****************
            0  redo size
         587  bytes sent via SQL*Net to client
         503  bytes received via SQL*Net from client
            2  SQL*Net roundtrips to/from client
            2  sorts (memory)
            0  sorts (disk)
         11  rows processed

So it's quite clear the count (stopkey) (Stopkey is a new method introduced in 8i and later version. it's implemented mostly for getting TOP N records. )is more efficient in this cirmstance. But as you browse page by page, it will become less efficient. Any way it won't be slower than count operation. As Bitty said, few people will browse the whole resultset page by page. If most of browsing are focusing at the 1st 500 rows, using count(stopkey), if more of browsing are focusing at the last 500rows, still suggest to use count(stopkey), but index ur key columns in desc order. But if the user's action is ramdom, Jimmy's may serve the purpose. However, the people using this one must consider the CBO's negative affection:

rudolf@TEST920.WORLD> set autot traceo
rudolf@TEST920.WORLD> analyze table t compute statistics for table for all indexes for all indexed columns;

Table analyzed.

rudolf@TEST920.WORLD>
rudolf@TEST920.WORLD>
rudolf@TEST920.WORLD> 1  select owner,object_name,object_id
   2  from t a,
   3  ( select rid from
   4    ( select rownum rno,rowid rid from
   5       ( select rowid from t
   6          where object_name < 'Z'
   7          order by object_id
   8       )
   9    ) where rNo between 20 and 30
   10  ) b
   11  where a.rowid=b.rid
   12* order by a.object_id
   /

Execution Plan
----------------------------------------------------------
   0    SELECT STATEMENT Optimizer=CHOOSE (Cost=959 Card=52483 Bytes=3306429)
   1 0 SORT (ORDER BY) (Cost=959 Card=52483 Bytes=3306429)
   2 1    HASH JOIN (Cost=416 Card=52483 Bytes=3306429)
   3 2    VIEW (Cost=287 Card=52483 Bytes=1049660)
   4 3       COUNT
   5 4          VIEW (Cost=287 Card=52483 Bytes=367381)
   6 5          SORT (ORDER BY) (Cost=287 Card=52483 Bytes=1364558)
   7 6             INDEX (FAST FULL SCAN) OF 'T_ONAME_OID_IND' (NON-UNIQUE) (Cost=22 Card=52483 bytes=1364558)
   8 2    TABLE ACCESS (FULL) OF 'T' (Cost=58 Card=52952 Bytes=2276936)

Statistics
----------------------------------------------------------
         42  recursive calls
            0  db block gets
         805  consistent gets                               ****************
            0  physical reads
            0  redo size
         587  bytes sent via SQL*Net to client
         503  bytes received via SQL*Net from client
            2  SQL*Net roundtrips to/from client
            2  sorts (memory)
            0  sorts (disk)
         11  rows processed

In fact, the total blocks of table t is only 583 blocks. And one more, this method will also lose efficiency its competitiveness when the avg_row_len is small. So, please remember to include the /*+rule*/ hint in your statement if you want to use this method.

只看该作者 · 发表于 2002-8-3 01:18

Now let's moves Bitty's method, mid table(or hit table

). The following is from asktom.oracle.com, it's very clear and understandable.

o Lets say the query is like the above however it takes a 'long' time to
execute.  I would set up a 'hits' table like the following:

create table hits( sess_id number, seqno number, rid rowid, primary
key(sess_id,seqno) );

Then, when I wanted to search I would code something like:

cnt := 0;
for y in ( select t.rowid rid from t where x = :host_variable order by y )
loop
   cnt := cnt + 1;
   insert into hits values ( userenv('sessionid'), cnt, y.rid );
end loop;

You have to pass that userenv('sessionid') from page to page as a hidden
variable then to identify the result set.

And to display results from this i would query:

select * from t, hits
where hits.seqno between :lowBound and :highBound
   and hits.sess_id = :theSessionId
order by hits.sess_id, hits.seqno

Normally I keep another 'parent' table to go along with hits and I store the
session id, the TIME of the query and such in there as well.  I put a foreign
key from hits back to this parent with ON DELETE CASCADE and run a job
(scheduled via dbms_jobs) to issue "delete from parent_table where time <
sysdate-1/24" -- that cleans out queries older then 1 hour.  Occasionally, i
truncate the table to lower the high water mark as well.

At fast glance, it seems like similar with the Jimmy's. In fact, they are different. the hit table is created one time, while use many, howevery, Jimmy's will ffs index everytime when paging. It also eliminates the small avg_row_len's affection at most.

jimmylee · 发表于 2002-8-4 14:51

受益非浅，颇有tom kyte风范！如果提问或回答问题都象这样一步步有条理，问题才能讲清楚。佩服，佩服，值得学习！

[精华] 为什么加上索引速度奇慢无比？

浏览过的版块