求助 oracle like%..%模糊查询优化

sqysl · 发表于 2014-10-10 10:19

全文索引有时作用明显，但也有受限的场景，但我感觉，楼主目前采用常规优化措施，虽然说都做了，但很可能在某些方面还存在问题。

vfast21 · 发表于 2014-10-10 10:30

sqysl 发表于 2014-10-9 16:24
能否贴全具体的执行计划？（你上面只是贴出了谓词部分的信息），如果你觉得不方便粘贴，你可以QQ我，因为 ...

select count(*)
  from kakou_clxx
  where  jgsj >= to_date('2014-05-15 00:00:00','yyyy/mm/dd HH24:MI:SS')
   and jgsj <= to_date('2014-06-15 23:59:59','yyyy/mm/dd HH24:MI:SS');
--->数据量35888965

---->hphm like %EC167%的执行计划
SQL> select t.rowid
  2  from
  3 (select rowid
  4    from kakou_clxx
  5    where  jgsj >= to_date('2014-05-15 00:00:00','yyyy/mm/dd HH24:MI:SS')
  6       and jgsj <= to_date('2014-06-15 23:59:59','yyyy/mm/dd HH24:MI:SS')
  7       and hphm like '%EC167%' order by jgsj  desc  ) t
  8  where rownum <= 301;

9 rows selected.

Elapsed: 00:00:18.81

Execution Plan
----------------------------------------------------------
Plan hash value: 1397666774

----------------------------------------------------------------------------------------------
| Id  | Operation       | Name                   | Rows  | Bytes | Cost (%CPU)| Time    |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT |                      | 301 |  3612 |    4 (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY    |                      |    |    |          |       |
| 2 | VIEW          |                      | 302 |  3624 |    4 (0)| 00:00:01 |
|*  3 | INDEX RANGE SCAN| IDX_CLXX_JGSJ_DESC_HPHM |  1769K| 45M|    4 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM<=301)
3 - access(SYS_OP_DESCEND("JGSJ")>=HEXTORAW('878DF9F0E7C3C3FF')  AND
            SYS_OP_DESCEND("JGSJ")<=HEXTORAW('878DFAF0FEF8FEFAFF') )
   filter(SYS_OP_UNDESCEND(SYS_OP_DESCEND("JGSJ"))<=TO_DATE(' 2014-06-15
            23:59:59', 'syyyy-mm-dd hh24:mi:ss') AND "HPHM" LIKE '%EC167%' AND
            SYS_OP_UNDESCEND(SYS_OP_DESCEND("JGSJ"))>=TO_DATE(' 2014-05-15 00:00:00',
            'syyyy-mm-dd hh24:mi:ss') AND "HPHM" IS NOT NULL)

Note
-----
- 'PLAN_TABLE' is old version

Statistics
----------------------------------------------------------
      1  recursive calls
      0  db block gets
   165480  consistent gets
   165479  physical reads
      0  redo size
      759  bytes sent via SQL*Net to client
      519  bytes received via SQL*Net from client
      2  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
      9  rows processed
还有个疑惑，执行计划为什么没有走分区？？

sqysl · 发表于 2014-10-10 10:59

执行计划没看出问题，而且也没回表，所以，你也看不到走分区。
估计你这个索引也是local index，那这样，根据你的条件，扫描了local index的两个分区。扫描完两个分区后，最后捞出了18条记录，耗时18s，两个分区占用1G多，按照你的数据量，也不算离谱，按照非全文索引方式，也就这样了，可以试试以下措施：
1、通过index_ffs引导CBO走ffs；
2、缩小条件中时间范围；
这些措施，或许可以对目前的性能有所提高，预计能提高一倍或更高些，但不会有数量级的提升。另外，autot也会对性能有所影响，可以关闭后再试试。
谢谢你的反馈。

vfast21 · 发表于 2014-10-10 11:01

Yong Huang 发表于 2014-10-9 22:38
> 我建了全文索引。但是像“皖DSYQOW”这样的车牌号怎么分词？

Somebody with text indexes on the Chin ...

感谢版主！
oracle全文检索的分词器不能把“皖DSYQOW”的字符分开，
还有如果分开的话那也组合出好多tokens，我感觉全文检索这里不能用？？？
我没有用过全文检索，就是最近查询的资料可能有点片面，如果那里说的不对还请版主纠正！
谢谢版主。

vfast21 · 发表于 2014-10-10 11:30

sqysl 发表于 2014-10-10 10:59
执行计划没看出问题，而且也没回表，所以，你也看不到走分区。
估计你这个索引也是local index，那这样，根 ...

1、使用index_ffs的活，如果like子查询数据大的话也是一个灾难吧？

2、现在经理逼着我不管时间范围，查询数据量一亿，还要快速出结果。

3、索引是全局索引，CREATE INDEX idx_clxx_jgsj_desc_hphm ON clxx(jgsj desc, hphm) ;
4、全文索引分词的时候不太明白怎么分，oracle分词器不能把车牌号分开，只能分汉字、单词、符号，难道我写好多过滤词组？？？
5、谢谢你的解答。

sqysl · 发表于 2014-10-10 11:42

vfast21 发表于 2014-10-10 11:30
1、使用index_ffs的活，如果like子查询数据大的话也是一个灾难吧？
2、现在经理逼着我不管时间范围 ...

如果是global index，那么，缩短时间范围肯定会有性能提升，但可能最后捞不出结果。
ffs，那得看具体情况，如果数据量几年甚至更久，你的条件范围是一个月，那就免了，如果只保留几个月数据量，你的时间范围是1个月，那么还是可以试试的。
你们这业务估计不会保留太久数据，所以，这样，可以试试。
full text索引，我当时看了下，也没用，感觉可操作性和灵活性不太好。你的问题和我当时的问题差别就是我们的结果比较密集，扫描几个索引块，可能就出结果了，你这个结果比较稀疏，要扫描好多数据才能捞出个结果，如果这样，倒不如试试ffs，以并行换效率，反正结果集不会太多，多个order也没关系。

vfast21 · 发表于 2014-10-10 17:22

Yong Huang 发表于 2014-10-9 22:38
> 我建了全文索引。但是像“皖DSYQOW”这样的车牌号怎么分词？

Somebody with text indexes on the Chin ...

谢谢您的回复！
1、lexer分词器不能分像“皖DSYQOW”的字符？？
2、即使可以分，但是像车牌号这样数字和字符组合的得到多少tokens，全文检索在这里是否不适用？
之前没有用过全文检索，最近查些资料，那里说的不对希望指正。

Yong Huang · 发表于 2014-10-10 22:25

> lexer分词器不能分像“皖DSYQOW”的字符？？即使可以分，但是像车牌号这样数字和字符组合的得到多少tokens，全文检索在这里是否不适用？

I wish somebody working at Baidu or Google can jump in. But I think your understanding is correct. Each token will be 像“皖DSYQOW”的字符, or maybe just the Latin spelling part “DSYQOW”. In the latter case, a search for '皖DSYQOW' is probably a search for 'DSYQOW' first and in the result set, filter on those preceded by '皖'.

If we don't consider text indexes, I can think of one strategy: manual "indexing" of your license plate numbers (车牌号). In the column that contains license plate number plus other text (i.e. your column hphm), you only want to search for the license plates, not any other text. Correct? I mean, if the column has value 'some text 皖DSYQOW some other text', you always search for '皖DSYQOW' and don't care the texts before and after it. Correct? If so, create a separate column, that *only* contains the license plate number '皖DSYQOW', and index that column. Then your query will be "... where license_plate = '皖DSYQOW'". But your select-list can include the free text column hphm. Just don't index hphm and don't use that as fuzzy search.

That basically breaks the requirement that your search must be a fuzzy search. For example, if your free text column hphm has this value '皖DSYQOW crashed into 皖ABCDEF on a Sunday morning', and you want to find this row with either '皖DSYQOW' or '皖ABCDEF' in the where-clause, what I suggested won't work.

Yong Huang · 发表于 2014-10-10 22:33

By the way, why is your index not partitioned?

Your 40-second response time (see msg #1) must be mostly due to physical read. If you run the same query the second time, "physical reads" should be 0. What is the response time? If it meets the 3-second requirement, just add plenty of memory to the server and configure a bigger buffer cache. How much memory does the server have and how big is buffer cache?

vfast21 · 发表于 2014-10-11 10:09

Yong Huang 发表于 2014-10-10 22:25
> lexer分词器不能分像“皖DSYQOW”的字符？？即使可以分，但是像车牌号这样数字和字符组合的得到多少token ...

谢谢版主！
1、使用全文检索还有一个问题就是，tokens 是“皖”和“DSYQOW”，当我查“DS"的时候的不到结果，漏查的可能。
2、我数据库中的表车牌号码只存车牌号没有别的字符。例如：皖ABC888
3、因为业务是违法抓拍，如果想要查询一个车辆，但只知道部分号码必须使用模糊查询。

[讨论] 求助 oracle like%..%模糊查询优化

浏览过的版块