|
RAC中一次混乱的性能诊断过程
众所周知在RAC中,问题很可能来自于CACHE FUSION(内存融合)的机制,简单的说就是CACHE BUFFER中的块在内存融合的机制下通过LMD进程进行传递,比如我节点1需要访问数据块A,通过SEND MESSAGE到MASTER节点询问块A所在的节点,MASTER告诉节点1 A块在节点2的BUFFER CACHE中,这个时候又可能是当前读和一致性读,如果块正在被修改就会构架PI来进行一致性读,传递块给节点1,这个时候PCM锁是X模式,如果当前块A没有被修改,只是被节点2SELECT,这个时候就是当前读,传递当前块给节点1,这个时候PCM锁是S模式,这个大概就是内存融合的机制。
最近遇到一个问题我们数据库服务器在高负载的情况下,CPU降低到40%左右,我试图找到问题,因为数据库服务器是RAC,并且AWRRPT的TOP5 EVENT如下:
| | | | | | | | | | | | gc cr multi block request | | | | | | | | | | | | | | | | | | | | | | | |
首先可以肯定的是这个问题绝对是处在块传递上,解决这样的问题有2个方法
1、
用SERVICE 来分离业务,通过分离业务让特定的数据块只在指定的节点里进行访问,避免传递。
2、
尽可能的减少逻辑读,逻辑读少了自然传递的可能性就小了,也就是这一点让我前期的诊断陷入了错误的方向。
分离业务是不能的,因为牵涉太大,所以我自然而然的想到了减少逻辑读,带着这个理念我进入了查看SQL的阶段,首先当然是查看最耗时的SQL
SQL ordered by Elapsed Time - Resources reported for PL/SQL code includes the resources used by all SQL statements called by the code.
- % Total DB Time is the Elapsed Time of the SQL statement divided into the Total Database Time multiplied by 100
- Total DB Time (s): 5,870
- Captured SQL account for 16.7% of Total
| | | | | | | | | | | | | | | SELECT * FROM ( Select RowNum ... | | | | | | | | SELECT COUNT(*) FROM TEST1... | | | | | | | | SELECT * FROM ( Select RowNum ... | | | | | | | | SELECT COUNT(*) FROM TEST1 ... | | | | | | | | SELECT * FROM ( Select RowNum ... | | | | | | | | select ti.serialno itemNo, ti.... | | | | | | | | SELECT * FROM ( Select RowNum ... | | | | | | | | SELECT COUNT(*) FROM TEST1 ... | | | | | | | | SELECT DISTINCT Wflog.* FROM W... | | | | | | | | SELECT COUNT(*) FROM TEST1 ... |
结果可以看到基本没有特别耗时的SQL,所以想通过优化单个SQL来大幅度提高性能的想法是不能实现的,然后进入查看最多逻辑读取的SQL
- Resources reported for PL/SQL code includes the resources used by all SQL statements called by the code.
- Total Buffer Gets: 584,021,980
- Captured SQL account for 32.1% of Total
| | | | | | | | | | | | | | | | | SELECT * FROM ( Select RowNum ... | | | | | | | | | SELECT COUNT(*) FROM TEST1... | | | | | | | | | SELECT * FROM ( Select RowNum ... | | | | | | | | | SELECT COUNT(*) FROM TEST1... | | | | | | | | | SELECT COUNT(*) FROM TEST1... | | | | | | | | | SELECT * FROM ( Select RowNum ... | | | | | | | | | SELECT COUNT(*) FROM TEST1... | | | | | | | | | SELECT * FROM ( Select RowNum ... | | | | | | | | | SELECT * FROM ( Select RowNum ... | | | | | | | | | SELECT COUNT(*) FROM
TEST1... | | | | | | | | | SELECT * FROM ( Select RowNum ... |
通过分析这里可能出现问题,因为SQL都是差不多或者说引用的对象都差不多,我最开始想如果能够找到方法减少这里的逻辑读,那问题一定能够改善,接着直接查看逻辑读最多的SEGMENT
Segments by Logical Reads - Total Logical Reads: 584,021,980
- Captured Segments account for 99.1% of Total
分析后这些SEGMENT都是以上语句涉及到的,因为语句基本同形,我就开始了SQL调优。 |
|