如何解决library cache latch冲突？

zjxs · 发表于 2005-1-21 11:08

谢谢各位，再补充一些信息：
1、shared pool是208m，因为使用动态sql很少，所以shared pool实际使用率很低，从x$ksmsp可以看到空闲在170m以上

2、压力是模拟工具产生的，事务量是很大，其中修改的比率也比较高，是业务逻辑决定的，所以没法更改。这个环境也基本是符合生产环境的。redo的总量应该是不算高的，都是小事务，不过很频繁，头疼啊。

3、存储是8个磁盘，4个一组做raid 0+1

4、log file sync是很高，受设备和应用逻辑限制暂时无法调整。在10个客户端时，log file sync等待排在第一位，增加到20个客户端时，latch free就上升到第一位了，而总的事务处理能力tps基本没有变化。

5、log buffer 32m，没有发现log buffer space等待事件。这个值好像过大了。

计划参考hanson的建议，调整_kgl_latch_count来重新测试一把。

疑问：进程等待log file sync时，可能持有library cache lock或library cache pin，但应该不会持有library cache latch或library cache pin latch吧，我觉得这些latch只会在sql执行前获取library cache lock和library cache pin，或者执行后释放library cache lock和library cache pin时才会用到？

zjxs · 发表于 2005-1-21 11:09

每个客户端是1个连接。

zjxs · 发表于 2005-1-21 11:16

还有一个有疑问的地方：
sar显示cpu的idle应该在35％左右，但同时显示每个cpu的run-queue维持在2.5～3之间，有这么多进程等待执行，cpu却在空闲？

biti_rainy · 发表于 2005-1-21 11:30

等前面的事务完成啊

瓶颈不在 shared  pool  啊，在  事务和日志方面啊，事务多日志写不及，影响进程间通信，dml  latch 也有影响，对于 latch 来说，持有时间增加也是可能的。

你的每秒 parse不多,latch  free 也并不是由于 sql造成的而是事务造成的啊。

更准确的，你检查  v$latch  and v$latch_children  ,看看到底是什么latch最多

redo allocation 115350794 -------  日志生成
library cache pin 135497355
dml lock allocation 183798454 ------  事务
enqueue hash chains 253989608 ------  锁（事务）

victor666666 · 发表于 2005-1-21 12:24

我感觉并发性有问题，也就是程序锁了不该或过多的锁了记录，

还有，share pool 和动态sql有关系吗，bind不是用的很多吗，怎么share pool还用的这么少，

唉，隔靴子搔痒啊

zjxs · 发表于 2005-1-24 12:56

（格式比较难看，可以直接下载附件看，或者哪位DX能说明一下如何格式化文本）
重新测试了一把，因为环境被人拿走用了一段时间，所以数据库重建了，这是按以前的环境设置做的测试报告：

Load Profile
~~~~~~~~~~~~                         Per Second    Per Transaction
                                 ---------------    ---------------
               Redo size:       1,876,202.75             1,003.83
            Logical reads:          39,885.03                21.34
            Block changes:          12,530.13                6.70
         Physical reads:                0.67                0.00
         Physical writes:             638.13                0.34
               User calls:             2,998.87                1.60
                  Parses:             157.07                0.08
            Hard parses:                0.11                0.00
                  Sorts:             888.65                0.48
                  Logons:                0.00                0.00
               Executes:          13,308.98                7.12
            Transactions:             1,869.04

  % Blocks changed per Read: 31.42 Recursive Call %: 85.99
Rollback per transaction %: 0.00    Rows per Sort:    0.17

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         Buffer Nowait %: 99.98    Redo NoWait %: 99.99
         Buffer  Hit %:  100.00 In-memory Sort %:  100.00
         Library Hit %: 99.98       Soft Parse %: 99.93
      Execute to Parse %: 98.82       Latch Hit %: 99.30
Parse CPU to Parse Elapsd %: 30.73    % Non-Parse CPU: 99.47

Shared Pool Statistics       Begin End
                           ------  ------
         Memory Usage %: 20.23 20.74
% SQL with executions>1: 73.37 80.78
  % Memory for SQL w/exec>1: 71.44 82.99

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~                                                    % Total
Event                                              Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
latch free                                        90,143       722 50.11
log file sync                                  240,688       355 24.61
CPU time                                                       336 23.32
log file parallel write                         162,024       10    .66
LGWR wait for redo copy                         55,611          8    .54
      -------------------------------------------------------------

先设置_wait_for_sync为false，把log file sync等待给去掉，性能有所提高（因为少了一个负担），但latch冲突未解决，在所有等待事件中占了90％以上：

Load Profile
~~~~~~~~~~~~                         Per Second    Per Transaction
                                 ---------------    ---------------
               Redo size:       1,893,089.81             941.95
            Logical reads:          45,715.06                22.75
            Block changes:          12,602.65                6.27
         Physical reads:                1.47                0.00
         Physical writes:             653.24                0.33
               User calls:             3,221.75                1.60
                  Parses:             227.76                0.11
            Hard parses:                0.08                0.00
                  Sorts:             954.04                0.47
                  Logons:                0.00                0.00
               Executes:          14,984.16                7.46
            Transactions:             2,009.75

  % Blocks changed per Read: 27.57 Recursive Call %: 86.46
Rollback per transaction %: 0.00    Rows per Sort:    0.26

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         Buffer Nowait %:  100.00    Redo NoWait %: 99.99
         Buffer  Hit %:  100.00 In-memory Sort %:  100.00
         Library Hit %: 99.99       Soft Parse %: 99.97
      Execute to Parse %: 98.48       Latch Hit %: 99.36
Parse CPU to Parse Elapsd %: 25.56    % Non-Parse CPU: 99.25

Shared Pool Statistics       Begin End
                           ------  ------
         Memory Usage %: 19.79 20.33
% SQL with executions>1: 69.82 78.01
  % Memory for SQL w/exec>1: 69.89 81.79

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~                                                    % Total
Event                                              Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
latch free                                     154,295    1,258 66.65
CPU time                                                       577 30.53
LGWR wait for redo copy                         81,648       22    1.18
log file parallel write                         261,890          8    .40
log file switch completion                         180          7    .37

把_wait_for_sync参数取消，设置_kgl_latch_count为11，继续测试（和最初的测试相比，看不出有什么变化）：
Load Profile
~~~~~~~~~~~~                         Per Second    Per Transaction
                                 ---------------    ---------------
               Redo size:       1,772,567.03             948.93
            Logical reads:          42,779.00                22.90
            Block changes:          11,725.92                6.28
         Physical reads:                1.40                0.00
         Physical writes:             629.41                0.34
               User calls:             2,988.61                1.60
                  Parses:             217.69                0.12
            Hard parses:                0.07                0.00
                  Sorts:             884.79                0.47
                  Logons:                0.00                0.00
               Executes:          14,013.93                7.50
            Transactions:             1,867.96

  % Blocks changed per Read: 27.41 Recursive Call %: 86.53
Rollback per transaction %: 0.00    Rows per Sort:    0.13

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         Buffer Nowait %:  100.00    Redo NoWait %: 99.99
         Buffer  Hit %:  100.00 In-memory Sort %:  100.00
         Library Hit %: 99.99       Soft Parse %: 99.97
      Execute to Parse %: 98.45       Latch Hit %: 99.43
Parse CPU to Parse Elapsd %: 27.16    % Non-Parse CPU: 99.26

Shared Pool Statistics       Begin End
                           ------  ------
         Memory Usage %: 19.74 20.26
% SQL with executions>1: 69.11 76.90
  % Memory for SQL w/exec>1: 68.11 80.15

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~                                                    % Total
Event                                              Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
latch free                                     134,963    1,114 49.58
CPU time                                                       550 24.50
log file sync                                  384,795       540 24.04
log file parallel write                         261,858       15    .65
LGWR wait for redo copy                         82,348       12    .52

个人的一点看法：
1、latch冲突和log file sync等待没有关系，去掉了log file sync，冲突依然很高
2、biti说的对，和shared pool应该没有关系，其实library cache、enqueue、dml lock等相关的latch冲突都很高，只是恰好library cache latch排在第一位而已，所以调整_kgl_latch_count参数没有效果也就很正常了
3、感觉象是cpu的瓶颈，4个cpu实在处理不过来20个连续施加压力的进程了，sar和vmstat命令显示每个cpu的run-queue在3左右，不能理解的是系统的idle在30％左右。
4、shared sql避免了重复parse语句的负担，不过也增加了冲突的可能性，因为执行sql语句要加锁，尤其是在会话比较多而sql语句比较少的时候，冲突尤为明显，本来想在不同进程的sql语句中加入和连接相关信息防止sql共享，从而验证以上的想法，不过测试工具不支持，只好作罢！

hanson · 发表于 2005-1-24 14:14

感觉确实如biti所说，主要问题出在redo log上。即便把_wait_for_sync=false，只是不让log file sync 等待事件出现，但是取而代之的LGWR wait for redo copy。
是不是你们的系统commit或rollbac的次数太频繁了？经常会在一个循环里面进行提交吗？这个是应用系统级别的，如果不能调整应用程序的话，那只能把redo log file放在更快的磁盘上了。
另外，log buffer是不是太大了？改成1M试试？

zjxs · 发表于 2005-1-24 19:34

设置_wait_for_sync为false后，LGWR wait for redo copy只有1.18％，不至于影响这么大，因为latch free有66％，cpu time有30％，而且LGWR wait for redo copy是LGWR进程等待（LGWR应该不会对library cache、enqueue等latch感兴趣吧？），并不是server进程等待，只要不产生log buffer space等待，应该不影响server进程处理能力。

biti_rainy · 发表于 2005-1-24 20:05

_wait_for_sync为false

顾名思义，我认为应该是说当用户提交事务后不需要等到被写入日志文件就可以继续下面的事务，这可能提高事务的并发性（风险可能就是崩溃的时候丢失已经提交的数据），但是并不代表着日志和事务相关的压力减小啊

你应该查询 v$latch and v$latch_shildren 的 sleeps ，看到底是哪些latch的等到最多啊

wing hong · 发表于 2005-1-25 12:53

And if you are sure library cache is very high,  try to set  "session_cached_cursors"  from the default 0 to 200, or 400  , or more,  ( keep testing it , since you have many free memory , you should use some in the PGA ) .

The reason is even you use bind variable, and  all your parse is soft parse, this is still parse,  Oracle still need to confirm the syntax and semantics of the SQL, and this check needs the library cache.

so if you cache the cursor in your PGA by setting the session_cached_cursors to a high number, then it kind of reduce the burdon on the library cache by  geting rid of the soft parse.

HTH