如何修改RAC从SESSION平衡到负载平衡？

ToddBao · 发表于 2009-12-7 11:37

Hi Yong,

可以通过v$servicemetric间接观察。

关于LBA的存在和作用大家自己去看文档先：
http://download.oracle.com/docs/cd/B19306_01/rac.102/b14197/hafeats.htm#BABICAJC

看完之后，关注一下v$servicemetric.

如果你选择goal为NONE, LBA侧重关注v$servicemetric.CPUPERCALL的增长
如果你选择goal为SERIVCE_TIME, LBA侧重关注v$serviemetric.DBTIMEPERSEC/v$serviemetric.CALLSPERSEC的增长
如果你选择goal为THROUGHPUT, LBA侧重关注v$serviemetric.DBTIMEPERCALL的增长

v$servicemetric.GOODNESS表示品质，值越高品质越差。
v$servicemetric.FLAG表示LBA根据你选择的service品质，对Service的态度（是否接受新会话）。
v$servicemetric.DELTA表示LBA对下一个会话对品质造成影响的预判。

我的工作同样也很忙，Yong你也可以利用一下你的资源去证明一下。或者发动大家去做吧。 =)

Todd

Yong Huang · 发表于 2009-12-8 00:02

> 如果你选择goal为NONE, LBA侧重关注v$servicemetric.CPUPERCALL的增长
> 如果你选择goal为SERIVCE_TIME, LBA侧重关注v$serviemetric.DBTIMEPERSEC/v$serviemetric.CALLSPERSEC的增长
> 如果你选择goal为THROUGHPUT, LBA侧重关注v$serviemetric.DBTIMEPERCALL的增长

That sounds correct. I made a mistake in my last message in saying the goal set to service_time probably checks cpu_time/executions in v$sql; it should be elapsed_time/executions. (Corrected it in msg #10)

Alternatively or even better, v$sysmetric has a bunch of metrics that could be used by LBA, such as "Host CPU Utilization (%)", "Database Time Per Sec", "CPU Usage Per Sec", etc. With this view, you don't even need to aggregate over all services from v$servicemetric.

Yong Huang

[ 本帖最后由 Yong Huang 于 2011-6-13 13:02 编辑 ]

Yong Huang · 发表于 2009-12-19 07:09

Intermediate test result, Oracle 10.2.0.4 on RHEL 4:

On server 1, start 4 tight loop bash processes:

while true; do :; done &
while true; do :; done &
while true; do :; done &
while true; do :; done &

col metric_name for a25
select inst_id, round(value,1) val, metric_name, intsize_csec int from gv$sysmetric where metric_name in ('Host CPU Utilization (%)', 'Database Time Per Sec', 'CPU Usage Per Sec') order by metric_name, intsize_csec, inst_id;

INST_ID       VAL METRIC_NAME                      INT
------- ------------ ------------------------- ------------
   1          .4 CPU Usage Per Sec                6010
   2       1.5 CPU Usage Per Sec                6010
   1       8.5 Database Time Per Sec          1502
   2          .8 Database Time Per Sec          1503
   1          1 Database Time Per Sec          6010
   2       2.1 Database Time Per Sec          6010
   1       99.9 Host CPU Utilization (%)       1502
   2       18.1 Host CPU Utilization (%)       1503
   1       98.6 Host CPU Utilization (%)       6010
   2       17.6 Host CPU Utilization (%)       6010

New sqlplus session using a service with service_time as goal does not always go to node 2. So it's not based on "Host CPU Utilization (%)". It seems to correspond to "CPU Usage Per Sec". But needs more test to confirm. This "CPU Usage Per Sec" is obviously a DB specific metric, not related to OS level CPU load or run queue.

Will post again once data is obtained.

Yong Huang

Actually, a better query to check the correlation between load balance redirection and system metric is:

select inst_id, round(value,1) val, metric_name, round(intsize_csec,-3) int
from gv$sysmetric
where metric_name in ('Host CPU Utilization (%)', 'Database Time Per Sec', 'CPU Usage Per Sec', 'Executions Per Sec', 'Redo Generated Per Sec', 'SQL Service Response Time')
order by metric_name, round(intsize_csec,-3), inst_id;

I added a few more metrics that look like possible metrics to control redirection. The name 'SQL Service Response Time' sounds very promising to correlate with service_time goal for a service. The value for this metric is very small so round() should be removed.

Yong Huang

[ 本帖最后由 Yong Huang 于 2009-12-18 22:22 编辑 ]

ToddBao · 发表于 2009-12-19 22:58

“The name 'SQL Service Response Time' sounds very promising to correlate with service_time goal for a service. ”

恩，就像我在二楼说过的：
“如果你要根据Service的服务时间也就大致是根据响应时间
exec dbms_serivce.modify_service(...,GOAL=>dbms_service.GOAL_SERVICE_TIME,CLB_GOAL=>dbms_service.CLB_GOAL_SHORT)”

如果有时间，可以再继续证明余下的几个，RAC的Connect-Time Load Balancing一共有4种算法。

Todd

Yong Huang · 发表于 2009-12-22 04:57

The challenge in designing a good experiment to identify the metrics responsible for service_time or throughput, and long and short connection load balancing goal, is that there's no easy way to independently change one metric without changing the others. You can easily change host CPU usage without changing a sysmetric inside Oracle (other than 'Host CPU Utilization (%)'). But beyond that, you can't easily increase, for instance, 'Database Time Per Sec' while you maintain about the same 'CPU Usage Per Sec' or 'Executions Per Sec' or other metrics.

Yong Huang

warehouse · 发表于 2011-6-9 01:35

原帖由 ToddBao 于 2009-12-7 11:37 发表
Hi Yong,

可以通过v$servicemetric间接观察。

关于LBA的存在和作用大家自己去看文档先：
http://download.oracle.com/docs/ ... afeats.htm#BABICAJC

看完之后，关注一下v$servicemetric.

如果你选择goal为NONE, LBA侧重关注v$servicemetric.CPUPERCALL的增长
如果你选择goal为SERIVCE_TIME, LBA侧重关注v$serviemetric.DBTIMEPERSEC/v$serviemetric.CALLSPERSEC的增长
如果你选择goal为THROUGHPUT, LBA侧重关注v$serviemetric.DBTIMEPERCALL的增长

v$servicemetric.GOODNESS表示品质，值越高品质越差。
v$servicemetric.FLAG表示LBA根据你选择的service品质，对Service的态度（是否接受新会话）。
v$servicemetric.DELTA表示LBA对下一个会话对品质造成影响的预判。

我的工作同样也很忙，Yong你也可以利用一下你的资源去证明一下。或者发动大家去做吧。 =)

Todd

无意中翻到了这个帖子，感觉oracle对server端的或者叫runtime load balancing介绍的不够明确呢

[ 本帖最后由 warehouse 于 2011-6-9 01:37 编辑 ]

yyp2009 · 发表于 2011-6-12 21:54

Yong Huang · 发表于 2011-6-14 05:23

This old thread caught my interest again. ToddBao's message #11 says

> 如果你选择goal为NONE, LBA侧重关注v$servicemetric.CPUPERCALL的增长
> 如果你选择goal为SERIVCE_TIME, LBA侧重关注v$serviemetric.DBTIMEPERSEC/v$serviemetric.CALLSPERSEC的增长
> 如果你选择goal为THROUGHPUT, LBA侧重关注v$serviemetric.DBTIMEPERCALL的增长
>
> v$servicemetric.GOODNESS表示品质，值越高品质越差。

My test on an 8-node cluster running Oracle 10.2.0.4 (RHEL4) shows that the lowest goodness indeed indicates the instance my connection goes to. Very good. But the goal value service_time (the second line above) doesn't hold true for me. I have multiple services. All have goal set to service_time (most with clb_goal set to long and two set to short, but that's irrelevant). According to ToddBao's, we should see the lowest goodness for the lowest v$serviemetric.DBTIMEPERSEC/v$serviemetric.CALLSPERSEC. Do I understand that correctly? In fact, I checked more ratios, with the SQL below:

col inst_id for 9
col service_name for a25
select inst_id, service_name, ELAPSEDPERCALL, CPUPERCALL, DBTIMEPERCALL, DBTIMEPERSEC, CALLSPERSEC, dbtimepercall/callspersec dbc_cs, dbtimepersec/callspersec dbs_cs, cpupercall/callspersec cpuc_cs, elapsedpercall/callspersec ec_cs, GOODNESS, DELTA, FLAGS
from gv$servicemetric
where CALLSPERSEC != 0 or DBTIMEPERSEC != 0 order by service_name, goodness;

(Rows with non-zero flags should probably be ignored.) I watched a while. Unfortunately, the rows with the lowest goodness for a given service do not necessarily have the lowest of any of the ratios I computed.

Yong Huang

Yong Huang · 发表于 2011-6-14 05:38

Followup to my last message #18. Think about v$serviemetric.DBTIMEPERSEC/v$serviemetric.CALLSPERSEC. And then the column v$serviemetric.DBTIMEPERCALL. They are exactly the same! DB time per sec divided by calls per sec. Isn't that the same as DB time per call? In fact, you can verify that in the output of my query. Compare the value under DBTIMEPERCALL and DBS_CS (the column alias I used). This looks cleaner:

SQL> select inst_id, DBTIMEPERCALL, dbtimepersec/callspersec dbs_cs from gv$servicemetric where (CALLSPERSEC != 0 or DBTIMEPERSEC != 0) and service_name = 'myservicename';

INST_ID DBTIMEPERCALL    DBS_CS
------- ------------- ------------
   4  828.27842566 .08282784257
   3 1368.845339  .1368845339
   3  816.25373134 .08162537313
   2  840.01576522 .08400157652
   2  14614.470588 1.4614470588
   5  653.81259943 .06538125994
   5  38572.179104 3.8572179104

You see the same numbers on each row (except for different units being used).

The difference between "选择goal为SERIVCE_TIME" and "选择goal为THROUGHPUT" is still a mystery.

Yong Huang

[ 本帖最后由 Yong Huang 于 2011-6-16 15:28 编辑 ]

如何修改RAC从SESSION平衡到负载平衡？

浏览过的版块