RAC increases points of failure

jieyancai · 发表于 2014-3-7 14:35

wolfop 发表于 2014-3-7 11:11
我再强调一点，Extend RAC导致双站点数据库紧耦合，根本不能用来做容灾。
所谓的双活就是双死

兄台觉得vplex如何？

wolfop · 发表于 2014-3-7 22:34

jieyancai 发表于 2014-3-7 14:35
兄台觉得vplex如何？

你去找找EMC的白皮书，原来写的很清楚要三台存储，voting需要不通过vplex管理，发在三个站点，三份拷贝(asm normal redundancy)才能保证单纯的站点故障，存储故障的RTO=0。结果给人忽悠的时候改了，为了节约成本，变成全部通过VPLEX管理，如果单存储故障、站点故障等情况，很大几率集群会重启，RTO!=0。
更要命的是解决不了我说的Extend RAC双死的问题，如果数据库逻辑故障，SCN不一致而无法实例恢复，全部完蛋。
非要做extend RAC，ASM normal redundancy足以，但是我仍然认为extend RAC不是容灾方案。更多是噱头。某个超大级别的移动发生过两次核心系统关键数据损坏无法实例恢复，最近又一个中等规模的省发生了，基于存储复制的容灾完全无能为力。
其实自己试试就知道，要容灾，可以模拟以下步骤，看看你的容灾能否解决。dd if=/dev/zero of=system表空间或者undo或者active的redolog所用的存储。
别给我说这种情况不能发生，你能保证你CPU或者内存故障的时候写到磁盘上的东西100%正确？
死的例子比比皆是。

jieyancai · 发表于 2014-3-7 23:07

wolfop 发表于 2014-3-7 22:34
你去找找EMC的白皮书，原来写的很清楚要三台存储，voting需要不通过vplex管理，发在三个站点，三份拷贝(a ...

来张官方图，问题出在哪里？

wolfop · 发表于 2014-3-8 12:01

两个问题
1）最严重的情况，VPLEX这种东西是不理解数据库的，上面写什么，就向磁盘写什么。如果发生严重错误，把redo log/system table space/undo写坏，两边都坏，双死。请从备份恢复，RTO肯定很长0，RPO估计也难是0。自己测试用我上面的方式模拟一下看看。这个的本质问题和基于存储复制的数据库容灾是一样的SRDF/S, PPRC同步，TRUE COPY同步解决不了的问题，vplex一样不行。DG没有这个问题，不是紧密耦合。
2）如果其中一个站但完蛋或者一个站点的存储完蛋或者一个站点的VPLEX完蛋，其写必须重试超时才放弃。这段期间写可能超时，数据文件还好，control file肯定不行，voting访问也可能超时，CRS会重启节点，导致RTO!=0.
自己去找他白皮书的一句话“In the case of VPLEX interconnect partitioning (or a true site failure) VPLEX immediately suspends IOs at both clusters while VPLEX Witness resolves which of them should continue to service IOs (based on detach rules and site availability). The Oracle cluster nodes will therefore have access to voting disks only where VPLEX resumes IO, and Oracle clusterware will therefore reconfigure the cluster in accordance.”
然后请他解释一下"The Oracle cluster nodes will therefore have access to voting disks only where VPLEX resumes IO, and Oracle clusterware will therefore reconfigure the cluster in accordance."到底会发生什么
3)跨站点的cache fusion，明显VPLEX是把这个问题留给ORACLE，反正除了问题也不管，也可以说不关他事。

jieyancai · 发表于 2014-3-8 19:46

wolfop 发表于 2014-3-8 12:01
两个问题
1）最严重的情况，VPLEX这种东西是不理解数据库的，上面写什么，就向磁盘写什么。如果发生严重错 ...

Yong Huang · 发表于 2014-3-8 23:33

wolfop 发表于 2014-3-6 21:11
我再强调一点，Extend RAC导致双站点数据库紧耦合，根本不能用来做容灾。
所谓的双活就是双死

I don't quite understand why a 紧耦合 (some kind of tight coupling?) cannot be used for DR. I understand DR to be the case where one data center is destroyed and all users are switched to the other data center. Why do you think extended RAC can't do that?

My posting here is about users experiencing temporary disconnections when some RAC instances crash. You brought up a discussion of DR, which is a separate issue. Maybe we need to open a new thread for that topic.

wolfop · 发表于 2014-3-9 12:22

Kamus 发表于 2014-3-8 23:30
1. 我了解你说的双死是有逻辑损坏，或者人为故障。这一点确实没错，Extend RAC说到底是RAC，RAC自身本来就 ...

对于同步DG对网络的需求，肯定是有一定要求的。现在的问题主要在于FC采用的dark fiber或者DWDM质量比较一般基于IP的网络会比较好。
如果按照FC网络的要求要求专门建设DG的IP网络，减少之间的路由器、交换机个数确保低的延迟<5ms，提供足够的带宽，比如10GE，是完全没有问题的。最近的一个测试，最大可用性模式下，采用10GE+1ms的延迟，logfile sync不过从2ms增加到8-9ms而已。相比之下，采用extend RAC，由于存储在远程也必须写成功，尤其vplex还是write through的cache，其log file sync延迟增加可能更可怕。虽然我没有直接的VPLEX做extend RAC导致log file sync的延迟增加，我手上有一个用SRDF/S做同步容灾的，其log file sync平均达到20-3ms
另外，如果你的网络带宽和延迟均够的情况下如果DG还有比较严重的log file sync，恐怕要考虑是特定版本的BUG，有相应patch。

wolfop · 发表于 2014-3-9 12:22

Yong Huang 发表于 2014-3-8 23:33
I don't quite understand why a 紧耦合 (some kind of tight coupling?) cannot be used for DR. I unde ...

Yes, it is really another topic. Just about extend RAC which is I will never use it for DR。

Yong Huang · 发表于 2014-3-11 00:04

What you described is high latency causing long log file sync (or other wait events) in case of extended RAC. That has nothing to do with DR (and also not related to the original topic of this discussion thread). But your point makes sense: extended RAC may have log file sync longer than that in a local non-extended RAC or single instance database.

mike79 · 发表于 2014-3-22 02:55

本帖最后由 mike79 于 2014-3-22 02:59 编辑

wolfop 发表于 2014-3-7 22:34
你去找找EMC的白皮书，原来写的很清楚要三台存储，voting需要不通过vplex管理，发在三个站点，三份拷贝(a ...

这个未必是忽悠。用了vplex，还将三个vote disk放在三个站点可能是有问题的：一旦中间链路断开，vplex可能和RAC做出不同的仲裁。比如vplex将站点A上的存储挂起，而RAC将站点B上的节点踢出去。这样整个RAC就挂了。
“An acknowledgement of the write I/O needs to be received in 200 seconds under normal operations (long disk timeout) and 27 seconds during a reconfiguration in the cluster (short disk timeout).”
“In other words, the connectivity to the third location should ensure that the Voting File write I/O can be acknowledged in 27/2 seconds (approx. 14 seconds)”
14秒时间给vplex做仲裁一般应该足够了。