技术专题总结：standby Database (取消置顶)

redhill · 发表于 2003-8-20 01:09

请问对于一个150GB左右的数据库，在actived standby database时以及在primary node上面重建 primary database的时候系统所需要的down time分别大约为多少？

Advanced Replication不需要down time吗？

请用任一常用机型举例。谢谢。

最初由 snowhite2000 发布
[B]

不同于OPS和Advanced Replication，使用standby database的时候，无论在actived standby database时，或在primary node上面重建 primary database的时候，系统都需要down time。所需时间长短，与系统状态有关。如果可以在primary mode建立standby database (如果两个server的硬件设置一样，一般standby node要差一些，节约费用) ，可以减少downtime。

[/B]

snowhite2000 · 发表于 2003-8-20 03:12

真正的命令操作，actived standby database，如果你不做冷备份，只需要 2 分钟。

重建 primary DB, 执行的命令行，如果你熟悉整个过程，不会超过 10 分钟。主要的时间是花在拷贝数据文件或者说是冷备份上面。但是有变通的办法，根据你的系统的具体情况。

我一直不理解 AR 怎样做容灾。做HA 我是可以理解的。

husthxd · 发表于 2003-8-20 08:57

fals · 发表于 2003-8-20 20:14

这几天正在研究standby，进展顺利，正高兴的准备做完后自己写篇总结呢，诸兄走在前面了啊。

有几个参数的设置，还在进一步研究中，究竟哪种设置适合哪种情况，做完后再说吧。

piner · 发表于 2003-8-20 20:27

9i的data guard已经进步了很多
其实8i中也可以实现完美切换，即不需要重构主数据库

andy9718 · 发表于 2003-9-2 13:34

3q3q!!

2zy · 发表于 2003-9-2 19:23

很感谢

li2 · 发表于 2004-2-14 11:04

文章写的很好,偶以前没接触过的东西现在也有点清楚了.
整理了一下文档,做成了word 文档,改了若干个字希望楼主不要介意 ,

dodola · 发表于 2004-2-14 13:39

問題描述﹕
1. 8.117的system 表空間﹐hmn表空間﹐cep表空間使用量均過90%﹐需增加數據文件。
2. 8.117和7.117做了standby 備份機制
3. 如何在主機不停機的情況下做數據文件的增加?
問題分析﹕
1. 現有OS空間分步﹕
df -h
Filesystem          Size  Used Avail Use% Mounted on
/dev/cciss/c0d0p3    2.9G  1.7G  1.1G  60% /
/dev/cciss/c0d0p10 8.0G  645M  6.9G 9% /arch
/dev/cciss/c0d0p1    96M  9.1M 82M  10% /boot
/dev/cciss/c0d0p8    5.8G  2.8G  2.7G  51% /db1
/dev/cciss/c0d0p5    19G 13G  5.9G  68% /db2
/dev/cciss/c0d0p6    17G 15G  1.9G  89% /db3
/dev/cciss/c0d0p7    9.6G  3.5G  5.6G  39% /exp
none                1008M    0 1008M 0% /dev/shm
/dev/cciss/c0d0p2    2.9G  2.7G 55M  99% /u
//10.151.7.2/samba 4.8G  478M  4.3G  10% /u/oracle/7.2mount
2. 增加計划(根據后繼的數據量及空間情況)
system表空間 /db1/fox/system02.dbf 500M
hmn表空間 /db2/fox/hmn02.dbf 1000M
cep 表空間 /db2/fox/cep03.dbf    1000M

問題具體的解決步驟描述:
1) 使主庫和備庫的sequence一致
svrmgrl>select max(sequence#) from v$log_history;
2) 使在主庫上﹐對某一表空間增加新的數據文件。
Svrmgrl>alter tablespace system add datafile ‘/db1/fox/system02.dbf’ size 100M;
3) 在主庫上switch log file
svrmgrl>alter system archive log current;或是
svrmgrl>alter system switch log file;
4) 在備庫上﹐應用此log file
svrmgrl>recover automatic standby database;
This SQL statement causes Oracle to stop applying archived redo logs because a datafile on the primary site does not exist on the standby site. Messages similar to the following are displayed when you try to archive the redo logs:
ORA-00283: recovery session canceled due to errors
ORA-01157: cannot identify/lock data file 4 - see DBWR trace file
ORA-01110: data file 4: '/vobs/oracle/dbs/stdby/tbs_4.f'
The error messages indicate that the datafile has been added to the stand by database control file, but the datafile has not been created yet.
5) 在備庫上﹐建立此數據文件。
Svmrgrl>alter database create datafile ‘/db1/fox/system02.dbf’ as ‘/db1/fox/system02.dbf’;
6) 在備庫上做恢愎
svrmgrl>RECOVER automcatic STANDBY DATABASE;

dodola · 发表于 2004-2-14 13:42

問題描述﹕
1. 在做數據庫規划時將一個OLTP的系統的online redo log file 的大小做成了三組每組﹐一個成員﹐每個組員的大小為500K,數據庫處于standby 的模式。
2. 系統的switch log頻繁﹐在alert log 文件中﹐在很多checkpoint 未完成的記錄。如下所示﹕
Mon Aug  4 00:50:37 2003
Thread 1 advanced to log sequence 2
  Current log# 2 seq# 2 mem# 0: /db2/fox/redo02.log
Thread 1 advanced to log sequence 3
  Current log# 3 seq# 3 mem# 0: /db3/fox/redo03.log
Thread 1 cannot allocate new log, sequence 4
Checkpoint not complete
  Current log# 3 seq# 3 mem# 0: /db3/fox/redo03.log
Thread 1 advanced to log sequence 4
  Current log# 1 seq# 4 mem# 0: /db1/fox/redo01.log
Thread 1 advanced to log sequence 5
  Current log# 2 seq# 5 mem# 0: /db2/fox/redo02.log
Thread 1 cannot allocate new log, sequence 6
Checkpoint not complete
  Current log# 2 seq# 5 mem# 0: /db2/fox/redo02.log
3. 系統于2004/1/26號做熱備時當機。
當機時alert 文件中的記錄如下﹕
Sun Jan 25 12:01:13 2004
ARC4: Beginning to archive log# 1 seq# 20520
ARC4: Completed archiving log# 1 seq# 20520
Sun Jan 25 14:10:55 2004
Thread 1 advanced to log sequence 20522
Current log# 3 seq# 20522 mem# 0: /db3/fox/redo03.log
Sun Jan 25 14:10:55 2004
ARC1: Beginning to archive log# 2 seq# 20521
ARC1: Completed archiving log# 2 seq# 20521
Sun Jan 25 14:29:26 2004
Thread 1 advanced to log sequence 20523
Current log# 1 seq# 20523 mem# 0: /db1/fox/redo01.log
Sun Jan 25 14:29:26 2004
ARC3: Beginning to archive log# 3 seq# 20522
ARC3: Completed archiving log# 3 seq# 20522
Sun Jan 25 15:32:28 2004
Thread 1 advanced to log sequence 20524
Sun Jan 25 15:32:28 2004
Current log# 2 seq# 20524 mem# 0: /db2/fox/redo02.log
Sun Jan 25 15:32:28 2004
ARC0: Beginning to archive log# 1 seq# 20523
Sun Jan 25 15:32:29 2004
Current log# 2 seq# 20524 mem# 0: /db2/fox/redo02.log
Sun Jan 25 16:40:42 2004
Thread 1 advanced to log sequence 20525
Current log# 3 seq# 20525 mem# 0: /db3/fox/redo03.log
Sun Jan 25 16:40:42 2004
ARC2: Beginning to archive log# 2 seq# 20524
ARC2: Completed archiving log# 2 seq# 20524
Sun Jan 25 16:40:46 2004
Thread 1 cannot allocate new log, sequence 20526
Checkpoint not complete
Current log# 3 seq# 20525 mem# 0: /db3/fox/redo03.log
Sun Jan 25 20:00:06 2004
alter tablespace SYSTEM begin backup。
Sun Jan 25 20:15:28 2004
Errors in file /u/product/admin/fox/bdump/lgwr_857.trc:
ORA-00600: internal error code, arguments: [2103], [0], [0], [1], [900], [], [], []
LGWR: terminating instance due to error 600
Instance terminated by LGWR, pid = 857

lgwr_857.trc文件中記錄的內容﹕
/u/product/admin/fox/bdump/lgwr_857.trc
Oracle8i Enterprise Edition Release 8.1.7.0.1 - Production
With the Partitioning option
JServer Release 8.1.7.0.1 - Production
ORACLE_HOME = /u/product/oracle817
System name: Linux
Node name: hx-db
Release: 2.4.18-14
Version: #1 Wed Sep 4 13:35:50 EDT 2002
Machine: i686
Instance name: fox
Redo thread mounted by this instance: 1
Oracle process number: 4
Unix process pid: 857, image: oracle@hx-db (LGWR)

*** 2004-01-25 20:15:28.255
*** SESSION ID(3.1) 2004-01-25 20:15:28.198
TIMEOUT ON CONTROL FILE ENQUEUE
mode=X, type=0, wait=1, eqt=900
===================================================
SYSTEM STATE
------------
System global information:
System global information:
Number of NUMA instances : 1
processes: base 77193788, size 415, cleanup 7719443c
allocation: free sessions(0) 7720bcbc, free calls(0) 1
control alloc errors: 0 (process), 0 (session), 0 (call)
system statistics:
0 158767 logons cumulative
0 64 logons current
0 1016361 opened cursors cumulative
0 70 opened cursors current
0 22053 user commits
0 265 user rollbacks
0 1220805 user calls
0 8165864 recursive calls
0 0 recursive cpu usage
0 528695276 session logical reads
0 0 session stored procedure space
0 0 CPU used when call started
0 0 CPU used by this session
0 0 session connect time
0 0 process last non-idle time
0 910948664 session uga memory
1 585456868 session uga memory max
。。。。。。。
問題分析﹕
1. 根據alert file 中的提示為redo log 文件的組數太少或是大小不夠大﹐導致日志文件頻繁switch.
2. 檢查點未完成。
3. 熱備份時﹐數據庫請求控制文件隊列無法獲得。
4. 等待15分钟后超时，出现ORA-600错误﹐如alert 文件中記錄的﹕
Sun Jan 25 20:00:06 2004
alter tablespace SYSTEM begin backup
Sun Jan 25 20:15:28 2004
Errors in file /u/product/admin/fox/bdump/lgwr_857.trc:

問題的解決辦法﹕
1. 在主庫﹕
增加online redo log file五組﹐每組二個成員﹐每個文件50M。
2. 在主庫﹕
switch log 使得原來的online redo log file 處于inactive status.
3. 在主庫﹕
drop online redo log file原來的三組online redo log file。

問題具體的解決步驟描述:
1) 在主庫上增加四組﹐每組二個成員﹐每個文件20M
svrmgrl>alter database add logfile goup 4
('/db1/redo041.log',
'/db2/redo042.log’)
size 50M reuse;
svrmgrl>alter database add logfile group 5
('/db2/redo051.log',
'/db3/redo052.log')
size 50M reuse;
svrmgrl>alter database add logfile group 6
('/db3/redo061.log',
‘/exp/redo062.log’)
size 50M reuse;
svrmgrl>alter database add logfile group 7
   (‘/exp/redo071.log’,
      ‘/db1/redo072.log’)
   size 50M reuse;
2) 在主庫上,查詢當前的log是哪個﹐switch log file ,使得舊的online redo log file 處于inactive 狀態。
Svrmgrl> select  t1.group#, t1.members, t1.status,t1.archived,t2.member
from v$log t1,v$logfile t2
where t1.group#=t2.group#
如果所有舊的redo log file 均處于inactive狀態﹐那么可以drop 舊的online redo log file .否則進行switch log file .
svrmgrl> alter system switch logfile;
3) 在主庫上drop 舊的online redo log file .
svrmgrl> alter database drop logfile group 1;
svrmgrl> alter database drop logfile group 2;
svrmgrl> alter database drop logfile group 3;
4) 在主庫上做一個standby control file
svrmgrl>alter database create standby controlfile as ‘/u/oracle/control.stl.stdby’
5) 將standby 庫與主庫的sequence保持同步
svrmgrl>recover automatic standby database;
svrmgrl>select max(sequence#) from v$log_history;主庫和備庫的查詢結果一致
6) standby庫正常當機。
Svrmgrl> shutdown immediate;
7) 將主庫上的standby control file copy to standby node host
primary host>$ rcp /u/oracle/control.ctl.stdby stdbyhost:/db1/control01.ctl
stdby host>$ cp /db1/control01.ctl /db2/control02.ctl
stdby host>$ cp /db1/cotnrol01.ctl /db3/control03.ctl
8) 使用新的控制文件將standby node host startup
svrmgrl>startup nomount
svrmgrl>alter database mount standby database;
9)比較主備庫兩邊下列SQL的查詢結果是否一致。
Svrmgrl> select t1.group#, t1.members, t1.status,t1.archived,t2.member
from v$log t1,v$logfile t2
where t1.group#=t2.group#

注意﹕當您active 您的備庫時﹐一定記得考主庫上的online redo log file 文件至備庫。
或是稍后有時間時給主庫做一個冷備﹐這很重要﹐萬一主庫備庫需要用archive log 恢愎時﹐不能恢愎到結果變更以前。對主庫做完冷備后﹐將primary node host 的online redo log file 考至standby node host 對應的目錄﹐使用主機和備庫的結構保持一致。您也可不做4)5)6)7)8)9)步﹐這樣您的主備庫的online redo log 的結構不一致﹐當您因主庫原因不能用﹐active 備庫時﹐online redo log 中記錄的沒有archive 的事務記錄將丟失﹐同時也影響你備庫的性能﹐因為備庫現在又面監主庫以前同樣的問題﹐online redo log file swtich 頻繁﹗

[精华] 技术专题总结：standby Database (取消置顶)

Re: 更新，增加第一部份

太好了

當Primary上數據文件增加時﹐我是這樣做的﹐請指教﹗

在primary主機上增加online redo log組數及大小的做法﹐請指教﹗

浏览过的版块