|
在primary主機上增加online redo log組數及大小的做法﹐請指教﹗
問題描述﹕
1. 在做數據庫規划時將一個OLTP的系統的online redo log file 的大小做成了三組每組﹐一個成員﹐每個組員的大小為500K,數據庫處于standby 的模式。
2. 系統的switch log頻繁﹐在alert log 文件中﹐在很多checkpoint 未完成的記錄。如下所示﹕
Mon Aug 4 00:50:37 2003
Thread 1 advanced to log sequence 2
Current log# 2 seq# 2 mem# 0: /db2/fox/redo02.log
Thread 1 advanced to log sequence 3
Current log# 3 seq# 3 mem# 0: /db3/fox/redo03.log
Thread 1 cannot allocate new log, sequence 4
Checkpoint not complete
Current log# 3 seq# 3 mem# 0: /db3/fox/redo03.log
Thread 1 advanced to log sequence 4
Current log# 1 seq# 4 mem# 0: /db1/fox/redo01.log
Thread 1 advanced to log sequence 5
Current log# 2 seq# 5 mem# 0: /db2/fox/redo02.log
Thread 1 cannot allocate new log, sequence 6
Checkpoint not complete
Current log# 2 seq# 5 mem# 0: /db2/fox/redo02.log
3. 系統于2004/1/26號做熱備時當機。
當機時alert 文件中的記錄如下﹕
Sun Jan 25 12:01:13 2004
ARC4: Beginning to archive log# 1 seq# 20520
ARC4: Completed archiving log# 1 seq# 20520
Sun Jan 25 14:10:55 2004
Thread 1 advanced to log sequence 20522
Current log# 3 seq# 20522 mem# 0: /db3/fox/redo03.log
Sun Jan 25 14:10:55 2004
ARC1: Beginning to archive log# 2 seq# 20521
ARC1: Completed archiving log# 2 seq# 20521
Sun Jan 25 14:29:26 2004
Thread 1 advanced to log sequence 20523
Current log# 1 seq# 20523 mem# 0: /db1/fox/redo01.log
Sun Jan 25 14:29:26 2004
ARC3: Beginning to archive log# 3 seq# 20522
ARC3: Completed archiving log# 3 seq# 20522
Sun Jan 25 15:32:28 2004
Thread 1 advanced to log sequence 20524
Sun Jan 25 15:32:28 2004
Current log# 2 seq# 20524 mem# 0: /db2/fox/redo02.log
Sun Jan 25 15:32:28 2004
ARC0: Beginning to archive log# 1 seq# 20523
Sun Jan 25 15:32:29 2004
Current log# 2 seq# 20524 mem# 0: /db2/fox/redo02.log
Sun Jan 25 16:40:42 2004
Thread 1 advanced to log sequence 20525
Current log# 3 seq# 20525 mem# 0: /db3/fox/redo03.log
Sun Jan 25 16:40:42 2004
ARC2: Beginning to archive log# 2 seq# 20524
ARC2: Completed archiving log# 2 seq# 20524
Sun Jan 25 16:40:46 2004
Thread 1 cannot allocate new log, sequence 20526
Checkpoint not complete
Current log# 3 seq# 20525 mem# 0: /db3/fox/redo03.log
Sun Jan 25 20:00:06 2004
alter tablespace SYSTEM begin backup。
Sun Jan 25 20:15:28 2004
Errors in file /u/product/admin/fox/bdump/lgwr_857.trc:
ORA-00600: internal error code, arguments: [2103], [0], [0], [1], [900], [], [], []
LGWR: terminating instance due to error 600
Instance terminated by LGWR, pid = 857
lgwr_857.trc文件中記錄的內容﹕
/u/product/admin/fox/bdump/lgwr_857.trc
Oracle8i Enterprise Edition Release 8.1.7.0.1 - Production
With the Partitioning option
JServer Release 8.1.7.0.1 - Production
ORACLE_HOME = /u/product/oracle817
System name: Linux
Node name: hx-db
Release: 2.4.18-14
Version: #1 Wed Sep 4 13:35:50 EDT 2002
Machine: i686
Instance name: fox
Redo thread mounted by this instance: 1
Oracle process number: 4
Unix process pid: 857, image: oracle@hx-db (LGWR)
*** 2004-01-25 20:15:28.255
*** SESSION ID(3.1) 2004-01-25 20:15:28.198
TIMEOUT ON CONTROL FILE ENQUEUE
mode=X, type=0, wait=1, eqt=900
===================================================
SYSTEM STATE
------------
System global information:
System global information:
Number of NUMA instances : 1
processes: base 77193788, size 415, cleanup 7719443c
allocation: free sessions(0) 7720bcbc, free calls(0) 1
control alloc errors: 0 (process), 0 (session), 0 (call)
system statistics:
0 158767 logons cumulative
0 64 logons current
0 1016361 opened cursors cumulative
0 70 opened cursors current
0 22053 user commits
0 265 user rollbacks
0 1220805 user calls
0 8165864 recursive calls
0 0 recursive cpu usage
0 528695276 session logical reads
0 0 session stored procedure space
0 0 CPU used when call started
0 0 CPU used by this session
0 0 session connect time
0 0 process last non-idle time
0 910948664 session uga memory
1 585456868 session uga memory max
。。。。。。。
問題分析﹕
1. 根據alert file 中的提示為redo log 文件的組數太少或是大小不夠大﹐導致日志文件頻繁switch.
2. 檢查點未完成。
3. 熱備份時﹐數據庫請求控制文件隊列無法獲得。
4. 等待15分钟后超时,出现ORA-600错误﹐如alert 文件中記錄的﹕
Sun Jan 25 20:00:06 2004
alter tablespace SYSTEM begin backup
Sun Jan 25 20:15:28 2004
Errors in file /u/product/admin/fox/bdump/lgwr_857.trc:
問題的解決辦法﹕
1. 在主庫﹕
增加online redo log file五組﹐每組二個成員﹐每個文件50M。
2. 在主庫﹕
switch log 使得原來的online redo log file 處于inactive status.
3. 在主庫﹕
drop online redo log file原來的三組online redo log file。
問題具體的解決步驟描述:
1) 在主庫上增加四組﹐每組二個成員﹐每個文件20M
svrmgrl>alter database add logfile goup 4
('/db1/redo041.log',
'/db2/redo042.log’)
size 50M reuse;
svrmgrl>alter database add logfile group 5
('/db2/redo051.log',
'/db3/redo052.log')
size 50M reuse;
svrmgrl>alter database add logfile group 6
('/db3/redo061.log',
‘/exp/redo062.log’)
size 50M reuse;
svrmgrl>alter database add logfile group 7
(‘/exp/redo071.log’,
‘/db1/redo072.log’)
size 50M reuse;
2) 在主庫上,查詢當前的log是哪個﹐switch log file ,使得舊的online redo log file 處于inactive 狀態。
Svrmgrl> select t1.group#, t1.members, t1.status,t1.archived,t2.member
from v$log t1,v$logfile t2
where t1.group#=t2.group#
如果所有舊的redo log file 均處于inactive狀態﹐那么可以drop 舊的online redo log file .否則進行switch log file .
svrmgrl> alter system switch logfile;
3) 在主庫上drop 舊的online redo log file .
svrmgrl> alter database drop logfile group 1;
svrmgrl> alter database drop logfile group 2;
svrmgrl> alter database drop logfile group 3;
4) 在主庫上做一個standby control file
svrmgrl>alter database create standby controlfile as ‘/u/oracle/control.stl.stdby’
5) 將standby 庫與主庫的sequence保持同步
svrmgrl>recover automatic standby database;
svrmgrl>select max(sequence#) from v$log_history;主庫和備庫的查詢結果一致
6) standby庫正常當機。
Svrmgrl> shutdown immediate;
7) 將主庫上的standby control file copy to standby node host
primary host>$ rcp /u/oracle/control.ctl.stdby stdbyhost:/db1/control01.ctl
stdby host>$ cp /db1/control01.ctl /db2/control02.ctl
stdby host>$ cp /db1/cotnrol01.ctl /db3/control03.ctl
8) 使用新的控制文件將standby node host startup
svrmgrl>startup nomount
svrmgrl>alter database mount standby database;
9)比較主備庫兩邊下列SQL的查詢結果是否一致。
Svrmgrl> select t1.group#, t1.members, t1.status,t1.archived,t2.member
from v$log t1,v$logfile t2
where t1.group#=t2.group#
注意﹕當您active 您的備庫時﹐一定記得考主庫上的online redo log file 文件至備庫。
或是稍后有時間時給主庫做一個冷備﹐這很重要﹐萬一主庫備庫需要用archive log 恢愎時﹐不能恢愎到結果變更以前。對主庫做完冷備后﹐將primary node host 的online redo log file 考至standby node host 對應的目錄﹐使用主機和備庫的結構保持一致。您也可不做4)5)6)7)8)9)步﹐這樣您的主備庫的online redo log 的結構不一致﹐當您因主庫原因不能用﹐active 備庫時﹐online redo log 中記錄的沒有archive 的事務記錄將丟失﹐同時也影響你備庫的性能﹐因為備庫現在又面監主庫以前同樣的問題﹐online redo log file swtich 頻繁﹗ |
|