|
当客户restore完毕然后作rollforward to end of logs and complete时一个很小的数据库3个小时也没有完成。
于是我们做
db2 list utilities show detail发现rollforward卡在了backward phase:
db2fsprd scurs10>db2 list utilities show detail
ID = 74887
Type = ROLLFORWARD RECOVERY
Database Name = FSDEV89
Partition Number = 0
Description = Database Rollforward Recovery
Start Time = 05/05/2006 07:14:00.497923
Progress Monitoring:
Estimated Percentage Complete = 100
Phase Number = 1
Description = Forward
Total Work = 3611153565 bytes
Completed Work = 3611153565 bytes
Start Time = 05/05/2006 07:14:00.497927
Phase Number [Current] = 2
Description = Backward
Total Work = 47 bytes
Completed Work = 0 bytes[/COLOR]
Start Time = 05/05/2006 07:24:18.934889
因此,为了看清楚到底发生了什么事情,我们应该对这个进程作kill -36。
但是db2 list applications for <database>显示没有连接,这里我们可以得出结论,程序确实是卡在某一个地方了。
然后检查db2diag.log看到进程827428开始了rollforward却一直没有停止,于是我们应该做kill -36 827428。
做了三次kill -36,得到三个trap file,以下是stack trace:
stack1:
*** Start stack traceback ***
0xD033591C read + 0x1A8
0xD2A5E34C sqloread + 0x84
0xD19A49DC
sqluhReadEntry__F12SQLO_FHANDLEP11SQLUH_ENTRYP14SQLUH_WORKAREAPcN34PUl +
0x534
0xD19A7DA0 sqluhUpdate + 0x75C
0xD19A0048 sqlpCleanHistFile__FP9SQLP_DBCBUl + 0x784
0xD19B188C sqlpForwardRecovery__FP20sqle_agent_privatecbUsPcPiPlT5PUlT2
+ 0x60C
0xD19ABF18
sqlufrol__FP20sqle_agent_privatecbUsT2PcP11sqlurf_infoT2PiT4P17SQLB_POOL
_ID_LISTPUlN34P5sqlca + 0x8BC
0xD233D1F4
db2RollforwardRouteIn__FPcUlT2T1P26sqlu_tablespace_bkrst_listlP17SQLB_PO
OL_ID_LISTN22P11sqlurf_infoPUcT1N22T1T2T1T2P17sqlurf_newlogpathT1PlT21_T
1P13sqle_agent_cbP5sqlca + 0xBC0
0xD233D8DC db2RollforwardDRDARouteIn__FP5sqldaT1P13sqle_agent_cbP5sqlca
+ 0x41C
0xD2E69248
sqlerKnownProcedure__FlPcPlP5sqldaT4P13sqlerFmpTableP13sqle_agent_cbP5sq
lca + 0xA20
0xD2E6ED5C sqlerCallDL__FP7UCintfcP9UCstpInfo + 0x848
0xD28C6024 sqljs_ddm_excsqlstt__FP7UCintfcP14sqljsDDMObject + 0x470
0xD28BA63C
sqljsParseRdbAccessed__FP13sqljsDrdaAsCbP14sqljsDDMObjectP7UCintfc +
0x58
0xD28BA4DC sqljsParse__FP13sqljsDrdaAsCbP7UCintfc + 0x27C
0xD2AE0038 sqljsSqlam__FP7UCintfcP13sqle_agent_cbb + 0x138
0xD2AE05D4 sqljsDriveRequests__FP13sqle_agent_cbP11UCconHandle + 0x88
0xD2AE0478 sqljsDrdaAsInnerDriver__FP17sqlcc_init_structb + 0xB0
0xD2AE024C sqljsDrdaAsDriver__FP17sqlcc_init_struct + 0x84
0xD29F7284 sqleRunAgent__FPcUl + 0x2D4
0xD2CD8BA0 sqloCreateEDU__FPFPcUl_vPcUlP13SQLO_EDU_INFOPl + 0x198
0xD2CD87D4 sqloRunGDS__Fv + 0xA0
0xD2CD91F8 sqloInitEDUServices + 0x144
0xD2CC7F58 sqloRunInstance + 0x47C
0x100028EC DB2main + 0x8A4
0x10003470 main + 0xC
stack2:
*** Start stack traceback ***
0xD03331F0 lseek64 + 0x14
0xD2A5F29C sqloseek + 0x184
0xD19A45B8
sqluhReadEntry__F12SQLO_FHANDLEP11SQLUH_ENTRYP14SQLUH_WORKAREAPcN34PUl +
0x110
0xD19A7DA0 sqluhUpdate + 0x75C
0xD19A0048 sqlpCleanHistFile__FP9SQLP_DBCBUl + 0x784
0xD19B188C sqlpForwardRecovery__FP20sqle_agent_privatecbUsPcPiPlT5PUlT2
+ 0x60C
0xD19ABF18
sqlufrol__FP20sqle_agent_privatecbUsT2PcP11sqlurf_infoT2PiT4P17SQLB_POOL
_ID_LISTPUlN34P5sqlca + 0x8BC
0xD233D1F4
db2RollforwardRouteIn__FPcUlT2T1P26sqlu_tablespace_bkrst_listlP17SQLB_PO
OL_ID_LISTN22P11sqlurf_infoPUcT1N22T1T2T1T2P17sqlurf_newlogpathT1PlT21_T
1P13sqle_agent_cbP5sqlca + 0xBC0
0xD233D8DC db2RollforwardDRDARouteIn__FP5sqldaT1P13sqle_agent_cbP5sqlca
+ 0x41C
0xD2E69248
sqlerKnownProcedure__FlPcPlP5sqldaT4P13sqlerFmpTableP13sqle_agent_cbP5sq
lca + 0xA20
0xD2E6ED5C sqlerCallDL__FP7UCintfcP9UCstpInfo + 0x848
0xD28C6024 sqljs_ddm_excsqlstt__FP7UCintfcP14sqljsDDMObject + 0x470
0xD28BA63C
sqljsParseRdbAccessed__FP13sqljsDrdaAsCbP14sqljsDDMObjectP7UCintfc +
0x58
0xD28BA4DC sqljsParse__FP13sqljsDrdaAsCbP7UCintfc + 0x27C
0xD2AE0038 sqljsSqlam__FP7UCintfcP13sqle_agent_cbb + 0x138
0xD2AE05D4 sqljsDriveRequests__FP13sqle_agent_cbP11UCconHandle + 0x88
0xD2AE0478 sqljsDrdaAsInnerDriver__FP17sqlcc_init_structb + 0xB0
0xD2AE024C sqljsDrdaAsDriver__FP17sqlcc_init_struct + 0x84
0xD29F7284 sqleRunAgent__FPcUl + 0x2D4
0xD2CD8BA0 sqloCreateEDU__FPFPcUl_vPcUlP13SQLO_EDU_INFOPl + 0x198
0xD2CD87D4 sqloRunGDS__Fv + 0xA0
0xD2CD91F8 sqloInitEDUServices + 0x144
0xD2CC7F58 sqloRunInstance + 0x47C
0x100028EC DB2main + 0x8A4
0x10003470 main + 0xC
*** End stack traceback ***
stack3:
*** Start stack traceback ***
0xD03331F0 lseek64 + 0x14
0xD2A5F29C sqloseek + 0x184
0xD19A45B8
sqluhReadEntry__F12SQLO_FHANDLEP11SQLUH_ENTRYP14SQLUH_WORKAREAPcN34PUl +
0x110
0xD19A7DA0 sqluhUpdate + 0x75C
0xD19A0048 sqlpCleanHistFile__FP9SQLP_DBCBUl + 0x784
0xD19B188C sqlpForwardRecovery__FP20sqle_agent_privatecbUsPcPiPlT5PUlT2
+ 0x60C
0xD19ABF18
sqlufrol__FP20sqle_agent_privatecbUsT2PcP11sqlurf_infoT2PiT4P17SQLB_POOL
_ID_LISTPUlN34P5sqlca + 0x8BC
0xD233D1F4
db2RollforwardRouteIn__FPcUlT2T1P26sqlu_tablespace_bkrst_listlP17SQLB_PO
OL_ID_LISTN22P11sqlurf_infoPUcT1N22T1T2T1T2P17sqlurf_newlogpathT1PlT21_T
1P13sqle_agent_cbP5sqlca + 0xBC0
0xD233D8DC db2RollforwardDRDARouteIn__FP5sqldaT1P13sqle_agent_cbP5sqlca
+ 0x41C
0xD2E69248
sqlerKnownProcedure__FlPcPlP5sqldaT4P13sqlerFmpTableP13sqle_agent_cbP5sq
lca + 0xA20
0xD2E6ED5C sqlerCallDL__FP7UCintfcP9UCstpInfo + 0x848
0xD28C6024 sqljs_ddm_excsqlstt__FP7UCintfcP14sqljsDDMObject + 0x470
0xD28BA63C
sqljsParseRdbAccessed__FP13sqljsDrdaAsCbP14sqljsDDMObjectP7UCintfc +
0x58
0xD28BA4DC sqljsParse__FP13sqljsDrdaAsCbP7UCintfc + 0x27C
0xD2AE0038 sqljsSqlam__FP7UCintfcP13sqle_agent_cbb + 0x138
0xD2AE05D4 sqljsDriveRequests__FP13sqle_agent_cbP11UCconHandle + 0x88
0xD2AE0478 sqljsDrdaAsInnerDriver__FP17sqlcc_init_structb + 0xB0
0xD2AE024C sqljsDrdaAsDriver__FP17sqlcc_init_struct + 0x84
0xD29F7284 sqleRunAgent__FPcUl + 0x2D4
0xD2CD8BA0 sqloCreateEDU__FPFPcUl_vPcUlP13SQLO_EDU_INFOPl + 0x198
0xD2CD87D4 sqloRunGDS__Fv + 0xA0
0xD2CD91F8 sqloInitEDUServices + 0x144
0xD2CC7F58 sqloRunInstance + 0x47C
0x100028EC DB2main + 0x8A4
0x10003470 main + 0xC
*** End stack traceback ***
从这里很显然我们可以看出所有的进程都卡在sqlpCleanHistFile,而且在一直循环:
第一个stack trace我们有
0xD19A49DC
sqluhReadEntry__F12SQLO_FHANDLEP11SQLUH_ENTRYP14SQLUH_WORKAREAPcN34PUl +
0x534
第二个却是:
0xD19A45B8
sqluhReadEntry__F12SQLO_FHANDLEP11SQLUH_ENTRYP14SQLUH_WORKAREAPcN34PUl +
0x110
从这里我们可以看出0x534>0x110,但是第一个却是在第二个之前运行,因此可以说明程序卡在了这个函数中,或者卡在了上一层.
知道了这点我们已经可以猜到结果了,就是由于history file corruption.
因此,告诉客户重新restore,然后删掉db2rhist.asc db2rhist.bak,然后重新rollforward一次成功.... |
|