来讨论一个recovery的问题

askgyliu · 发表于 2008-11-29 12:54

原帖由 wangzhonnew 于 2008-11-29 01:48 发表

askgyliu do you have the answer for the above question?
really wonder how oracle handle large object in this case, it will be surprising me if oracle keep SCN even for large object data pages...

印象中没听说ORACLE对BLOB有不同的记录方法，在STORAGE LEVEL。可以放到ORACLE坛上去讨论一下。

Pythagoras · 发表于 2008-12-1 15:22

P520(对DISTRIBUTED DATA而言。大概是相当于DB2 EEE吧？)： Point-in-time recovery (to the last image copy or to a relative byte address (RBA)) presents other challenges. You cannot control a utility in one subsystem from another subsystem. In practice, you cannot quiesce two sets of table spaces, or make image copies of them, in two different subsystems at exactly the same instant. Neither can you recover them to exactly the same instant, because two different logs are involved, and a RBA does not mean the same thing for both of them.
In planning, the best approach is to consider carefully what the QUIESCE, COPY, and RECOVER utilities do for you, and then plan not to place data that must be closely coordinated on separate subsystems. After that, recovery planning is a matter of agreement among database administrators at separate locations.

This paragraph just tells us when making recovery planning, it must be kept in mind that always place application related data into the same subsystem, because it is NOT possible to set consistent point across subsystems.
Not related to our discussion.

Pythagoras · 发表于 2008-12-1 15:29

P521： DB2 can recover a page set by using an image copy or system-level backup, the recovery log, or both.
<这里提到了两种可以用来RECOVER DB2的东西，an image copy or system-level backup。IMAGE COPY也是用DB2 UTILITY创建的？>

IC is created by COPY utility, system-level backup is created by BACKUP SYSTEM utility, it's clear.
If we are talking about backup out of DB2 control (e.g. OS-level backup), we have to use method out of DB2 to recover it, correspondingly. You will not find them in this DB2 admin guide.

Pythagoras · 发表于 2008-12-1 15:35

521/522 有一个DB2 RECOVERY SCENARIO。
P522： create a full image copy by using the MERGECOPY utility to merge the incremental image copy with the full image copy
P522： An unsuccessful write operation occurs and you need to recover the table space. You run the RECOVER utility. The utility restores the table space from the full image copy that was made by MERGECOPY on Wednesday and the incremental image copies that were made on Thursday and Friday, and includes all changes that are made to the recovery log since Friday morning.
P522： To recover a page set, the RECOVER utility typically uses these items:
- A backup that is a full image copy or a system-level backup.
- For table spaces only, when the COPY utility is used, any later incremental image copies; each incremental image copy summarizes all the changes that were made to the table space from the time that the previous image copy was made.
- All log records for the page set that were created since the image copy. If the log has been damaged or discarded, or if data has been changed erroneously and then committed, you can recover to a particular point in time by limiting the range of log records that are to be applied by the RECOVER utility.
<这里的FULL IMAGE COPY就是RECOVERY 的基础来着？而这个FULL IMAGE COPY也是用DB2 COPY这个UTILITY创建出来的(参考DB2 ONLINE UTILITIES - COPY)>

It's a typical recovery scenario, I guess its intent is to introduce and explain incremental copy and MERGECOPY utility. These copies are made by DB2, then recovered by DB2. What we are talking about is backed up by non-DB2, and recovered out of DB2. So, this paragraph, hmmmm... please just forget it.

Pythagoras · 发表于 2008-12-1 15:44

Oops... It's too long to reply paragraph by paragraph.
I decide to run toward the key directly.
There are three methods to backup using FlashCopy for disaster recovery.

1,SET LOG SUSPEND-FlashCopy-SET LOG RESUME.
This method is referenced above by askgyliu. It's a "perfect" (but with too many limits) method to create a consistent copy, combined with DB2 involvements and DISK tech. It requires WRITE I/O suspension, which is an service and availability outage, even though just a few minutes. And, for a critical hot OLTP production, sometimes the outage is too expansive. So, few customers choose to use this method. However, some OLAP and DW customers, like sites running SAP, PEOPLESOFT find its value, because they can tolerate a short outage during getting a consistent copy.
Finally, I have to point out, its original purpose for DR is very weak, unfortunately. You can't forecast the disaster and issue SET LOG command beforehand. If you run this process periodically (e.g. once per day), the data loss (RPO) will be up to 24 hours.

2, BACKUP SYSTEM.
This method will not suspend I/O, except that your CISIZE is not 32K and having 32K page objects. I have talked about this utility above. It is backed up under control of DB2 and recover under control of DB2, but call FlashCopy at the storage level.

3, FlashCopy Version 2 Consistency Group.
With FlashCopy V2 it is now possible to gather a unique global consistent point for all the disk data in a DBMS, without the need to stop or suspend any running application or database (i.e. I/O suspension). This unique consistent point can be used in case of any recovery/restart.
This is what I mentioned "represent a static database image" (a real snapshot) prevously.

At the end of this post, I just want to say modern DISK tech evolution has been changing our modes and methods we used to administrate DBMS.

[ 本帖最后由 Pythagoras 于 2008-12-1 16:57 编辑 ]

askgyliu · 发表于 2008-12-1 21:38

原帖由 Pythagoras 于 2008-12-1 15:44 发表
1,SET LOG SUSPEND-FlashCopy-SET LOG RESUME.

This looks like the "write suspend" in DB2 for LUW, then taking the snapshot at the storage level

原帖由 Pythagoras 于 2008-12-1 15:44 发表
2, BACKUP SYSTEM.

Well, this is a DB2 utility, and db2 involvement is expected, and there is no doubt this should be a valid method of DB backup. So this is out of the discussion since it does not meet Wangzhonnew's criteria.

But again, what does <<Suspend system checkpoints/Prevents data set from pseudo close and all the other "suspend" mean in L67>> really mean? What if someone manually issue some "checkpoint" command to the system while "backup system" is running? Will that sessino have to wait for "backup system" to complete although it may have to wait for only 1 ns?

This more likes "db2 backup database" command in db2 for LUW, although the underneath backup is not be snapclone, there is no need to stop the database, neither IO suspensionis required.

I also notice there is no "backup database" command in DB2 for z/OS.

原帖由 Pythagoras 于 2008-12-1 15:44 发表
3, FlashCopy Version 2 Consistency Group.

This sounds like what I described for Global Mirroring for DB2 for LUW.

And for DB2 for z/OS, "a unique global consistent point for all the disk data in a DBMS" is required. As what I decribed in Global Mirroring, no IO suspension is required in the online/primary database, but it does need some IO suspension in the mirroring site in order to get the consistent db image, and all the transactions are recovered, no data lost at the point of consistency, and there is no extract step required.

http://recoveryspecialties.com/flashcopy.html

"By using the Freeze FlashCopy Consistency Group option, the disk subsystem will hold off I/O activity to a volume for a brief time period by putting the source volume in an extended long busy state. In this way, a window is created during which dependent write updates will not occur and FlashCopy will use that window to obtain a consistent point-in-time copy of the related volumes. I/O activity resumes when a FlashCopy consistency group is created"

Well, certain level of IO suspension is still expected, although it is not at the DB level, and this is different from Wangzhonnew's original question.

Can you confirm this is the only requirement to get a consistent DB image, but no other action required?

Another question about this backup method, can such image be used to "roll forward" the database? Or like in Wangzhonew's original question, can this be used as a base and some delta is applied to get the base to next stage of consistent db image?

<<总之就是不用db2得backup，而是用文件系统的备份工具来备份，然后恢复的时候可以根据文件系统级别的增量备份恢复到某一个特定时间点>>

<<L22 However, this process is NOT regular and can NOT be standardized and processed fixedly>>

And, if this is a valid and working solution, I don't see why IBM will not just give an firm answer. We have been implementing Global Mirroring for DB2 for LUW, Global Mirroring for Oracle for DR site, and we have been cloning DB2 database using Global Mirroring all the time without affecting the primary database. And of course, we are also using "write suspend" and FlashCopy to take our daily DB backup.

I find L22 is very much contradictory with method 3.

<<point-in-time得FS image有一个很大的问题，也就是它可以在文件系统的页级别保证一致性，但是无法保证数据库的页级别一致性。>>

I fully agree with this comment. And after reading all the replies, I find it to be more agreeable.

<<因为从数据库页的header看来该页已经被写入了，但是实际上真实写入的只有前面的几个page>> Status consistency. After thinking a lot, this is the only phrase I can think of to describe this requirement.

<<if a database only contain tablespaces with 4k pagesize, is it possible to do flashcopy>> DB can have pagesize=4/8/16/32k

<<that is interesting to know db2 for z support this kind of recovery strategy don't know when this kind of feature can be port to LUW>> Global Mirroring, it is already there, not just for DB2, and also for Oracle.

<<L46 DB2-trigger flashcopy>> I assume this is meant by "db2 backup system", which is out of the discussion now since it does not meet Wangzhonnew's original criteria?

When we are doing OS system backup, the FS can be in continuous change while we are backing up the files, and that copy FS backup CAN'T be used for DB restoration purpose.

To tackle the continous-changing issue, Oracle uses a baseline method, i.e. record a starting time (SCN), then record all the very original copy of the data and the change vector if the block is to be changed; while DB2 uses a more violent way of IO suspension to ensure a consistent copy of the database image.

No doubt the backup and recovery method has evolved over the years, but the principle remains.

And this is a good discussion or eye-openning session.

askgyliu · 发表于 2008-12-1 21:47

BTW, I know nothing about DB2 for z/OS, so whatever comment I made about DB2 for z/OS might be incorrect, or the terms I have been using for DB2 for z/OS may not be accurate.

Pythagoras · 发表于 2008-12-1 22:40

"By using the Freeze FlashCopy Consistency Group option, the disk subsystem will hold off I/O activity to a volume for a brief time period by putting the source volume in an extended long busy state. In this way, a window is created during which dependent write updates will not occur and FlashCopy will use that window to obtain a consistent point-in-time copy of the related volumes. I/O activity resumes when a FlashCopy consistency group is created"

Well, certain level of IO suspension is still expected, although it is not at the DB level, and this is different from Wangzhonnew's original question.

The quoted description of FlashCopy Consistency Group is correct. But the I/O activity holder is disk subsystem, NOT database subsystem and operating system. In other words, it is transparent to OS and DBMS. So-called, "certain level of IO suspension is still expected" is also absolutely correct, because I/O is always discrete from the time-slot point of view, this kind of IO suspension is always existed.

Can you confirm this is the only requirement to get a consistent DB image, but no other action required?

We have been using this method to get a full set of replicated data for years, from production OLTP, never with an I/O suspension outage. The recovery process is one part of DR solutions, what need to do is to restart DB2. This process has been standardized in our site, but different site leads to different process, the reason "why IBM will not just give an firm answer".

Another question about this backup method, can such image be used to "roll forward" the database? Or like in Wangzhonew's original question, can this be used as a base and some delta is applied to get the base to next stage of consistent db image?

My answer is YES. You can use RECOVER LOGONLY option to apply LOG after restore the copy. I have mentioned previously.

And, if this is a valid and working solution, I don't see why IBM will not just give an firm answer. We have been implementing Global Mirroring for DB2 for LUW, Global Mirroring for Oracle for DR site, and we have been cloning DB2 database using Global Mirroring all the time without affecting the primary database. And of course, we are also using "write suspend" and FlashCopy to take our daily DB backup.

FlashCopy, also known as IBM TotalStorage FlashCopy.
Asynchronous PPRC, also known as IBM TotalStorage Global Mirror.
XRC, also known as IBM TotalStorage z/OS Global Mirror.

So, I think what you are using is Asynchronous PPRC(PPRC-ASYN, or Global Mirror PPRC). It applies to both open platform and z/OS, based on PPRC-XD and FlashCopy and periodic creation of Consistency Groups across one or more ESS boxes. It is designed to provide a long-distance remote copy solution across two sites using asynchronous technology, and provide a consistent and restartable copy of the data at the remote site, created with minimal impact to applications at the local site. Asynchronous PPRC eliminates the requirement to do a manual and periodic suspend at the local site order to create a consistent and restartable copy at the remote site. The lags between the primary and secondary site is typically less than 10 seconds, minimizing the amount of data exposure in the event of an unplanned outage/disaster.

A typical Global Mirror PPRC DR solution is:
Your DBMS running in your local site, the disks (the first set) are PPRC Primary.
Thru PPRC-XD, the data in the local site (PPRC Primary) is asynchronously copied to PPRC Secondary disks(the second set), which residing in the remote site, and PPRC Secondary is also used as FlashCopy source.
And, in the remote site, you have FlashCopy taget disks(the third set).
The key point for DR structure is:
1.Automatically and periodically create Consistency Group of volumes in the local site.
2.Send incremental of consistent data to the remote site.
3.FlashCopy in the remote site.
4.Recycle from step 1 after a user-defined time period.

When unplanned outage/disaster happens:
1. TERMINATE SESSION to stop creating Consistency Groups.
2. FAILOVER to the PPRC Secondary disks at the remote site.
3. Establish a FlashCopy from the PPRC Secondary to the FlashCopy target disks. This is a safe copy.
4. Put the PPRC Secondary disks online at the remote site, start applications.

If "write suspend" dose not impact your SLA and you can tolerate a short time outage, write suspend+FlashCopy is OK for daily DB backup, and this copy should be offline (or clean) copy, assume you ensure the data in buffers is also copied. If not, this copy is still a fuzzy copy, and no difference from direct FlashCopy without write suspend. We still have methods to recover this fuzzy copy to a consisten point, i.e. the fuzzy copy made by FlashCopy is still recoverable.

[ 本帖最后由 Pythagoras 于 2008-12-2 14:54 编辑 ]

hank_2008 · 发表于 2008-12-2 04:04

原帖由 wangzhonnew 于 2008-11-29 10:08 发表

write suspend
that will flush dirty pages to bufferpool and suspend all write requests

yes, write suspend is a option in some not too busy system, but for a large number of OLTP system there are not allow to suspend database.
will have more answer too, what are they?

wangzhonnew · 发表于 2008-12-2 05:25

then online backup

that will not need any io suspension

[精华] 来讨论一个recovery的问题

浏览过的版块