来讨论一个recovery的问题

macrozeng · 发表于 2008-11-11 13:18

哈哈，文件系统级别的备份，用户少拷一个文件，或者拷的过程中出了点问题，文件不完整，数据库完全不可控，（即使是使用 flashcopy 技术，各个厂家都有各个厂家的产品和解决方案，测试也是个巨大的工作）然后拿这些给 IBM 说:"帮我恢复吧” 你说 IBM 能愿意吗？反观 backup 命令，一个命令搞定一切，还有 db2ckbkp 等工具来检查备份的可用性。对于数据库来说，备份太重要了，这么重要的事情不确定的因素是越少越好

Pythagoras · 发表于 2008-11-12 10:23

So, this is an interesting topic, and NOT a short story.

Given a complete and consistent set of files copy (including data files, log files, control info, etc.), an experienced DBA could recover database to a point-in-time (PIT, which related to the timepoint when the copy taken) or even current.
However, this process is NOT regular and can NOT be standardized and processed fixedly. Generally, this process needs kinds of manual determinations and operations which is not a simple command.

And, there are two important points need to be kept in mind. The first, are locks, which protect your online transactions. Locks cannot be represented by files in database, nor the transactions. So, a set of files copy can NOT represent transactions that are crossing thru the database (or, which are being processed in database). In other words, a FLASHCOPY or SNAPSHOT represents a static database image but nothing about transactions. Of course, log does make it. And you need to rebuild locks and current status to recover the transactions (detemine to commit or abort or backout) from logs, unfortunate, manually. And the second, is that the files copy made by OS is NOT known to database. This is relatively simple but still a challenge. Things like versioning of copy, coupling of data files and log files are complex.

Finally, a lot of efforts have been made by database vendors, based on development of new DISK technology. For example, BACKUP/RESTORE SYSTEM utilities introduced since DB2 for z/OS V8, and improved in DB2 9.

But, anyway, it is not yet an easy promise that recovery from files copy, which is too difficult for database vendors.
Keep Things Stupidly Simple, a golden rule of marketing and also technology.

[ 本帖最后由 Pythagoras 于 2008-11-13 12:41 编辑 ]

wangzhonnew · 发表于 2008-11-12 10:50

楼上说的很有道理。不过根据俺的理解，point-in-time得FS image有一个很大的问题，也就是它可以在文件系统的页级别保证一致性，但是无法保证数据库的页级别一致性。
一个数据库的data page可以是4k,8k,16k,32k得，但是文件系统的页大小基本都是4k得。也就是在某一个point-in-time，有可能一个数据库的页只有一部分被刷新到磁盘上（比如只是前两个4k页），然后FS得consistency-group就会在文件系统级别一致，但是在数据库级别不一致……而且这种东西不能用crash recovery来恢复，因为从数据库页的header看来该页已经被写入了，但是实际上真实写入的只有前面的几个page……

这个不同于server crash。在server crash得时候，如果一个写请求已经完整地发送到SAN，那么SAN肯定会把完整的数据写进去。如果请求发送到一半数据库server就down了，那么这个tcpip得连接就会中断而SAN也不会去写那个发送到一半的请求。所以对于server crash来说，crash recovery是可以保证数据库页级别的一致性的……但是如果SAN断电了，那就自求多福吧……

这个是俺的理解，不知道有没有什么地方不大妥当，欢迎指出

Pythagoras · 发表于 2008-11-12 11:30

Yes, the point you mentioned above is what I forgot.

This WAS a problem, for DB2 for z/OS, before V8.

Since V8, you have a system parameter which controls your size of CI, (CI, just like data block for LUW platform), and this makes it possible that your CI size=PAGE size (even PAGE size >4K). So, this improvement eliminates the partial write problem where 32K page crosses device boundary (for z/OS, it was 16K block size uesd). And it's OK for FLASHCOPY, whatever 4K,8K,16K,32K page size.

Things may be different for non System z platform.

[ 本帖最后由 Pythagoras 于 2008-11-12 11:32 编辑 ]

wangzhonnew · 发表于 2008-11-12 11:57

right

in most unix system the default page size is 4k, of course users can set it to 32k as large page size, but not every does that

so the next question that i really don't know the answer is, if a database only contain tablespaces with 4k pagesize, is it possible to do flashcopy (of course with consistency-group) without stopping the database? ^_^

Pythagoras · 发表于 2008-11-12 12:25

What about DR recovery? It is of course possible, however with a little bit of data loss. In fact, you would recover to the PIT when the consistent-group completed, NOT exactly the PIT when crash/disaster occured. But, data consistency is assured.

[ 本帖最后由 Pythagoras 于 2008-11-12 12:32 编辑 ]

macrozeng · 发表于 2008-11-12 13:27

讨论越来越精彩了，授精了，呵呵

wangzhonnew · 发表于 2008-11-12 19:10

原帖由 Pythagoras 于 2008-11-12 13:25 发表
What about DR recovery? It is of course possible, however with a little bit of data loss. In fact, you would recover to the PIT when the consistent-group completed, NOT exactly the PIT when crash/disaster occured. But, data consistency is assured.

yeah, similar idea with the server crash, DR will guarantee the data consistency at database level, because an incomplete data package won't be flush to disk so there won't be any partial write on standby site

askgyliu · 发表于 2008-11-18 00:23

想想IBM的GLOBAL MIRRORING是怎样做的，大概可以比较清楚些。

FS-LEVEL的COPY/CLONE，普通STORAGE-LEVEL的CLONE等等，都没法保证DB-LEVEL TRANSACTION CONSISTENCY。

wangzhonnew · 发表于 2008-11-18 03:35

原帖由 askgyliu 于 2008-11-18 01:23 发表
想想IBM的GLOBAL MIRRORING是怎样做的，大概可以比较清楚些。

FS-LEVEL的COPY/CLONE，普通STORAGE-LEVEL的CLONE等等，都没法保证DB-LEVEL TRANSACTION CONSISTENCY。

actually i think if there's some way to prevent FS consistency-group to partially write database page, and flush all OS FS cache to disk (such like DIO for everything), it should be okay for crash recovery to recovery the database.
because crash recovery is just go through each record from min(minbufLSN,lowtranLSN), that means with DIO, all pages below minbufLSN are on disk. So if FS consistency-group flush copy is able to guarantee there's no db2 page partial write (for example, all pages are 4k based), it should be fine to use FS consistency-group flash copy to backup the database and RESTART DATABASE to start crash recovery....

but anyway, this approach is not supported, just thinking

[精华] 来讨论一个recovery的问题

浏览过的版块