联系:手机/微信(+86 17813235971) QQ(107644445)
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
悲剧的客户因为IBM p系列小机更换电源导致主机直接掉电,起来后发现数据库出现不少坏块,而且还有部分坏块中含有回滚事务,导致alert日志一直报smon回滚遇到坏块错误,该数据库版本是9.2.0.8 RAC,根据客户的备份情况,为了减少对业务的影响,决定使用blockrecover对其处理.这里通过10g数据库大概模拟出现含事务坏块的情况以及处理过程,重现了我们在处理的时候不确定的一些知识.
创建测试表
SQL> select * from v$version; BANNER ---------------------------------------------------------------- Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Prod PL/SQL Release 10.2.0.4.0 - Production CORE 10.2.0.4.0 Production TNS for Linux: Version 10.2.0.4.0 - Production NLSRTL Version 10.2.0.4.0 - Production SQL> conn /as sysdba Connected. SQL> archive log list; Database log mode Archive Mode Automatic archival Enabled Archive destination /u01/oracle/oradata/XFF/archivelog Oldest online log sequence 231 Next log sequence to archive 233 Current log sequence 233 SQL> conn chf/xifenfei Connected. SQL> create table t_xifenfei 2 as 3 select * from dba_objects where rownum<10; Table created. SQL> select rowid, 2 dbms_rowid.rowid_relative_fno(rowid) rel_fno, 3 dbms_rowid.rowid_block_number(rowid) block 4 from chf.t_xifenfei; ROWID REL_FNO BLOCK ------------------ ---------- ---------- AAANIqAAEAAAAAcAAA 4 28 AAANIqAAEAAAAAcAAB 4 28 AAANIqAAEAAAAAcAAC 4 28 AAANIqAAEAAAAAcAAD 4 28 AAANIqAAEAAAAAcAAE 4 28 AAANIqAAEAAAAAcAAF 4 28 AAANIqAAEAAAAAcAAG 4 28 AAANIqAAEAAAAAcAAH 4 28 AAANIqAAEAAAAAcAAI 4 28 9 rows selected.
当前的seq是233(也就是说我在233归档上创建了t_xienfei表)
dbv检查block
[oracle@xifenfei ~]$ dbv file='/u01/oracle/oradata/XFF/users01.dbf' DBVERIFY: Release 10.2.0.4.0 - Production on Sun Jan 20 22:16:16 2013 Copyright (c) 1982, 2007, Oracle. All rights reserved. DBVERIFY - Verification starting : FILE = /u01/oracle/oradata/XFF/users01.dbf DBVERIFY - Verification complete Total Pages Examined : 1280 Total Pages Processed (Data) : 904 Total Pages Failing (Data) : 0 Total Pages Processed (Index): 38 Total Pages Failing (Index): 0 Total Pages Processed (Other): 44 Total Pages Processed (Seg) : 0 Total Pages Failing (Seg) : 0 Total Pages Empty : 294 Total Pages Marked Corrupt : 0 Total Pages Influx : 0 Highest block SCN : 3224018224 (2860.3224018224)
证明无任何坏块
切换归档
SQL> alter system switch logfile; System altered. SQL> / System altered. SQL> archive log list; Database log mode Archive Mode Automatic archival Enabled Archive destination /u01/oracle/oradata/XFF/archivelog Oldest online log sequence 233 Next log sequence to archive 235 Current log sequence 235
现在已经切换seq到235
rman备份我们需要测试block(file 4 block 28)对应的数据文件
RMAN> backup datafile 4 format '/u01/oracle/oradata/xff_4.rman';
具体见:rman制造坏块,bbed修复坏块
模拟数据库进行其他操作
SQL> alter system switch logfile; System altered. SQL> / System altered. SQL> archive log list; Database log mode Archive Mode Automatic archival Enabled Archive destination /u01/oracle/oradata/XFF/archivelog Oldest online log sequence 235 Next log sequence to archive 237 Current log sequence 237 SQL> conn chf/xifenfei Connected. SQL> create table t_xff 2 as 3 select * from dba_objects; Table created. SQL> alter system switch logfile; System altered.
主要是为了模拟对其他block操作,对于block 28的恢复影响
对block 28进行操作
SQL> archive log list; Database log mode Archive Mode Automatic archival Enabled Archive destination /u01/oracle/oradata/XFF/archivelog Oldest online log sequence 237 Next log sequence to archive 239 Current log sequence 239 SQL> update chf.t_xifenfei set object_name='www.xifenfei.com'; 9 rows updated. SQL> commit; Commit complete. SQL> alter system switch logfile; System altered. SQL> / System altered. SQL> / System altered.
在seq为239的时候对block 28进行了一次update操作
模拟其他业务操作
SQL> archive log list; Database log mode Archive Mode Automatic archival Enabled Archive destination /u01/oracle/oradata/XFF/archivelog Oldest online log sequence 240 Next log sequence to archive 242 Current log sequence 242 SQL> delete from chf.t_xff ; 50491 rows deleted. SQL> commit; Commit complete. SQL> alter system switch logfile; System altered. SQL> / System altered. SQL> archive log list; Database log mode Archive Mode Automatic archival Enabled Archive destination /u01/oracle/oradata/XFF/archivelog Oldest online log sequence 242 Next log sequence to archive 244 Current log sequence 244 SQL> alter system switch logfile; System altered. SQL> / System altered.
这里可以知道在seq为246的时候做了备份归档操作
备份归档操作
RMAN> backup archivelog all format '/u01/oracle/oradata/xff_arch_%U' delete input;
模拟继续操作
SQL> alter system switch logfile; System altered. SQL> / System altered. SQL> archive log list; Database log mode Archive Mode Automatic archival Enabled Archive destination /u01/oracle/oradata/XFF/archivelog Oldest online log sequence 248 Next log sequence to archive 250 Current log sequence 250 SQL> update chf.t_xifenfei set object_name='www.orasos.com' where rownum<5; 4 rows updated.
这里可以发现,在seq为250的时候我们再次对block 28进行了操作
使用rman制造坏块
RMAN> BLOCKRECOVER DATAFILE 4 block 28 clear;
dbv检查坏块
[oracle@xifenfei ~]$ dbv file='/u01/oracle/oradata/XFF/users01.dbf' DBVERIFY: Release 10.2.0.4.0 - Production on Sun Jan 20 23:01:24 2013 Copyright (c) 1982, 2007, Oracle. All rights reserved. DBVERIFY - Verification starting : FILE = /u01/oracle/oradata/XFF/users01.dbf Page 28 is influx - most likely media corrupt Corrupt block relative dba: 0x0100001c (file 4, block 28) Fractured block found during dbv: Data in bad block: type: 6 format: 2 rdba: 0x0100001c last change scn: 0x0b2c.c02ab081 seq: 0x1 flg: 0x04 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0xb0c4a6ea check value in block header: 0x393f computed block checksum: 0xc917 DBVERIFY - Verification complete Total Pages Examined : 1280 Total Pages Processed (Data) : 903 Total Pages Failing (Data) : 0 Total Pages Processed (Index): 38 Total Pages Failing (Index): 0 Total Pages Processed (Other): 47 Total Pages Processed (Seg) : 0 Total Pages Failing (Seg) : 0 Total Pages Empty : 291 Total Pages Marked Corrupt : 1 Total Pages Influx : 1 Highest block SCN : 3224022228 (2860.3224022228)
强制kill数据库
[oracle@xifenfei ~]$ ps -ef|grep pmon oracle 9744 9638 0 23:03 pts/1 00:00:00 grep pmon oracle 32156 1 0 14:17 ? 00:00:10 ora_pmon_XFF [oracle@xifenfei ~]$ kill -9 32156 [oracle@xifenfei ~]$ ps -ef|grep pmon oracle 9751 9638 0 23:03 pts/1 00:00:00 grep pmon
为了模拟含事务的block出现坏块
启动数据库
SQL> startup mount ORACLE instance started. Total System Global Area 306184192 bytes Fixed Size 1267164 bytes Variable Size 109054500 bytes Database Buffers 188743680 bytes Redo Buffers 7118848 bytes Database mounted. SQL> ALTER DATABASE OPEN; Database altered.
数据库启动正常
查询坏块
SQL> select count(*) from chf.t_xifenfei; select count(*) from chf.t_xifenfei * ERROR at line 1: ORA-01578: ORACLE data block corrupted (file # 4, block # 28) ORA-01110: data file 4: '/u01/oracle/oradata/XFF/users01.dbf'
alert日志
Sun Jan 20 23:04:37 2013 SMON: enabling tx recovery Sun Jan 20 23:04:37 2013 Database Characterset is ZHS16GBK Sun Jan 20 23:04:37 2013 Hex dump of (file 4, block 28) in trace file /u01/oracle/admin/XFF/bdump/xff_smon_9775.trc Corrupt block relative dba: 0x0100001c (file 4, block 28) Fractured block found during buffer read Data in bad block: type: 6 format: 2 rdba: 0x0100001c last change scn: 0x0b2c.c02ab081 seq: 0x1 flg: 0x04 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0xb0c4a6ea check value in block header: 0x393f computed block checksum: 0xc917 Reread of rdba: 0x0100001c (file 4, block 28) found same corrupted data ORACLE Instance XFF (pid = 8) - Error 1578 encountered while recovering transaction (9, 37) on object 53802. Sun Jan 20 23:04:38 2013 Errors in file /u01/oracle/admin/XFF/bdump/xff_smon_9775.trc: ORA-01578: ORACLE data block corrupted (file # 4, block # 28) ORA-01110: data file 4: '/u01/oracle/oradata/XFF/users01.dbf'
通过试验步骤和alert日志可以发现因为block有事务,但是被标记为了坏块,所以smon无法回滚该事务,从而出现alert中类似提示
继续切换归档
SQL> alter system switch logfile; System altered. SQL> / System altered. SQL> / System altered. SQL> / System altered. SQL> archive log list; Database log mode Archive Mode Automatic archival Enabled Archive destination /u01/oracle/oradata/XFF/archivelog Oldest online log sequence 252 Next log sequence to archive 254 Current log sequence 254
移走归档
[oracle@xifenfei archivelog]$ ls -l total 2224 -rw-r----- 1 oracle oinstall 360960 Jan 20 22:59 1_247_792679299.dbf -rw-r----- 1 oracle oinstall 1024 Jan 20 22:59 1_248_792679299.dbf -rw-r----- 1 oracle oinstall 1630208 Jan 20 23:04 1_249_792679299.dbf -rw-r----- 1 oracle oinstall 249344 Jan 20 23:09 1_250_792679299.dbf -rw-r----- 1 oracle oinstall 1024 Jan 20 23:09 1_251_792679299.dbf -rw-r----- 1 oracle oinstall 4608 Jan 20 23:09 1_252_792679299.dbf -rw-r----- 1 oracle oinstall 1024 Jan 20 23:09 1_253_792679299.dbf [oracle@xifenfei archivelog]$ mkdir bak [oracle@xifenfei archivelog]$ mv *.dbf bak [oracle@xifenfei archivelog]$ ll total 4 drwxr-xr-x 2 oracle oinstall 4096 Jan 20 23:11 bak
为了重现,当我们使用blockrecover恢复的时候,如果缺少归档会怎么样
尝试blockrecover
RMAN> BLOCKRECOVER DATAFILE 4 block 28; Starting blockrecover at 20-JAN-13 using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: sid=135 devtype=DISK allocated channel: ORA_DISK_2 channel ORA_DISK_2: sid=126 devtype=DISK allocated channel: ORA_DISK_3 channel ORA_DISK_3: sid=125 devtype=DISK allocated channel: ORA_DISK_4 channel ORA_DISK_4: sid=124 devtype=DISK channel ORA_DISK_1: restoring block(s) channel ORA_DISK_1: specifying block(s) to restore from backup set restoring blocks of datafile 00004 channel ORA_DISK_1: reading from backup piece /u01/oracle/oradata/xff_4.rman channel ORA_DISK_1: restored block(s) from backup piece 1 piece handle=/u01/oracle/oradata/xff_4.rman tag=TAG20130120T222333 channel ORA_DISK_1: block restore complete, elapsed time: 00:00:01 starting media recovery RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of blockrecover command at 01/20/2013 23:11:41 RMAN-06053: unable to perform media recovery because of missing log RMAN-06025: no backup of log thread 1 seq 253 lowscn 12286830510311 found to restore RMAN-06025: no backup of log thread 1 seq 252 lowscn 12286830510307 found to restore RMAN-06025: no backup of log thread 1 seq 251 lowscn 12286830510305 found to restore RMAN-06025: no backup of log thread 1 seq 250 lowscn 12286830509979 found to restore RMAN-06025: no backup of log thread 1 seq 249 lowscn 12286830489543 found to restore RMAN-06025: no backup of log thread 1 seq 248 lowscn 12286830489541 found to restore RMAN-06025: no backup of log thread 1 seq 247 lowscn 12286830489279 found to restore
alert日志记录
Sun Jan 20 23:11:38 2013 alter database recover datafile list clear Sun Jan 20 23:11:38 2013 Completed: alter database recover datafile list clear Sun Jan 20 23:11:38 2013 Starting block media recovery Sun Jan 20 23:11:39 2013 Media Recovery Log /u01/oracle/oradata/XFF/archivelog/1_235_792679299.dbf Sun Jan 20 23:11:41 2013 alter database recover cancel Sun Jan 20 23:11:41 2013 Media Recovery Canceled Completed: alter database recover cancel
blockrecover恢复途中或者异常终止,dbv检测
[oracle@xifenfei ~]$ dbv file=/u01/oracle/oradata/XFF/users01.dbf DBVERIFY: Release 10.2.0.4.0 - Production on Sun Jan 20 23:18:29 2013 Copyright (c) 1982, 2007, Oracle. All rights reserved. DBVERIFY - Verification starting : FILE = /u01/oracle/oradata/XFF/users01.dbf Page 28 is marked corrupt Corrupt block relative dba: 0x0100001c (file 4, block 28) Bad check value found during dbv: Data in bad block: type: 6 format: 2 rdba: 0x0100001c last change scn: 0x0b2c.c02b0248 seq: 0x1 flg: 0x04 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0x02480601 check value in block header: 0x13fc computed block checksum: 0x663b DBVERIFY - Verification complete Total Pages Examined : 1280 Total Pages Processed (Data) : 903 Total Pages Failing (Data) : 0 Total Pages Processed (Index): 38 Total Pages Failing (Index): 0 Total Pages Processed (Other): 47 Total Pages Processed (Seg) : 0 Total Pages Failing (Seg) : 0 Total Pages Empty : 291 Total Pages Marked Corrupt : 1 Total Pages Influx : 0 Highest block SCN : 3224022228 (2860.3224022228)
在blockrecover处理过程,或者处理失败的后,block依然是坏块,如果遇到这类情况,没有事务可以直接指定seq/scn/time的方法来恢复,如果有事务,需要指定恢复时间点过该事务的时间点,让smon能够正常回滚,从而使得smon进程正常工作
还原归档后继续测试
[oracle@xifenfei archivelog]$ mv bak/* ./ RMAN> BLOCKRECOVER DATAFILE 4 block 28; starting media recovery archive log thread 1 sequence 247 is already on disk as file /u01/oracle/oradata/XFF/archivelog/1_247_792679299.dbf ………… media recovery complete, elapsed time: 00:00:01 Finished blockrecover at 20-JAN-13 [oracle@xifenfei ~]$ dbv file=/u01/oracle/oradata/XFF/users01.dbf DBVERIFY: Release 10.2.0.4.0 - Production on Sun Jan 20 23:15:43 2013 Copyright (c) 1982, 2007, Oracle. All rights reserved. DBVERIFY - Verification starting : FILE = /u01/oracle/oradata/XFF/users01.dbf DBVERIFY - Verification complete Total Pages Examined : 1280 Total Pages Processed (Data) : 904 Total Pages Failing (Data) : 0 Total Pages Processed (Index): 38 Total Pages Failing (Index): 0 Total Pages Processed (Other): 47 Total Pages Processed (Seg) : 0 Total Pages Failing (Seg) : 0 Total Pages Empty : 291 Total Pages Marked Corrupt : 0 Total Pages Influx : 0 Highest block SCN : 3224023169 (2860.3224023169)
>
blockrecover处理alert日志
Sun Jan 20 23:15:01 2013 Media Recovery Log /u01/oracle/oradata/XFF/archivelog/1_250_792679299.dbf Sun Jan 20 23:15:01 2013 Media Recovery Log /u01/oracle/oradata/XFF/archivelog/1_251_792679299.dbf Sun Jan 20 23:15:01 2013 Recovery of Online Redo Log: Thread 1 Group 3 Seq 252 Reading mem 0 Mem# 0: /u01/oracle/oradata/XFF/redo03.log Sun Jan 20 23:15:01 2013 Recovery of Online Redo Log: Thread 1 Group 1 Seq 253 Reading mem 0 Mem# 0: /u01/oracle/oradata/XFF/redo01.log Sun Jan 20 23:15:01 2013 Recovery of Online Redo Log: Thread 1 Group 2 Seq 254 Reading mem 0 Mem# 0: /u01/oracle/oradata/XFF/redo02.log Sun Jan 20 23:15:02 2013 Completed block media recovery
补充说明
1.在9i中使用blockrecover会在Archive destination中生产block的备份文件,类似469_519791_3063_2442393528.bkd(file 469 block 519791),需要注意Archive destination目录结尾需要”/”,不然可能出现直接写入和该目录并列的Archive destination+blockrecover产生文件;10g中不生成该文件;如果该block在9i中未备份,也不会在生产相关文件,而是利用归档恢复.
2.blockrecover需要还原从该block从备份之后的所有归档(如果被备份起来需要还原出来),如果缺少归档可能导致恢复失败,包括基于scn/seq/time的等
3.blockrecover在执行过程中或者执行异常终止,该block依然是坏块,不会对其他block产生影响.在使用blockrecover出现异常终止后的block修复,在后续blog中提供解决方法
4.对于不含事务的坏块,如果数据允许丢失可以通过设置event跳过坏块(特殊block除外),然后重建对象;对于含event的block 坏块,建议使用blockrecover处理或者直接恢复数据文件,如果没有备份,考试使用event跳过事务回滚,然后屏蔽坏块处理
在9i中如果出现含事务block坏块错误类此如下
使用bbed,直接修改scn应该是可以的,因为可以估算出来需要的scn
Red Hat Enterprise Linux 4
10g Enterprise Edition Release 10.2.0.1.0
RMAN> blockrecover datafile 6 block 37 clear;
Starting blockrecover at 09-JUN-13
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=128 devtype=DISK
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of blockrecover command at 06/09/2013 09:40:42
ORA-00600: internal error code, arguments: [krbrbmrs_clear_open], [], [], [], [], [], [], []
这个问题怎么解决。
发我邮箱,相关的alert日志,trace文件,进一步分析,单纯该报错,无法定位问题