以前写过一篇乱用_allow_resetlogs_corruption参数导致悲剧的文章,昨天晚上又遇到一个朋友不谨慎使用_allow_resetlogs_corruption导致ORA-00704/ORA-01555故障
环境描述
系统环境:solaris
数据库版本:10.2.0.5.7
数据存储方式:ASM
数据量:15T以上
补充事宜:数据库SCN距离headroom只有54天
报ORA-00020错误,实例crash
数据库因为超过了系统的进程数,出现dbwn进程写数据文件异常
Sun Aug 25 16:00:41 CST 2013 Errors in file /opt/oracle/admin/orcl/bdump/orcl_dbw0_7490.trc: ORA-01148: 无法刷新数据文件 22 的文件大小 ORA-01110: 数据文件 22: '+DATA/orcl/datafile/index_jh.dbf' ORA-00020: 超出最大进程数 () Sun Aug 25 16:00:41 CST 2013 Errors in file /opt/oracle/admin/orcl/bdump/orcl_dbw0_7490.trc: ORA-01242: 数据文件出现介质故障: 数据库处于 NOARCHIVELOG 模式 ORA-01110: 数据文件 22: '+DATA/orcl/datafile/index_jh.dbf' Sun Aug 25 16:00:41 CST 2013 DBW0: terminating instance due to error 1242 Termination issued to instance processes. Waiting for the processes to exit Sun Aug 25 16:00:51 CST 2013 Instance termination failed to kill one or more processes Instance terminated by DBW0, pid = 7490
ORA-00600[kcbtema_10]
实例恢复出现ORA-00600: 内部错误代码, 参数: [kcbtema_10], [1], [], [], [], [], [], []
Sun Aug 25 19:19:23 CST 2013 ALTER DATABASE OPEN Sun Aug 25 19:19:38 CST 2013 Beginning crash recovery of 1 threads parallel recovery started with 16 processes Sun Aug 25 19:19:40 CST 2013 Started redo scan Sun Aug 25 19:20:07 CST 2013 Completed redo scan 12016413 redo blocks read, 93405 data blocks need recovery Sun Aug 25 19:20:19 CST 2013 Started redo application at Thread 1: logseq 53681, block 1091966 Sun Aug 25 19:20:19 CST 2013 Recovery of Online Redo Log: Thread 1 Group 1 Seq 53681 Reading mem 0 Mem# 0: +DATA/orcl/onlinelog/redo_1_1.log Mem# 1: +DATA/orcl/onlinelog/redo_1_2.log Sun Aug 25 19:20:21 CST 2013 Errors in file /opt/oracle/admin/orcl/bdump/orcl_p011_16944.trc: ORA-00600: 内部错误代码, 参数: [kcbtema_10], [1], [], [], [], [], [], [] Sun Aug 25 19:20:23 CST 2013 Errors in file /opt/oracle/admin/orcl/bdump/orcl_p011_16944.trc: ORA-00600: 内部错误代码, 参数: [kcbtema_10], [1], [], [], [], [], [], [] Sun Aug 25 19:20:23 CST 2013 Aborting crash recovery due to slave death, attempting serial crash recovery Sun Aug 25 19:20:23 CST 2013 Beginning crash recovery of 1 threads Sun Aug 25 19:20:23 CST 2013 Started redo scan Sun Aug 25 19:20:47 CST 2013 Completed redo scan 12016413 redo blocks read, 93405 data blocks need recovery Sun Aug 25 19:20:54 CST 2013 Started redo application at Thread 1: logseq 53681, block 1091966 Sun Aug 25 19:20:54 CST 2013 Recovery of Online Redo Log: Thread 1 Group 1 Seq 53681 Reading mem 0 Mem# 0: +DATA/orcl/onlinelog/redo_1_1.log Mem# 1: +DATA/orcl/onlinelog/redo_1_2.log Sun Aug 25 19:20:54 CST 2013 Errors in file /opt/oracle/admin/orcl/udump/orcl_ora_16751.trc: ORA-00600: 内部错误代码, 参数: [kcbtema_10], [1], [], [], [], [], [], [] Sun Aug 25 19:20:56 CST 2013 Aborting crash recovery due to error 600 Sun Aug 25 19:20:56 CST 2013 Errors in file /opt/oracle/admin/orcl/udump/orcl_ora_16751.trc: ORA-00600: 内部错误代码, 参数: [kcbtema_10], [1], [], [], [], [], [], [] ORA-600 signalled during: ALTER DATABASE OPEN...
使用隐含参数
ALTER SYSTEM SET _allow_resetlogs_corruption=TRUE SCOPE=SPFILE;
报ORA-00704/ORA-01555
因为在前面的恢复中进行了不完全恢复,因此这里加入隐含参数,然后尝试resetlogs,然后报如下错误
Sun Aug 25 20:11:54 CST 2013 alter database open resetlogs Sun Aug 25 20:12:10 CST 2013 RESETLOGS is being done without consistancy checks. This may result in a corrupted database. The database should be recreated. RESETLOGS after incomplete recovery UNTIL CHANGE 13429649847189 Resetting resetlogs activation ID 1312390734 (0x4e397e4e) Sun Aug 25 20:16:25 CST 2013 Setting recovery target incarnation to 2 Sun Aug 25 20:16:42 CST 2013 ************************************************************ Warning: The SCN headroom for this database is only 54 days! ************************************************************ Sun Aug 25 20:16:43 CST 2013 Assigning activation ID 1352200163 (0x5098efe3) Thread 1 opened at log sequence 1 Current log# 1 seq# 1 mem# 0: +DATA/orcl/onlinelog/redo_1_1.log Current log# 1 seq# 1 mem# 1: +DATA/orcl/onlinelog/redo_1_2.log Successful open of redo thread 1 Sun Aug 25 20:16:43 CST 2013 MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set Sun Aug 25 20:16:52 CST 2013 SMON: enabling cache recovery Sun Aug 25 20:16:52 CST 2013 ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 0x0c36.d582339b): Sun Aug 25 20:16:52 CST 2013 select ctime, mtime, stime from obj$ where obj# = :1 Sun Aug 25 20:16:52 CST 2013 Errors in file /opt/oracle/admin/orcl/udump/orcl_ora_2859.trc: ORA-00704: 引导程序进程失败 ORA-00704: 引导程序进程失败 ORA-00604: 递归 SQL 级别 1 出现错误 ORA-01555: 快照过旧: 回退段号 143 (名称为 "_SYSSMU143$") 过小 Error 704 happened during db open, shutting down database USER: terminating instance due to error 704 Termination issued to instance processes. Waiting for the processes to exit Sun Aug 25 20:17:02 CST 2013 Instance termination failed to kill one or more processes Instance terminated by USER, pid = 2859 ORA-1092 signalled during: alter database open resetlogs...
数据库当前SCN
SQL > select CHECKPOINT_CHANGE# from v$database; CHECKPOINT_CHANGE# ------------------ 13429649947222 SQL > select distinct CHECKPOINT_CHANGE# from v$datafile_header; CHECKPOINT_CHANGE# ------------------ 13429649947222
解决方法
因为该数据库版本为10.2.0.5.7,已经包含了scn patch,因此不能使用event或者隐含参数来修改scn,而且该库容量15T以上(asm),因此也无法使用bbed修改数据文件头,最后决定使用ordebug来解决该问题
使用oradebug DUMPvar SGA kcsgscn_
使用oradebug poke
sqlplus / as sysdba startup mount oradebug setmypid oradebug DUMPvar SGA kcsgscn_ oradebug poke recover database; alter database open;
事后总结
查询MOS,发现ORA-00600[kcbtema_10] Raised During Recovery Operations (Doc ID 472282.1)
--故障原因 The cause of this problem has been identified and verified in unpublished Bug 5184359 ORA-600 [KCBTEMA_10]. Due to this bug, during recovery, the class designation of a data block has changed. --处理方法 SQL>startup mount SQL>recover database; SQL>alter database open;
因为MOS上给的解决思路在该数据库中已经无法尝试,不能确定该方法一定可行,但是对于本次的恢复过程中,没有任何直接recover database操作(只有一次不完全恢复)确实让人有无限的遗憾和可惜。对于本次应该先查询MOS,尝试该种方法,慎重使用_allow_resetlogs_corruption参数