Oracle故障第一现场被恢复混乱的数据库恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Oracle故障第一现场被恢复混乱的数据库恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户数据库断电异常之后,由于第三方进行了一系列的恢复,现场比较混乱,无法判断最初情况,通过Oracle Database Recovery Check收集的结果进行初步判断
1. 有三个文件处于丢失状态,并且数据库在故障之后被人强制resetlogs拉过库
file-missing


2. 数据文件头scn不一致,而且相差的日志序列还比较大(该数据库为非归档模式)
seq

3. 该库多次重建ctl(alert日志中也有相关记录)
rectl

现在恢复这个库需要做的几件事情:
1. 由于没有任何原始故障之后的控制文件,需要从服务器上找出来所有故障之时的数据文件,担心被人重建ctl使用了错误的数据文件
2. 对于三个file missing的进行分析,并确认磁盘上是否存在,是否是好的,如果是好的需要和现在的文件一起作为一个整体进行恢复,并打开库
3. 打开数据库过程可能遇到的错误处理

通过obet中近期增加的get_dbinfo功能来解析所有可能的数据文件头(obet官方说明),结合文件头的信息判断,发现磁盘上名称dbf结尾的文件号重复
1

这样的情况下,我们取filesize大,(scn大不一定正确,可能由于被强制resetlogs导致scn比正确的文件大),同时也结合这个收集的信息,确认三个丢失的文件中两个为undotbs1表空间文件,另外一个为112k的数据文件,这里让我学习到了新知识,oracle的数据文件最小可以多少个block(通过试验测试,最小可以16个block,文件大小即为:16+1(block 0)*block_size)

C:\Users\XFF>sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on 星期三 5月 13 22:05:52 2026

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


连接到:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> create tablespace tbs datafile 'e:/tbs01.dbf' size 8k;
create tablespace tbs datafile 'e:/tbs01.dbf' size 8k
*
第 1 行出现错误:
ORA-03214: 指定的文件大小小于所需的最小值


SQL> create tablespace tbs datafile 'e:/tbs01.dbf' size 80k;
create tablespace tbs datafile 'e:/tbs01.dbf' size 80k
*
第 1 行出现错误:
ORA-03214: 指定的文件大小小于所需的最小值


SQL> create tablespace tbs datafile 'e:/tbs01.dbf' size 96k;

表空间已创建。

确认好相关信息之后,然后对于三个file missing状态的文件进行dbv检测确认undotbs01.dbf(file 3)基本上全部损坏(大量全0块),另外两个文件正常
obet_dbv


对于正常的文件通过obet修改相关scn信息
obet_resetlogs_scn

然后重建控制文件(丢弃undotbs01.dbf文件),由于确认undo已经异常,直接设置undo为manual管理方式并屏蔽回滚段,然后屏蔽一致性,强制打开数据库,结果报ORA-600 2662错误
ora-600-2662

使用Patch_SCN工具修改数据库scn(Patch_SCN工具说明)
patch_scn

然后数据库顺利打开,重建新undo,增加temp,删除老undo,导出数据完成本次恢复任务

一次断电引起的Oracle故障恢复-ora-600 2662故障

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:一次断电引起的Oracle故障恢复-ora-600 2662故障

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

接手一个oracle恢复case,由于断电导致oracle数据库异常,现场人员进行了一系列的恢复成功,但是没有成功open库,我接手故障之后尝试做recover 报ORA-16433错误

[oracle@xifenfei.com ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Sun May 10 09:27:51 2026

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> 
SQL> recover database;
ORA-00283: recovery session canceled due to errors
ORA-16433: The database must be opened in read/write mode.

根据经验这种错误一般是由于强制打开库失败导致,回溯oracle alert日志发现类似操作

Fri May 08 18:47:59 2026
ALTER DATABASE OPEN RESETLOGS
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 138986145572
Errors in file /data/oracle/diag/rdbms/xff/xff/trace/xff_ora_37813.trc:
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1: '/data/oracle/oradata/xff/redo01.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Clearing online redo logfile 1 /data/oracle/oradata/xff/redo01.log
Clearing online log 1 of thread 1 sequence number 2389
Errors in file /data/oracle/diag/rdbms/xff/xff/trace/xff_ora_37813.trc:
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1: '/data/oracle/oradata/xff/redo01.log'
………………
Clearing online redo logfile 3 complete
Resetting resetlogs activation ID 677485461 (0x28619b95)
Online log /data/oracle/oradata/xff/redo01.log: Thread 1 Group 1 was previously cleared
Online log /data/oracle/oradata/xff/redo02.log: Thread 1 Group 2 was previously cleared
Online log /data/oracle/oradata/xff/redo03.log: Thread 1 Group 3 was previously cleared
Fri May 08 18:48:17 2026
Setting recovery target incarnation to 4
Fri May 08 18:48:17 2026
Assigning activation ID 677634815 (0x2863e2ff)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: /data/oracle/oradata/xff/redo01.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri May 08 18:48:18 2026
SMON: enabling cache recovery
Errors in file /data/oracle/diag/rdbms/xff/xff/trace/xff_ora_37813.trc  (incident=170321):
ORA-00600: internal error code, arguments: [2662], [32], [1547192107], [32], [1547212241], [12583040], []
Fri May 08 18:48:20 2026
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /data/oracle/diag/rdbms/xff/xff/trace/xff_ora_37813.trc:
ORA-00600: internal error code, arguments: [2662], [32], [1547192107], [32], [1547212241], [12583040]
Errors in file /data/oracle/diag/rdbms/xff/xff/trace/xff_ora_37813.trc:
ORA-00600: internal error code, arguments: [2662], [32], [1547192107], [32], [1547212241], [12583040]
Error 600 happened during db open, shutting down database
USER (ospid: 37813): terminating the instance due to error 600
Instance terminated by USER, pid = 37813
ORA-1092 signalled during: ALTER DATABASE OPEN RESETLOGS...

对于这种情况,先重建控制文件

SQL> @/tmp/rectl.sql

Control file created.

尝试recover database恢复

ALTER DATABASE RECOVER  database  
Media Recovery Start
 started logmerger process
Parallel Media Recovery started with 96 slaves
Sun May 10 09:33:03 2026
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: /data/oracle/oradata/xff/redo02.log
Errors in file /data/oracle/diag/rdbms/xff/xff/trace/xff_pr00_250074.trc  (incident=254397):
ORA-00353: log corruption near block 3 change 138986165586 time 05/08/2026 18:55:37
ORA-00312: online log 2 thread 1: '/data/oracle/oradata/xff/redo02.log'
Incident details in: /data/oracle/diag/rdbms/xff/xff/incident/incdir_254397/xff_pr00_250074_i254397.trc
Sun May 10 09:33:04 2026
Media Recovery failed with error 399
Errors in file /data/oracle/diag/rdbms/xff/xff/trace/xff_pr00_250074.trc:
ORA-00283: recovery session canceled due to errors
ORA-00399: corrupt change description in redo log
ORA-00353: log corruption near block 3 change 138986165586 time 05/08/2026 18:55:37
ORA-00312: online log 2 thread 1: '/data/oracle/oradata/xff/redo02.log'
ORA-10877 signalled during: ALTER DATABASE RECOVER  database  ...
Sun May 10 09:33:04 2026
Sweep [inc][254397]: completed
Errors in file /data/oracle/diag/rdbms/xff/xff/trace/xff_m000_250348.trc  (incident=255173):
ORA-00353: log corruption near block 3 change 138986165586 time 05/08/2026 18:55:37
ORA-00312: online log 2 thread 1: '/data/oracle/oradata/xff/redo02.log'
Errors in file /data/oracle/diag/rdbms/xff/xff/incident/incdir_254397/xff_m000_250348_i254397_a.trc:
ORA-00399: corrupt change description in redo log
ORA-00353: log corruption near block 3 change 138986165586 time 05/08/2026 18:55:37
ORA-00312: online log 2 thread 1: '/data/oracle/oradata/xff/redo02.log'
Errors in file /data/oracle/diag/rdbms/xff/xff/incident/incdir_254397/xff_m000_250348_i254397_a.trc:
ORA-00399: corrupt change description in redo log
ORA-00353: log corruption near block 3 change 138986165586 time 05/08/2026 18:55:37
ORA-00312: online log 2 thread 1: '/data/oracle/oradata/xff/redo02.log'
Sun May 10 09:34:05 2026

由于无法正常recover 操作,数据库需要强制打开,考虑到之前该库open的过程有ORA-600 2662错误,这次在打开之前使用Patch_SCN调整SCN(关于Patch_SCN文章:Patch_SCN for Linux 功能完善

[oracle@xifenfei.com tmp]$ ./Patch_SCN 249756 0x205D38954E
Successfully obtained address automatically: 0x6001ae70
Original Oracle SCN at Address 0x6001ae70: 0x0
Are you sure you want to modify Oracle SCN? (yes/no): yes
New SCN at Address 0x6001ae70: 0x205d38954e
Oracle SCN successfully modified.

然后数据库顺利打开
open1


在expdp导出数据过程中遇到了硬件错误
ora-27072
io-error

为了安全性采用库只读情况下exp进行导出,运气不错所有数据顺利导出,完成本次数据恢复任务
exp

Oracle典型故障:The controlfile header block returned by the OS has a sequence number that is too old

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Oracle典型故障:The controlfile header block returned by the OS has a sequence number that is too old

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

这个是一例子客户数据库运行过程中突然报:The controlfile header block returned by the OS has a sequence number that is too old.然后数据库无法正常启动的数据库恢复case
以前处理过一些类似case:Controlfile sequence number in file header is different from the one in memory
故障现象
alert日志中报The controlfile header block returned by the OS has a sequence number that is too old.错误,然后数据库crash


Wed Mar 18 12:00:44 2026
********************* ATTENTION: ******************** 
 The controlfile header block returned by the OS
 has a sequence number that is too old. 
 The controlfile might be corrupted.
 PLEASE DO NOT ATTEMPT TO START UP THE INSTANCE 
 without following the steps below.
 RE-STARTING THE INSTANCE CAN CAUSE SERIOUS DAMAGE 
 TO THE DATABASE, if the controlfile is truly corrupted.
 In order to re-start the instance safely, 
 please do the following:
 (1) Save all copies of the controlfile for later 
     analysis and contact your OS vendor and Oracle support.
 (2) Mount the instance and issue: 
     ALTER DATABASE BACKUP CONTROLFILE TO TRACE;
 (3) Unmount the instance. 
 (4) Use the script in the trace file to
     RE-CREATE THE CONTROLFILE and open the database. 
*****************************************************
USER (ospid: 15912): terminating the instance

这个错误比较明显是由于控制文件的sequence number比较老导致,出现这种问题,一般是由于io过慢,或者底层不稳定(比如虚拟化平台,文件系统异常,硬件不稳定等)导致(官方参考文档:The controlfile header block returned by the OS has a sequence number that is too old.)

尝试重启数据库报ORA-01207错误

Wed Mar 18 18:51:54 2026
alter database mount exclusive
Successful mount of redo thread 1, with mount id 1534819594
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: alter database mount exclusive
alter database open
Errors in file e:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_2992.trc:
ORA-01122: ????? 18 ????
ORA-01110: ???? 18: 'E:\APP\ADMINISTRATOR\ORADATA\ORCL\XIFENFEI.DBF'
ORA-01207: ????????? - ??????
ORA-1122 signalled during: alter database open...
Wed Mar 18 18:52:01 2026
Checker run found 1 new persistent data failures

该错误的官方解释

[oracle@xifenfei.com ~]$ oerr ora 1207
01207, 00000, "file is more recent than control file - old control file"
// *Cause:  The control file change sequence number in the data file is 
//         greater than the number in the control file. This implies that
//         the wrong control file is being used. Note that repeatedly causing
//         this error can make it stop happening without correcting the real
//         problem. Every attempt to open the database will advance the
//         control file change sequence number until it is great enough.
// *Action: Use the current control file or do backup control file recovery to 
//         make the control file current. Be sure to follow all restrictions 
//         on doing a backup control file recovery.

由于数据文件比控制文件更新,导致该问题,通过查询v$datafile_header发现更多类似异常文件(可以使用Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)收集信息)
111


alert日志文件中也有明显的写错日志信息
222

这些二进制内容和监听日志内容被写入到了alert日志中,证明当时文件系统或者操作系统甚至更底层出现了异常,这个客户是运行在云平台上的,具体运营需要平台厂商才能分析

故障处理
重建控制文件并进行recover

SQL> startup nomount pfile='e:/pfile.txt' ;
ORACLE 例程已经启动。

Total System Global Area 2.9931E+10 bytes
Fixed Size                  2190296 bytes
Variable Size            1946158120 bytes
Database Buffers         2.7917E+10 bytes
Redo Buffers               64905216 bytes
SQL> @rectl.sql

控制文件已创建。
SQL> recover database;

完成介质恢复。

尝试启动数据库,报ora-600 2662错误

SQL> alter database open;

alter database open
*
第 1 行出现错误:

ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [2662], [1], [45773288], [1], [45777527], [301990016]
ORA-00600: internal error code, arguments: [2662], [1], [45773287], [1], [45777527], [301990016]
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [2662], [1], [45773285], [1], [45777527], [301990016]
进程 ID: 2708
会话 ID: 148 序列号: 5

数据库启动报ORA-600 2662错误,这个是典型的文件头scn过小的问题,通过自研小工具Patch_SCN可以快速解决,以前类似文章:
Patch SCN一键解决ORA-600 2662故障
Patch SCN工具一键恢复ORA-600 kcbzib_kcrsds_1
ORA-600 kcratr_nab_less_than_odr和ORA-600 2662故障处理
333


修改scn之后,数据库顺利打开

SQL> startup mount pfile='e:/pfile.txt';
ORACLE 例程已经启动。

Total System Global Area 2.9931E+10 bytes
Fixed Size                  2190296 bytes
Variable Size            1946158120 bytes
Database Buffers         2.7917E+10 bytes
Redo Buffers               64905216 bytes
数据库装载完毕。
SQL> alter database open;

alter database open
*
第 1 行出现错误:
ORA-01113: 文件 1 需要介质恢复
ORA-01110: 数据文件 1: 'E:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM01.DBF'
SQL> recover database;

完成介质恢复。
SQL> alter database open;

数据库已更改。

ORA-600 kcratr_nab_less_than_odr和ORA-600 2662故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-600 kcratr_nab_less_than_odr和ORA-600 2662故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

异常断电之后,oracle启动报ORA-600 kcratr_nab_less_than_odr错误

Sun Aug 17 11:06:09 2025
ALTER DATABASE OPEN
Beginning crash recovery of 1 threads
 parallel recovery started with 11 processes
Started redo scan
Completed redo scan
 read 0 KB redo, 0 data blocks need recovery
ORA-00600: 内部错误代码, 参数: [kcratr_nab_less_than_odr], [1], [40785], [33267], [40630]
Sun Aug 17 11:06:20 2025
Aborting crash recovery due to error 600
Sun Aug 17 11:06:20 2025
Trace dumping is performing id=[cdmp_20250817110620]
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_2920.trc:
ORA-00600: 内部错误代码, 参数: [kcratr_nab_less_than_odr], [1], [40785], [33267], [40630]
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_2920.trc:
ORA-00600: 内部错误代码, 参数: [kcratr_nab_less_than_odr], [1], [40785], [33267], [40630]
ORA-600 signalled during: ALTER DATABASE OPEN...

这个错误处理有多次处理经验
12c启动报kcratr_nab_less_than_odr
又一例ORA-600 kcratr_nab_less_than_odr
ORA-600 kcratr_nab_less_than_odr故障解决
差点被误操作的ORA-600 kcratr_nab_less_than_odr故障
11.2.0.4库中遇到ORA-600 kcratr_nab_less_than_odr报错
ORA-600 kcratr_nab_less_than_odr和ORA-600 4194故障处理
一般重建ctl或者using backup ctl方式恢复即可实现0丢失打开库,但是这个库尝试打开报ORA-600 2662错误

SQL> startup nomount;
ORACLE 例程已经启动。

Total System Global Area 6847938560 bytes
Fixed Size                  2188768 bytes
Variable Size            4680845856 bytes
Database Buffers         2147483648 bytes
Redo Buffers               17420288 bytes
SQL> @rectl.sql

控制文件已创建。

SQL>
SQL>
SQL> recover database;
完成介质恢复。
SQL> alter database open;
alter database open
*
第 1 行出现错误:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [2662], [4413], [1200914792],
[4413], [1201104184], [12583040], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [2662], [4413], [1200914791],
[4413], [1201104184], [12583040], [], [], [], [], [], []
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [2662], [4413], [1200914789],
[4413], [1201104184], [12583040], [], [], [], [], [], []
进程 ID: 3132
会话 ID: 3018 序列号: 1

对于这样的ORA-600 2662错误是比较常见的问题,直接通过Patch_SCN工具修改scn即可正常打开库
patch_scn-ora-600-2662

Mon Aug 18 03:30:41 2025
ALTER DATABASE RECOVER  database  
Media Recovery Start
 started logmerger process
Parallel Media Recovery started with 12 slaves
Mon Aug 18 03:30:41 2025
Recovery of Online Redo Log: Thread 1 Group 2 Seq 40787 Reading mem 0
  Mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO02.LOG
Completed: ALTER DATABASE RECOVER  database  
alter database open 
Beginning crash recovery of 1 threads
 parallel recovery started with 11 processes
Started redo scan
Completed redo scan
 read 1 KB redo, 0 data blocks need recovery
Started redo application at
 Thread 1: logseq 40787, block 2, scn 18954891612040
Recovery of Online Redo Log: Thread 1 Group 2 Seq 40787 Reading mem 0
  Mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO02.LOG
Completed redo application of 0.00MB
Completed crash recovery at
 Thread 1: logseq 40787, block 4, scn 18954891632047
 0 data blocks read, 0 data blocks written, 1 redo k-bytes read
Mon Aug 18 03:30:47 2025
Thread 1 advanced to log sequence 40788 (thread open)
Thread 1 opened at log sequence 40788
  Current log# 3 seq# 40788 mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO03.LOG
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Mon Aug 18 03:30:47 2025
SMON: enabling cache recovery
Dictionary check beginning
Tablespace 'TEMP' #3 found in data dictionary,
but not in the controlfile. Adding to controlfile.
Dictionary check complete
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
*********************************************************************
WARNING: The following temporary tablespaces contain no files.
         This condition can occur when a backup controlfile has
         been restored.  It may be necessary to add files to these
         tablespaces.  That can be done using the SQL statement:
 
         ALTER TABLESPACE <tablespace_name> ADD TEMPFILE
 
         Alternatively, if these temporary tablespaces are no longer
         needed, then they can be dropped.
           Empty temporary tablespace: TEMP
*********************************************************************
Database Characterset is ZHS16GBK
replication_dependency_tracking turned off (no async multimaster replication found)
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
Completed: alter database open

后续增加tempfile,导出数据完成本次恢复任务