asm dd 10M导致system文件部分坏块修复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:asm dd 10M导致system文件部分坏块修复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户的数据库由于不当操作导致asm磁盘头损坏,进行的操作命令类似

dd if=/dev/zero of=/dev/dm-29 bs=1024K count=10

asm磁盘组无法mount,提示ORA-15042

SQL> ALTER DISKGROUP DATA MOUNT  /* asm agent *//* {1:1712:2} */ 
2026-05-19T21:57:16.517284-04:00
NOTE: cache registered group DATA 5/0xAE103296
NOTE: cache began mount (first) of group DATA 5/0xAE103296
NOTE: Assigning number (5,10) to disk (/dev/dm-29)
…………
NOTE: Assigning number (5,18) to disk (/dev/dm-15)
NOTE: Assigning number (5,0) to disk (/dev/dm-7)
NOTE: Assigning number (5,15) to disk (/dev/dm-5)
2026-05-19T21:57:16.650481-04:00
cluster guid (b216d47e2bf86f6aff34a36119b0c161) generated for PST Hbeat for instance 1
2026-05-19T21:57:22.659262-04:00
NOTE: GMON heartbeating for grp 5 (DATA)
GMON querying group 5 at 28 for pid 42, osid 12996
2026-05-19T21:57:22.661447-04:00
NOTE: Assigning number (5,14) to disk ()
2026-05-19T21:57:22.662476-04:00
GMON querying group 5 at 29 for pid 42, osid 12996
2026-05-19T21:57:22.663144-04:00
NOTE: cache dismounting (clean) group 5/0xAE103296 (DATA)
NOTE: messaging CKPT to quiesce pins Unix process pid: 12996, image: oracle@dlycdb1 (TNS V1-V3)
NOTE: dbwr not being msg'd to dismount
NOTE: LGWR not being messaged to dismount
NOTE: cache dismounted group 5/0xAE103296 (DATA)
NOTE: cache ending mount (fail) of group DATA number=5 incarn=0xae103296
NOTE: cache deleting context for group DATA 5/0xae103296
2026-05-19T21:57:22.720673-04:00
GMON dismounting group 5 at 30 for pid 42, osid 12996
2026-05-19T21:57:22.721055-04:00
NOTE: Disk DATA_0000 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0001 in mode 0x7f marked for de-assignment
…………
NOTE: Disk DATA_0013 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0015 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0016 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0017 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0018 in mode 0x7f marked for de-assignment
ERROR: diskgroup DATA was not mounted
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "14" is missing from group number "5" 

2026-05-19T21:57:22.745140-04:00
ERROR: ALTER DISKGROUP DATA MOUNT  /* asm agent *//* {1:1712:2} */

由于客户这个是19c版本,直接使用备份au还原,然后mount磁盘组成功

SQL> alter diskgroup data mount 
2026-05-19T22:40:59.137657+08:00
NOTE: cache registered group DATA 2/0xB126DA41
NOTE: cache began mount (first) of group DATA 2/0xB126DA41
NOTE: Assigning number (2,14) to disk (/dev/dm-29)
NOTE: Assigning number (2,8) to disk (/dev/dm-28)
…………
NOTE: Assigning number (2,2) to disk (/dev/dm-7)
NOTE: Assigning number (2,0) to disk (/dev/dm-6)
NOTE: Assigning number (2,1) to disk (/dev/dm-3)
2026-05-19T22:40:59.303908+08:00
cluster guid (b216d47e2bf86f6aff34a36119b0c161) generated for PST Hbeat for instance 2
2026-05-19T22:41:05.312792+08:00
NOTE: GMON heartbeating for grp 2 (DATA)
GMON querying group 2 at 80 for pid 35, osid 53051
2026-05-19T22:41:05.314925+08:00
NOTE: cache is mounting group DATA created on 2024/11/28 17:55:45
NOTE: cache opening disk 0 of grp 2: DATA_0000 path:/dev/dm-6
NOTE: 05/20/26 16:41:04 DATA.F1X0 found on disk 0 au 10 fcn 0.0 datfmt 1
NOTE: cache opening disk 1 of grp 2: DATA_0001 path:/dev/dm-3
…………
NOTE: cache opening disk 13 of grp 2: DATA_0013 path:/dev/dm-18
NOTE: cache opening disk 14 of grp 2: DATA_0014 path:/dev/dm-29
NOTE: cache opening disk 15 of grp 2: DATA_0015 path:/dev/dm-16
NOTE: cache opening disk 16 of grp 2: DATA_0016 path:/dev/dm-17
NOTE: cache opening disk 17 of grp 2: DATA_0017 path:/dev/dm-20
NOTE: cache opening disk 18 of grp 2: DATA_0018 path:/dev/dm-21
2026-05-19T22:41:05.317307+08:00
NOTE: cache mounting (first) external redundancy group 2/0xB126DA41 (DATA)
2026-05-19T22:41:05.522191+08:00
NOTE: attached to recovery domain 2
2026-05-19T22:41:05.558136+08:00
validate pdb 2, flags x4, valid 0, pdb flags x204 
* validated domain 2, flags = 0x200
NOTE: cache recovered group 2 to fcn 0.46336611
NOTE: redo buffer size is 512 blocks (2105344 bytes)
2026-05-19T22:41:05.569479+08:00
NOTE: LGWR attempting to mount thread 1 for diskgroup 2 (DATA)
NOTE: LGWR found thread 1 closed at ABA 110.6507 lock domain=0 inc#=0 instnum=1
NOTE: LGWR mounted thread 1 for diskgroup 2 (DATA)
2026-05-19T22:41:05.574499+08:00
mntstmp=2026/05/20 16:41:05.572000
2026-05-19T22:41:05.574854+08:00
NOTE: cache mounting group 2/0xB126DA41 (DATA) succeeded
NOTE: cache ending mount (success) of group DATA number=2 incarn=0xb126da41
2026-05-19T22:41:05.616227+08:00
NOTE: Instance updated compatible.asm to 19.0.0.0.0 for grp 2 (DATA).
2026-05-19T22:41:05.616754+08:00
NOTE: Instance updated compatible.asm to 19.0.0.0.0 for grp 2 (DATA).
2026-05-19T22:41:05.617932+08:00
NOTE: Instance updated compatible.rdbms to 19.0.0.0.0 for grp 2 (DATA).
2026-05-19T22:41:05.618303+08:00
NOTE: Instance updated compatible.rdbms to 19.0.0.0.0 for grp 2 (DATA).
2026-05-19T22:41:05.643844+08:00
SUCCESS: diskgroup DATA was mounted

虽然磁盘组mount成功,但是asm依旧在报错

这个客户胆子也真够大的,这样mount起来的磁盘组,还往里面加入磁盘出发Rebalance操作
1
2026-05-20T09:18:34.292103-04:00
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_60220.trc:
ORA-15196: invalid ASM block header [kfc.c:30747] [endian_kfbh] [2147483662] [1014] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:30747] [endian_kfbh] [2147483662] [1014] [0 != 1]
NOTE: cache repaired a corrupt block: group=2(DATA) dsk=14 blk=1014 on disk 14 from disk=14 (DATA_0014) incarn=4042690154 au=11 blk=1014 count=1
2026-05-20T09:18:36.721669-04:00
WARNING: cache read a corrupt block: group=2(DATA) dsk=14 blk=1015 disk=14 (DATA_0014) incarn=4042690154 au=0 blk=1015 count=1
2026-05-20T09:18:36.721982-04:00
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_60220.trc:
ORA-15196: invalid ASM block header [kfc.c:30747] [endian_kfbh] [2147483662] [1015] [0 != 1]
NOTE: a corrupted block from group DATA was dumped to /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_60220.trc
WARNING: cache read (retry) a corrupt block: group=2(DATA) dsk=14 blk=1015 disk=14 (DATA_0014) incarn=4042690154 au=0 blk=1015 count=1
2026-05-20T09:18:36.724279-04:00
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_60220.trc:
ORA-15196: invalid ASM block header [kfc.c:30747] [endian_kfbh] [2147483662] [1015] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:30747] [endian_kfbh] [2147483662] [1015] [0 != 1]
NOTE: cache repaired a corrupt block: group=2(DATA) dsk=14 blk=1015 on disk 14 from disk=14 (DATA_0014) incarn=4042690154 au=11 blk=1015 count=1
2026-05-20T09:44:20.469792-04:00
NOTE: Starting expel slave for group 2/0xcd67eb4 (DATA)
2026-05-20T09:44:20.473880-04:00
NOTE: GroupBlock outside rolling migration privileged region
NOTE: requesting all-instance membership refresh for group=2
2026-05-20T09:44:20.530101-04:00
NOTE: membership refresh pending for group 2/0xcd67eb4 (DATA)
2026-05-20T09:44:20.534097-04:00
GMON querying group 2 at 178 for pid 27, osid 70130
2026-05-20T09:44:20.557651-04:00
SUCCESS: refreshed membership for 2/0xcd67eb4 (DATA)
2026-05-20T09:44:23.489304-04:00
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: Refresh completed on diskgroup DATA. No voting file found.
2026-05-20T11:20:49.804413-04:00
NOTE: stopping process ARB0
NOTE: stopping process ARBA
2026-05-20T11:20:51.538777-04:00
SUCCESS: rebalance completed for group 2/0xcd67eb4 (DATA)
2026-05-20T23:20:51.541462+08:00
SUCCESS: ALTER DISKGROUP DATA ADD  DISK '/dev/dm-20' SIZE 4194304M
 REBALANCE WAIT

算幸运,由于ORA-15196: invalid ASM block header [kfc.c:30747] [endian_kfbh] 错误导致rebalance没有真正运行起来,从而该磁盘组没有dismount(19c这个方面确实增强不少,如果以前版本大概率会直接dismount掉)

客户在这样mount的磁盘组上尝试启动库,报ORA-01578错误,无法启动成功

2026-05-20T17:46:07.060794+08:00
Error attempting to elevate LMHB's priority: no further priority changes will be attempted for this process
2026-05-20T17:46:07.647114+08:00
Undo initialization recovery: Parallel FPTR complete: start:26091908 end:26096229 diff:4321 ms (4.3 seconds)
Undo initialization recovery: err:0 start: 26091907 end: 26096229 diff: 4322 ms (4.3 seconds)
[50880] Successfully onlined Undo Tablespace 5.
Undo initialization online undo segments: err:0 start: 26096229 end: 26096576 diff: 347 ms (0.3 seconds)
Undo initialization finished serial:0 start:26091907 end:26096591 diff:4684 ms (4.7 seconds)
Database Characterset is AL32UTF8
No Resource Manager plan active
2026-05-20T17:46:08.734113+08:00

Corrupt block relative dba: 0x004030ee (file 1, block 12526)
Completely zero block found during buffer read

Reread (file 1, block 12526) found same corrupt data (no logical check)
2026-05-20T17:46:08.811455+08:00
Corrupt Block Found
         TIME STAMP (GMT) = 05/20/2026 17:46:07
         CONT = 0, TSN = 0, TSNAME = SYSTEM
         RFN = 1, BLK = 12526, RDBA = 4206830
         OBJN = 37, OBJD = 37, OBJECT = I_OBJ2, SUBOBJECT = 
         SEGMENT OWNER = SYS, SEGMENT TYPE = Index Segment
Errors in file /u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_ora_50880.trc  (incident=800708):
ORA-01578: ORACLE data block corrupted (file # 1, block # 12526)
ORA-01110: data file 1: '+DATA/xff/DATAFILE/system.257.1186720165'
2026-05-20T05:46:09.047772-04:00
ALTER SYSTEM SET remote_listener=' xff-scan:11521' SCOPE=MEMORY SID='xff2';
2026-05-20T05:46:09.049615-04:00
ALTER SYSTEM SET listener_networks='' SCOPE=MEMORY SID='xff2';
2026-05-20T17:46:09.812271+08:00
*****************************************************************
An internal routine has requested a dump of selected redo.
This usually happens following a specific internal error, when
analysis of the redo logs will help Oracle Support with the
diagnosis.
It is recommended that you retain all the redo logs generated (by
all the instances) during the past 12 hours, in case additional
redo dumps are required to help with the diagnosis.
*****************************************************************

Corrupt block relative dba: 0x004030de (file 1, block 12510)
Completely zero block found during buffer read

Reread (file 1, block 12510) found same corrupt data (no logical check)
2026-05-20T17:46:10.350573+08:00
Corrupt Block Found
         TIME STAMP (GMT) = 05/20/2026 17:46:09
         CONT = 0, TSN = 0, TSNAME = SYSTEM
         RFN = 1, BLK = 12510, RDBA = 4206814
         OBJN = 83, OBJD = 83, OBJECT = DEPENDENCY$, SUBOBJECT = 
         SEGMENT OWNER = SYS, SEGMENT TYPE = Table Segment
Errors in file /u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_ora_50880.trc  (incident=800709):
ORA-01578: ORACLE data block corrupted (file # 1, block # 12510)
ORA-01110: data file 1: '+DATA/xff/DATAFILE/system.257.1186720165'
2026-05-20T17:46:10.694365+08:00

Corrupt block relative dba: 0x00403000 (file 1, block 12288)

Completely zero block found during validation
Reread of blocknum=12288, file=+DATA/xff/DATAFILE/system.257.1186720165. found same corrupt data
Reread of blocknum=12288, file=+DATA/xff/DATAFILE/system.257.1186720165. found same corrupt data
Reread of blocknum=12288, file=+DATA/xff/DATAFILE/system.257.1186720165. found same corrupt data
Reread of blocknum=12288, file=+DATA/xff/DATAFILE/system.257.1186720165. found same corrupt data
Reread of blocknum=12288, file=+DATA/xff/DATAFILE/system.257.1186720165. found same corrupt data

Corrupt block relative dba: 0x00403001 (file 1, block 12289)

Completely zero block found during validation
Reread of blocknum=12289, file=+DATA/xff/DATAFILE/system.257.1186720165. found same corrupt data
Reread of blocknum=12289, file=+DATA/xff/DATAFILE/system.257.1186720165. found same corrupt data
Reread of blocknum=12289, file=+DATA/xff/DATAFILE/system.257.1186720165. found same corrupt data
Reread of blocknum=12289, file=+DATA/xff/DATAFILE/system.257.1186720165. found same corrupt data
Reread of blocknum=12289, file=+DATA/xff/DATAFILE/system.257.1186720165. found same corrupt data
………………
Corrupt block relative dba: 0x004030ff (file 1, block 12543)

Completely zero block found during validation
Reread of blocknum=12543, file=+DATA/xff/DATAFILE/system.257.1186720165. found same corrupt data
Reread of blocknum=12543, file=+DATA/xff/DATAFILE/system.257.1186720165. found same corrupt data
Reread of blocknum=12543, file=+DATA/xff/DATAFILE/system.257.1186720165. found same corrupt data
Reread of blocknum=12543, file=+DATA/xff/DATAFILE/system.257.1186720165. found same corrupt data
Reread of blocknum=12543, file=+DATA/xff/DATAFILE/system.257.1186720165. found same corrupt data

Corrupt block relative dba: 0x0040301d (file 1, block 12317)
Completely zero block found during buffer read

Reread (file 1, block 12317) found same corrupt data (no logical check)
2026-05-20T17:46:12.183545+08:00
Corrupt Block Found
         TIME STAMP (GMT) = 05/20/2026 17:46:11
         CONT = 0, TSN = 0, TSNAME = SYSTEM
         RFN = 1, BLK = 12317, RDBA = 4206621
         OBJN = 37, OBJD = 37, OBJECT = I_OBJ2, SUBOBJECT = 
         SEGMENT OWNER = SYS, SEGMENT TYPE = Index Segment
Errors in file /u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_ora_50880.trc  (incident=800710):
ORA-01578: ORACLE data block corrupted (file # 1, block # 12317)
ORA-01110: data file 1: '+DATA/xff/DATAFILE/system.257.1186720165'
Incident details in: /u01/app/oracle/diag/rdbms/xff/xff2/incident/incdir_800710/xff2_ora_50880_i800710.trc
2026-05-20T05:46:12.371040-04:00
ALTER SYSTEM SET remote_listener=' xff-scan:11521' SCOPE=MEMORY SID='xff2';
2026-05-20T05:46:12.372924-04:00
ALTER SYSTEM SET listener_networks='' SCOPE=MEMORY SID='xff2';
2026-05-20T17:46:13.814346+08:00
Errors in file /u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_ora_50880.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 12317)
ORA-01110: data file 1: '+DATA/xff/DATAFILE/system.257.1186720165'
2026-05-20T17:46:13.814407+08:00
Errors in file /u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_ora_50880.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 12317)
ORA-01110: data file 1: '+DATA/xff/DATAFILE/system.257.1186720165'
2026-05-20T17:46:13.814442+08:00
Error 604 happened during db open, shutting down database
Errors in file /u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_ora_50880.trc  (incident=800711):
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 12317)
ORA-01110: data file 1: '+DATA/xff/DATAFILE/system.257.1186720165'

Corrupt block relative dba: 0x00403097 (file 1, block 12439)
Completely zero block found during buffer read

Reread (file 1, block 12439) found same corrupt data (no logical check)
2026-05-20T17:46:14.259044+08:00
Corrupt Block Found
         TIME STAMP (GMT) = 05/20/2026 17:46:13
         CONT = 0, TSN = 0, TSNAME = SYSTEM
         RFN = 1, BLK = 12439, RDBA = 4206743
         OBJN = 18, OBJD = 18, OBJECT = OBJ$, SUBOBJECT = 
         SEGMENT OWNER = SYS, SEGMENT TYPE = Table Segment
Errors in file /u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_gen0_48604.trc  (incident=798852):
ORA-01578: ORACLE data block corrupted (file # 1, block # 12439)
ORA-01110: data file 1: '+DATA/xff/DATAFILE/system.257.1186720165'

2026-05-20T17:46:13.814346+08:00
Errors in file /u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_ora_50880.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 12317)
ORA-01110: data file 1: '+DATA/xff/DATAFILE/system.257.1186720165'
2026-05-20T17:46:13.814407+08:00
Errors in file /u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_ora_50880.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 12317)
ORA-01110: data file 1: '+DATA/xff/DATAFILE/system.257.1186720165'
2026-05-20T17:46:13.814442+08:00
Error 604 happened during db open, shutting down database
Errors in file /u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_ora_50880.trc  (incident=800711):
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 12317)
ORA-01110: data file 1: '+DATA/xff/DATAFILE/system.257.1186720165'

Corrupt block relative dba: 0x00403097 (file 1, block 12439)
Completely zero block found during buffer read

Reread (file 1, block 12439) found same corrupt data (no logical check)
2026-05-20T17:46:14.259044+08:00
Corrupt Block Found
         TIME STAMP (GMT) = 05/20/2026 17:46:13
         CONT = 0, TSN = 0, TSNAME = SYSTEM
         RFN = 1, BLK = 12439, RDBA = 4206743
         OBJN = 18, OBJD = 18, OBJECT = OBJ$, SUBOBJECT = 
         SEGMENT OWNER = SYS, SEGMENT TYPE = Table Segment
Errors in file /u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_gen0_48604.trc  (incident=798852):
ORA-01578: ORACLE data block corrupted (file # 1, block # 12439)
ORA-01110: data file 1: '+DATA/xff/DATAFILE/system.257.1186720165'
2026-05-20T05:46:14.436597-04:00
ALTER SYSTEM SET remote_listener=' xff-scan:11521' SCOPE=MEMORY SID='xff2';
2026-05-20T05:46:14.438492-04:00
ALTER SYSTEM SET listener_networks='' SCOPE=MEMORY SID='xff2';
2026-05-20T17:46:15.486758+08:00
opiodr aborting process unknown ospid (50880) as a result of ORA-603
2026-05-20T17:46:15.498707+08:00
ORA-603 : opitsk aborting process
License high water mark = 423
USER(prelim) (ospid: 50880): terminating the instance due to ORA error 604
2026-05-20T17:46:15.536740+08:00
opiodr aborting process unknown ospid (69547) as a result of ORA-1092
2026-05-20T05:46:16.321597-04:00
ORA-1092 : opitsk aborting process

这里基本上可以看出来是由于在数据库启动过程中递归调用一些sql,但是由于遭遇到坏块导致启动失败,通过dbv检查system数据文件发现256个坏块
dbv-zero


256个连续的全0坏块,怀疑是2M的数据被dd全空覆盖,这样的情况,也就是怀疑是au=2的后面2M被覆盖(ausize为4M),分析system的数据分布情况
au

这里可以确认system的第24个au(从0开始)在14号盘au 2 上面,也就是数据块起始损坏为block:12288-12543(24M*4/8K[有block 0 需要考虑]),对于这种彻底损坏而且比较靠前的system中block,通过人工构造出来这些block的方式进行修复,在自研的Oracle Recovery Tools和obet工具都有该功能.运气不错,通过这个修复之后,直接expdp导出数据没有大问题,比较完美的恢复了这个故障.

Oracle故障第一现场被恢复混乱的数据库恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Oracle故障第一现场被恢复混乱的数据库恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户数据库断电异常之后,由于第三方进行了一系列的恢复,现场比较混乱,无法判断最初情况,通过Oracle Database Recovery Check收集的结果进行初步判断
1. 有三个文件处于丢失状态,并且数据库在故障之后被人强制resetlogs拉过库
file-missing


2. 数据文件头scn不一致,而且相差的日志序列还比较大(该数据库为非归档模式)
seq

3. 该库多次重建ctl(alert日志中也有相关记录)
rectl

现在恢复这个库需要做的几件事情:
1. 由于没有任何原始故障之后的控制文件,需要从服务器上找出来所有故障之时的数据文件,担心被人重建ctl使用了错误的数据文件
2. 对于三个file missing的进行分析,并确认磁盘上是否存在,是否是好的,如果是好的需要和现在的文件一起作为一个整体进行恢复,并打开库
3. 打开数据库过程可能遇到的错误处理

通过obet中近期增加的get_dbinfo功能来解析所有可能的数据文件头(obet官方说明),结合文件头的信息判断,发现磁盘上名称dbf结尾的文件号重复
1

这样的情况下,我们取filesize大,(scn大不一定正确,可能由于被强制resetlogs导致scn比正确的文件大),同时也结合这个收集的信息,确认三个丢失的文件中两个为undotbs1表空间文件,另外一个为112k的数据文件,这里让我学习到了新知识,oracle的数据文件最小可以多少个block(通过试验测试,最小可以16个block,文件大小即为:16+1(block 0)*block_size)

C:\Users\XFF>sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on 星期三 5月 13 22:05:52 2026

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


连接到:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> create tablespace tbs datafile 'e:/tbs01.dbf' size 8k;
create tablespace tbs datafile 'e:/tbs01.dbf' size 8k
*
第 1 行出现错误:
ORA-03214: 指定的文件大小小于所需的最小值


SQL> create tablespace tbs datafile 'e:/tbs01.dbf' size 80k;
create tablespace tbs datafile 'e:/tbs01.dbf' size 80k
*
第 1 行出现错误:
ORA-03214: 指定的文件大小小于所需的最小值


SQL> create tablespace tbs datafile 'e:/tbs01.dbf' size 96k;

表空间已创建。

确认好相关信息之后,然后对于三个file missing状态的文件进行dbv检测确认undotbs01.dbf(file 3)基本上全部损坏(大量全0块),另外两个文件正常
obet_dbv


对于正常的文件通过obet修改相关scn信息
obet_resetlogs_scn

然后重建控制文件(丢弃undotbs01.dbf文件),由于确认undo已经异常,直接设置undo为manual管理方式并屏蔽回滚段,然后屏蔽一致性,强制打开数据库,结果报ORA-600 2662错误
ora-600-2662

使用Patch_SCN工具修改数据库scn(Patch_SCN工具说明)
patch_scn

然后数据库顺利打开,重建新undo,增加temp,删除老undo,导出数据完成本次恢复任务

一次断电引起的Oracle故障恢复-ora-600 2662故障

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:一次断电引起的Oracle故障恢复-ora-600 2662故障

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

接手一个oracle恢复case,由于断电导致oracle数据库异常,现场人员进行了一系列的恢复成功,但是没有成功open库,我接手故障之后尝试做recover 报ORA-16433错误

[oracle@xifenfei.com ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Sun May 10 09:27:51 2026

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> 
SQL> recover database;
ORA-00283: recovery session canceled due to errors
ORA-16433: The database must be opened in read/write mode.

根据经验这种错误一般是由于强制打开库失败导致,回溯oracle alert日志发现类似操作

Fri May 08 18:47:59 2026
ALTER DATABASE OPEN RESETLOGS
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 138986145572
Errors in file /data/oracle/diag/rdbms/xff/xff/trace/xff_ora_37813.trc:
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1: '/data/oracle/oradata/xff/redo01.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Clearing online redo logfile 1 /data/oracle/oradata/xff/redo01.log
Clearing online log 1 of thread 1 sequence number 2389
Errors in file /data/oracle/diag/rdbms/xff/xff/trace/xff_ora_37813.trc:
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1: '/data/oracle/oradata/xff/redo01.log'
………………
Clearing online redo logfile 3 complete
Resetting resetlogs activation ID 677485461 (0x28619b95)
Online log /data/oracle/oradata/xff/redo01.log: Thread 1 Group 1 was previously cleared
Online log /data/oracle/oradata/xff/redo02.log: Thread 1 Group 2 was previously cleared
Online log /data/oracle/oradata/xff/redo03.log: Thread 1 Group 3 was previously cleared
Fri May 08 18:48:17 2026
Setting recovery target incarnation to 4
Fri May 08 18:48:17 2026
Assigning activation ID 677634815 (0x2863e2ff)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: /data/oracle/oradata/xff/redo01.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri May 08 18:48:18 2026
SMON: enabling cache recovery
Errors in file /data/oracle/diag/rdbms/xff/xff/trace/xff_ora_37813.trc  (incident=170321):
ORA-00600: internal error code, arguments: [2662], [32], [1547192107], [32], [1547212241], [12583040], []
Fri May 08 18:48:20 2026
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /data/oracle/diag/rdbms/xff/xff/trace/xff_ora_37813.trc:
ORA-00600: internal error code, arguments: [2662], [32], [1547192107], [32], [1547212241], [12583040]
Errors in file /data/oracle/diag/rdbms/xff/xff/trace/xff_ora_37813.trc:
ORA-00600: internal error code, arguments: [2662], [32], [1547192107], [32], [1547212241], [12583040]
Error 600 happened during db open, shutting down database
USER (ospid: 37813): terminating the instance due to error 600
Instance terminated by USER, pid = 37813
ORA-1092 signalled during: ALTER DATABASE OPEN RESETLOGS...

对于这种情况,先重建控制文件

SQL> @/tmp/rectl.sql

Control file created.

尝试recover database恢复

ALTER DATABASE RECOVER  database  
Media Recovery Start
 started logmerger process
Parallel Media Recovery started with 96 slaves
Sun May 10 09:33:03 2026
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: /data/oracle/oradata/xff/redo02.log
Errors in file /data/oracle/diag/rdbms/xff/xff/trace/xff_pr00_250074.trc  (incident=254397):
ORA-00353: log corruption near block 3 change 138986165586 time 05/08/2026 18:55:37
ORA-00312: online log 2 thread 1: '/data/oracle/oradata/xff/redo02.log'
Incident details in: /data/oracle/diag/rdbms/xff/xff/incident/incdir_254397/xff_pr00_250074_i254397.trc
Sun May 10 09:33:04 2026
Media Recovery failed with error 399
Errors in file /data/oracle/diag/rdbms/xff/xff/trace/xff_pr00_250074.trc:
ORA-00283: recovery session canceled due to errors
ORA-00399: corrupt change description in redo log
ORA-00353: log corruption near block 3 change 138986165586 time 05/08/2026 18:55:37
ORA-00312: online log 2 thread 1: '/data/oracle/oradata/xff/redo02.log'
ORA-10877 signalled during: ALTER DATABASE RECOVER  database  ...
Sun May 10 09:33:04 2026
Sweep [inc][254397]: completed
Errors in file /data/oracle/diag/rdbms/xff/xff/trace/xff_m000_250348.trc  (incident=255173):
ORA-00353: log corruption near block 3 change 138986165586 time 05/08/2026 18:55:37
ORA-00312: online log 2 thread 1: '/data/oracle/oradata/xff/redo02.log'
Errors in file /data/oracle/diag/rdbms/xff/xff/incident/incdir_254397/xff_m000_250348_i254397_a.trc:
ORA-00399: corrupt change description in redo log
ORA-00353: log corruption near block 3 change 138986165586 time 05/08/2026 18:55:37
ORA-00312: online log 2 thread 1: '/data/oracle/oradata/xff/redo02.log'
Errors in file /data/oracle/diag/rdbms/xff/xff/incident/incdir_254397/xff_m000_250348_i254397_a.trc:
ORA-00399: corrupt change description in redo log
ORA-00353: log corruption near block 3 change 138986165586 time 05/08/2026 18:55:37
ORA-00312: online log 2 thread 1: '/data/oracle/oradata/xff/redo02.log'
Sun May 10 09:34:05 2026

由于无法正常recover 操作,数据库需要强制打开,考虑到之前该库open的过程有ORA-600 2662错误,这次在打开之前使用Patch_SCN调整SCN(关于Patch_SCN文章:Patch_SCN for Linux 功能完善

[oracle@xifenfei.com tmp]$ ./Patch_SCN 249756 0x205D38954E
Successfully obtained address automatically: 0x6001ae70
Original Oracle SCN at Address 0x6001ae70: 0x0
Are you sure you want to modify Oracle SCN? (yes/no): yes
New SCN at Address 0x6001ae70: 0x205d38954e
Oracle SCN successfully modified.

然后数据库顺利打开
open1


在expdp导出数据过程中遇到了硬件错误
ora-27072
io-error

为了安全性采用库只读情况下exp进行导出,运气不错所有数据顺利导出,完成本次数据恢复任务
exp

oracleasm createdisk破坏的acfs文件系统恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:oracleasm createdisk破坏的acfs文件系统恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

接到一个朋友请求,客户12.2.0.1的asm被执行了oracleasm createdisk,把之前的asmdisk给重建了
oracleasm-createdisk


根据以前恢复经验,这个故障会把前面1M的数据全部重置
删除asmlib磁盘导致磁盘组故障恢复
分享oracleasm createdisk重新创建asm disk后数据0丢失恢复案例
这个客户有点特殊,他的asm 磁盘组中跑的不是oracle 数据库而是直接跑acfs,然后再里面跑mysql数据库,也就是利用grid实现mysql的底层高可用,acfs实现共享挂载(我的理解一次也只能启动一个节点的mysql),现在asm disk的头被oracleasm createdisk重置之后,导致asm磁盘组无法mount,从而acfs也无法mount.对于这个故障,让现场提供被破坏磁盘使用dd前面100M 发给我进行分析
使用kfed读取asm disk磁盘头信息

H:\TEMP\0423\0423>kfed read data3_100m
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                  1096040823 ; 0x00c: 0x41544177
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
005B78600 00000000 00000000 00000000 41544177  [............wATA]
005B78610 00000000 00000000 00000000 00000000  [................]
005B78620 4C43524F 4B534944 41544144 00000033  [ORCLDISKDATA3...]
005B78630 00000000 00000000 00000000 00000000  [................]
  Repeat 252 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

这里可以发现比较明显的asmlib的标记信息ORCLDISKDATA3,证明该磁盘被oracleasm createdisk重建之后,没有再进行kfed修复或者重建新磁盘组

SUCCESS: CREATE DISKGROUP DATA EXTERNAL REDUNDANCY  DISK '/dev/oracleasm/disks/DATA1' SIZE 1430507M
 DISK '/dev/oracleasm/disks/DATA2' SIZE 1430507M
 DISK '/dev/oracleasm/disks/DATA3' SIZE 1430507M
 ATTRIBUTE 'compatible.asm'='12.2.0.1','compatible.advm'='12.2.0.1','au_size'='4M'

SUCCESS: CREATE DISKGROUP CRS EXTERNAL REDUNDANCY  DISK '/dev/oracleasm/disks/CRS' SIZE 190732M
 ATTRIBUTE 'compatible.asm'='12.2.0.1','compatible.advm'='12.2.0.1','au_size'='4M' /* ASMCA */

通过alert日志中磁盘组的创建语句,确认该磁组是ausize为4M,这样的情况下,asm的磁盘头备份备份和au备份应该都是好的,直接通过winhex来确认
orcldisk


再次通过kfed来确认相关磁盘头信息

H:\TEMP\0423\0423>kfed read data3_100m aus=4096k blkn=1022 aun=1|grep name
kfdhdb.dskname:               DATA_0002 ; 0x028: length=9
kfdhdb.grpname:                    DATA ; 0x048: length=4
kfdhdb.fgname:                DATA_0002 ; 0x068: length=9
kfdhdb.capname:                         ; 0x088: length=0

H:\TEMP\0423\0423>kfed read data3_100m aus=4096k blkn=0 aun=11|grep name
kfdhdb.dskname:               DATA_0002 ; 0x028: length=9
kfdhdb.grpname:                    DATA ; 0x048: length=4
kfdhdb.fgname:                DATA_0002 ; 0x068: length=9
kfdhdb.capname:                         ; 0x088: length=0

基于上述情况,证明磁盘头和au的备份都还在,而且au的备份是4M,完全包含了被createdisk破坏的1M数据,直接吧这个au给还原回去理论上就可以了,但是很不幸,这样处理之后,crs启动过程依旧无法找到表决盘
vote


通过分析crs盘的情况,发现它的偏移量是错误的
voteoffset

通过分析是由于磁盘分区问题导致

磁盘 /dev/sdc:200.0 GB, 199996997632 字节,390619136 个扇区
Units = 扇区 of 1 * 512 = 512 bytes
扇区大小(逻辑/物理):512 字节 / 512 字节
I/O 大小(最小/最佳):512 字节 / 1048576 字节
磁盘标签类型:dos
磁盘标识符:0x00000000

   设备 Boot      Start         End      Blocks   Id  System
/dev/sdc1               1   390619135   195309567+  ee  GPT

磁盘 /dev/sdd:1500.0 GB, 1499996356608 字节,2929680384 个扇区
Units = 扇区 of 1 * 512 = 512 bytes
扇区大小(逻辑/物理):512 字节 / 512 字节
I/O 大小(最小/最佳):512 字节 / 1048576 字节
磁盘标签类型:gpt
Disk identifier: 99F1679E-DC32-4F6A-B85D-D91C87B09775


#         Start          End    Size  Type            Name
 1         2048   2929678335    1.4T  Linux LVM       

处理好分区问题之后,重启crs,一切自动恢复成功
acfs-mysql


至此这个被oracleasm createdisk的case完美恢复,数据0丢失

先offline数据文件,再resetlogs导致恢复复杂的故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:先offline数据文件,再resetlogs导致恢复复杂的故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

本来是一个简单的数据文件被误删除,然后通过底层恢复出来数据文件,再启动库就可以的事情,结果由于对oracle的不了解和自以为是,直接把丢失的文件不存在的情况下,offline文件,然后尝试resetlogs打开库,并且进行了各种尝试,结果使得问题比较麻烦.
故障之后现象
通过分析alert日志大概的主要错误,大概梳理故障情况

1. 启动数据库报control03.ctl丢失

Fri Apr 17 21:53:03 2026
MMNL started with pid=16, OS id=3613 
ORACLE_BASE from environment = /data/oracle
Fri Apr 17 21:53:08 2026
alter database mount
ORA-00210: cannot open the specified control file
ORA-00202: control file: '/data/oracle/oradata/orcl/control03.ctl'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
ORA-00210: cannot open the specified control file
ORA-00202: control file: '/data/oracle/oradata/orcl/control02.ctl'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
ORA-205 signalled during: alter database mount...

如果只是这个文件丢失(这里还没有看到其他数据文件丢失的报错),本身是一个非常简单的故障,直接修改control_files参数即可

2. 结果当时操作的人直接rectl

Fri Apr 17 21:57:01 2026
Successful mount of redo thread 1, with mount id 1758675116
Completed: CREATE CONTROLFILE REUSE DATABASE "ORCL" NORESETLOGS NOARCHIVELOG
    MAXLOGFILES 16
    MAXLOGMEMBERS 3
    MAXDATAFILES 100
    MAXINSTANCES 8
    MAXLOGHISTORY 292
LOGFILE
  GROUP 1 '/data/oracle/oradata/orcl/redo01.log'  SIZE 50M BLOCKSIZE 512,
  GROUP 2 '/data/oracle/oradata/orcl/redo02.log'  SIZE 50M BLOCKSIZE 512,
  GROUP 3 '/data/oracle/oradata/orcl/redo03.log'  SIZE 50M BLOCKSIZE 512
DATAFILE
  '/data/oracle/oradata/orcl/system01.dbf',
  '/data/oracle/oradata/orcl/sysaux01.dbf',
  '/data/oracle/oradata/orcl/undotbs01.dbf',
  '/data/oracle/oradata/orcl/users01.dbf'
CHARACTER SET ZHS16GBK

3.然后启动数据库报错

Fri Apr 17 22:02:43 2026
ALTER DATABASE OPEN
Beginning crash recovery of 1 threads
 parallel recovery started with 3 processes
Started redo scan
Completed redo scan
 read 39020 KB redo, 0 data blocks need recovery
Started redo application at
 Thread 1: logseq 11590, block 2, scn 137806010
Recovery of Online Redo Log: Thread 1 Group 1 Seq 11590 Reading mem 0
  Mem# 0: /data/oracle/oradata/orcl/redo01.log
Completed redo application of 0.00MB
Completed crash recovery at
 Thread 1: logseq 11590, block 78042, scn 137831847
 0 data blocks read, 0 data blocks written, 39020 redo k-bytes read
Fri Apr 17 22:02:44 2026
Thread 1 advanced to log sequence 11591 (thread open)
Thread 1 opened at log sequence 11591
  Current log# 2 seq# 11591 mem# 0: /data/oracle/oradata/orcl/redo02.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Apr 17 22:02:44 2026
SMON: enabling cache recovery
Successfully onlined Undo Tablespace 2.
Dictionary check beginning
Tablespace 'TEMP' #3  found in data dictionary,
but not in the controlfile. Adding to controlfile.
Tablespace 'ERP_XXXX' #6  found in data dictionary,
but not in the controlfile. Adding to controlfile.
Tablespace 'ERP_AAAA' #7  found in data dictionary,
but not in the controlfile. Adding to controlfile.
Tablespace 'ABCD' #8  found in data dictionary,
but not in the controlfile. Adding to controlfile.
Tablespace 'ERP_BBBB' #9  found in data dictionary,
but not in the controlfile. Adding to controlfile.
Tablespace 'ERP_XXD' #10  found in data dictionary,
but not in the controlfile. Adding to controlfile.
Tablespace 'ERP_12SF' #11  found in data dictionary,
but not in the controlfile. Adding to controlfile.
Tablespace 'XXX14' #12  found in data dictionary,
but not in the controlfile. Adding to controlfile.
Tablespace 'P_ZY' #13 found in data dictionary,
but not in the controlfile. Adding to controlfile.
File #5 found in data dictionary but not in controlfile.
Creating OFFLINE file 'MISSING00005' in the controlfile.
File #6 found in data dictionary but not in controlfile.
Creating OFFLINE file 'MISSING00006' in the controlfile.
File #7 found in data dictionary but not in controlfile.
Creating OFFLINE file 'MISSING00007' in the controlfile.
File #8 found in data dictionary but not in controlfile.
Creating OFFLINE file 'MISSING00008' in the controlfile.
File #9 found in data dictionary but not in controlfile.
Creating OFFLINE file 'MISSING00009' in the controlfile.
File #10  found in data dictionary but not in controlfile.
Creating OFFLINE file 'MISSING00010' in the controlfile.
File #11  found in data dictionary but not in controlfile.
Creating OFFLINE file 'MISSING00011' in the controlfile.
File #12 found in data dictionary but not in controlfile.
Creating OFFLINE file 'MISSING00012' in the controlfile.

4.然后尝试resetlogs操作

Sat Apr 18 05:55:10 2026
ALTER DATABASE   MOUNT
Successful mount of redo thread 1, with mount id 1758652862
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: ALTER DATABASE   MOUNT
Sat Apr 18 05:55:14 2026
ALTER DATABASE OPEN RESETLOGS
ORA-1139 signalled during: ALTER DATABASE OPEN RESETLOGS...
Sat Apr 18 05:56:29 2026
Starting ORACLE instance (normal)
ALTER DATABASE RECOVER  DATABASE UNTIL CANCEL  
Media Recovery Start
 started logmerger process
Parallel Media Recovery started with 4 slaves
Sat Apr 18 05:56:29 2026
Warning: Datafile 5 (/data/oracle/orcl/xxxx.dbf) is offline during full database recovery and will not be recovered
Warning: Datafile 6 (/data/oracle/orcl/xxxx.dbf) is offline during full database recovery and will not be recovered
Warning: Datafile 7 (/data/oracle/orcl/xxxx.dbf) is offline during full database recovery and will not be recovered
Warning: Datafile 8 (/data/oracle/orcl/xxxx.dbf) is offline during full database recovery and will not be recovered
Media Recovery Not Required
Completed: ALTER DATABASE RECOVER  DATABASE UNTIL CANCEL  
Sat Apr 18 05:57:45 2026
ALTER DATABASE OPEN RESETLOGS
RESETLOGS after complete recovery through change 137865786
Resetting resetlogs activation ID 1645665187 (0x6216dba3)
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_2549.trc:
ORA-00367: checksum error in log file header
ORA-00322: log 1 of thread 1 is not current copy
ORA-00312: online log 1 thread 1: '/data/oracle/oradata/orcl/redo01.log'
Sat Apr 18 05:57:45 2026
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_m000_2554.trc:
ORA-00316: log 1 of thread 1, type 0 in header is not log file
ORA-00312: online log 1 thread 1: '/data/oracle/oradata/orcl/redo01.log'
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_2549.trc:
ORA-00367: checksum error in log file header
ORA-00322: log 2 of thread 1 is not current copy
ORA-00312: online log 2 thread 1: '/data/oracle/oradata/orcl/redo02.log'
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_m000_2554.trc:
ORA-00316: log 2 of thread 1, type 0 in header is not log file
ORA-00312: online log 2 thread 1: '/data/oracle/oradata/orcl/redo02.log'
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_m000_2554.trc:
ORA-00322: log 3 of thread 1 is not current copy
ORA-00312: online log 3 thread 1: '/data/oracle/oradata/orcl/redo03.log'
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_2549.trc:
ORA-00367: checksum error in log file header
ORA-00322: log 3 of thread 1 is not current copy
ORA-00312: online log 3 thread 1: '/data/oracle/oradata/orcl/redo03.log'
Sat Apr 18 05:57:46 2026
Setting recovery target incarnation to 2
Sat Apr 18 05:57:46 2026
Assigning activation ID 1758652862 (0x68d2e9be)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: /data/oracle/oradata/orcl/redo01.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sat Apr 18 05:57:46 2026
SMON: enabling cache recovery
Successfully onlined Undo Tablespace 2.
Dictionary check beginning
File #5 is offline, but is part of an online tablespace.
data file 5: '/data/oracle/oradata/orcl/xxxx.dbf'
File #6 is offline, but is part of an online tablespace.
data file 6: '/data/oracle/oradata/orcl/xxxx.dbf'
File #7 is offline, but is part of an online tablespace.
data file 7: '/data/oracle/oradata/orcl/xxxx.dbf'
File #8 is offline, but is part of an online tablespace.
data file 8: '/data/oracle/oradata/orcl/xxxx.dbf'

到这一步悲剧基本上已经发生,犯了一个在oracle恢复里面比较忌讳的事情,有数据文件offline的情况下,执行resetlogs操作,导致部分数据文件的resetlogs信息没有被及时更新,导致一套库里面,被offline的这个部分数据文件resetlogs信息小于其他online的数据文件的。

5. 后续其他操作各种报错

Completed: ALTER DATABASE   MOUNT
Sun Apr 19 08:13:02 2026
ALTER DATABASE DATAFILE 5 OFFLINE DROP
Sun Apr 19 08:13:02 2026
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_dbw0_9212.trc  (incident=67094):
ORA-00600: internal error code, arguments: [3600], [5], [14], [], [], [], [], [], [], [], [], []
Incident details in: /data/oracle/diag/rdbms/orcl/orcl/incident/incdir_67094/orcl_dbw0_9212_i67094.trc
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_dbw0_9212.trc:
ORA-00600: internal error code, arguments: [3600], [5], [14], [], [], [], [], [], [], [], [], []
DBW0 (ospid: 9212): terminating the instance due to error 471
Tue Apr 21 22:31:23 2026
Assigning activation ID 1758985759 (0x68d7fe1f)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: /data/oracle/oradata/orcl/redo01.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Tue Apr 21 22:31:23 2026
SMON: enabling cache recovery
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_4951.trc  (incident=87950):
ORA-00600: internal error code, arguments: [2662], [0], [137890858], [0], [137891091], [12583056], []
Incident details in: /data/oracle/diag/rdbms/orcl/orcl/incident/incdir_87950/orcl_ora_4951_i87950.trc
Errors in file /data/oracle/diag/rdbms/orcl/orcl/incident/incdir_87950/orcl_ora_4951_i87950.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/data/oracle/oradata/orcl/redo03.log'
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/data/oracle/oradata/orcl/redo02.log'
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/data/oracle/oradata/orcl/redo02.log'
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/data/oracle/oradata/orcl/redo03.log'
ORA-00600: internal error code, arguments: [2662], [0], [137890858], [0], [137891091], [12583056], []
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_4951.trc:
ORA-00600: internal error code, arguments: [2662], [0], [137890858], [0], [137891091], [12583056], []
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_4951.trc:
ORA-00600: internal error code, arguments: [2662], [0], [137890858], [0], [137891091], [12583056], []
Error 600 happened during db open, shutting down database
USER (ospid: 4951): terminating the instance due to error 600

接手故障之后分析
使用obet工具直接快速的检查坏块情况和文件头信息,关于obet的介绍参考:
obet实现对数据文件坏块检测功能
Oracle数据块编辑工具( Oracle Block Editor Tool)-obet
dbv


dbv检测没任何坏块,比较好好的消息
reset

但是检测数据文件头信息,发现有三种类型的resetlogs的信息,证明进行了多次部分文件的情况下进行了resetlogs操作

恢复处理
1. 使用Oracle Recovery Tools工具修改 resetlogs 信息
由于大量reseltogs 信息不一致,先使用Oracle Recovery Tools修改scn等相关信息Oracle Recovery Tools恢复案例总结—202505(注意选择resetlogs scn最大的文件为参照文件)
orarec

2. 重建ctl,打开库
open-db

比较幸运直接打开成功(本来也就应该成功,因为客户本身之前丢失主要业务文件的时候多次打开过库)


3. 然后增加temp文件,并expdp导出数据,完成本次恢复工作

exp dmp导入报IMP-00098: INTERNAL ERROR: impgst2故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:exp dmp导入报IMP-00098: INTERNAL ERROR: impgst2故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

接到客户反馈,exp导出来的dmp无法导入到库中,而且原库已经被删除并且创建了新库,希望我们协助把dmp里面几个核心表给恢复出来
故障现象
imp导入dmp文件报 IMP-00098: INTERNAL ERROR: impgst2错误
imp


imp导入操作直接终止,无法恢复需要的数据
故障原因
分析导出日志,发现”ORA-24801: 在 OCI lob 函数中非法的参数值” 错误
ORA-24801

查询mos发现EXP-56 ORA-24801 During Export KB83982文章
nls

进一步和客户确认,他们确实修改过该库的字符集。基本上可以确认是由于修改字符集导致lob数据损坏,然后exp导出dmp中这些损坏的lob破坏了dmp的完整性,使得dmp无法正常导入
故障解决
对于这样的情况,由于损坏的lob比较靠前,而且不是客户业务用户中数据.处理方法有两种:
1. 直接使用winhex把损坏的lob表从dmp中剔除掉,然后导入数据
2. 直接使用工具从dmp中提取需要的表数据,以前处理过类似文章:
解决imp导入数据报IMP-00098错误
IMP-00098: INTERNAL ERROR: impgst2
exp dmp文件损坏(坏块/corruption)恢复—跳过dmp坏块

一次运气好的ORA-600 kcratr_nab_less_than_odr故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:一次运气好的ORA-600 kcratr_nab_less_than_odr故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户由于虚拟化环境空间不足,导致数据库异常,启动报ORA-600 kcratr_nab_less_than_odr错误

Mon Apr 06 00:13:16 2026
Completed: alter database mount exclusive
alter database open
Beginning crash recovery of 1 threads
 parallel recovery started with 3 processes
Started redo scan
Mon Apr 06 00:13:26 2026
Completed redo scan
 read 5480 KB redo, 459 data blocks need recovery
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_2324.trc  (incident=418959):
ORA-00600: ??????, ??: [kcratr_nab_less_than_odr], [1], [53856], [40105], [43042], [], [], [], [], [], [], []
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_418959\orcl_ora_2324_i418959.trc
Aborting crash recovery due to error 600
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_2324.trc:
ORA-00600: ??????, ??: [kcratr_nab_less_than_odr], [1], [53856], [40105], [43042], [], [], [], [], [], [], []
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_2324.trc:
ORA-00600: ??????, ??: [kcratr_nab_less_than_odr], [1], [53856], [40105], [43042], [], [], [], [], [], [], []
ORA-600 signalled during: alter database open...
Mon Apr 06 00:13:33 2026
Trace dumping is performing id=[cdmp_20260406001333]

由于客户自己不熟悉,故障之后,没有再次继续操作,一直保留着现场。这个故障一般是由于ctl写丢失导致,一般首先选择rectl
11111


然后尝试open库,运气不错,直接打开成功
22222

这样就完成了本次恢复工作,数据库一切正常,运气不错。对于ORA-600 kcratr_nab_less_than_odr错误大部分时候,可以这样简单的恢复,但是也遇到过rectl之后,继续报ORA-600等错误的情况:
ORA-600 kcratr_nab_less_than_odr和ORA-600 4194故障处理
ORA-600 kcratr_nab_less_than_odr和ORA-600 2662故障处理
ORA-600 kcratr_nab_less_than_odr和ORA-600 4193故障处理

obet一键恢复offline数据文件

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:obet一键恢复offline数据文件

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户有一个数据库由于异常断电之后无法正常启动,自行尝试恢复之后但是没有open成功,让可以通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)脚本收集信息进行评估,发现两个问题:
1. 根据查询信息确认users01.dbf(file# 4)文件处于offline状态,而且checkpoint scn明显小于其他文件
offline


2. 通过分析alert日志确认客户在尝试offline file 4 之后open数据库报ORA-600 4194错误,数据库没有open成功

Mon Mar 30 06:26:30 2026
Starting ORACLE instance (normal)
Mon Mar 30 06:27:00 2026
ALTER DATABASE DATAFILE 4 OFFLINE DROP
Completed: ALTER DATABASE DATAFILE 4 OFFLINE DROP
Mon Mar 30 06:27:26 2026
ALTER DATABASE OPEN
Beginning crash recovery of 1 threads
 parallel recovery started with 7 processes
Started redo scan
Completed redo scan
 read 88 KB redo, 103 data blocks need recovery
Started redo application at
 Thread 1: logseq 3, block 3
Recovery of Online Redo Log: Thread 1 Group 3 Seq 3 Reading mem 0
  Mem# 0: /home/oracle/app/oracle/oradata/orcl/redo03.log
Completed redo application of 0.07MB
Completed crash recovery at
 Thread 1: logseq 3, block 180, scn 415466134
 103 data blocks read, 103 data blocks written, 88 redo k-bytes read
Mon Mar 30 06:27:28 2026
Thread 1 advanced to log sequence 4 (thread open)
Thread 1 opened at log sequence 4
  Current log# 1 seq# 4 mem# 0: /home/oracle/app/oracle/oradata/orcl/redo01.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Mon Mar 30 06:27:28 2026
SMON: enabling cache recovery
Successfully onlined Undo Tablespace 2.
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
No Resource Manager plan active
Errors in file /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_17628.trc(incident=186839):
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /home/oracle/app/oracle/diag/rdbms/orcl/orcl/orcl_smon_17628_i186839.trc
Exception [type: SIGBUS, Non-existent physical address] [ADDR:0x6C0C4B62] [PC:0x2297750, kgegpa()+40]
Exception [type: SIGBUS, Non-existent physical address] [ADDR:0x6C0C4B62] [PC:0x229597B, kgebse()+279]
Mon Mar 30 06:27:28 2026
PMON (ospid: 17604): terminating the instance due to error 397
Errors in file /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_17628.trc:
ORA-00328: archived log ends at change 415466135, need later change 415466136
ORA-00334: archived log: '/home/oracle/app/oracle/oradata/orcl/redo03.log'
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Instance terminated by PMON, pid = 17604

接手这个故障之后,由于数据库是非归档模式,而且已经被屏蔽一致性强制打开过(open过程没成功),redo已经被clear过,因此基于这样的情况,直接上obet工具(Oracle Block Editor Tool修改file# 4的文件头状态
obet修复之前文件状态

STATUS	CHECKPOINT_TIME 			 FUZ CHECKPOINT_CHANGE# 	 ROW_NUM
------- ---------------------------------------- --- ------------------ ----------------
OFFLINE 2026-03-30 06:24:41			 YES	      415446014 	       1
ONLINE	2026-03-30 06:28:26			 YES	      415486184 	       7

obet修改文件头操作

OBET> open listfile.txt
Loaded 8 files from  datafile list 'listfile.txt'.

OBET> info

Loaded files (2 total):
----------------------------------------
Number  Path
----------------------------------------
     1  /home/oracle/app/oracle/oradata/orcl/system01.dbf
     4  /opt/oradata/orcl/users01.dbf
----------------------------------------

OBET> set mode edit
mode set to: edit


OBET> set file 4
filename set to: /opt/oradata/orcl/users01.dbf (file#4)

OBET> backup block 1
Created backup directory: backup_blk
Successfully backed up block 1 from current file to /tmp/backup_blk/users01.dbf_1.20260331092333


OBET> copy chkscn file 1 to file 4

Confirm Modify chkscn:
Source: file#1 (/home/oracle/app/oracle/oradata/orcl/system01.dbf)
Target: file#4 (/opt/oradata/orcl/users01.dbf)
Proceed? (Y/YES to confirm): y
Successfully copied checkpoint SCN information from file#1 to file#4.

OBET> exit
Exiting OBET.

再次查询文件头scn信息

STATUS	CHECKPOINT_TIME 			 FUZ CHECKPOINT_CHANGE# 	 ROW_NUM
------- ---------------------------------------- --- ------------------ ----------------
OFFLINE 2026-03-30 06:28:26			 NO	      415486184 	       1
ONLINE	2026-03-30 06:28:26			 YES	      415486184 	       7

尝试online文件,并open库成功

SQL> recover datafile 4;
Media recovery complete.
SQL> alter database datafile 4 online;

Database altered.

SQL> alter database open;

Database altered.

然后expdp导出数据成功,基本完成本次数据恢复工作
dump


Oracle典型故障:The controlfile header block returned by the OS has a sequence number that is too old

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Oracle典型故障:The controlfile header block returned by the OS has a sequence number that is too old

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

这个是一例子客户数据库运行过程中突然报:The controlfile header block returned by the OS has a sequence number that is too old.然后数据库无法正常启动的数据库恢复case
以前处理过一些类似case:Controlfile sequence number in file header is different from the one in memory
故障现象
alert日志中报The controlfile header block returned by the OS has a sequence number that is too old.错误,然后数据库crash


Wed Mar 18 12:00:44 2026
********************* ATTENTION: ******************** 
 The controlfile header block returned by the OS
 has a sequence number that is too old. 
 The controlfile might be corrupted.
 PLEASE DO NOT ATTEMPT TO START UP THE INSTANCE 
 without following the steps below.
 RE-STARTING THE INSTANCE CAN CAUSE SERIOUS DAMAGE 
 TO THE DATABASE, if the controlfile is truly corrupted.
 In order to re-start the instance safely, 
 please do the following:
 (1) Save all copies of the controlfile for later 
     analysis and contact your OS vendor and Oracle support.
 (2) Mount the instance and issue: 
     ALTER DATABASE BACKUP CONTROLFILE TO TRACE;
 (3) Unmount the instance. 
 (4) Use the script in the trace file to
     RE-CREATE THE CONTROLFILE and open the database. 
*****************************************************
USER (ospid: 15912): terminating the instance

这个错误比较明显是由于控制文件的sequence number比较老导致,出现这种问题,一般是由于io过慢,或者底层不稳定(比如虚拟化平台,文件系统异常,硬件不稳定等)导致(官方参考文档:The controlfile header block returned by the OS has a sequence number that is too old.)

尝试重启数据库报ORA-01207错误

Wed Mar 18 18:51:54 2026
alter database mount exclusive
Successful mount of redo thread 1, with mount id 1534819594
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: alter database mount exclusive
alter database open
Errors in file e:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_2992.trc:
ORA-01122: ????? 18 ????
ORA-01110: ???? 18: 'E:\APP\ADMINISTRATOR\ORADATA\ORCL\XIFENFEI.DBF'
ORA-01207: ????????? - ??????
ORA-1122 signalled during: alter database open...
Wed Mar 18 18:52:01 2026
Checker run found 1 new persistent data failures

该错误的官方解释

[oracle@xifenfei.com ~]$ oerr ora 1207
01207, 00000, "file is more recent than control file - old control file"
// *Cause:  The control file change sequence number in the data file is 
//         greater than the number in the control file. This implies that
//         the wrong control file is being used. Note that repeatedly causing
//         this error can make it stop happening without correcting the real
//         problem. Every attempt to open the database will advance the
//         control file change sequence number until it is great enough.
// *Action: Use the current control file or do backup control file recovery to 
//         make the control file current. Be sure to follow all restrictions 
//         on doing a backup control file recovery.

由于数据文件比控制文件更新,导致该问题,通过查询v$datafile_header发现更多类似异常文件(可以使用Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)收集信息)
111


alert日志文件中也有明显的写错日志信息
222

这些二进制内容和监听日志内容被写入到了alert日志中,证明当时文件系统或者操作系统甚至更底层出现了异常,这个客户是运行在云平台上的,具体运营需要平台厂商才能分析

故障处理
重建控制文件并进行recover

SQL> startup nomount pfile='e:/pfile.txt' ;
ORACLE 例程已经启动。

Total System Global Area 2.9931E+10 bytes
Fixed Size                  2190296 bytes
Variable Size            1946158120 bytes
Database Buffers         2.7917E+10 bytes
Redo Buffers               64905216 bytes
SQL> @rectl.sql

控制文件已创建。
SQL> recover database;

完成介质恢复。

尝试启动数据库,报ora-600 2662错误

SQL> alter database open;

alter database open
*
第 1 行出现错误:

ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [2662], [1], [45773288], [1], [45777527], [301990016]
ORA-00600: internal error code, arguments: [2662], [1], [45773287], [1], [45777527], [301990016]
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [2662], [1], [45773285], [1], [45777527], [301990016]
进程 ID: 2708
会话 ID: 148 序列号: 5

数据库启动报ORA-600 2662错误,这个是典型的文件头scn过小的问题,通过自研小工具Patch_SCN可以快速解决,以前类似文章:
Patch SCN一键解决ORA-600 2662故障
Patch SCN工具一键恢复ORA-600 kcbzib_kcrsds_1
ORA-600 kcratr_nab_less_than_odr和ORA-600 2662故障处理
333


修改scn之后,数据库顺利打开

SQL> startup mount pfile='e:/pfile.txt';
ORACLE 例程已经启动。

Total System Global Area 2.9931E+10 bytes
Fixed Size                  2190296 bytes
Variable Size            1946158120 bytes
Database Buffers         2.7917E+10 bytes
Redo Buffers               64905216 bytes
数据库装载完毕。
SQL> alter database open;

alter database open
*
第 1 行出现错误:
ORA-01113: 文件 1 需要介质恢复
ORA-01110: 数据文件 1: 'E:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM01.DBF'
SQL> recover database;

完成介质恢复。
SQL> alter database open;

数据库已更改。

通过alert日志回顾其他dba oracle异常恢复故障处理以及后续open数据库操作

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:通过alert日志回顾其他dba oracle异常恢复故障处理以及后续open数据库操作

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户有一个数据库故障,是其他工程师进行恢复操作,最后搞不定通过朋友介绍找到我的.我通过分析alert日志,大概追述故障经过
1. 数据库断电之后启动报ORA-01172 ORA-01151错误,直接启动数据库失败,从报错看是由于数据库在open过程中前滚redo日志异常导致

Thu Feb 26 06:48:35 2026
alter database open
Beginning crash recovery of 1 threads
 parallel recovery started with 23 processes
Started redo scan
Completed redo scan
 read 73194 KB redo, 37226 data blocks need recovery
Thu Feb 26 06:48:49 2026
Started redo application at
 Thread 1: logseq 869366, block 3
Recovery of Online Redo Log: Thread 1 Group 2 Seq 869366 Reading mem 0
  Mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO02.LOG
Thu Feb 26 06:48:50 2026
RECOVERY OF THREAD 1 STUCK AT BLOCK 16938 OF FILE 3
Slave exiting with ORA-1172 exception
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_p016_4672.trc:
ORA-01172: recovery of thread 1 stuck at block 16938 of file 3
ORA-01151: use media recovery to recover block, restore backup if needed
Aborting crash recovery due to slave death, attempting serial crash recovery
Beginning crash recovery of 1 threads
Started redo scan
Thu Feb 26 06:49:00 2026
Completed redo scan
 read 73194 KB redo, 37226 data blocks need recovery
Started redo application at
 Thread 1: logseq 869366, block 3
Thu Feb 26 06:49:10 2026
Recovery of Online Redo Log: Thread 1 Group 2 Seq 869366 Reading mem 0
  Mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO02.LOG
RECOVERY OF THREAD 1 STUCK AT BLOCK 16938 OF FILE 3
Aborting crash recovery due to error 1172
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_9260.trc:
ORA-01172: recovery of thread 1 stuck at block 16938 of file 3
ORA-01151: use media recovery to recover block, restore backup if needed
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_9260.trc:
ORA-01172: recovery of thread 1 stuck at block 16938 of file 3
ORA-01151: use media recovery to recover block, restore backup if needed
ORA-1172 signalled during: alter database open...

2. 尝试recover database报ORA-600 3020、ORA-600 17147、ORA-600 17114、ORA-600 17182等错误,这个报错比较明确是由于redo的block信息和datafile的block不一致,导致实例recover database失败

Thu Feb 26 06:50:54 2026
ALTER DATABASE RECOVER  database  
Media Recovery Start
 started logmerger process
Parallel Media Recovery started with 24 slaves
Thu Feb 26 06:50:57 2026
Recovery of Online Redo Log: Thread 1 Group 2 Seq 869366 Reading mem 0
  Mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO02.LOG
Thu Feb 26 06:50:59 2026
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_pr0i_9232.trc  (incident=79538):
ORA-00600: internal error code, arguments: [3020], [3], [16934], [12599846], [], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 3, block# 16934, file offset is 138723328 bytes)
ORA-10564: tablespace UNDOTBS1
ORA-01110: data file 3: 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\UNDOTBS01.DBF'
ORA-10560: block type 'KTU UNDO BLOCK'
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_79538\orcl_pr0i_9232_i79538.trc
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_pr0i_9232.trc  (incident=79539):
ORA-00600: internal error code, arguments: [17114], [0x0381C4C60], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [17182], [0x0381C6948], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [17147], [0x0381C4C60], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [3020], [3], [16934], [12599846], [], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 3, block# 16934, file offset is 138723328 bytes)
ORA-10564: tablespace UNDOTBS1
ORA-01110: data file 3: 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\UNDOTBS01.DBF'
ORA-10560: block type 'KTU UNDO BLOCK'

3. 使用隐含参数尝试强制拉库,报ORA-600 2662错误,导致强制拉库没有成功,这个错误相对比较简单,一般修改数据库scn即可

Thu Feb 26 07:04:39 2026
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 16794964253372
Resetting resetlogs activation ID 1548038913 (0x5c453301)
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_8712.trc:
ORA-00367: checksum error in log file header
ORA-00322: log 1 of thread 1 is not current copy
ORA-00312: online log 1 thread 1: 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO01.LOG'
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_8712.trc:
ORA-00367: checksum error in log file header
ORA-00322: log 2 of thread 1 is not current copy
ORA-00312: online log 2 thread 1: 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO02.LOG'
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_8712.trc:
ORA-00367: checksum error in log file header
ORA-00322: log 3 of thread 1 is not current copy
ORA-00312: online log 3 thread 1: 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO03.LOG'
Thu Feb 26 07:04:49 2026
Setting recovery target incarnation to 3
Thu Feb 26 07:04:50 2026
Assigning activation ID 1753992092 (0x688bcb9c)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO01.LOG
Successful open of redo thread 1
Thu Feb 26 07:04:50 2026
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu Feb 26 07:04:50 2026
SMON: enabling cache recovery
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_8712.trc  (incident=82031):
ORA-00600: internal error code, arguments: [2662], [3910], [1642126020], [3910], [1642126047], [4194432]
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_8712.trc:
ORA-00600: internal error code, arguments: [2662], [3910], [1642126020], [3910], [1642126047], [4194432]
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_8712.trc:
ORA-00600: internal error code, arguments: [2662], [3910], [1642126020], [3910], [1642126047], [4194432]
Error 600 happened during db open, shutting down database
USER (ospid: 8712): terminating the instance due to error 600
Thu Feb 26 07:05:00 2026
Instance terminated by USER, pid = 8712
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (8712) as a result of ORA-1092

4. 尝试重新open库,数据库open成功但是报ORA-600 kturbleurec1、ORA-600 kcbgtcr_13错误,数据库运行一会就直接crash,这个错误一般是由于undo异常导致

Thu Feb 26 07:09:10 2026
alter database open
Beginning crash recovery of 1 threads
 parallel recovery started with 23 processes
Started redo scan
Completed redo scan
 read 0 KB redo, 0 data blocks need recovery
Started redo application at
 Thread 1: logseq 1, block 3, scn 16794964253378
Recovery of Online Redo Log: Thread 1 Group 1 Seq 1 Reading mem 0
  Mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO01.LOG
Completed redo application of 0.00MB
Completed crash recovery at
 Thread 1: logseq 1, block 3, scn 16794964273379
 0 data blocks read, 0 data blocks written, 0 redo k-bytes read
Thu Feb 26 07:09:14 2026
Thread 1 advanced to log sequence 2 (thread open)
Thread 1 opened at log sequence 2
  Current log# 2 seq# 2 mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO02.LOG
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu Feb 26 07:09:14 2026
SMON: enabling cache recovery
Successfully onlined Undo Tablespace 2.
Dictionary check beginning
Dictionary check complete
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
No Resource Manager plan active
Thu Feb 26 07:09:21 2026
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_p002_10096.trc  (incident=84847):
ORA-00600: internal error code, arguments: [kturbleurec1], [], [], [], [], [], [], [], [], [], [], []
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_84847\orcl_p002_10096_i84847.trc
Thu Feb 26 07:09:21 2026
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_4600.trc  (incident=84831):
ORA-00600: internal error code, arguments: [kcbgtcr_13], [], [], [], [], [], [], [], [], [], [], []
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_84831\orcl_ora_4600_i84831.trc
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_4600.trc  (incident=84832):
ORA-00600: internal error code, arguments: [kcbgtcr_13], [], [], [], [], [], [], [], [], [], [], []
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_84832\orcl_ora_4600_i84832.trc

客户那边还做了各种恢复尝试,最终依旧无法正常open库,让我这边进行恢复支持.由于客户库不大,而且可以提供数据进行恢复,我让客户发生我数据之后,在本地电脑上进行恢复,下载文件之后,重命名相关路径然后尝试open库

SQL> recover database;
ORA-00283: 恢复会话因错误而取消
ORA-16433: 必须以读/写模式打开数据库。

重建控制文件

C:\Users\XFF>sqlplus / as sysdba

SQL*Plus: Release 11.2.0.1.0 Production on 星期五 2月 27 14:36:57 2026

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

已连接到空闲例程。

SQL> startup nomount pfile='d:/pfile.txt'
ORACLE 例程已经启动。

Total System Global Area 4275781632 bytes
Fixed Size                  2182592 bytes
Variable Size             973079104 bytes
Database Buffers         3288334336 bytes
Redo Buffers               12185600 bytes
SQL> @rectl.sql

控制文件已创建。

SQL> recover database;
完成介质恢复。

尝试open库报ORA-600 2663错误

SQL> alter database open;
alter database open
*
第 1 行出现错误:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [2663], [3910], [1642323772],
[3910], [1642327019], [], [], [], [], [], [], []
进程 ID: 27296
会话 ID: 14 序列号: 3

使用Patch_SCN工具修改数据库scn
ora-600-2663-patch_scn


再次尝试open数据库

SQL> recover database;
完成介质恢复。
SQL> alter database open ;

数据库已更改。

alert日志报ORA-600 6856错误

Fri Feb 27 14:40:03 2026
QMNC started with pid=62, OS id=18360 
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
Completed: alter database open 
Fri Feb 27 14:40:03 2026
Errors in file c:\app\xff\diag\rdbms\orcl\orcl\trace\orcl_p001_21540.trc  (incident=15787):
ORA-00600: 内部错误代码, 参数: [6856], [0], [479], [], [], [], [], [], [], [], [], []
Incident details in: c:\app\xff\diag\rdbms\orcl\orcl\incident\incdir_15787\orcl_p001_21540_i15787.trc
Fri Feb 27 14:40:04 2026
Starting background process CJQ0
Fri Feb 27 14:40:04 2026
CJQ0 started with pid=63, OS id=15592 
Doing block recovery for file 8 block 1541549
Resuming block recovery (PMON) for file 8 block 1541549
Block recovery from logseq 3, block 108 to scn 16794965431265
Recovery of Online Redo Log: Thread 1 Group 3 Seq 3 Reading mem 0
  Mem# 0: H:\BAIDUNETDISK\20260227\REDO03.LOG
Block recovery completed at rba 3.16415.16, scn 3910.1643303906
Fri Feb 27 14:40:04 2026
Trace dumping is performing id=[cdmp_20260227144004]
SMON: ignoring slave err,downgrading to serial rollback
Errors in file c:\app\xff\diag\rdbms\orcl\orcl\trace\orcl_smon_21696.trc  (incident=15723):
ORA-00600: 内部错误代码, 参数: [6856], [0], [479], [], [], [], [], [], [], [], [], []
Incident details in: c:\app\xff\diag\rdbms\orcl\orcl\incident\incdir_15723\orcl_smon_21696_i15723.trc
………………
ORACLE Instance orcl (pid = 15) - Error 607 encountered while recovering transaction (3, 10) on object 81310.
Errors in file c:\app\xff\diag\rdbms\orcl\orcl\trace\orcl_smon_21696.trc:
ORA-00607: 当更改数据块时出现内部错误
ORA-00600: 内部错误代码, 参数: [6856], [0], [479], [], [], [], [], [], [], [], [], []
Process debug not enabled via parameter _debug_enable
Trace dumping is performing id=[cdmp_20260227144011]
PMON (ospid: 23760): terminating the instance due to error 474

该错误是undo异常引起,屏蔽掉异常undo之后,正常open,并顺利导出所有数据,完成本次恢复任务
dmp