ntfs MFT损坏(ntfs文件系统故障)导致oracle异常恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ntfs MFT损坏(ntfs文件系统故障)导致oracle异常恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户虚拟化环境,由于断电,启动数据库报ORA-01157错误,通过操作系统层面查看,发现文件是存在的,但是dbv检测报不可访问
ora-01157


感觉是文件系统损坏了,尝试把该文件拷贝到其他磁盘
221509

查看操作系统事件,确认是ntfs文件系统的MFT损坏
mft

基于这种情况,通过文件系统恢复工具进行恢复该文件尝试,提示恢复文件大小和实际元数据中记录大小不一致
214712

通过对比实际恢复大小和文件本身大小,发现7811899392-7791460352,几乎等于20M大小(也就是说恢复出来的数据文件少了20M),通过分析数据库alert日志,确认该系统在前端时间刚好扩展了20M(增加数据文件之时指定了每次扩展20m)

2023-08-11T11:29:21.397236+08:00
ALTER TABLESPACE "HSHIS" ADD DATAFILE
'D:\APP\ADMINISTRATOR\ORADATA\HIS\HSHIS01.DBF' SIZE 10M AUTOEXTEND ON NEXT 20M MAXSIZE 8001M
Completed: ALTER TABLESPACE "HSHIS" ADD DATAFILE
'D:\APP\ADMINISTRATOR\ORADATA\HIS\HSHIS01.DBF' SIZE 10M AUTOEXTEND ON NEXT 20M MAXSIZE 8001M

2024-10-09T00:18:31.058537+08:00
Resize operation completed for file# 66, old size 7608320K, new size 7628800K

通过对该文件底层block分析,确认最终丢失block就是最后20M(直接的数据文件的block的rdba均正确),对于这种故障,通过填补数据文件尾部,欺骗数据库完成该文件的恢复(最后20M中如果写入了业务数据,可能会丢失),做好该文件修复工作之后,尝试打开数据库,结果很不乐观,redo也损坏
recover-error


屏蔽一致性,强制打开库成功

2024-10-18T04:24:43.911107+08:00
ALTER DATABASE RECOVER    CANCEL  
2024-10-18T04:24:47.098637+08:00
Errors in file E:\TRACE\diag\rdbms\his\his\trace\his_pr00_2608.trc:
ORA-01547: 警告: RECOVER 成功但 OPEN RESETLOGS 将出现如下错误
ORA-01194: 文件 1 需要更多的恢复来保持一致性
ORA-01110: 数据文件 1: 'E:\ORADATA\SYSTEM01.DBF'
2024-10-18T04:24:47.114278+08:00
ORA-1547 signalled during: ALTER DATABASE RECOVER    CANCEL  ...
ALTER DATABASE RECOVER CANCEL 
ORA-1112 signalled during: ALTER DATABASE RECOVER CANCEL ...
2024-10-18T04:25:03.989398+08:00
alter database open resetlogs
2024-10-18T04:25:05.598781+08:00
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 2666786639 time 
Resetting resetlogs activation ID 3659241623 (0xda1b9897)
2024-10-18T04:25:12.380089+08:00
Setting recovery target incarnation to 3
2024-10-18T04:25:15.052071+08:00
Ping without log force is disabled:
  instance mounted in exclusive mode.
Endian type of dictionary set to little
2024-10-18T04:25:15.458286+08:00
Assigning activation ID 3703362676 (0xdcbcd474)
2024-10-18T04:25:15.505102+08:00
TT00 (PID:4092): Gap Manager starting
2024-10-18T04:25:15.551992+08:00
Redo log for group 1, sequence 1 is not located on DAX storage
2024-10-18T04:25:17.833250+08:00
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: E:\ORADATA\REDO01.LOG
Successful open of redo thread 1
2024-10-18T04:25:17.848888+08:00
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
stopping change tracking
2024-10-18T04:25:22.052035+08:00
Undo initialization recovery: err:0 start: 24275578 end: 24276578 diff: 1000 ms (1.0 seconds)
Undo initialization online undo segments: err:0 start: 24276578 end: 24276593 diff: 15 ms (0.0 seconds)
Undo initialization finished serial:0 start:24275578 end:24276640 diff:1062 ms (1.1 seconds)
Dictionary check beginning
Dictionary check complete
Verifying minimum file header compatibility for tablespace encryption..
Verifying file header compatibility for tablespace encryption completed for pdb 0
2024-10-18T04:25:23.114610+08:00
Database Characterset is AL32UTF8
No Resource Manager plan active
2024-10-18T04:25:29.036475+08:00
replication_dependency_tracking turned off (no async multimaster replication found)
2024-10-18T04:25:32.833386+08:00
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
Starting background process AQPC
2024-10-18T04:25:33.145881+08:00
AQPC started with pid=37, OS id=5560 
2024-10-18T04:25:35.677167+08:00
Starting background process CJQ0
2024-10-18T04:25:35.708430+08:00
CJQ0 started with pid=39, OS id=2728 
2024-10-18T04:25:36.724036+08:00
Completed: alter database open resetlogs

然后导出数据到新库,其中遇到了file# 66号文件最后丢失的20M引起的数据无法正常导出的问题处理(丢弃损坏部分数据,把剩余好的表中数据恢复到新库中)

清空redo,导致ORA-27048: skgfifi: file header information is invalid

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:清空redo,导致ORA-27048: skgfifi: file header information is invalid

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户由于空间不足,使用> redo命令清空了oracle的redo文件
redo


数据库挂掉之后,启动报错

Fri Oct 04 10:32:57 2024
alter database open
Beginning crash recovery of 1 threads
 parallel recovery started with 31 processes
Started redo scan
Errors in file /home/oracle/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_24876.trc:
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: '/u01/app/oracle/oradata/xifenfei/redo03.log'
ORA-27048: skgfifi: file header information is invalid
Additional information: 13
Aborting crash recovery due to error 313
Errors in file /home/oracle/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_24876.trc:
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: '/u01/app/oracle/oradata/xifenfei/redo03.log'
ORA-27048: skgfifi: file header information is invalid
Additional information: 13
Errors in file /home/oracle/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_24876.trc:
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: '/u01/app/oracle/oradata/xifenfei/redo03.log'
ORA-27048: skgfifi: file header information is invalid
Additional information: 13
ORA-313 signalled during: alter database open...
Fri Oct 04 10:32:58 2024
Errors in file /home/oracle/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_m000_29646.trc:
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1: '/u01/app/oracle/oradata/xifenfei/redo01.log'
ORA-27047: unable to read the header block of file
Linux-x86_64 Error: 25: Inappropriate ioctl for device
Additional information: 1
Errors in file /home/oracle/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_m000_29646.trc:
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '/u01/app/oracle/oradata/xifenfei/redo02.log'
ORA-27047: unable to read the header block of file
Linux-x86_64 Error: 25: Inappropriate ioctl for device
Additional information: 1
Errors in file /home/oracle/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_m000_29646.trc:
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: '/u01/app/oracle/oradata/xifenfei/redo03.log'
ORA-27048: skgfifi: file header information is invalid
Additional information: 11
Checker run found 6 new persistent data failures
Fri Oct 04 10:47:32 2024
db_recovery_file_dest_size of 4182 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.

这种情况下,所有redo全部被清空(包含current,active的redo),只能强制拉库,运气不错,拉库成功.

Sun Oct 06 10:09:01 2024
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 25668466513
Resetting resetlogs activation ID 4222555315 (0xfbaf14b3)
Sun Oct 06 10:09:10 2024
Setting recovery target incarnation to 3
Sun Oct 06 10:09:10 2024
Assigning activation ID 79943739 (0x4c3d83b)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: /u01/app/oracle/oradata/xifenfei/redo01.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sun Oct 06 10:09:11 2024
SMON: enabling cache recovery
Undo initialization finished serial:0 start:70198684 end:70198794 diff:110 (1 seconds)
Dictionary check beginning
Dictionary check complete
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is AL32UTF8
No Resource Manager plan active
Sun Oct 06 10:09:12 2024
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Sun Oct 06 10:09:13 2024
QMNC started with pid=23, OS id=4328 
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
Sun Oct 06 10:09:16 2024
db_recovery_file_dest_size of 4182 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Sun Oct 06 10:09:16 2024
Starting background process CJQ0
Sun Oct 06 10:09:16 2024
CJQ0 started with pid=25, OS id=4413 
Completed: alter database open resetlogs

通过alert日志分析客户自行对一个数据库恢复的来龙去脉和点评

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:通过alert日志分析客户自行对一个数据库恢复的来龙去脉和点评

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

12.1.0.2数据库由于异常断电,导致无法正常启动,通过alert日志对客户的整个操作过程进行分析(不含我的操作部分)
12.1.0.2


通过alert日志分析最初故障原因是由于控制文件有坏块导致

Tue Sep 24 11:49:48 2024
alter database open
Tue Sep 24 11:49:48 2024
Ping without log force is disabled
.
Tue Sep 24 11:49:48 2024
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_4715.trc:
ORA-01113: file 10 needs media recovery
ORA-01110: data file 10: '/u01/app/oracle/oradata/xifenfei.dbf'
ORA-1113 signalled during: alter database open...
alter database recover datafile '/u01/app/oracle/oradata/xifenfei.dbf'

offline 无法正常recover的数据文件

Tue Sep 24 13:13:30 2024
Media Recovery Complete (orcl)
Completed: ALTER DATABASE RECOVER  datafile 15  
ALTER DATABASE DATAFILE '/u01/app/oracle/oradata/xifenfei.dbf' END BACKUP
ORA-1235 signalled during: ALTER DATABASE DATAFILE '/u01/app/oracle/oradata/xifenfei.dbf' END BACKUP...
ALTER DATABASE DATAFILE '/u01/app/oracle/oradata/xifenfei.dbf' offline
Completed: ALTER DATABASE DATAFILE '/u01/app/oracle/oradata/xifenfei.dbf' offline
Tue Sep 24 13:25:16 2024
 ALTER DATABASE DATAFILE '/u01/app/oracle/oradata/xff.dbf' offline
Completed:  ALTER DATABASE DATAFILE '/u01/app/oracle/oradata/xff.dbf' offline

然后尝试打开数据库,遭遇ORA-600 4193错误,没有open成功

Tue Sep 24 13:27:06 2024
Media Recovery Complete (orcl)
Completed: ALTER DATABASE RECOVER  datafile 13   
alter database open
Tue Sep 24 13:27:16 2024
Ping without log force is disabled
.
Tue Sep 24 13:27:16 2024
Beginning crash recovery of 1 threads
 parallel recovery started with 7 processes
Tue Sep 24 13:27:16 2024
Started redo scan
Tue Sep 24 13:27:16 2024
Completed redo scan
 read 67 KB redo, 0 data blocks need recovery
Tue Sep 24 13:27:16 2024
Started redo application at
 Thread 1: logseq 7422, block 2, scn 119284797
Tue Sep 24 13:27:16 2024
Recovery of Online Redo Log: Thread 1 Group 3 Seq 7422 Reading mem 0
  Mem# 0: /u01/app/oracle/oradata/orcl/redo03.log
Tue Sep 24 13:27:16 2024
Completed redo application of 0.00MB
Tue Sep 24 13:27:16 2024
Completed crash recovery at
 Thread 1: logseq 7422, block 136, scn 119284798
 0 data blocks read, 0 data blocks written, 67 redo k-bytes read
Initializing SCN for created control file
Database SCN compatibility initialized to 3
Starting background process TMON
Tue Sep 24 13:27:16 2024
TMON started with pid=32, OS id=10617 
Tue Sep 24 13:27:16 2024
Thread 1 advanced to log sequence 7423 (thread open)
Thread 1 opened at log sequence 7423
  Current log# 1 seq# 7423 mem# 0: /u01/app/oracle/oradata/orcl/redo01.log
Successful open of redo thread 1
Tue Sep 24 13:27:16 2024
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Tue Sep 24 13:27:16 2024
SMON: enabling cache recovery
Tue Sep 24 13:27:20 2024
[10553] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:6974064 end:6975474 diff:1410 ms (1.4 seconds)
Dictionary check beginning
Tablespace 'TEMP' #3 found in data dictionary,
but not in the controlfile. Adding to controlfile.
File #10 is offline, but is part of an online tablespace.
data file 10: '/u01/app/oracle/oradata/tbs_data.dbf'
File #14 is offline, but is part of an online tablespace.
data file 14: '/u01/app/oracle/oradata/corsmf03.dbf'
Dictionary check complete
Verifying minimum file header compatibility (11g) for tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
Tue Sep 24 13:27:21 2024
SMON: enabling tx recovery
Tue Sep 24 13:27:21 2024
*********************************************************************
WARNING: The following temporary tablespaces contain no files.
         This condition can occur when a backup controlfile has
         been restored.  It may be necessary to add files to these
         tablespaces.  That can be done using the SQL statement:
 
         ALTER TABLESPACE <tablespace_name> ADD TEMPFILE
 
         Alternatively, if these temporary tablespaces are no longer
         needed, then they can be dropped.
           Empty temporary tablespace: TEMP
*********************************************************************
Updating character set in controlfile to AL32UTF8
Starting background process SMCO
Tue Sep 24 13:27:21 2024
SMCO started with pid=34, OS id=10632 
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_10523.trc  (incident=108129):
ORA-00600: internal error code, arguments: [4193], [21368], [21372], [], [], [], [], [], [], [], [], []
Incident details in:/u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_108129/orcl_smon_10523_i108129.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
…………
Tue Sep 24 13:27:24 2024
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_10553.trc:
ORA-00600: internal error code, arguments: [4193], [21652], [21539], [], []
Tue Sep 24 13:27:24 2024
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_10553.trc:
ORA-00600: internal error code, arguments: [4193], [21652], [21539], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 10553): terminating the instance due to error 600
Tue Sep 24 13:27:25 2024
Instance terminated by USER, pid = 10553
ORA-1092 signalled during: alter database open...

重建了ctl,加入_allow_resetlogs_corruption隐含参数,尝试使用resetlogs方式打开数据库,报ORA-600 2662错误

Tue Sep 24 14:30:22 2024
alter database open RESETLOGS
Tue Sep 24 14:32:09 2024
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 119237645 time 
Online log /u01/app/oracle/oradata/orcl/redo01.log: Thread 1 Group 1 was previously cleared
Online log /u01/app/oracle/oradata/orcl/redo02.log: Thread 1 Group 2 was previously cleared
Online log /u01/app/oracle/oradata/orcl/redo03.log: Thread 1 Group 3 was previously cleared
Tue Sep 24 14:32:09 2024
Setting recovery target incarnation to 2
Tue Sep 24 14:32:09 2024
Ping without log force is disabled
.
Initializing SCN for created control file
Database SCN compatibility initialized to 3
Tue Sep 24 14:32:09 2024
Warning - High Database SCN: Current SCN value is 119237648, threshold SCN value is 0
If you have not previously reported this warning on this database, 
please notify Oracle Support so that additional diagnosis can be performed.
Starting background process TMON
Tue Sep 24 14:32:09 2024
TMON started with pid=25, OS id=15032 
Tue Sep 24 14:32:09 2024
Assigning activation ID 1708301307 (0x65d29bfb)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: /u01/app/oracle/oradata/orcl/redo01.log
Successful open of redo thread 1
Tue Sep 24 14:32:09 2024
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Tue Sep 24 14:32:09 2024
SMON: enabling cache recovery
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_14937.trc  (incident=122458):
ORA-00600: internal error code, arguments: [2662], [0], [119484861], [0], [119484868], [16777344]……
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_122458/orcl_ora_14937_i122458.trc
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_14937.trc  (incident=122459):
………………
Tue Sep 24 14:32:16 2024
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_124802/orcl_ora_14937_i124802.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [2662], [0], [119484866], [0], [119484868], [16777344]……
ORA-00600: internal error code, arguments: [2662], [0], [119484865], [0], [119484868], [16777344]……
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [2662], [0], [119484861], [0], [119484868], [16777344]……

客户的自行恢复到此为止,没有成功,这里客户的恢复没有犯原则性错误(破坏文件的resetlogs 信息),同时也没有解决两个ORA-600错误
1. 在offline部分文件的情况下,打开数据库(没有使用resetlogs,避免了进一步破坏offline文件的resetlogs 信息),但是数据库报ORA-600 4193错误没有打开库成功
2. 后面强制拉库之前重建了ctl文件,避免了offline数据文件在resetlogs之后导致文件头resetlogs 信息和其他文件不一致的可能(因为重建ctl,offline的文件自动onlinne)
3. 最初offline数据文件,启动库报ORA-600 4193故障没有解决,这个故障一般是undo异常导致,这个故障大概率在后面强制拉库open过程中还可能遇到
4. 强制拉库过程中遭遇ORA-600 2662问题,需要修改scn,如果这个问题不解决,数据库无法open成功

ORA-12514: TNS: 监听进程不能解析在连接描述符中给出的SERVICE_NAME

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-12514: TNS: 监听进程不能解析在连接描述符中给出的SERVICE_NAME

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

一个10g的库应用访问报ORA-12514: TNS: 监听进程不能解析在连接描述符中给出的SERVICE_NAME
ORA-12514


通过分析alert日志,确认是数据库启动报ORA-600 4194错误

Mon Sep 23 16:12:42 2024
SMON: enabling cache recovery
Mon Sep 23 16:12:43 2024
Successfully onlined Undo Tablespace 1.
Mon Sep 23 16:12:43 2024
SMON: enabling tx recovery
Mon Sep 23 16:12:43 2024
Database Characterset is ZHS16GBK
Mon Sep 23 16:12:43 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\udump\xifenfei_ora_7832.trc:
ORA-00600: 内部错误代码, 参数: [4194], [66], [50], [], [], [], [], []

DEBUG: Replaying xcb 0xae312888, pmd 0x9058f4d4 for failed op 8
Doing block recovery for file 2 block 5547
No block recovery was needed
Mon Sep 23 16:13:31 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\udump\xifenfei_ora_7832.trc:
ORA-00600: 内部错误代码, 参数: [4194], [66], [50], [], [], [], [], []
ORA-00600: 内部错误代码, 参数: [4194], [66], [50], [], [], [], [], []

Mon Sep 23 16:13:32 2024
DEBUG: Replaying xcb 0xae312888, pmd 0x9058f4d4 for failed op 8
Mon Sep 23 16:13:32 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\udump\xifenfei_ora_7832.trc:
ORA-00600: 内部错误代码, 参数: [4194], [66], [50], [], [], [], [], []
ORA-00600: 内部错误代码, 参数: [4194], [66], [50], [], [], [], [], []

Doing block recovery for file 2 block 5547
No block recovery was needed
Mon Sep 23 16:13:33 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\udump\xifenfei_ora_7832.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [4194], [66], [50], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [66], [50], [], [], [], [], []

Mon Sep 23 16:14:18 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_smon_5880.trc:
ORA-00600: internal error code, arguments: [4194], [66], [50], [], [], [], [], []

Mon Sep 23 16:14:19 2024
DEBUG: Replaying xcb 0xae312888, pmd 0x9058f4d4 for failed op 8
Mon Sep 23 16:14:19 2024
Non-fatal internal error happenned while SMON was doing shrinking of rollback segments.
SMON encountered 1 out of maximum 100 non-fatal internal errors.
Mon Sep 23 16:14:19 2024
Doing block recovery for file 2 block 5547
No block recovery was needed
Mon Sep 23 16:15:06 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_pmon_6952.trc:
ORA-00600: internal error code, arguments: [4194], [66], [50], [], [], [], [], []

Mon Sep 23 16:15:06 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_pmon_6952.trc:
ORA-00600: internal error code, arguments: [4194], [66], [50], [], [], [], [], []

Mon Sep 23 16:15:06 2024
PMON: terminating instance due to error 472
Mon Sep 23 16:15:07 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_psp0_2104.trc:
ORA-00472: PMON  process terminated with error

Mon Sep 23 16:15:07 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_lgwr_3200.trc:
ORA-00472: PMON  process terminated with error

Mon Sep 23 16:15:07 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_dbw1_448.trc:
ORA-00472: PMON  process terminated with error

Mon Sep 23 16:15:07 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_dbw0_7436.trc:
ORA-00472: PMON  process terminated with error

Mon Sep 23 16:15:07 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_mman_1704.trc:
ORA-00472: PMON  process terminated with error

Mon Sep 23 16:15:07 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_dbw2_5072.trc:
ORA-00472: PMON  process terminated with error

Mon Sep 23 16:15:07 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_ckpt_6628.trc:
ORA-00472: PMON  process terminated with error

Mon Sep 23 16:15:07 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_reco_7924.trc:
ORA-00472: PMON  process terminated with error

Mon Sep 23 16:15:07 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_smon_5880.trc:
ORA-00472: PMON  process terminated with error

Instance terminated by PMON, pid = 6952

这个比较简单一般就是undo异常,对undo设置为人工管理,然后重建undo完成本次恢复任务

Oracle 19c异常恢复—ORA-01209/ORA-65088

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Oracle 19c异常恢复—ORA-01209/ORA-65088

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

由于raid卡bug故障,导致文件系统异常,从而使得数据库无法正常启动,客户找到我之前已经让多人分析,均未恢复成功,查看alert日志,发现他们恢复的时候尝试resetlogs库,然后报ORA-600 kcbzib_kcrsds_1错误

2024-09-15T17:07:32.553215+08:00
alter database open resetlogs
2024-09-15T17:07:32.569110+08:00
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 274757454692 time 
.... (PID:8074): Clearing online redo logfile 1 /opt/oracle/oradata/XFF/onlinelog/o1_mf_1_j3k201g9_.log
.... (PID:8074): Clearing online redo logfile 2 /opt/oracle/oradata/XFF/onlinelog/o1_mf_2_j3k201h3_.log
.... (PID:8074): Clearing online redo logfile 3 /opt/oracle/oradata/XFF/onlinelog/o1_mf_3_j3k201hk_.log
Clearing online log 1 of thread 1 sequence number 0
Clearing online log 2 of thread 1 sequence number 0
Clearing online log 3 of thread 1 sequence number 0
2024-09-15T17:07:34.939550+08:00
.... (PID:8074): Clearing online redo logfile 1 complete
.... (PID:8074): Clearing online redo logfile 2 complete
.... (PID:8074): Clearing online redo logfile 3 complete
Online log /opt/oracle/oradata/XFF/onlinelog/o1_mf_1_j3k201g9_.log: Thread 1 Group 1 was previously cleared
Online log /opt/oracle/fast_recovery_area/XFF/onlinelog/o1_mf_1_j3k201l4_.log: Thread 1 Group 1 was previously cleared
Online log /opt/oracle/oradata/XFF/onlinelog/o1_mf_2_j3k201h3_.log: Thread 1 Group 2 was previously cleared
Online log /opt/oracle/fast_recovery_area/XFF/onlinelog/o1_mf_2_j3k201kw_.log: Thread 1 Group 2 was previously cleared
Online log /opt/oracle/oradata/XFF/onlinelog/o1_mf_3_j3k201hk_.log: Thread 1 Group 3 was previously cleared
Online log /opt/oracle/fast_recovery_area/XFF/onlinelog/o1_mf_3_j3k201mt_.log: Thread 1 Group 3 was previously cleared
2024-09-15T17:07:34.966674+08:00
Setting recovery target incarnation to 2
2024-09-15T17:07:34.992357+08:00
Ping without log force is disabled:
  instance mounted in exclusive mode.
Buffer Cache Full DB Caching mode changing from FULL CACHING DISABLED to FULL CACHING ENABLED 
2024-09-15T17:07:34.994329+08:00
Crash Recovery excluding pdb 2 which was cleanly closed.
2024-09-15T17:07:34.994390+08:00
Crash Recovery excluding pdb 3 which was cleanly closed.
2024-09-15T17:07:34.994433+08:00
Crash Recovery excluding pdb 4 which was cleanly closed.
2024-09-15T17:07:34.994474+08:00
Crash Recovery excluding pdb 5 which was cleanly closed.
Initializing SCN for created control file
Database SCN compatibility initialized to 3
Endian type of dictionary set to little
2024-09-15T17:07:35.001752+08:00
Assigning activation ID 2966012017 (0xb0c9c071)
Redo log for group 1, sequence 1 is not located on DAX storage
2024-09-15T17:07:35.015921+08:00
TT00 (PID:8113): Gap Manager starting
2024-09-15T17:07:35.034047+08:00
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: /opt/oracle/oradata/XFF/onlinelog/o1_mf_1_j3k201g9_.log
  Current log# 1 seq# 1 mem# 1: /opt/oracle/fast_recovery_area/XFF/onlinelog/o1_mf_1_j3k201l4_.log
Successful open of redo thread 1
2024-09-15T17:07:35.034573+08:00
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
stopping change tracking
2024-09-15T17:07:35.063726+08:00
TT03 (PID:8119): Sleep 5 seconds and then try to clear SRLs in 2 time(s)
2024-09-15T17:07:35.129748+08:00
Undo initialization recovery: Parallel FPTR failed: start:2528681 end:2528684 diff:3 ms (0.0 seconds)
Errors in file /opt/oracle/diag/rdbms/xff/XFF/trace/XFF_ora_8074.trc  (incident=146455) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /opt/oracle/diag/rdbms/xff/XFF/incident/incdir_146455/XFF_ora_8074_i146455.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Please look for redo dump in pinned buffers history in incident trace file, if not dumped for what so ever reason,
use the following command to dump it at the earliest. ALTER SYSTEM DUMP REDO DBA MIN 4 128 DBA MAX 4 128 SCN MIN 1;
*****************************************************************
An internal routine has requested a dump of selected redo.
This usually happens following a specific internal error, when
analysis of the redo logs will help Oracle Support with the
diagnosis.
It is recommended that you retain all the redo logs generated (by
all the instances) during the past 12 hours, in case additional
redo dumps are required to help with the diagnosis.
*****************************************************************
Undo initialization recovery: err:600 start: 2528681 end: 2529341 diff: 660 ms (0.7 seconds)
2024-09-15T17:07:35.786923+08:00
Errors in file /opt/oracle/diag/rdbms/xff/XFF/trace/XFF_ora_8074.trc:
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
2024-09-15T17:07:35.786967+08:00
Errors in file /opt/oracle/diag/rdbms/xff/XFF/trace/XFF_ora_8074.trc:
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
Errors in file /opt/oracle/diag/rdbms/xff/XFF/trace/XFF_ora_8074.trc  (incident=146456) (PDBNAME=CDB$ROOT):
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /opt/oracle/diag/rdbms/xff/XFF/incident/incdir_146456/XFF_ora_8074_i146456.trc
2024-09-15T17:07:36.291884+08:00
opiodr aborting process unknown ospid (8074) as a result of ORA-603
2024-09-15T17:07:36.299928+08:00
ORA-603 : opitsk aborting process
License high water mark = 4
USER(prelim) (ospid: 8074): terminating the instance due to ORA error 600

然后他们又重建了ctl,通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)检查,发现几个问题:
1. PDB$SEED不在该库记录中(由于该pdb中无业务数据,可以忽略)
pdb


2. 部分文件resetlogs 信息不正确(应该是对部分文件offline或者重建ctl的时候没有带上他们)
resetlogs-scn

接手该库进行恢复,尝试resetlogs该库

[oracle@localhost check_db]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Tue Sep 17 11:29:28 2024
Version 19.9.0.0.0

Copyright (c) 1982, 2020, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.9.0.0.0

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-65088: database open should be retried
Process ID: 101712
Session ID: 105 Serial number: 4711

对应的alert日志报错

Endian type of dictionary set to little
2024-09-17T11:29:46.691904+08:00
Assigning activation ID 2966261119 (0xb0cd8d7f)
Redo log for group 1, sequence 1 is not located on DAX storage
2024-09-17T11:29:46.714594+08:00
TT00 (PID:101731): Gap Manager starting
2024-09-17T11:29:46.735407+08:00
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: /opt/oracle/oradata/XFF/onlinelog/o1_mf_1_j3k201g9_.log
Successful open of redo thread 1
2024-09-17T11:29:46.736182+08:00
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
stopping change tracking
2024-09-17T11:29:46.774207+08:00
TT03 (PID:101737): Sleep 5 seconds and then try to clear SRLs in 2 time(s)
2024-09-17T11:29:46.793381+08:00
Undo initialization recovery: Parallel FPTR complete: start:99831350 end:99831351 diff:1 ms (0.0 seconds)
Undo initialization recovery: err:0 start: 99831349 end: 99831351 diff: 2 ms (0.0 seconds)
Undo initialization online undo segments: err:0 start: 99831351 end: 99831353 diff: 2 ms (0.0 seconds)
Undo initialization finished serial:0 start:99831349 end:99831356 diff:7 ms (0.0 seconds)
Dictionary check beginning
2024-09-17T11:29:46.817810+08:00
Errors in file /opt/oracle/diag/rdbms/xff/XFF/trace/XFF_ora_101712.trc:
ORA-65106: Pluggable database #2 (PDB$SEED) is in an invalid state.
Pluggable Database PDB$SEED (#2) found in data dictionary,
but not in the control file. Adding it to control file.
Pluggable Database PDB1 (#3) found in data dictionary,
but not in the control file. Adding it to control file.
Pluggable Database PDB2 (#4) found in data dictionary,
but not in the control file. Adding it to control file.
Pluggable Database PDB3 (#5) found in data dictionary,
but not in the control file. Adding it to control file.
Tablespace 'TEMP' #3 found in data dictionary,
but not in the controlfile. Adding to controlfile.
2024-09-17T11:29:46.878684+08:00
Read of datafile '/opt/oracle/oradata/XFF/PDB/datafile/o1_mf_system_j3kc9hl0_.dbf'(fno 9)header failed with ORA-01209
Rereading datafile 9 header failed with ORA-01209
2024-09-17T11:29:46.921314+08:00
Errors in file /opt/oracle/diag/rdbms/xff/XFF/trace/XFF_dbw0_100632.trc:
ORA-01186: file 9 failed verification tests
ORA-01122: database file 9 failed verification check
ORA-01110: data file 9: '/opt/oracle/oradata/XFF/PDB/datafile/o1_mf_system_j3kc9hl0_.dbf'
ORA-01209: data file is from before the last RESETLOGS
File 9 not verified due to error ORA-01122
…………
Read of datafile '/opt/oracle/oradata/XFF/datafile/users07.dbf' (fno 39) header failed with ORA-01209
Rereading datafile 39 header failed with ORA-01209
2024-09-17T11:29:46.983955+08:00
Errors in file /opt/oracle/diag/rdbms/xff/XFF/trace/XFF_dbw0_100632.trc:
ORA-01186: file 39 failed verification tests
ORA-01122: database file 39 failed verification check
ORA-01110: data file 39: '/opt/oracle/oradata/XFF/datafile/users07.dbf'
ORA-01209: data file is from before the last RESETLOGS
File 39 not verified due to error ORA-01122
2024-09-17T11:29:46.987947+08:00
Dictionary check complete
Verifying minimum file header compatibility for tablespace encryption for pdb 1..
Verifying file header compatibility for tablespace encryption completed for pdb 1
*********************************************************************
WARNING: The following temporary tablespaces in container(CDB$ROOT)
         contain no files.
         This condition can occur when a backup controlfile has
         been restored.  It may be necessary to add files to these
         tablespaces.  That can be done using the SQL statement:
 
         ALTER TABLESPACE <tablespace_name> ADD TEMPFILE
 
         Alternatively, if these temporary tablespaces are no longer
         needed, then they can be dropped.
           Empty temporary tablespace: TEMP
*********************************************************************
Database Characterset is AL32UTF8
2024-09-17T11:29:47.059806+08:00
Errors in file /opt/oracle/diag/rdbms/xff/XFF/trace/XFF_mz00_101739.trc:
ORA-01110: data file 9: '/opt/oracle/oradata/XFF/PDB/datafile/o1_mf_system_j3kc9hl0_.dbf'
ORA-01209: data file is from before the last RESETLOGS
…………
**********************************************************
WARNING: Files may exists in db_recovery_file_dest
that are not known to the database. Use the RMAN command
CATALOG RECOVERY AREA to re-catalog any such files.
If files cannot be cataloged, then manually delete them
using OS command.
One of the following events caused this:
1. A backup controlfile was restored.
2. A standby controlfile was restored.
3. The controlfile was re-created.
4. db_recovery_file_dest had previously been enabled and
   then disabled.
**********************************************************
Starting background process IMCO
2024-09-17T11:29:47.340660+08:00
2024-09-17T11:29:47.382153+08:00
Errors in file /opt/oracle/diag/rdbms/xff/XFF/trace/XFF_mz00_101739.trc:
ORA-01110: data file 13: '/opt/oracle/oradata/XFF/PDB/datafile/o1_mf_users_j3kckos2_.dbf'
ORA-01209: data file is from before the last RESETLOGS
replication_dependency_tracking turned off (no async multimaster replication found)
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
2024-09-17T11:29:47.464233+08:00
Errors in file /opt/oracle/diag/rdbms/xff/XFF/trace/XFF_mz00_101739.trc:
ORA-01110: data file 14: '/opt/oracle/oradata/XFF/PDB/datafile/o1_mf_users_j3kckqfx_.dbf'
ORA-01209: data file is from before the last RESETLOGS
AQ Processes can not start in restrict mode
Could not open PDB$SEED error=65106
2024-09-17T11:29:47.522825+08:00
Errors in file /opt/oracle/diag/rdbms/xff/XFF/trace/XFF_ora_101712.trc:
ORA-65106: Pluggable database #2 (PDB$SEED) is in an invalid state.
ORA-65106: Pluggable database #2 (PDB$SEED) is in an invalid state.
2024-09-17T11:29:47.525249+08:00
db_recovery_file_dest_size of 65536 MB is 0.05% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
2024-09-17T11:29:47.529134+08:00
Errors in file /opt/oracle/diag/rdbms/xff/XFF/trace/XFF_ora_101712.trc:
ORA-65088: database open should be retried
2024-09-17T11:29:47.529202+08:00
Errors in file /opt/oracle/diag/rdbms/xff/XFF/trace/XFF_ora_101712.trc:
ORA-65088: database open should be retried
2024-09-17T11:29:47.529253+08:00
Error 65088 happened during db open, shutting down database
2024-09-17T11:29:47.545440+08:00
Errors in file /opt/oracle/diag/rdbms/xff/XFF/trace/XFF_mz00_101739.trc:
ORA-01110: data file 15: '/opt/oracle/oradata/XFF/PDB/datafile/o1_mf_users_j3kckstd_.dbf'
ORA-01209: data file is from before the last RESETLOGS
Errors in file /opt/oracle/diag/rdbms/xff/XFF/trace/XFF_ora_101712.trc(incident=775863)(PDBNAME=CDB$ROOT):
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-65088: database open should be retried
2024-09-17T11:29:48.046698+08:00
Errors in file /opt/oracle/diag/rdbms/xff/XFF/trace/XFF_mz00_101739.trc:
ORA-01110: data file 21: '/opt/oracle/oradata/XFF/PDB2/datafile/o1_mf_users_j45x90oq_.dbf'
ORA-01209: data file is from before the last RESETLOGS
2024-09-17T11:29:48.073328+08:00
opiodr aborting process unknown ospid (101712) as a result of ORA-603
2024-09-17T11:29:48.081576+08:00
ORA-603 : opitsk aborting process
License high water mark = 122
USER(prelim) (ospid: 101712): terminating the instance due to ORA error 65088
2024-09-17T11:29:49.104770+08:00
Instance terminated by USER(prelim), pid = 101712

主要错误有两个
ORA-01209: data file is from before the last RESETLOGS 和
ORA-65088: database open should be retried
通过分析这两个错误

[oracle@ora19c:/home/oracle]$ oerr ora 65088
65088, 00000, "database open should be retried"
// *Cause:   An inconsistency between the control file and the data dictionary
//           was found and fixed during the database open. The database open
//           needs to be executed again.
// *Action:  Retry the database open.
//
[oracle@ora19c:/home/oracle]$ oerr ora 01209
01209, 00000, "data file is from before the last RESETLOGS"   
// *Cause:  The reset log data in the file header does not match the   
//         control file. If the database is closed or the file is offline,  
//         the backup is old because it was taken before the last ALTER   
//         DATABASE OPEN RESETLOGS command. If opening a database that is   
//         open already by another instance, or if another instance just   
//         brought this file online, the file accessed by this instance is 
//         probably a different version. Otherwise, a backup of the file 
//         probably was restored while the file was in use.   
// *Action: Make the correct file available to the database. Then, either open
//         the database, or execute ALTER SYSTEM CHECK DATAFILES.  

ORA-65088参见官方:ORA-65088 while opening DB with resetlogs for multi-tenant DB in 12.2 (Doc ID 2449591.1),应该不是一个技术问题(由于重建ctl+resetlogs导致)
ORA-01209: data file is from before the last RESETLOGS 这个错误,可以简单理解resetlogs的信息比数据文件的checkpoint信息新,对于这种情况,以及结合上述的部分文件resetlogs信息不一致问题,索性直接使用m_scn小工具对其进行批量
m_scn


再次使用Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)检查,确认resetlogs 问题修复
resetlogs

然后顺利打开数据库,并导出数据,完成本次恢复任务

ORA-600 16703故障再现

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-600 16703故障再现

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

从第一次发现ORA-600 16703(警告:互联网中有oracle介质被注入恶意程序导致—ORA-600 16703)至今已经7年多时间了,最近依旧有客户中招,提醒各位注意该问题

Sat Sep 14 21:43:29 2024
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sat Sep 14 21:43:29 2024
SMON: enabling cache recovery
Errors in file D:\ORACLE\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_6264.trc  (incident=8561):
ORA-00600: 内部错误代码, 参数: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Incident details in: D:\ORACLE\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\incident\incdir_8561\orcl_ora_6264_i8561.trc
Sat Sep 14 21:43:31 2024
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file D:\ORACLE\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_6264.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00600: 内部错误代码, 参数: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Errors in file D:\ORACLE\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_6264.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00600: 内部错误代码, 参数: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Error 704 happened during db open, shutting down database
USER (ospid: 6264): terminating the instance due to error 704
Instance terminated by USER, pid = 6264
ORA-1092 signalled during: alter database open...
opiodr aborting process unknown ospid (6264) as a result of ORA-1092

由于此类故障出现较多,破坏性加大,对其进行了深入的研究,在没有破坏现场的情况下,通过对tab$进行直接重建,实现数据库完美恢复(数据0丢失,数据库无需逻辑迁移[原库直接可用])
ora-600-16703


以前关于此类报错的文章:
10g数据库遭遇ORA-600 16703
12C数据库遭遇ORA-600 16703
ORA-600 kzrini:!uprofile处理
ORA-600 16703故障解析—tab$表被清空
近期又遇到ORA-600 16703和ORA-702故障
ORA-00600: internal error code, arguments: [16703], [1403], [4] 原因
tab$异常被处理之后报ORA-600 13304故障处理
最近遇到几起ORA-600 16703故障(tab$被清空),请引起重视
ORA-600 16703直接把orachk备份表插入到tab$恢复
警告:互联网中有oracle介质被注入恶意程序导致—ORA-600 16703
aix平台tab$被删除可能出现ORA-600 [16703], [1403], [28]错误
ORA-00600: internal error code, arguments: [16703], [1403], [4] 故障处理
ORA-00600: internal error code, arguments: [16703], [1403], [32]
ORA-600 16703故障,客户找人恢复数据库,数据库被进一步恶意破坏—ORA-00704 ORA-00922
尽可能不要从互联网下载Oracle安装介质和Patch,避免被注入恶意脚本,并检查已经存在的安装介质的sha256码

ORA-600 krhpfh_03-1210故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-600 krhpfh_03-1210故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

rac数据库多个节点均处于open状态,数据查询正常,但是应用入库有些时候会失败报类似ORA-01187: cannot read from file because it failed verification tests错误:
ora-01187


故障最初原因是由于有坏盘,换盘之后,有两个节点数据实例crash

Mon Aug 19 21:16:47 2024
Read of datafile '+DATA/xifenfei99.dbf' (fno 1399) header failed with ORA-01207
Rereading datafile 1399 header failed with ORA-01207
Errors in file /u01/app/oracle/diag/rdbms/xff/xff5/trace/xff5_ckpt_75779.trc:
ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode
ORA-01122: database file 1399 failed verification check
ORA-01110: data file 1399: '+DATA/xifenfei99.dbf'
ORA-01207: file is more recent than control file - old control file
Errors in file /u01/app/oracle/diag/rdbms/xff/xff5/trace/xff5_ckpt_75779.trc:
ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode
ORA-01122: database file 1399 failed verification check
ORA-01110: data file 1399: '+DATA/xifenfei99.dbf'
ORA-01207: file is more recent than control file - old control file
CKPT (ospid: 75779): terminating the instance due to error 1242
Mon Aug 19 21:16:47 2024
System state dump requested by (instance=5, osid=75779 (CKPT)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/xff/xff5/trace/xff5_diag_75725.trc
Mon Aug 19 21:16:52 2024
ORA-1092 : opitsk aborting process
Mon Aug 19 21:16:53 2024
ORA-1092 : opitsk aborting process
Mon Aug 19 21:16:53 2024
License high water mark = 131
Termination issued to instance processes. Waiting for the processes to exit
Mon Aug 19 21:17:02 2024
Instance termination failed to kill one or more processes
Instance terminated by CKPT, pid = 75779
Mon Aug 19 21:17:03 2024
USER (ospid: 33495): terminating the instance
Termination issued to instance processes. Waiting for the processes to exit
Mon Aug 19 21:17:13 2024
Instance termination failed to kill one or more processes
Instance terminated by USER, pid = 33495

但是数据库人工启动成功,查询所有数据文件均处于online状态
20240820-182825


可是有部分入库进程非常慢大量等待在enq:HW – contention
20240826-120804

所有数据库节点alert日志偶尔报ORA-01186: file 1399 failed verification tests等错

Tue Aug 20 21:30:02 2024
Read of datafile '+DATA/xifenfei99.dbf' (fno 1399) header failed with ORA-01207
Rereading datafile 1399 header failed with ORA-01207
Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_dbw0_43828.trc:
ORA-01186: file 1399 failed verification tests
ORA-01122: database file 1399 failed verification check
ORA-01110: data file 1399: '+DATA/xifenfei99.dbf'
ORA-01207: file is more recent than control file - old control file
File 1399 not verified due to error ORA-01122
Read of datafile '+DATA/xifenfei99.dbf' (fno 1399) header failed with ORA-01207
Rereading datafile 1399 header failed with ORA-01207
Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_dbw0_43828.trc:
ORA-01186: file 1399 failed verification tests
ORA-01122: database file 1399 failed verification check
ORA-01110: data file 1399: '+DATA/xifenfei99.dbf'
ORA-01207: file is more recent than control file - old control file
File 1399 not verified due to error ORA-01122

基于这种情况,初步判断:
1. 是由于该集群本身多节点(6个节点),只要有节点是open状态,其他节点关闭再启动依旧可以正常启动,但是无法写入数据到报ORA-01207错误的数据文件中(可以读取数据).
2. 如果所有节点关闭关闭,然后数据库无法正常启动会报ORA-01207: file is more recent than control file错误

这样的情况,根据以往经验,ORA-01207: file is more recent than control file通过重建ctl即可恢复,先关闭所有节点,然后尝试启动一个节点

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01122: database file 1399 failed verification check
ORA-01110: data file 1399: '+DATA/xifenfei99.dbf'
ORA-01207: file is more recent than control file - old control file
alter database open
Wed Aug 21 14:14:22 2024
SUCCESS: diskgroup REDO was mounted
Wed Aug 21 14:14:22 2024
NOTE: dependency between database xff and diskgroup resource ora.REDO.dg is established
Wed Aug 21 14:14:27 2024
Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_47884.trc:
ORA-01122: database file 1399 failed verification check
ORA-01110: data file 1399: '+DATA/xifenfei99.dbf'
ORA-01207: file is more recent than control file - old control file
ORA-1122 signalled during: alter database open...

和预期的一样,重试重建ctl,然后数据库报ORA-00600 [krhpfh_03-1210]错误

SQL> shutdown immediate;
ORA-01109: database not open


Database dismounted.
ORACLE instance shut down.
SQL> startup nomount pfile='/tmp/xff/pfile';
ORACLE instance started.

Total System Global Area 1.3255E+11 bytes
Fixed Size		    2244832 bytes
Variable Size		 9.7442E+10 bytes
Database Buffers	 3.4897E+10 bytes
Redo Buffers		  208654336 bytes
SQL> @rectl

Control file created.

SQL> 
SQL> 
SQL> 
SQL> recover database;
ORA-00283: recovery session canceled due to errors
ORA-01610: recovery using the BACKUP CONTROLFILE option must be done


SQL> recover database using backup controlfile;
ORA-00283: recovery session canceled due to errors
ORA-00600: internal error code, arguments: [krhpfh_03-1210], [fno =], [1399],
[fhcpc =], [274968], [fhccc =], [274983], [], [], [], [], []
ORA-01110: data file 1399: '+DATA/xifenfei99.dbf'

这里的提示是有fhcpc和fhccc值不对导致,通过bbed查看相关值

BBED> set file 1399
	FILE#          	1399

BBED> p kcvfhccc
ub4 kcvfhccc                                @148      0x00043227 ===>274983(10进制)

BBED> p kcvfhcpc
ub4 kcvfhcpc                                @140      0x00043218 ===>274968(10进制)

报错比较明显通过bbed修改这两个值

BBED> m /x 2a390400 offset 148
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
 File: /tmp/xff/1399.dbf.header (1399)
 Block: 1                Offsets:  148 to  659           Dba:0x5dc00001
------------------------------------------------------------------------
 2a390400 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 0c000000 0f004441 
 5441315f 5442535f 45515f30 31000000 00000000 00000000 00000000 78010000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 cfebdd33 01000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 419333df 81001c0a 6ab13046 06000000 
 c1520400 02000000 10000000 7e000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 0d000d00 0d000100 00000000 00000000 

 <32 bytes per line>

BBED> m /x 2b390400 offset 140
 File: /tmp/xff/1399.dbf.header (1399)
 Block: 1                Offsets:  140 to  651           Dba:0x5dc00001
------------------------------------------------------------------------
 2b390400 e6ef524d 2a390400 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 0c000000 0f004441 5441315f 5442535f 45515f30 31000000 00000000 00000000 
 00000000 78010000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 cfebdd33 01000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 419333df 81001c0a 
 6ab13046 06000000 c1520400 02000000 10000000 7e000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 0d000d00 0d000100 

 <32 bytes per line>

修改好这些值之后,recover database和open数据库成功,检查字典正常,业务读写也正常,完成本次恢复任务

SQL> @hcheck
HCheck Version 07MAY18 on 21-AUG-2024 15:13:02
----------------------------------------------
Catalog Version 11.2.0.3.0 (1102000300)
db_name: XFF

				   Catalog	 Fixed
Procedure Name			   Version    Vs Release    Timestamp
Result
------------------------------ ... ---------- -- ---------- --------------
------
.- LobNotInObj		       ... 1102000300 <=  *All Rel* 08/21 15:13:02 PASS
.- MissingOIDOnObjCol	       ... 1102000300 <=  *All Rel* 08/21 15:13:02 PASS
.- SourceNotInObj	       ... 1102000300 <=  *All Rel* 08/21 15:13:02 PASS
.- OversizedFiles	       ... 1102000300 <=  *All Rel* 08/21 15:13:02 PASS
.- PoorDefaultStorage	       ... 1102000300 <=  *All Rel* 08/21 15:13:02 PASS
.- PoorStorage		       ... 1102000300 <=  *All Rel* 08/21 15:13:02 PASS
.- TabPartCountMismatch        ... 1102000300 <=  *All Rel* 08/21 15:13:02 PASS
.- OrphanedTabComPart	       ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
.- MissingSum$		       ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
.- MissingDir$		       ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
.- DuplicateDataobj	       ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
.- ObjSynMissing	       ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
.- ObjSeqMissing	       ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
.- OrphanedUndo 	       ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
.- OrphanedIndex	       ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
.- OrphanedIndexPartition      ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
.- OrphanedIndexSubPartition   ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- OrphanedTable	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- OrphanedTablePartition      ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- OrphanedTableSubPartition   ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- MissingPartCol	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- OrphanedSeg$ 	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- OrphanedIndPartObj#	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- DuplicateBlockUse	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- FetUet		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- Uet0Check		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- SeglessUET		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- BadInd$		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- BadTab$		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- BadIcolDepCnt	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- ObjIndDobj		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- TrgAfterUpgrade	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- ObjType0		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- BadOwner		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- StmtAuditOnCommit	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- BadPublicObjects	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- BadSegFreelist	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- BadDepends		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- CheckDual		       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
.- ObjectNames		       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
.- BadCboHiLo		       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
.- ChkIotTs		       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
.- NoSegmentIndex	       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
.- BadNextObject	       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
.- DroppedROTS		       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
.- FilBlkZero		       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
.- DbmsSchemaCopy	       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
.- OrphanedObjError	       ... 1102000300 >  1102000000 08/21 15:13:05 PASS
.- ObjNotLob		       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
.- MaxControlfSeq	       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
.- SegNotInDeferredStg	       ... 1102000300 >  1102000000 08/21 15:13:06 PASS
.- SystemNotRfile1	       ... 1102000300 >   902000000 08/21 15:13:06 PASS
.- DictOwnNonDefaultSYSTEM     ... 1102000300 <=  *All Rel* 08/21 15:13:07 PASS
.- OrphanTrigger	       ... 1102000300 <=  *All Rel* 08/21 15:13:07 PASS
.- ObjNotTrigger	       ... 1102000300 <=  *All Rel* 08/21 15:13:07 PASS
---------------------------------------
21-AUG-2024 15:13:07  Elapsed: 5 secs
---------------------------------------
Found 0 potential problem(s) and 0 warning(s)

PL/SQL procedure successfully completed.

Statement processed.

Complete output is in trace file:
/u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_70961_HCHECK.trc

19c库启动报ORA-600 kcbzib_kcrsds_1

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:19c库启动报ORA-600 kcbzib_kcrsds_1

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

一套19c的库由于某种情况,发现异常,当时的技术使用隐含参数强制拉库,导致数据库启动报ORA-00704 ORA-600 kcbzib_kcrsds_1错误
kcbzib_kcrsds_1

2024-08-24T06:11:25.494304+08:00
ALTER DATABASE OPEN
2024-08-24T06:11:25.494370+08:00
TMI: adbdrv open database BEGIN 2024-08-24 06:11:25.494324
Smart fusion block transfer is disabled:
  instance mounted in exclusive mode.
2024-08-24T06:11:25.515306+08:00
Beginning crash recovery of 1 threads
 parallel recovery started with 7 processes
 Thread 1: Recovery starting at checkpoint rba (logseq 2 block 3), scn 286550073
2024-08-24T06:11:25.567011+08:00
Started redo scan
2024-08-24T06:11:25.587170+08:00
Completed redo scan
 read 0 KB redo, 0 data blocks need recovery
2024-08-24T06:11:25.595192+08:00
Started redo application at
 Thread 1: logseq 2, block 3, offset 0, scn 0x0000000011146839
2024-08-24T06:11:25.595552+08:00
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: /dbf/RLZY/redo02.log
2024-08-24T06:11:25.595712+08:00
Completed redo application of 0.00MB
2024-08-24T06:11:25.596058+08:00
Completed crash recovery at
 Thread 1: RBA 2.3.0, nab 3, scn 0x000000001114683a
 0 data blocks read, 0 data blocks written, 0 redo k-bytes read
Endian type of dictionary set to little
2024-08-24T06:11:25.648152+08:00
LGWR (PID:1614826): STARTING ARCH PROCESSES
2024-08-24T06:11:25.661738+08:00
TT00 (PID:1614908): Gap Manager starting
Starting background process ARC0
2024-08-24T06:11:25.677246+08:00
ARC0 started with pid=54, OS id=1614910 
2024-08-24T06:11:25.687525+08:00
LGWR (PID:1614826): ARC0: Archival started
LGWR (PID:1614826): STARTING ARCH PROCESSES COMPLETE
2024-08-24T06:11:25.687733+08:00
ARC0 (PID:1614910): Becoming a 'no FAL' ARCH
ARC0 (PID:1614910): Becoming the 'no SRL' ARCH
2024-08-24T06:11:25.696437+08:00
TMON (PID:1614886): STARTING ARCH PROCESSES
Starting background process ARC1
2024-08-24T06:11:25.711645+08:00
Thread 1 advanced to log sequence 3 (thread open)
Redo log for group 3, sequence 3 is not located on DAX storage
2024-08-24T06:11:25.715270+08:00
ARC1 started with pid=56, OS id=1614914 
Starting background process ARC2
Thread 1 opened at log sequence 3
  Current log# 3 seq# 3 mem# 0: /dbf/RLZY/redo03.log
Successful open of redo thread 1
2024-08-24T06:11:25.728586+08:00
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Stopping change tracking
2024-08-24T06:11:25.734124+08:00
ARC2 started with pid=57, OS id=1614916 
Starting background process ARC3
2024-08-24T06:11:25.752891+08:00
ARC3 started with pid=58, OS id=1614918 
2024-08-24T06:11:25.752979+08:00
TMON (PID:1614886): ARC1: Archival started
TMON (PID:1614886): ARC2: Archival started
TMON (PID:1614886): ARC3: Archival started
TMON (PID:1614886): STARTING ARCH PROCESSES COMPLETE
2024-08-24T06:11:25.802551+08:00
ARC0 (PID:1614910): Archived Log entry 2828 added for T-1.S-2 ID 0x74f18f91 LAD:1
2024-08-24T06:11:25.806845+08:00
TT03 (PID:1614922): Sleep 5 seconds and then try to clear SRLs in 2 time(s)
Errors in file /oracle/diag/rdbms/xff/xff/trace/xff_ora_1614892.trc  (incident=124865):
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /oracle/diag/rdbms/xff/xff/incident/incdir_124865/xff_ora_1614892_i124865.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2024-08-24T06:11:25.871925+08:00
2024-08-24T06:11:26.772652+08:00
*****************************************************************
An internal routine has requested a dump of selected redo.
This usually happens following a specific internal error, when
analysis of the redo logs will help Oracle Support with the
diagnosis.
It is recommended that you retain all the redo logs generated (by
all the instances) during the past 12 hours, in case additional
redo dumps are required to help with the diagnosis.
*****************************************************************
2024-08-24T06:11:26.872265+08:00
Errors in file /oracle/diag/rdbms/xff/xff/trace/xff_ora_1614892.trc:
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
2024-08-24T06:11:26.872351+08:00
Errors in file /oracle/diag/rdbms/xff/xff/trace/xff_ora_1614892.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
2024-08-24T06:11:26.872412+08:00
Errors in file /oracle/diag/rdbms/xff/xff/trace/xff_ora_1614892.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
2024-08-24T06:11:26.872455+08:00
Error 704 happened during db open, shutting down database
Errors in file /oracle/diag/rdbms/xff/xff/trace/xff_ora_1614892.trc  (incident=124866):
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /oracle/diag/rdbms/xff/xff/incident/incdir_124866/xff_ora_1614892_i124866.trc
opiodr aborting process unknown ospid (1614892) as a result of ORA-603
2024-08-24T06:11:27.498146+08:00
Errors in file /oracle/diag/rdbms/xff/xff/trace/xff_ora_1614892.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
2024-08-24T06:11:27.501122+08:00
ORA-603 : opitsk aborting process
License high water mark = 8
USER(prelim) (ospid: 1614892): terminating the instance due to ORA error 704
2024-08-24T06:11:28.526358+08:00
Instance terminated by USER(prelim), pid = 1614892

官方关于kcbzib_kcrsds_1从解释只有:Bug 31887074 – sr21.1bigscn_hipu3 – trc – ksfdopn2 – ORA-600 [kcbzib_kcrsds_1] (Doc ID 31887074.8)
ksfdopn2


虽然关于ORA-600 [kcbzib_kcrsds_1],oracle官方没有给出来解决方案,其实通过以往大量的恢复案例和经验中已经知道,这个错误解决方案就是修改oracle scn的方法可以绕过去,以前有过一些类似恢复案例:
ORA-600 kcbzib_kcrsds_1报错
12C数据库报ORA-600 kcbzib_kcrsds_1故障处理
ORA-00603 ORA-01092 ORA-600 kcbzib_kcrsds_1
redo异常强制拉库报ORA-600 kcbzib_kcrsds_1修复
Patch SCN工具一键恢复ORA-600 kcbzib_kcrsds_1
存储故障,强制拉库报ORA-600 kcbzib_kcrsds_1处理

redo写丢失导致ORA-600 kcrf_resilver_log_1故障

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:redo写丢失导致ORA-600 kcrf_resilver_log_1故障

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有一个客户硬件故障,做完硬件恢复之后,数据库启动报ORA-600 kcrf_resilver_log_1错误.
kcrf_resilver_log_1

Thu Aug 22 13:37:50 2024
alter database open
Beginning crash recovery of 1 threads
 parallel recovery started with 3 processes
Started redo scan
Errors in file e:\oracle\zy\diag\rdbms\orcl\orcl\trace\orcl_ora_1640.trc  (incident=9767):
ORA-00600: 内部错误代码, 参数: [kcrf_resilver_log_1], [0x7DCEBE020], [2], [], [], [], [], [], [], [], [], []
Incident details in: e:\oracle\zy\diag\rdbms\orcl\orcl\incident\incdir_9767\orcl_ora_1640_i9767.trc
Thu Aug 22 13:37:55 2024
Trace dumping is performing id=[cdmp_20240822133755]
Aborting crash recovery due to error 600
Errors in file e:\oracle\zy\diag\rdbms\orcl\orcl\trace\orcl_ora_1640.trc:
ORA-00600: 内部错误代码, 参数: [kcrf_resilver_log_1], [0x7DCEBE020], [2], [], [], [], [], [], [], [], [], []
Errors in file e:\oracle\zy\diag\rdbms\orcl\orcl\trace\orcl_ora_1640.trc:
ORA-00600: 内部错误代码, 参数: [kcrf_resilver_log_1], [0x7DCEBE020], [2], [], [], [], [], [], [], [], [], []

查询mos出现该问题的原因一般是由于redo log write lost导致
kcrf_resilver_log_1-9056657


这个问题恢复起来不难,一般就是尝试强制打开库,以前有过类似的恢复case:
ORA-600 kcrf_resilver_log_1故障处理
ORA-00600[kcrf_resilver_log_1]异常恢复

200T 数据库非归档无备份恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:200T 数据库非归档无备份恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

一套近200T的,6个节点的RAC,由于存储管线链路不稳定,导致服务器经常性掉盘,引起asm 磁盘组频繁dismount/mount,数据库集群节点不停的重启,修复好链路问题之后,数据库启动报ORA-01113,ORA-01110
ORA-01113-ORA-01110


通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)脚本检测,发现有10个数据文件异常,无法正常恢复
20240814155122

该库比较大,有近200T,因此恢复需要各位谨慎(无法做现场备份,另外客户要求2天时间必须恢复好)
200t

由于数据库是非归档模式,该库无法通过应用归档日志来实现对这些文件进行恢复,对于这种情况,直接使用dbms_diskgroup把数据文件头拷贝到文件系统中,类似操作

SQL> @dbms_diskgroup_get_block.sql  +DATA/xifenfei.dbf 1 1 /tmp/xff/xifenfei.dbf.header

Parameter 1:
ASM_file_name (required)


Parameter 2:
block_to_extract (required)


Parameter 3
number_of_blocks_to_extract (required)


Parameter 4:
FileSystem_File_Name (required)

old  14:  v_AsmFilename := '&ASM_File_Name';
new  14:  v_AsmFilename := '+DATA/xifenfei.dbf';
old  15:  v_offstart := '&block_to_extract';
new  15:  v_offstart := '1';
old  16:  v_numblks := '&number_of_blocks_to_extract';
new  16:  v_numblks := '1';
old  17:  v_FsFilename := '&FileSystem_File_Name';
new  17:  v_FsFilename := '/tmp/xff/xifenfei.dbf.header';
File: +DATA/xifenfei.dbf
Type: 2 Data File
Size (in logical blocks): 3978880
Logical Block Size: 16384
Physical Block Size: 512

PL/SQL procedure successfully completed.

然后通过bbed修改相关scn

BBED> set filename 'xifenfei.dbf.header'
	FILENAME       	xifenfei.dbf.header

BBED> set blocksize 16384
	BLOCKSIZE      	16384

BBED> map
 File: xifenfei.dbf.header (0)
 Block: 1                                     Dba:0x00000000
------------------------------------------------------------
 Data File Header

 struct kcvfh, 860 bytes                    @0       

 ub4 tailchk                                @16380   


BBED> p kcvfh.kcvfhckp.kcvcpscn
struct kcvcpscn, 8 bytes                    @484     
   ub4 kscnbas                              @484      0xa8061324
   ub2 kscnwrp                              @488      0x0081

BBED> assign file 295 block 1 kcvfh.kcvfhckp.kcvcpscn = file 1 block 1 kcvfh.kcvfhckp.kcvcpscn;
struct kcvcpscn, 8 bytes                    @484     
   ub4 kscnbas                              @484      0xa8133e2b
   ub2 kscnwrp                              @488      0x0081

然后把修改的数据文件头写回到asm中

SQL> @dbms_diskgroup_cp_block_to_asm.sql  /tmp/xff/xifenfei.dbf.header  +DATA/xifenfei.dbf 1 1 

Parameter 1:
v_FsFileName (required)


Parameter 2:
v_AsmFileName (required)


Parameter 3
v_offstart (required)


Parameter 4
v_numblks (required)

old  16: v_FsFileName := '&v_FsFileName';
new  16: v_FsFileName := '/tmp/xff/xifenfei.dbf.header';
old  17: v_AsmFileName := '&v_AsmFileName';
new  17: v_AsmFileName := '+DATA/xifenfei.dbf';
old  18: v_offstart := '&v_offstart';
new  18: v_offstart := '1';
old  19:  v_numblks := '&v_numblks';
new  19:  v_numblks := '1';
File: +DATA/xifenfei.dbf
Type: 2 Data File
Size (in logical blocks): 3978880
Logical Block Size: 16384

PL/SQL procedure successfully completed.

查询文件头是否修改成功

[oracle@xff1 xff]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Sat Aug 10 16:45:02 2024

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options

SQL> set numw 16
SQL> select CHECKPOINT_CHANGE# from v$datafile_header where file# in (1,295);

CHECKPOINT_CHANGE#
------------------
      556870614571
      556870614571

SQL> recover datafile 295;
Media recovery complete.

通过上述操作,确认bbed修改文件头成功,后续类似方法对其他9个文件进行修改,并打开数据库

SQL> recover database;
Media recovery complete.
SQL> alter database open;

Database altered.

alert日志提示

Sat Aug 10 16:46:11 2024
ALTER DATABASE RECOVER  datafile 295  
Media Recovery Start
Serial Media Recovery started
WARNING! Recovering data file 295 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
Media Recovery Complete (xff1)
Completed: ALTER DATABASE RECOVER  datafile 295  
Sat Aug 10 16:46:39 2024
ALTER DATABASE RECOVER  database  
Media Recovery Start
 started logmerger process
Sat Aug 10 16:46:51 2024
WARNING! Recovering data file 1139 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
WARNING! Recovering data file 1140 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
WARNING! Recovering data file 1601 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
WARNING! Recovering data file 1803 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
WARNING! Recovering data file 1827 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
WARNING! Recovering data file 1931 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
WARNING! Recovering data file 2185 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
WARNING! Recovering data file 2473 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
WARNING! Recovering data file 2616 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
Sat Aug 10 16:46:54 2024
Parallel Media Recovery started with 64 slaves
Media Recovery Complete (xff1)
Completed: ALTER DATABASE RECOVER  database  
Sat Aug 10 17:19:58 2024
alter database open
This instance was first to open
Sat Aug 10 17:19:58 2024
SUCCESS: diskgroup DATA was mounted
Sat Aug 10 17:19:58 2024
NOTE: dependency between database xff and diskgroup resource ora.DATA.dg is established
Sat Aug 10 17:20:10 2024
Picked broadcast on commit scheme to generate SCNs
Sat Aug 10 17:20:10 2024
SUCCESS: diskgroup REDO was mounted
Sat Aug 10 17:20:10 2024
NOTE: dependency between database xff and diskgroup resource ora.REDO.dg is established
Thread 1 opened at log sequence 124958
  Current log# 14 seq# 124958 mem# 0: +REDO/xff/log2.ora
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sat Aug 10 17:20:14 2024
SMON: enabling cache recovery
Instance recovery: looking for dead threads
Instance recovery: lock domain invalid but no dead threads
[33770] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:261099864 end:261100854 diff:990 (9 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
Sat Aug 10 17:20:16 2024
minact-scn: Inst 1 is now the master inc#:2 mmon proc-id:33650 status:0x7
minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000
Starting background process GTX0
Sat Aug 10 17:20:16 2024
GTX0 started with pid=45, OS id=34119 
Starting background process RCBG
Sat Aug 10 17:20:16 2024
RCBG started with pid=46, OS id=34121 
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Sat Aug 10 17:20:16 2024
QMNC started with pid=47, OS id=34134 
Starting background process SMCO
Completed: alter database open

其他集群其他节点数据库,一切正常
20240814162201


检查数据字典一致性

SQL> @hcheck.sql
HCheck Version 07MAY18 on 10-AUG-2024 18:24:49
----------------------------------------------
Catalog Version 11.2.0.3.0 (1102000300)
db_name: XFF

				   Catalog	 Fixed
Procedure Name			   Version    Vs Release    Timestamp
Result
------------------------------ ... ---------- -- ---------- --------------
------
.- LobNotInObj		       ... 1102000300 <=  *All Rel* 08/10 18:24:49 PASS
.- MissingOIDOnObjCol	       ... 1102000300 <=  *All Rel* 08/10 18:24:49 PASS
.- SourceNotInObj	       ... 1102000300 <=  *All Rel* 08/10 18:24:49 PASS
.- OversizedFiles	       ... 1102000300 <=  *All Rel* 08/10 18:24:50 PASS
.- PoorDefaultStorage	       ... 1102000300 <=  *All Rel* 08/10 18:24:50 PASS
.- PoorStorage		       ... 1102000300 <=  *All Rel* 08/10 18:24:50 PASS
.- TabPartCountMismatch        ... 1102000300 <=  *All Rel* 08/10 18:24:50 PASS
.- OrphanedTabComPart	       ... 1102000300 <=  *All Rel* 08/10 18:24:50 PASS
.- MissingSum$		       ... 1102000300 <=  *All Rel* 08/10 18:24:50 PASS
.- MissingDir$		       ... 1102000300 <=  *All Rel* 08/10 18:24:50 PASS
.- DuplicateDataobj	       ... 1102000300 <=  *All Rel* 08/10 18:24:50 PASS
.- ObjSynMissing	       ... 1102000300 <=  *All Rel* 08/10 18:24:51 PASS
.- ObjSeqMissing	       ... 1102000300 <=  *All Rel* 08/10 18:24:51 PASS
.- OrphanedUndo 	       ... 1102000300 <=  *All Rel* 08/10 18:24:51 PASS
.- OrphanedIndex	       ... 1102000300 <=  *All Rel* 08/10 18:24:51 PASS
.- OrphanedIndexPartition      ... 1102000300 <=  *All Rel* 08/10 18:24:51 PASS
.- OrphanedIndexSubPartition   ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- OrphanedTable	       ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- OrphanedTablePartition      ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- OrphanedTableSubPartition   ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- MissingPartCol	       ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- OrphanedSeg$ 	       ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- OrphanedIndPartObj#	       ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- DuplicateBlockUse	       ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- FetUet		       ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- Uet0Check		       ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- SeglessUET		       ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- BadInd$		       ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- BadTab$		       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- BadIcolDepCnt	       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- ObjIndDobj		       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- TrgAfterUpgrade	       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- ObjType0		       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- BadOwner		       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- StmtAuditOnCommit	       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- BadPublicObjects	       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- BadSegFreelist	       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- BadDepends		       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- CheckDual		       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- ObjectNames		       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- BadCboHiLo		       ... 1102000300 <=  *All Rel* 08/10 18:24:54 PASS
.- ChkIotTs		       ... 1102000300 <=  *All Rel* 08/10 18:24:54 PASS
.- NoSegmentIndex	       ... 1102000300 <=  *All Rel* 08/10 18:24:54 PASS
.- BadNextObject	       ... 1102000300 <=  *All Rel* 08/10 18:24:54 PASS
.- DroppedROTS		       ... 1102000300 <=  *All Rel* 08/10 18:24:54 PASS
.- FilBlkZero		       ... 1102000300 <=  *All Rel* 08/10 18:24:54 PASS
.- DbmsSchemaCopy	       ... 1102000300 <=  *All Rel* 08/10 18:24:54 PASS
.- OrphanedObjError	       ... 1102000300 >  1102000000 08/10 18:24:54 PASS
.- ObjNotLob		       ... 1102000300 <=  *All Rel* 08/10 18:24:54 PASS
.- MaxControlfSeq	       ... 1102000300 <=  *All Rel* 08/10 18:24:55 PASS
.- SegNotInDeferredStg	       ... 1102000300 >  1102000000 08/10 18:25:18 PASS
.- SystemNotRfile1	       ... 1102000300 >   902000000 08/10 18:25:18 PASS
.- DictOwnNonDefaultSYSTEM     ... 1102000300 <=  *All Rel* 08/10 18:25:18 PASS
.- OrphanTrigger	       ... 1102000300 <=  *All Rel* 08/10 18:25:18 PASS
.- ObjNotTrigger	       ... 1102000300 <=  *All Rel* 08/10 18:25:18 PASS
---------------------------------------
10-AUG-2024 18:25:18  Elapsed: 29 secs
---------------------------------------
Found 0 potential problem(s) and 0 warning(s)

PL/SQL procedure successfully completed.

Statement processed.

Complete output is in trace file:
/u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_71148_HCHECK.trc

运气不错,数据字典本身没有损坏,业务直接运行,一切正常(主要原因是在光纤链路不稳定的情况下,客户已经没有往库中写入数据)