通过alert日志分析客户自行对一个数据库恢复的来龙去脉和点评

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:通过alert日志分析客户自行对一个数据库恢复的来龙去脉和点评

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

12.1.0.2数据库由于异常断电,导致无法正常启动,通过alert日志对客户的整个操作过程进行分析(不含我的操作部分)
12.1.0.2


通过alert日志分析最初故障原因是由于控制文件有坏块导致

Tue Sep 24 11:49:48 2024
alter database open
Tue Sep 24 11:49:48 2024
Ping without log force is disabled
.
Tue Sep 24 11:49:48 2024
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_4715.trc:
ORA-01113: file 10 needs media recovery
ORA-01110: data file 10: '/u01/app/oracle/oradata/xifenfei.dbf'
ORA-1113 signalled during: alter database open...
alter database recover datafile '/u01/app/oracle/oradata/xifenfei.dbf'

offline 无法正常recover的数据文件

Tue Sep 24 13:13:30 2024
Media Recovery Complete (orcl)
Completed: ALTER DATABASE RECOVER  datafile 15  
ALTER DATABASE DATAFILE '/u01/app/oracle/oradata/xifenfei.dbf' END BACKUP
ORA-1235 signalled during: ALTER DATABASE DATAFILE '/u01/app/oracle/oradata/xifenfei.dbf' END BACKUP...
ALTER DATABASE DATAFILE '/u01/app/oracle/oradata/xifenfei.dbf' offline
Completed: ALTER DATABASE DATAFILE '/u01/app/oracle/oradata/xifenfei.dbf' offline
Tue Sep 24 13:25:16 2024
 ALTER DATABASE DATAFILE '/u01/app/oracle/oradata/xff.dbf' offline
Completed:  ALTER DATABASE DATAFILE '/u01/app/oracle/oradata/xff.dbf' offline

然后尝试打开数据库,遭遇ORA-600 4193错误,没有open成功

Tue Sep 24 13:27:06 2024
Media Recovery Complete (orcl)
Completed: ALTER DATABASE RECOVER  datafile 13   
alter database open
Tue Sep 24 13:27:16 2024
Ping without log force is disabled
.
Tue Sep 24 13:27:16 2024
Beginning crash recovery of 1 threads
 parallel recovery started with 7 processes
Tue Sep 24 13:27:16 2024
Started redo scan
Tue Sep 24 13:27:16 2024
Completed redo scan
 read 67 KB redo, 0 data blocks need recovery
Tue Sep 24 13:27:16 2024
Started redo application at
 Thread 1: logseq 7422, block 2, scn 119284797
Tue Sep 24 13:27:16 2024
Recovery of Online Redo Log: Thread 1 Group 3 Seq 7422 Reading mem 0
  Mem# 0: /u01/app/oracle/oradata/orcl/redo03.log
Tue Sep 24 13:27:16 2024
Completed redo application of 0.00MB
Tue Sep 24 13:27:16 2024
Completed crash recovery at
 Thread 1: logseq 7422, block 136, scn 119284798
 0 data blocks read, 0 data blocks written, 67 redo k-bytes read
Initializing SCN for created control file
Database SCN compatibility initialized to 3
Starting background process TMON
Tue Sep 24 13:27:16 2024
TMON started with pid=32, OS id=10617 
Tue Sep 24 13:27:16 2024
Thread 1 advanced to log sequence 7423 (thread open)
Thread 1 opened at log sequence 7423
  Current log# 1 seq# 7423 mem# 0: /u01/app/oracle/oradata/orcl/redo01.log
Successful open of redo thread 1
Tue Sep 24 13:27:16 2024
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Tue Sep 24 13:27:16 2024
SMON: enabling cache recovery
Tue Sep 24 13:27:20 2024
[10553] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:6974064 end:6975474 diff:1410 ms (1.4 seconds)
Dictionary check beginning
Tablespace 'TEMP' #3 found in data dictionary,
but not in the controlfile. Adding to controlfile.
File #10 is offline, but is part of an online tablespace.
data file 10: '/u01/app/oracle/oradata/tbs_data.dbf'
File #14 is offline, but is part of an online tablespace.
data file 14: '/u01/app/oracle/oradata/corsmf03.dbf'
Dictionary check complete
Verifying minimum file header compatibility (11g) for tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
Tue Sep 24 13:27:21 2024
SMON: enabling tx recovery
Tue Sep 24 13:27:21 2024
*********************************************************************
WARNING: The following temporary tablespaces contain no files.
         This condition can occur when a backup controlfile has
         been restored.  It may be necessary to add files to these
         tablespaces.  That can be done using the SQL statement:
 
         ALTER TABLESPACE <tablespace_name> ADD TEMPFILE
 
         Alternatively, if these temporary tablespaces are no longer
         needed, then they can be dropped.
           Empty temporary tablespace: TEMP
*********************************************************************
Updating character set in controlfile to AL32UTF8
Starting background process SMCO
Tue Sep 24 13:27:21 2024
SMCO started with pid=34, OS id=10632 
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_10523.trc  (incident=108129):
ORA-00600: internal error code, arguments: [4193], [21368], [21372], [], [], [], [], [], [], [], [], []
Incident details in:/u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_108129/orcl_smon_10523_i108129.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
…………
Tue Sep 24 13:27:24 2024
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_10553.trc:
ORA-00600: internal error code, arguments: [4193], [21652], [21539], [], []
Tue Sep 24 13:27:24 2024
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_10553.trc:
ORA-00600: internal error code, arguments: [4193], [21652], [21539], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 10553): terminating the instance due to error 600
Tue Sep 24 13:27:25 2024
Instance terminated by USER, pid = 10553
ORA-1092 signalled during: alter database open...

重建了ctl,加入_allow_resetlogs_corruption隐含参数,尝试使用resetlogs方式打开数据库,报ORA-600 2662错误

Tue Sep 24 14:30:22 2024
alter database open RESETLOGS
Tue Sep 24 14:32:09 2024
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 119237645 time 
Online log /u01/app/oracle/oradata/orcl/redo01.log: Thread 1 Group 1 was previously cleared
Online log /u01/app/oracle/oradata/orcl/redo02.log: Thread 1 Group 2 was previously cleared
Online log /u01/app/oracle/oradata/orcl/redo03.log: Thread 1 Group 3 was previously cleared
Tue Sep 24 14:32:09 2024
Setting recovery target incarnation to 2
Tue Sep 24 14:32:09 2024
Ping without log force is disabled
.
Initializing SCN for created control file
Database SCN compatibility initialized to 3
Tue Sep 24 14:32:09 2024
Warning - High Database SCN: Current SCN value is 119237648, threshold SCN value is 0
If you have not previously reported this warning on this database, 
please notify Oracle Support so that additional diagnosis can be performed.
Starting background process TMON
Tue Sep 24 14:32:09 2024
TMON started with pid=25, OS id=15032 
Tue Sep 24 14:32:09 2024
Assigning activation ID 1708301307 (0x65d29bfb)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: /u01/app/oracle/oradata/orcl/redo01.log
Successful open of redo thread 1
Tue Sep 24 14:32:09 2024
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Tue Sep 24 14:32:09 2024
SMON: enabling cache recovery
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_14937.trc  (incident=122458):
ORA-00600: internal error code, arguments: [2662], [0], [119484861], [0], [119484868], [16777344]……
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_122458/orcl_ora_14937_i122458.trc
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_14937.trc  (incident=122459):
………………
Tue Sep 24 14:32:16 2024
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_124802/orcl_ora_14937_i124802.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [2662], [0], [119484866], [0], [119484868], [16777344]……
ORA-00600: internal error code, arguments: [2662], [0], [119484865], [0], [119484868], [16777344]……
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [2662], [0], [119484861], [0], [119484868], [16777344]……

客户的自行恢复到此为止,没有成功,这里客户的恢复没有犯原则性错误(破坏文件的resetlogs 信息),同时也没有解决两个ORA-600错误
1. 在offline部分文件的情况下,打开数据库(没有使用resetlogs,避免了进一步破坏offline文件的resetlogs 信息),但是数据库报ORA-600 4193错误没有打开库成功
2. 后面强制拉库之前重建了ctl文件,避免了offline数据文件在resetlogs之后导致文件头resetlogs 信息和其他文件不一致的可能(因为重建ctl,offline的文件自动onlinne)
3. 最初offline数据文件,启动库报ORA-600 4193故障没有解决,这个故障一般是undo异常导致,这个故障大概率在后面强制拉库open过程中还可能遇到
4. 强制拉库过程中遭遇ORA-600 2662问题,需要修改scn,如果这个问题不解决,数据库无法open成功

存储宕机导致Oracle异常故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:存储宕机导致Oracle异常故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

存储突然掉线,导致数据库crash,报大量ORA-00206 ORA-00202 ORA-15081以及Linux-x86_64 Error: 5: Input/output error之类的错误

Sun Jul 21 20:00:11 2024
Thread 1 advanced to log sequence 1594398 (LGWR switch)
  Current log# 5 seq# 1594398 mem# 0: +DATA/xff/onlinelog/group_5.412.906718739
Sun Jul 21 20:53:17 2024
WARNING: Write Failed. group:2 disk:0 AU:506916 offset:49152 size:16384
Sun Jul 21 20:53:17 2024
WARNING: Read Failed. group:2 disk:2 AU:506931 offset:49152 size:16384
WARNING: failed to read mirror side 1 of virtual extent 4 logical extent 0 of file 415 in group [2.34109396] 
from disk ORACLE_DATA_0002  allocation unit 506931 reason error; if possible, will try another mirror side
Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_ckpt_42142.trc:
ORA-15080: 与磁盘的同步 I/O 操作失败
ORA-27061: 异步 I/O 等待失败
Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 16384
WARNING: failed to write mirror side 1 of virtual extent 0 logical extent 0 
of file 415 in group 2 on disk 0 allocation unit 506916 
Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_ckpt_42142.trc:
ORA-00206: 写入控制文件时出错 (块 3, # 块 1)
ORA-00202: 控制文件: ''+DATA/xff/controlfile/current.415.906718737''
ORA-15081: 无法将 I/O 操作提交到磁盘
ORA-15081: 无法将 I/O 操作提交到磁盘
Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_ckpt_42142.trc:
ORA-00221: 写入控制文件时出错
ORA-00206: 写入控制文件时出错 (块 3, # 块 1)
ORA-00202: 控制文件: ''+DATA/xff/controlfile/current.415.906718737''
ORA-15081: 无法将 I/O 操作提交到磁盘
ORA-15081: 无法将 I/O 操作提交到磁盘
CKPT (ospid: 42142): terminating the instance due to error 221
Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_lmon_42087.trc:
ORA-00202: 控制文件: ''+DATA/xff/controlfile/current.415.906718737''
ORA-15081: 无法将 I/O 操作提交到磁盘
ORA-27072: 文件 I/O 错误
Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 1038194784
Additional information: -1
Sun Jul 21 20:53:19 2024
ORA-1092 : opitsk aborting process
Sun Jul 21 20:53:24 2024
ORA-1092 : opitsk aborting process
Sun Jul 21 20:53:24 2024
License high water mark = 59
Sun Jul 21 20:53:28 2024
Instance terminated by CKPT, pid = 42142
USER (ospid: 64660): terminating the instance
Instance terminated by USER, pid = 64660

存储恢复之后启动数据库报ORA-600 2131错误

Mon Jul 22 09:10:04 2024
ALTER DATABASE   MOUNT
This instance was first to mount
Mon Jul 22 09:10:04 2024
Sweep [inc][490008]: completed
Sweep [inc2][490008]: completed
NOTE: Loaded library: System 
SUCCESS: diskgroup ORACLE_DATA was mounted
NOTE: dependency between database rac and diskgroup resource ora.ORACLE_DATA.dg is established
Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_ora_14301.trc  (incident=492409):
ORA-00600: ??????, ??: [2131], [33], [32], [], [], [], [], [], [], [], [], []
Incident details in: /users/oracle/app/db/diag/rdbms/xff/xff1/incident/incdir_492409/xff1_ora_14301_i492409.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ORA-600 signalled during: ALTER DATABASE   MOUNT...

客户尝试重建ctl进行恢复,结果由于分析不正确,导致在重建ctl的时候,遗漏了3个数据文件,并且在屏蔽一致性的情况下,强制resetlogs操作,结果数据库没有被正常打开,而是报ORA-600 2662错误

alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 9965567206652
Clearing online redo logfile 1 +DATA/xff/onlinelog/group_1.414.906718739
Clearing online log 1 of thread 1 sequence number 0
Clearing online redo logfile 1 complete
Clearing online redo logfile 2 +DATA/xff/onlinelog/group_2.413.906718739
Clearing online log 2 of thread 1 sequence number 0
Clearing online redo logfile 2 complete
Clearing online redo logfile 5 +DATA/xff/onlinelog/group_5.412.906718739
Clearing online log 5 of thread 1 sequence number 0
Clearing online redo logfile 5 complete
Expanded controlfile section 2 from 1 to 63 records
The number of logical blocks in section 2 remains the same
Expanded controlfile section 1 from 4 to 66 records
Requested to grow by 62 records; added 32 blocks of records
Expanded controlfile section 30 from 1 to 63 records
The number of logical blocks in section 30 remains the same
Expanded controlfile section 29 from 1 to 63 records
The number of logical blocks in section 29 remains the same
Control file has been expanded to support 63 threads
Mon Jul 22 23:04:07 2024
Redo thread 2 enabled by open resetlogs or standby activation
Online log +DATA/xff/onlinelog/group_1.414.906718739: Thread 1 Group 1 was previously cleared
Online log +DATA/xff/onlinelog/group_2.413.906718739: Thread 1 Group 2 was previously cleared
Online log +DATA/xff/onlinelog/group_3.501.1175036643: Thread 2 Group 3 was previously cleared
Online log +DATA/xff/onlinelog/group_4.502.1175036645: Thread 2 Group 4 was previously cleared
Online log +DATA/xff/onlinelog/group_5.412.906718739: Thread 1 Group 5 was previously cleared
Mon Jul 22 23:04:08 2024
Setting recovery target incarnation to 2
Initializing SCN for created control file
Database SCN compatibility initialized to 3
Warning - High Database SCN: Current SCN value is 9965567206655, threshold SCN value is 0
If you have not previously reported this warning on this database, 
please notify Oracle Support so that additional diagnosis can be performed.
Mon Jul 22 23:04:09 2024
Assigning activation ID 2763017873 (0xa4b04e91)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: +DATA/xff/onlinelog/group_1.414.906718739
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Mon Jul 22 23:04:10 2024
SMON: enabling cache recovery
Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_ora_64210.trc  (incident=624374):
ORA-00600: 内部错误代码, 参数: [2662], [2320], [1243079939], [2320], [1243211805], [12583040], [], [], [], [], [], []
Incident details in: /users/oracle/app/db/diag/rdbms/xff/xff1/incident/incdir_624374/xff1_ora_64210_i624374.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_ora_64210.trc:
ORA-00600: 内部错误代码, 参数: [2662], [2320], [1243079939], [2320], [1243211805], [12583040], [], [], [], [], [], []
Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_ora_64210.trc:
ORA-00600: 内部错误代码, 参数: [2662], [2320], [1243079939], [2320], [1243211805], [12583040], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 64210): terminating the instance due to error 600
Instance terminated by USER, pid = 64210
ORA-1092 signalled during: alter database open resetlogs...

操作到这里,后续问题就比较麻烦了,因为在asm磁盘组中数据文件重建ctl的时候遗漏3个并且还被resetlogs操作过,导致这三个文件的resetlogs scn和其他数据文件不一致,对于这个问题,解决办法通过Oracle Recovery Tools工具或者bbed修改相关resetlogs scn,然后重建ctl

SQL> @rectl.sql

Control file created.

SQL> RECOVER DATABASE;
Media recovery complete

然后解决之前数据库启动报ORA-600 2662问题,通过修改数据库scn进行解决,可以使用Patch_SCN工具进行快速解决,然后open数据库成功

SQL> ALTER DATABASE OPEN;
 
Database altered.

但是查看alert日志数据库报大量ORA-600 4194、ORA-01595Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0xC21D511] [PC:0x97F4EFA, kgegpa()+40]之类错误

Wed Jul 24 15:24:21 2024
alter database open
Beginning crash recovery of 1 threads
 parallel recovery started with 32 processes
Started redo scan
Completed redo scan
 read 0 KB redo, 0 data blocks need recovery
…………
Database Characterset is ZHS16GBK
No Resource Manager plan active
Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_smon_40279.trc  (incident=777938):
ORA-00600: 内部错误代码, 参数: [4194], [], [], [], [], [], [], [], [], [], [], []
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Wed Jul 24 15:24:40 2024
QMNC started with pid=79, OS id=54632 
Block recovery from logseq 2, block 74 to scn 9965587206835
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: +DATA/xff/onlinelog/redo02
LOGSTDBY: Validating controlfile with logical metadata
Wed Jul 24 15:24:40 2024
Block recovery stopped at EOT rba 2.82.16
Block recovery completed at rba 2.82.16, scn 2320.1263080114
Block recovery from logseq 2, block 74 to scn 9965587206833
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: +DATA/xff/onlinelog/redo02
Block recovery completed at rba 2.82.16, scn 2320.1263080114
Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_smon_40279.trc:
ORA-01595: 释放区 (4) 回退段 (20) 时出错
ORA-00600: 内部错误代码, 参数: [4194], [], [], [], [], [], [], [], [], [], [], []
LOGSTDBY: Validation complete
Wed Jul 24 15:24:41 2024
Sweep [inc][777938]: completed
Sweep [inc2][777938]: completed
Wed Jul 24 15:24:41 2024
Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_q001_54657.trc  (incident=778362):
ORA-00600: 内部错误代码, 参数: [4194], [], [], [], [], [], [], [], [], [], [], []
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Starting background process SMCO
Wed Jul 24 15:24:42 2024
SMCO started with pid=83, OS id=54691 
Block recovery from logseq 2, block 74 to scn 9965587206835
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: +DATA/xff/onlinelog/redo02
Block recovery completed at rba 2.82.16, scn 2320.1263080118
Block recovery from logseq 2, block 74 to scn 9965587206838
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: +DATA/xff/onlinelog/redo02
Block recovery completed at rba 2.83.16, scn 2320.1263080119
Error 600 in kwqmnpartition(), aborting txn 
Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_q001_54657.trc  (incident=778363):
ORA-25319: 队列表重新分区已中止
Completed: alter database open
Block recovery from logseq 2, block 74 to scn 9965587206835
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: +DATA/rac/onlinelog/redo02
Block recovery completed at rba 2.82.16, scn 2320.1263080118
Block recovery from logseq 2, block 74 to scn 9965587207538
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: +DATA/rac/onlinelog/redo02
Block recovery completed at rba 2.1097.16, scn 2320.1263080819
Errors in file /users/oracle/app/db/diag/rdbms/rac/rac1/trace/rac1_cjq0_55657.trc  (incident=778427):
ORA-00600: 内部错误代码, 参数: [600], [ORA-00600: 内部错误代码, 参数: 
[4194], [], [], [], [], [], [], [], [], [], [], []], [], [], [], [], [], [], [], [], [], []
Incident details in: /users/oracle/app/db/diag/rdbms/xff/xff1/incident/incdir_778427/xff1_cjq0_55657_i778427.trc
Exception [type:SIGSEGV, Address not mapped to object][ADDR:0xC21D511][PC:0x97F4EFA, kgegpa()+40][flags: 0x0, count: 1]
Exception [type:SIGSEGV, Address not mapped to object][ADDR:0xC21D511][PC:0x97F396E, kgebse()+776][flags: 0x2, count: 2]
Exception [type:SIGSEGV, Address not mapped to object][ADDR:0xC21D511][PC:0x97F396E, kgebse()+776][flags: 0x2, count: 2]

从报错分析是由于undo异常导致,处理异常undo回滚段之后,数据库open正常,安排逻辑迁移数据,完成本次恢复

数据库open报ORA-600 kcratr_scan_lastbwr故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:数据库open报ORA-600 kcratr_scan_lastbwr故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

由于断电,导致数据库正常open报ORA-600 kcratr_scan_lastbwr错误

Wed Jan 17 18:23:26 2024
ALTER DATABASE   MOUNT
Successful mount of redo thread 1, with mount id 1028618590
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: ALTER DATABASE   MOUNT
Wed Jan 17 18:23:30 2024
ALTER DATABASE OPEN
Beginning crash recovery of 1 threads
 parallel recovery started with 32 processes
Started redo scan
Hex dump of (file 3, block 144) in trace file /database/oracle/app/oracle/diag/rdbms/xff/xff/trace/xff_ora_66361.trc
Reading datafile '/database/oracle/app/oracle/oradata/xff/datafile/o1_mf_undotbs1_hct7001s_.dbf' 
  for corruption at rdba: 0x00c00090 (file 3, block 144)
Reread (file 3, block 144) found same corrupt data (logically corrupt)
Write verification failed for File 3 Block 144 (rdba 0xc00090)
Errors in file /database/oracle/app/oracle/diag/rdbms/xff/xff/trace/xff_ora_66361.trc  (incident=672241):
ORA-00600: internal error code, arguments: [kcratr_scan_lastbwr], [], [], [], [], []
Incident details in: /database/oracle/app/oracle/diag/rdbms/xff/xff/incident/incdir_672241/xff_ora_66361_i672241.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Aborting crash recovery due to error 600
Errors in file /database/oracle/app/oracle/diag/rdbms/xff/xff/trace/xff_ora_66361.trc:
ORA-00600: internal error code, arguments: [kcratr_scan_lastbwr], [], [], [], [], []
Errors in file /database/oracle/app/oracle/diag/rdbms/xff/xff/trace/xff_ora_66361.trc:
ORA-00600: internal error code, arguments: [kcratr_scan_lastbwr], [], [], [], [], []
ORA-600 signalled during: ALTER DATABASE OPEN...

尝试recover 数据库报ORA-600 3020错误

Wed Jan 17 18:28:38 2024
ALTER DATABASE RECOVER  database  
Media Recovery Start
 started logmerger process
Parallel Media Recovery started with 96 slaves
Wed Jan 17 18:28:41 2024
Recovery of Online Redo Log: Thread 1 Group 2 Seq 410864 Reading mem 0
  Mem# 0: /database/oracle/app/oracle/oradata/xff/onlinelog/o1_mf_2_hct740hq_.log
Wed Jan 17 18:28:42 2024
ORA-00600: internal error code, arguments: [3020], [3], [240], [12583152], [], []
ORA-10567: Redo is inconsistent with data block (file# 3, block# 240, file offset is 1966080 bytes)
ORA-10564: tablespace UNDOTBS1
ORA-01110: data file 3: '/database/oracle/app/oracle/oradata/xff/datafile/o1_mf_undotbs1_hct7001s_.dbf'
ORA-10560: block type 'KTU SMU HEADER BLOCK'
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Slave exiting with ORA-600 exception
Errors in file /database/oracle/app/oracle/diag/rdbms/xff/xff/trace/xff_pr19_68212.trc:
ORA-00600: internal error code, arguments: [3020], [3], [240], [12583152], [], []
ORA-10567: Redo is inconsistent with data block (file# 3, block# 240, file offset is 1966080 bytes)
ORA-10564: tablespace UNDOTBS1
ORA-01110: data file 3: '/database/oracle/app/oracle/oradata/xff/datafile/o1_mf_undotbs1_hct7001s_.dbf'
ORA-10560: block type 'KTU SMU HEADER BLOCK'
Errors in file /database/oracle/app/oracle/diag/rdbms/xff/xff/trace/xff_ora_68038.trc  (incident=672243):
ORA-00600: internal error code, arguments: [3020], [3], [240], [12583152], [], []
ORA-10567: Redo is inconsistent with data block (file# 3, block# 240, file offset is 1966080 bytes)
ORA-10564: tablespace UNDOTBS1
ORA-01110: data file 3: '/database/oracle/app/oracle/oradata/xff/datafile/o1_mf_undotbs1_hct7001s_.dbf'
ORA-10560: block type 'KTU SMU HEADER BLOCK'
Wed Jan 17 18:28:43 2024
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Recovery Slave PR19 previously exited with exception 600
Wed Jan 17 18:28:43 2024
Sweep [inc][672865]: completed
Media Recovery failed with error 448
Errors in file /database/oracle/app/oracle/diag/rdbms/xff/xff/trace/xff_pr00_68115.trc:
ORA-00283: recovery session canceled due to errors
ORA-00448: normal completion of background process
ORA-600 signalled during: ALTER DATABASE RECOVER  database  ...

加上隐含参数尝试强制拉库

alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 6165467436
Clearing online redo logfile 1 /database/oracle/app/oracle/oradata/xff/onlinelog/o1_mf_1_hct740fp_.log
Clearing online log 1 of thread 1 sequence number 0
Clearing online redo logfile 1 complete
Clearing online redo logfile 2 /database/oracle/app/oracle/oradata/xff/onlinelog/o1_mf_2_hct740hq_.log
Clearing online log 2 of thread 1 sequence number 0
Clearing online redo logfile 2 complete
Clearing online redo logfile 3 /database/oracle/app/oracle/oradata/xff/onlinelog/o1_mf_3_hct740k7_.log
Clearing online log 3 of thread 1 sequence number 0
Clearing online redo logfile 3 complete
Online log /database/oracle/app/oracle/oradata/xff/onlinelog/o1_mf_1_hct740fp_.log: Thread 1 Group 1 was previously cleared
Online log /database/oracle/app/oracle/oradata/xff/onlinelog/o1_mf_2_hct740hq_.log: Thread 1 Group 2 was previously cleared
Online log /database/oracle/app/oracle/oradata/xff/onlinelog/o1_mf_3_hct740k7_.log: Thread 1 Group 3 was previously cleared
Fri Jan 19 09:24:59 2024
Setting recovery target incarnation to 2
Initializing SCN for created control file
Database SCN compatibility initialized to 3
Warning - High Database SCN: Current SCN value is 6165467439, threshold SCN value is 0
If you have not previously reported this warning on this database,
  please notify Oracle Support so that additional diagnosis can be performed.
Fri Jan 19 09:24:59 2024
Assigning activation ID 1028784413 (0x3d52011d)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: /database/oracle/app/oracle/oradata/xff/onlinelog/o1_mf_1_hct740fp_.log
Successful open of redo thread 1
Fri Jan 19 09:24:59 2024
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Jan 19 09:24:59 2024
SMON: enabling cache recovery
Errors in file /database/oracle/app/oracle/diag/rdbms/xff/xff/trace/xff_ora_84860.trc  (incident=1344255):
ORA-00600: internal error code, arguments: [2662], [1], [1870500147], [1], [1870515285], [12583040]
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /database/oracle/app/oracle/diag/rdbms/xff/xff/trace/xff_ora_84860.trc:
ORA-00600: internal error code, arguments: [2662], [1], [1870500147], [1], [1870515285], [12583040]
Errors in file /database/oracle/app/oracle/diag/rdbms/xff/xff/trace/xff_ora_84860.trc:
ORA-00600: internal error code, arguments: [2662], [1], [1870500147], [1], [1870515285], [12583040]
Error 600 happened during db open, shutting down database
USER (ospid: 84860): terminating the instance due to error 600
Instance terminated by USER, pid = 84860
ORA-1092 signalled during: alter database open resetlogs...

客户自行恢复到这一步,后面无法处理,接手之后进行恢复,其实后面比较简单了,就是修改下数据库scn,数据库就可以open起来,然后处理异常的undo和对象即可,可以参考以前类似文章:
ORA-600 2662快速恢复之Patch scn工具
硬件故障导致ORA-600 2662错误处理
ORA-00600 [2662]和ORA-00600 [4194]恢复
更多参考:惜分飞blog中ORA-600 2662文章

resetlogs强制拉库失败并使用备份system文件还原数据库故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:resetlogs强制拉库失败并使用备份system文件还原数据库故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

接手一个库,在open的过程中遭遇到ORA-600 2662错误

Sun May 26 10:15:54 2024
alter database open RESETLOGS
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 84303583
Clearing online redo logfile 1 /data/OracleData/xff/redo01.log
Clearing online log 1 of thread 1 sequence number 8330
Clearing online redo logfile 1 complete
Clearing online redo logfile 2 /data/OracleData/xff/redo02.log
Clearing online log 2 of thread 1 sequence number 8327
Clearing online redo logfile 2 complete
Clearing online redo logfile 3 /data/OracleData/xff/redo03.log
Clearing online log 3 of thread 1 sequence number 8329
Clearing online redo logfile 3 complete
Clearing online redo logfile 4 /data/OracleData/xff/redo04.log
Clearing online log 4 of thread 1 sequence number 8328
Clearing online redo logfile 4 complete
Resetting resetlogs activation ID 1431370398 (0x5550fa9e)
Online log /data/OracleData/xff/redo01.log: Thread 1 Group 1 was previously cleared
Online log /data/OracleData/xff/redo02.log: Thread 1 Group 2 was previously cleared
Online log /data/OracleData/xff/redo03.log: Thread 1 Group 3 was previously cleared
Online log /data/OracleData/xff/redo04.log: Thread 1 Group 4 was previously cleared
Sun May 26 10:15:59 2024
Setting recovery target incarnation to 3
Sun May 26 10:15:59 2024
Read of datafile '/data/OracleData/xff/temp01.dbf' (fno 201) header failed with ORA-01200
Rereading datafile 201 header failed with ORA-01200
Errors in file /data/u01/app/oracle/diag/rdbms/xff/xff/trace/xff_dbw0_1563.trc:
ORA-01186: file 201 failed verification tests
ORA-01122: database file 201 failed verification check
ORA-01110: data file 201: '/data/OracleData/xff/temp01.dbf'
ORA-01200: actual file size of 3711 is smaller than correct size of 3712 blocks
File 201 not verified due to error ORA-01122
Sun May 26 10:15:59 2024
Assigning activation ID 1509069065 (0x59f29109)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: /data/OracleData/xff/redo01.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sun May 26 10:15:59 2024
SMON: enabling cache recovery
Errors in file /data/u01/app/oracle/diag/rdbms/xff/xff/trace/xff_ora_1590.trc  (incident=163897):
ORA-00600: internal error code, arguments: [2662], [0], [84303590], [0], [84314659], [12583040] 
Incident details in:/data/u01/app/oracle/diag/rdbms/xff/xff/incident/incdir_163897/xff_ora_1590_i163897.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /data/u01/app/oracle/diag/rdbms/xff/xff/trace/xff_ora_1590.trc:
ORA-00600: internal error code, arguments: [2662], [0], [84303590], [0], [84314659], [12583040] 
Errors in file /data/u01/app/oracle/diag/rdbms/xff/xff/trace/xff_ora_1590.trc:
ORA-00600: internal error code, arguments: [2662], [0], [84303590], [0], [84314659], [12583040] 
Error 600 happened during db open, shutting down database
USER (ospid: 1590): terminating the instance due to error 600

然后客户使用备份的system01.dbf文件替换了被resetlogs之后文件,导致数据库后续操作无法继续

SQL> recover database;
ORA-00283: recovery session canceled due to errors
ORA-19909: datafile 1 belongs to an orphan incarnation
ORA-01110: data file 1: '/data/OracleData/xff/system01.dbf'

这个问题比较简单,通过bbed或者Oracle Recovery Tools修改文件头相关信息,然后open数据库成功
重建控制文件丢失数据文件导致悲剧
Oracle Recovery Tools快速恢复ORA-19909

SQL> recover datafile 1;
Media recovery complete.
SQL> recover database;
Media recovery complete.
SQL> alter database open;

Database altered.

但是由于system文件有大量坏块导致数据库无法正常登录和导出

[oracle@et-dbserver ~]$ exp "'/ as sysdba'" owner=USERNAME  file=/tmp/2user.dmp log=/tmp/2user.log 

Export: Release 11.2.0.4.0 - Production on Sun May 26 13:00:50 2024

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

EXP-00056: ORACLE error 604 encountered
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 86500)
ORA-01110: data file 1: '/data/OracleData/xff/system01.dbf'
Username: / as sysdba

EXP-00056: ORACLE error 604 encountered
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 86500)
ORA-01110: data file 1: '/data/OracleData/xff/system01.dbf'
Username:
Password:

EXP-00056: ORACLE error 604 encountered
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 86500)
ORA-01110: data file 1: '/data/OracleData/xff/system01.dbf'
ORA-01017: invalid username/password; logon denied
EXP-00005: all allowable logon attempts failed
EXP-00000: Export terminated unsuccessfully

通过dbv检查system数据文件

DBVERIFY: Release 11.2.0.4.0 - Production on Sun May 26 12:33:28 2024

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.


DBVERIFY - Verification starting : FILE = /data/OracleData/xff/system01.dbf
Page 1044 is influx - most likely media corrupt
Corrupt block relative dba: 0x00400414 (file 1, block 1044)
Fractured block found during dbv: 
Data in bad block:
 type: 0 format: 2 rdba: 0x00400414
 last change scn: 0x0000.00000000 seq: 0x1 flg: 0x05
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x1d7f550b
 check value in block header: 0xa354
 computed block checksum: 0x6830

Page 1103 is marked corrupt
Corrupt block relative dba: 0x0040044f (file 1, block 1103)
Bad header found during dbv: 
Data in bad block:
 type: 0 format: 0 rdba: 0x00000000
 last change scn: 0x508f.5f74492e seq: 0x53 flg: 0x0c
 spare1: 0xc spare2: 0xa6 spare3: 0xc757
 consistency value in tail: 0x00000001
 check value in block header: 0x8925
 computed block checksum: 0x5d3b

Page 1143 is marked corrupt
Corrupt block relative dba: 0x00400477 (file 1, block 1143)
Bad header found during dbv: 
Data in bad block:
 type: 0 format: 0 rdba: 0x00000001
 last change scn: 0x65c4.52eb202e seq: 0x28 flg: 0x0e
 spare1: 0xe spare2: 0xe2 spare3: 0xfa46
 consistency value in tail: 0x00000001
 check value in block header: 0x6405
 computed block checksum: 0x28b1

………………

Page 124805 is influx - most likely media corrupt
Corrupt block relative dba: 0x0041e785 (file 1, block 124805)
Fractured block found during dbv: 
Data in bad block:
 type: 6 format: 2 rdba: 0x0041e785
 last change scn: 0x0000.0434fc6c seq: 0x2 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x1991255b
 check value in block header: 0x6386
 computed block checksum: 0x1384


DBVERIFY - Verification complete

Total Pages Examined         : 130560
Total Pages Processed (Data) : 95634
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 14949
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 1774
Total Pages Processed (Seg)  : 1669
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 16251
Total Pages Marked Corrupt   : 283
Total Pages Influx           : 149
Total Pages Encrypted        : 0
Highest block SCN            : 84314727 (0.84314727)

对于这样问题,通过Oracle Recovery Tools实战批量坏块修复,实现数据库可以完美导出数据

存储故障后oracle报—ORA-01122/ORA-01207故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:存储故障后oracle报—ORA-01122/ORA-01207故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户存储异常,通过硬件恢复解决存储故障之后,oracle数据库无法正常启动(存储cache丢失),尝试recover数据库报ORA-00283 ORA-01122 ORA-01110 ORA-01207错误
以前处理过比较类似的存储故障case:
又一起存储故障导致ORA-00333 ORA-00312恢复
存储故障,强制拉库报ORA-600 kcbzib_kcrsds_1处理

SQL> recover database until cancel;
ORA-00283: 恢复会话因错误而取消
ORA-01122: 数据库文件 536 验证失败
ORA-01110: 数据文件 536: '+DATA/orcl/dt_img_dat511.ora'
ORA-01207: 文件比控制文件更新 - 旧的控制文件
Sun May 05 00:09:03 2024
ALTER DATABASE RECOVER  database until cancel  
Media Recovery Start
 started logmerger process
Sun May 05 00:09:10 2024
SUCCESS: diskgroup FRA was mounted
Sun May 05 00:09:10 2024
NOTE: dependency between database orcl and diskgroup resource ora.FRA.dg is established
Sun May 05 00:09:14 2024
WARNING! Recovering data file 1 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
Media Recovery failed with error 1122
Slave exiting with ORA-283 exception
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_pr00_8048.trc:
ORA-00283: 恢复会话因错误而取消
ORA-01122: 数据库文件 536 验证失败
ORA-01110: 数据文件 536: '+DATA/orcl/dt_img_dat511.ora'
ORA-01207: 文件比控制文件更新 - 旧的控制文件
Sun May 05 00:09:16 2024
Recovery Slave PR00 previously exited with exception 283
ORA-283 signalled during: ALTER DATABASE RECOVER  database until cancel  ...

using backup controlfile进行恢复

SQL> recover database using backup controlfile until cancel;
ORA-00279: 更改 18646239951 (在 04/25/2024 17:14:50 生成) 对于线程 1 是必需的
ORA-00289: 建议:
+FRA/orcl/archivelog/2024_04_25/thread_1_seq_1003886.199435.1167240505
ORA-00280: 更改 18646239951 (用于线程 1) 在序列 #1003886 中


指定日志: {<RET>=suggested | filename | AUTO | CANCEL}
auto
ORA-00279: 更改 18646239951 (在 04/25/2024 17:11:40 生成) 对于线程 2 是必需的
ORA-00289: 建议:
+FRA/orcl/archivelog/2024_04_25/thread_2_seq_677876.199531.1167239807
ORA-00280: 更改 18646239951 (用于线程 2) 在序列 #677876 中


ORA-00279: 更改 18646255791 (在 04/25/2024 17:16:46 生成) 对于线程 2 是必需的
ORA-00289: 建议:
+FRA/orcl/archivelog/2024_04_25/thread_2_seq_677877.199472.1167240099
ORA-00280: 更改 18646255791 (用于线程 2) 在序列 #677877 中
ORA-00278: 此恢复不再需要日志文件
'+FRA/orcl/archivelog/2024_04_25/thread_2_seq_677876.199531.1167239807'


ORA-00279: 更改 18646295647 (在 04/25/2024 17:21:38 生成) 对于线程 2 是必需的
ORA-00289: 建议:
+FRA/orcl/archivelog/2024_04_25/thread_2_seq_677878.199379.1167240623
ORA-00280: 更改 18646295647 (用于线程 2) 在序列 #677878 中
ORA-00278: 此恢复不再需要日志文件
'+FRA/orcl/archivelog/2024_04_25/thread_2_seq_677877.199472.1167240099'


ORA-00279: 更改 18646331784 (在 04/25/2024 17:28:25 生成) 对于线程 1 是必需的
ORA-00289: 建议:
+FRA/orcl/archivelog/2024_04_25/thread_1_seq_1003887.199320.1167241507
ORA-00280: 更改 18646331784 (用于线程 1) 在序列 #1003887 中
ORA-00278: 此恢复不再需要日志文件
'+FRA/orcl/archivelog/2024_04_25/thread_1_seq_1003886.199435.1167240505'


ORA-00308: 无法打开归档日志
'+FRA/orcl/archivelog/2024_04_25/thread_1_seq_1003887.199320.1167241507'
ORA-17503: ksfdopn: 2 未能打开文件
+FRA/orcl/archivelog/2024_04_25/thread_1_seq_1003887.199320.1167241507
ORA-15012: ASM file
'+FRA/orcl/archivelog/2024_04_25/thread_1_seq_1003887.199320.1167241507' does not exist


ORA-10879: error signaled in parallel recovery slave
ORA-01547: 警告: RECOVER 成功但 OPEN RESETLOGS 将出现如下错误
ORA-01194: 文件 1 需要更多的恢复来保持一致性
ORA-01110: 数据文件 1: '+DATA/orcl/system01.dbf'

通过分析,确认由于cache丢失导致thread_1_seq_1003887这个日志丢失(而且redo已经被覆盖)
20240506-2


20240506-1

数据库无法通过正常recover的思路解决.通过屏蔽一致性,强制打开数据库,alert日志报ORA-600 2662错误

Sat May 04 17:23:00 2024
Redo thread 2 internally disabled at seq 1 (CKPT)
ARC1: Archiving disabled thread 2 sequence 1
Archived Log entry 2 added for thread 2 sequence 1 ID 0x0 dest 1:
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_ora_3684.trc  (incident=47066):
ORA-00600: ??????, ??: [2662], [4], [1466533588], [4], [1466584862], [12583040], [], [], [], [], [], []
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_ora_3684.trc:
ORA-00600: ??????, ??: [2662], [4], [1466533588], [4], [1466584862], [12583040], [], [], [], [], [], []
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_ora_3684.trc:
ORA-00600: ??????, ??: [2662], [4], [1466533588], [4], [1466584862], [12583040], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 3684): terminating the instance due to error 600
Instance terminated by USER, pid = 3684
ORA-1092 signalled during: alter database open resetlogs...

通过修改数据库scn,open数据库报ORA-600 4137

Sun May 05 00:12:41 2024
replication_dependency_tracking turned off (no async multimaster replication found)
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
Completed: alter database open resetlogs
Sun May 05 00:12:56 2024
Trace dumping is performing id=[cdmp_20240505001256]
Sun May 05 00:12:56 2024
ORACLE Instance orcl1 (pid = 22) - Error 600 encountered while recovering transaction (28, 21).
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_smon_5896.trc:
ORA-00600: ??????, ??: [4137], [28.21.42965783], [0], [0], [], [], [], [], [], [], [], []

这个错误比较明显,由于28号回滚段异常导致,对异常回滚段进行处理,重建undo,数据库恢复主要工作完成

数据库open成功后报ORA-00353 ORA-00354错误引起的一系列问题(本质ntfs文件系统异常)

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:数据库open成功后报ORA-00353 ORA-00354错误引起的一系列问题(本质ntfs文件系统异常)

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

服务器异常断电之后,开机启动数据库启动成功,但是报ORA-00353 ORA-00354以及ORA-600 kdsgrp1错误

Sun Mar 31 01:19:51 2024
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sun Mar 31 01:19:51 2024
SMON: enabling cache recovery
[4528] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:64250 end:64859 diff:609 (6 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
No Resource Manager plan active
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Sun Mar 31 01:19:57 2024
QMNC started with pid=40, OS id=4912 
Sun Mar 31 01:20:03 2024
Completed: alter database open
Sun Mar 31 01:20:04 2024
Starting background process CJQ0
Sun Mar 31 01:20:04 2024
CJQ0 started with pid=20, OS id=5104 
Setting Resource Manager plan SCHEDULER[0x32DE]:DEFAULT_MAINTENANCE_PLAN via scheduler window
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Sun Mar 31 01:20:07 2024
Starting background process VKRM
Sun Mar 31 01:20:07 2024
VKRM started with pid=48, OS id=4160 
Sun Mar 31 01:20:32 2024
Incomplete read from log member 'E:\APP\ARCHIVELOG\ARC0000005281_1157968160.0001'. Trying next member.
Incomplete read from log member 'E:\APP\ARCHIVELOG\ARC0000005281_1157968160.0001'. Trying next member.
Errors in file E:\APP\ADMINISTRATOR\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_j000_4180.trc(incident=132393):
ORA-00353: 日志损坏接近块 13792 更改 33686946 时间 03/31/2024 00:17:54
ORA-00334: 归档日志: 'E:\APP\ARCHIVELOG\ARC0000005281_1157968160.0001'
Incomplete read from log member 'E:\APP\ARCHIVELOG\ARC0000005281_1157968160.0001'. Trying next member.
Incomplete read from log member 'E:\APP\ARCHIVELOG\ARC0000005281_1157968160.0001'. Trying next member.
Errors in file E:\APP\ADMINISTRATOR\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_j000_4180.trc(incident=132394):
ORA-00353: 日志损坏接近块 13792 更改 33686946 时间 03/31/2024 00:17:54
ORA-00334: 归档日志: 'E:\APP\ARCHIVELOG\ARC0000005281_1157968160.0001'
ORA-00353: 日志损坏接近块 13792 更改 33686946 时间 03/31/2024 00:17:54
ORA-00334: 归档日志: 'E:\APP\ARCHIVELOG\ARC0000005281_1157968160.0001'
Errors in file E:\APP\ADMINISTRATOR\diag\rdbms\xifenfei\xifenfei\incident\incdir_132393\xifenfei_j000_4180_i132393.trc:
ORA-00354: 损坏重做日志块标头
ORA-00353: 日志损坏接近块 13792 更改 33686946 时间 03/31/2024 00:17:54
ORA-00334: 归档日志: 'E:\APP\ARCHIVELOG\ARC0000005281_1157968160.0001'
ORA-00353: 日志损坏接近块 13792 更改 33686946 时间 03/31/2024 00:17:54
ORA-00334: 归档日志: 'E:\APP\ARCHIVELOG\ARC0000005281_1157968160.0001'
Sun Mar 31 01:20:35 2024
Sweep [inc][132394]: completed
Errors in file E:\APP\ADMINISTRATOR\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_j000_4180.trc:
ORA-00354: 损坏重做日志块标头
ORA-00353: 日志损坏接近块 13792 更改 33686946 时间 03/31/2024 00:17:54
ORA-00334: 归档日志: 'E:\APP\ARCHIVELOG\ARC0000005281_1157968160.0001'
Errors in file E:\APP\ADMINISTRATOR\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_j000_4180.trc(incident=132395):
ORA-00600: 内部错误代码, 参数: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
Incomplete read from log member 'E:\APP\ARCHIVELOG\ARC0000005281_1157968160.0001'. Trying next member.
Incomplete read from log member 'E:\APP\ARCHIVELOG\ARC0000005281_1157968160.0001'. Trying next member.
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.

再次重启数据库报ORA-600 2131错误,数据库无法正常mount

Sun Mar 31 08:59:20 2024
alter database mount exclusive
Errors in file E:\APP\ADMINISTRATOR\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_ora_3232.trc  (incident=144175):
ORA-00600: ??????, ??: [2131], [9], [8], [], [], [], [], [], [], [], [], []
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ORA-600 signalled during: alter database mount exclusive...

因为这个库有ctl备份,通过rman还原ctl备份,然后尝试recover库,结果报ORA-00310 ORA-00334(由于需要的redo无法正常应用导致)
20240403223225


对于这类情况,只能通过屏蔽一致性强制打开库
20240403223432

数据报ORA-600 2662错误,此类错误比较简单,使用patch scn工具一键搞定(ORA-600 2662快速恢复之Patch scn工具),数据库open成功,导出数据完整恢复
对于正常open的库,出现此类问题属于反常现象,通过分析系统事件确定是由于ntfs文件系统本身有问题导致
20240403223855

ORA-600 2662快速恢复之Patch scn工具

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-600 2662快速恢复之Patch scn工具

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户数据库启动报ORA-600 2662错误

SQL> recover database;
完成介质恢复。
SQL> alter database open ;
alter database open 
*
第 1 行出现错误:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [2662], [0], [272845978], [0], [272853062], [12583040]
ORA-00600: internal error code, arguments: [2662], [0], [272845977], [0], [272853062], [12583040]
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [2662], [0], [272845973], [0], [272853062], [12583040]
进程 ID: 3528
会话 ID: 33 序列号: 1

通过自研开发的patch scn工具,修改数据库scn值
20240313180657


然后open数据库成功
20240313181402

对于这类故障,patch scn工具是最快速的解决方案

又一例:ORA-600 kclchkblk_4和2662故障

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:又一例:ORA-600 kclchkblk_4和2662故障

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户恢复请求:由于未知原因导致aix环境的rac两台主机同时重启之后数据库无法正常启动,初步判断是由于写丢失导致故障(ORA-00742 ORA-00353)

Wed Feb 21 09:23:06 2024
ALTER DATABASE OPEN
This instance was first to open
Abort recovery for domain 0
Errors in file /oracle/db/diag/rdbms/xff/xff1/trace/xff1_ora_5767246.trc:
ORA-01113: file 32 needs media recovery
ORA-01110: data file 32: '+DATA/xff/datafile/x5sys_cs.dbf'
ORA-1113 signalled during: ALTER DATABASE OPEN...
Wed Feb 21 09:23:27 2024
ALTER DATABASE RECOVER  datafile '+DATA/xff/datafile/x5sys_cs.dbf'  
Media Recovery Start
Serial Media Recovery started
WARNING! Recovering data file 32 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
Recovery of Online Redo Log: Thread 2 Group 14 Seq 48490 Reading mem 0
  Mem# 0: +DATA/xff/onlinelog/group_14.313.1060528521
Recovery of Online Redo Log: Thread 1 Group 7 Seq 64195 Reading mem 0
  Mem# 0: +DATA/xff/onlinelog/group_7.306.1060527979
Wed Feb 21 09:24:10 2024
Errors in file /oracle/db/diag/rdbms/xff/xff1/trace/xff1_ora_5767246.trc:
ORA-00742: Log read detects lost write in thread %d sequence %d block %d
ORA-00334: archived log: '+DATA/xff/onlinelog/group_14.313.1060528521'
Errors in file /oracle/db/diag/rdbms/xff/xff1/trace/xff1_ora_5767246.trc  (incident=336478):
ORA-00353: log corruption near block 139727 change 26346459680 time 02/20/2024 20:13:50
ORA-00312: online log 14 thread 2: '+DATA/xff/onlinelog/group_14.313.1060528521'

尝试屏蔽一致性强制拉库后数据库报ORA-600 kclchkblk_4
参考:ora-600 2662和ora-600 kclchkblk_4恢复redo异常 ORA-600 kclchkblk_4 故障恢复

Wed Feb 21 09:55:26 2024
SMON: enabling cache recovery
Wed Feb 21 09:55:26 2024
Redo thread 2 internally disabled at seq 5 (CKPT)
Archived Log entry 112707 added for thread 2 sequence 4 ID 0xffffffffe144183b dest 1:
ARC0: Archiving disabled thread 2 sequence 5
Archived Log entry 112708 added for thread 2 sequence 5 ID 0xffffffffe144183b dest 1:
Wed Feb 21 09:55:28 2024
Errors in file /oracle/db/diag/rdbms/xff/xff1/trace/xff1_ora_6423264.trc  (incident=360479):
ORA-00600: internal error code, arguments: [kclchkblk_4], [6], [576721660], [6], [576702892]
Incident details in: /oracle/db/diag/rdbms/xff/xff1/incident/incdir_360479/xff1_ora_6423264_i360479.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thread 1 advanced to log sequence 3 (LGWR switch)
  Current log# 7 seq# 3 mem# 0: +DATA/xff/onlinelog/group_7.306.1161510375
Archived Log entry 112709 added for thread 1 sequence 2 ID 0xffffffffe144183b dest 1:
Wed Feb 21 09:55:31 2024
Errors in file /oracle/db/diag/rdbms/xff/xff1/trace/xff1_ora_6423264.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [kclchkblk_4], [6], [576721660], [6], [576702892]
Errors in file /oracle/db/diag/rdbms/xff/xff1/trace/xff1_ora_6423264.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [kclchkblk_4], [6], [576721660], [6], [576702892]

后续处理中出现和这个错误类似的ORA-600 2662错误

Wed Feb 21 15:37:35 2024
SMON: enabling cache recovery
Errors in file /oracle/db/diag/rdbms/xff/xff1/trace/xff1_ora_6357664.trc  (incident=432423):
ORA-00600: internal error code, arguments: [2662], [6], [576742938], [6], [576834283], [12583104]
Incident details in: /oracle/db/diag/rdbms/xff/xff1/incident/incdir_432423/xff1_ora_6357664_i432423.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /oracle/db/diag/rdbms/xff/xff1/trace/xff1_ora_6357664.trc:
ORA-00600: internal error code, arguments: [2662], [6], [576742938], [6], [576834283], [12583104], [], [], [], [], [], []
Errors in file /oracle/db/diag/rdbms/xff/xff1/trace/xff1_ora_6357664.trc:
ORA-00600: internal error code, arguments: [2662], [6], [576742938], [6], [576834283], [12583104], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 6357664): terminating the instance due to error 600
Instance terminated by USER, pid = 6357664

通过对oracle scn进行修改,数据库open成功

SQL> recover database;
Media recovery complete.
SQL> 
SQL> 
SQL> 
SQL> oradebug setmypid
Statement processed.
SQL> oradebug DUMPvar SGA kcsgscn_
kcslf kcsgscn_ [700000000019B70, 700000000019BA0) = 00000000 00000000 00000000 00000000 
SQL> oradebug poke 
BEFORE: [700000000019B70, 700000000019B78) = 00000000 00000000
AFTER:  [700000000019B70, 700000000019B78) = 00000006 22710D2B
SQL> oradebug DUMPvar SGA kcsgscn_ 
kcslf kcsgscn_ [700000000019B70, 700000000019BA0) = 00000006 22710D2B 00000000 00000000 
SQL> alter database open;

Database altered.

后续检查发现obj$中的index异常(ORA-08102: index key not found, obj# 39)
类似文章:通过bbed修改obj$中dataobj$重现I_OBJ4索引报ORA-08102错误

SQL> truncate CLUSTER "SYS"."SMON_SCN_TO_TIME_AUX";

truncate CLUSTER "SYS"."SMON_SCN_TO_TIME_AUX"
                       *
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-08102: index key not found, obj# 39, file 1, block 967206 (2)

SQL> SQL> select object_name,object_type from dba_objects where object_id=39;

OBJECT_NAME                    OBJECT_TYPE
------------------------------ -------------------
I_OBJ4                         INDEX

对于此类问题使用非常规方法把obj$字典表进行重建(需要注意undo需要为自动管理方式,temp不能为空),参考:
bootstrap$核心index(I_OBJ1,I_USER1,I_FILE#_BLOCK#,I_IND1,I_TS#,I_CDEF1等)异常恢复—ORA-00701错误解决

从ORA-00283 ORA-16433报错开始恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:从ORA-00283 ORA-16433报错开始恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

接手一个客户无法正常启动的故障数据库,尝试recover 报ORA-00283 ORA-16433错误

[oracle@xff trace]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Sat Jan 27 04:46:23 2024

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


???:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> show pdbs;
SQL> select open_mode from v$database;

OPEN_MODE
--------------------
MOUNTED

SQL>
SQL> recover database;
ORA-00283: recovery session canceled due to errors
ORA-16433: The database must be opened in read/write mode

通过对控制文件进行处理,再次尝试recover库

SQL> recover database;
ORA-00399: corrupt change description in redo log
ORA-00353: log corruption near block 134877 change 3249721295 time 01/27/2024 00:21:05
ORA-00312: online log 1 thread 1:'/u01/app/oracle/oradata/xff/redo01.log'

由于redo和数据文件不匹配,无法正常recover库,尝试强制打开库报ORA-600 2662错误

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [2662], [0], [3249721308], [0],[3249730440], [16777344],[],[],[],[],[],[]
ORA-00600: internal error code, arguments: [2662], [0], [3249721307], [0],[3249730440], [16777344],[],[],[],[],[],[]
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [2662], [0], [3249721303], [0],[3249730440], [16777344],[],[],[],[],[],[]
Process ID: 117336
Session ID: 1146 Serial number: 11764

基于这种错误,尝试oradebug修改scn

SQL> oradebug setmypid
oradebug DUMPvar SGA kcsgscn_
Statement processed.
SQL> kcslf kcsgscn_ [06001FBB0, 06001FBE0) = 00000000 00000000 00000000 00000000 00000000 
SQL> oradebug poke 0x06001FBB0 4 0x10000000
oradebug DUMPvar SGA kcsgscn_
ORA-32521: error parsing ORADEBUG command:

发现报ORA-32521错误,证明常规的oradebug方法无法修改scn,参考相关文章:
oradebug poke ORA-32521/ORA-32519故障解决
第一次通过其他方法处理,由于计算失误导致数据库启动报ORA-600 2252错误

SQL> ALTER DATABASE OPEN RESETLOGS;
ALTER DATABASE OPEN RESETLOGS
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [2252], [45264], [0], [11641],[3340959744], [],[],[],[],[],[]

该错误是相关文章参考:
记录一次ORA-00600[2252]故障解决
ORA-00600: internal error code, arguments: [2252], [3987]
主机断电系统回到N年前数据库报ORA-600 kcm_headroom_warn_1错误
处理正确的scn值之后,数据库open成功,然后逻辑方式导出数据,恢复工作完成

SQL> alter database open ;

Database altered.

ORA-600 kcrf_resilver_log_1故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-600 kcrf_resilver_log_1故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

接手一个客户的数据库故障处理,最初数据库启动报ORA-600 kcrf_resilver_log_1错

Mon Jan 08 16:16:22 2024
ALTER DATABASE   MOUNT
Successful mount of redo thread 1, with mount id 2385308630
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: ALTER DATABASE   MOUNT
Mon Jan 08 16:16:26 2024
ALTER DATABASE OPEN
Beginning crash recovery of 1 threads
 parallel recovery started with 15 processes
Started redo scan
Errors in file /u2/11G/base/diag/rdbms/xff/xff/trace/xff_ora_3350.trc  (incident=276167):
ORA-00600: internal error code, arguments: [kcrf_resilver_log_1], [0x1542C3A00], [2], , 
Incident details in:/u2/11G/base/diag/rdbms/xff/xff/incident/incdir_276167/xff_ora_3350_i276167.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Aborting crash recovery due to error 600
Errors in file /u2/11G/base/diag/rdbms/xff/xff/trace/xff_ora_3350.trc:
ORA-00600: internal error code, arguments: [kcrf_resilver_log_1], [0x1542C3A00], [2]
Errors in file /u2/11G/base/diag/rdbms/xff/xff/trace/xff_ora_3350.trc:
ORA-00600: internal error code, arguments: [kcrf_resilver_log_1], [0x1542C3A00], [2]
ORA-600 signalled during: ALTER DATABASE OPEN...

客户自行recover数据库之后报ORA-00283 ORA-00742 ORA-00312错

Mon Jan 08 17:05:34 2024
ALTER DATABASE RECOVER  database  
Media Recovery Start
 started logmerger process
Parallel Media Recovery started with 16 slaves
Mon Jan 08 17:05:35 2024
Recovery of Online Redo Log: Thread 1 Group 2 Seq 63899 Reading mem 0
  Mem# 0: /u2/11G/data/xff/redo02.log
Media Recovery failed with error 742
Errors in file /u2/11G/base/diag/rdbms/xff/xff/trace/xff_pr00_3857.trc:
ORA-00283: recovery session canceled due to errors
ORA-00742: Log read detects lost write in thread %d sequence %d block %d
ORA-00312: online log 2 thread 1: '/u2/11G/data/xff/redo02.log'
ORA-283 signalled during: ALTER DATABASE RECOVER  database  ...

客户强制打开数据库

Tue Jan 09 17:37:51 2024
ALTER DATABASE OPEN
Errors in file /u2/11G/base/diag/rdbms/xff/xff/trace/xff_ora_3501.trc:
ORA-01589: must use RESETLOGS or NORESETLOGS option for database open
ORA-1589 signalled during: ALTER DATABASE OPEN...
Tue Jan 09 17:43:45 2024
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 4418119911
Resetting resetlogs activation ID 2289128497 (0x88715431)
Online log /u2/11G/data/xff/redo01.log: Thread 1 Group 1 was previously cleared
Online log /u2/11G/data/xff/redo02.log: Thread 1 Group 2 was previously cleared
Online log /u2/11G/data/xff/redo03.log: Thread 1 Group 3 was previously cleared
Tue Jan 09 17:43:46 2024
Setting recovery target incarnation to 3
Tue Jan 09 17:43:46 2024
Assigning activation ID 2385405291 (0x8e2e656b)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: /u2/11G/data/xff/redo01.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Tue Jan 09 17:43:46 2024
SMON: enabling cache recovery
Errors in file /u2/11G/base/diag/rdbms/xff/xff/trace/xff_ora_3501.trc  (incident=492171):
ORA-00600: internal error code, arguments: [2662], [1], [123152622], [1], [123176387], [12583040], []
Incident details in: /u2/11G/base/diag/rdbms/xff/xff/incident/incdir_492171/xff_ora_3501_i492171.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u2/11G/base/diag/rdbms/xff/xff/trace/xff_ora_3501.trc:
ORA-00600: internal error code, arguments: [2662], [1], [123152622], [1], [123176387], [12583040], []
Errors in file /u2/11G/base/diag/rdbms/xff/xff/trace/xff_ora_3501.trc:
ORA-00600: internal error code, arguments: [2662], [1], [123152622], [1], [123176387], [12583040], []
Error 600 happened during db open, shutting down database
USER (ospid: 3501): terminating the instance due to error 600
Instance terminated by USER, pid = 3501
ORA-1092 signalled during: alter database open resetlogs...

这个故障相对比较简单,修改数据库scn之后,即可open数据库,然后逻辑方式迁移数据到新库即可