RAC环境redo在各节点本地导致数据库故障恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:RAC环境redo在各节点本地导致数据库故障恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

最近一个运行在win平台的rac,由于断电之后,集群两个节点均无法正常启动,客户进行了一系列尝试,结果到了ora-600 kclchkblk_4错误无法继续.
通过对数据库日志分析,回溯了故障大概的原因,启动的时候报错为:
节点1启动报错

Sun Aug 03 15:21:22 2025
alter database open
This instance was first to open
Beginning crash recovery of 2 threads
 parallel recovery started with 32 processes
Started redo scan
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl1\trace\orcl1_ora_7108.trc:
ORA-00314: 日志 11 (用于线程 2) 要求的 sequence# 147717 与 147541 不匹配
ORA-00312: 联机日志 11 线程 2: 'D:\REDOLOG\REDO011.LOG'
Abort recovery for domain 0
Aborting crash recovery due to error 314
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl1\trace\orcl1_ora_7108.trc:
ORA-00314: 日志 11 (用于线程 2) 要求的 sequence# 147717 与 147541 不匹配
ORA-00312: 联机日志 11 线程 2: 'D:\REDOLOG\REDO011.LOG'
Abort recovery for domain 0
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl1\trace\orcl1_ora_7108.trc:
ORA-00314: 日志 11 (用于线程 2) 要求的 sequence# 147717 与 147541 不匹配
ORA-00312: 联机日志 11 线程 2: 'D:\REDOLOG\REDO011.LOG'
ORA-314 signalled during: alter database open...

节点2启动报错

Sat Aug 02 15:45:43 2025
Successful mount of redo thread 2, with mount id 1735887907
Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
Lost write protection disabled
Completed: ALTER DATABASE MOUNT /* db agent *//* {1:47460:124} */
ALTER DATABASE OPEN /* db agent *//* {1:47460:124} */
This instance was first to open
Beginning crash recovery of 2 threads
Sat Aug 02 15:45:49 2025
 parallel recovery started with 32 processes
Started redo scan
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl2\trace\orcl2_ora_3444.trc:
ORA-00314: ?? 1 (???? 1) ??? sequence# 67782 ? 60818 ???
ORA-00312: ???? 1 ?? 1: 'D:\REDOLOG\REDO01.LOG'
Abort recovery for domain 0
Aborting crash recovery due to error 314
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl2\trace\orcl2_ora_3444.trc:
ORA-00314: ?? 1 (???? 1) ??? sequence# 67782 ? 60818 ???
ORA-00312: ???? 1 ?? 1: 'D:\REDOLOG\REDO01.LOG'
Abort recovery for domain 0
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl2\trace\orcl2_ora_3444.trc:
ORA-00314: ?? 1 (???? 1) ??? sequence# 67782 ? 60818 ???
ORA-00312: ???? 1 ?? 1: 'D:\REDOLOG\REDO01.LOG'
ORA-314 signalled during: ALTER DATABASE OPEN /* db agent *//* {1:47460:124} */...

看到这两个报错信息得出两个结论:
1)比较明显节点1需要thead 2的 group 11 sequence为147717,但是实际group 11文件的sequence为147541;而节点2启动需要thread 1的group 1 sequence为67782,但是实际中group 1文件的sequnece为60818,这两个都相差比较多,属于非正常的情况,很可能是文件本身有问题
2)这是一套win的rac架构,理论上redo应该在共享文件系统(一般是asm中),而这个第一感觉很可能是本地文件系统中
客户当时恢复之时查询信息截图
ORA-00314-ORA-00312


查看了两个节点的最后redo切换信息

--节点1
Sat Aug 02 10:49:31 2025
Thread 1 advanced to log sequence 67782 (LGWR switch)
  Current log# 1 seq# 67782 mem# 0: D:\REDOLOG\REDO01.LOG

--节点2(redo每组2G,节点2长时间没跑业务,之时做数据库导出操作,所以切换时间比较久远)
Sat Jul 26 16:56:42 2025
Thread 2 advanced to log sequence 147717 (LGWR switch)
  Current log# 11 seq# 147717 mem# 0: D:\REDOLOG\REDO011.LOG

并查看两个机器d:/redolog信息(客户自行resetlogs之后的,非第一现场,但是可以确认两个节点各自有一份redo文件
redo


本来这个是一个比较小的故障,只要把节点2的thread 1的redo拷贝到到节点1或者节点1的thread 2的redo拷贝到节点2,然后正常open库即可,现场恢复对rac不太熟悉,直接按照互联网上检索的处理方法,加上_allow_resetlogs_corruption然后强制拉库,结果不太幸运,拉库失败报ORA-600 kclchkblk_4错误
kclchkblk_4


Sun Aug 03 18:59:24 2025
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 21497084214
Resetting resetlogs activation ID 1543012633 (0x5bf88119)
Sun Aug 03 18:59:46 2025
Setting recovery target incarnation to 3
Sun Aug 03 18:59:46 2025
This instance was first to open
Picked broadcast on commit scheme to generate SCNs
Sun Aug 03 18:59:48 2025
Assigning activation ID 1735960667 (0x6778a85b)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: D:\REDOLOG\REDO01.LOG
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sun Aug 03 18:59:49 2025
SMON: enabling cache recovery
Instance recovery: looking for dead threads
Instance recovery: lock domain invalid but no dead threads
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl1\trace\orcl1_ora_8508.trc  (incident=728324):
ORA-00600: 内部错误代码, 参数: [kclchkblk_4], [5], [200595988], [5], [22247740], [], [], [], [], [], [], []
Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl1\incident\incdir_728324\orcl1_ora_8508_i728324.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Sun Aug 03 18:59:53 2025
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl1\trace\orcl1_ora_8508.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00600: 内部错误代码, 参数: [kclchkblk_4], [5], [200595988], [5], [22247740], [], [], [], [], [], [], []
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl1\trace\orcl1_ora_8508.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00600: 内部错误代码, 参数: [kclchkblk_4], [5], [200595988], [5], [22247740], [], [], [], [], [], [], []
Error 704 happened during db open, shutting down database
USER (ospid: 8508): terminating the instance due to error 704
Sun Aug 03 18:59:54 2025
opiodr aborting process unknown ospid (9480) as a result of ORA-1092
Sun Aug 03 19:00:09 2025
Instance terminated by USER, pid = 8508
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (8508) as a result of ORA-1092

这个故障之后,客户那边无法自行恢复,让我这边介入处理,对于这个错误以前处理比较多,一般就是scn问题,通过Patch SCN小工具快速解决
QQ20250808-185918


数据库open成功之后主要报一些ORA-600 4137,ORA-600 6006等错误

Database Characterset is ZHS16GBK
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl1\trace\orcl1_smon_8300.trc  (incident=1176205):
ORA-00600: 内部错误代码, 参数: [4137], [1.14.2713957], [0], [0], [], [], [], [], [], [], [], []
Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl1\incident\incdir_1176205\orcl1_smon_8300_i1176205.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Fri Aug 08 19:03:14 2025
ORACLE Instance orcl1 (pid = 25) - Error 600 encountered while recovering transaction (1, 14).
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl1\trace\orcl1_smon_8300.trc:
ORA-00600: 内部错误代码, 参数: [4137], [1.14.2713957], [0], [0], [], [], [], [], [], [], [], []
Fri Aug 08 19:03:15 2025
ORACLE Instance orcl1 (pid = 25) - Error 600 encountered while recovering transaction (5, 19).
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl1\trace\orcl1_smon_8300.trc:
ORA-00600: 内部错误代码, 参数: [4137], [5.19.2318502], [0], [0], [], [], [], [], [], [], [], []
Starting background process MMON
Fri Aug 08 19:03:18 2025
MMON started with pid=29, OS id=4624
Fri Aug 08 19:03:19 2025
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl1\trace\orcl1_smon_8300.trc  (incident=1176207):
ORA-00600: 内部错误代码, 参数: [6006], [1], [], [], [], [], [], [], [], [], [], []
Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl1\incident\incdir_1176207\orcl1_smon_8300_i1176207.trc
Starting background process MMNL
Fri Aug 08 19:03:19 2025
MMNL started with pid=30, OS id=8344
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ORACLE Instance orcl1 (pid = 25) - Error 600 encountered while recovering transaction (46, 28) on object 197344.
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl1\trace\orcl1_smon_8300.trc:
ORA-00600: 内部错误代码, 参数: [6006], [1], [], [], [], [], [], [], [], [], [], []

通过重建undo解决该错误,数据库稳定运行,没有再crash和报明显错误,导出核心数据,完成本次恢复任务.

虚拟机故障引起ORA-00310 ORA-00334故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:虚拟机故障引起ORA-00310 ORA-00334故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户由于硬件底层问题,导致运行在虚拟机环境中的oracle数据库突然爆大量错误

Reread (file 5, block 2371528) found same corrupt data (no logical check)
Errors in file /home/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_j000_10927.trc  (incident=397049):
ORA-01578: ORACLE data block corrupted (file # 5, block # 2371528)
ORA-01110: data file 5: '/home/oracle/app/oradata/users01.dbf'

Wed Apr 02 23:10:24 2025
Errors in file /home/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_j000_10927.trc  (incident=397050):
ORA-00600: internal error code, arguments: [5400], [], [], [], [], [], [], [], [], [], [], []

Wed Apr 02 23:15:29 2025
Errors in file /home/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_11605.trc  (incident=397075):
ORA-00600: internal error code, arguments: [ktbdchk1: bad dscn], [], [], [], [], [], [], [], [], [], [], []

Wed Apr 02 23:20:32 2025
Errors in file /home/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_11530.trc  (incident=397034):
ORA-00600: internal error code, arguments: [25027], [6], [196610], [], [], [], [], [], [], [], [], []

Wed Apr 02 23:20:52 2025
Errors in file /home/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_11528.trc  (incident=397027):
ORA-00600: internal error code, arguments: [ktspfpblk:kcbz_objdchk], [0], [0], [1], [], [], [], [], [], [], [], []

Wed Apr 02 23:22:53 2025
Errors in file /home/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_11609.trc  (incident=397082):
ORA-00600: internal error code, arguments: [6002], [6], [189], [1], [0], [], [], [], [], [], [], []

Wed Apr 02 23:26:41 2025
Errors in file /home/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_m000_11966.trc  (incident=397035):
ORA-00600: internal error code, arguments: [dbgrmblur_update_range_1], [11], [6], [], [], [], [], [], [], [], [], []

Wed Apr 02 23:31:47 2025
Errors in file /home/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_j000_10927.trc:
ORA-12012: error on auto execute of job "SYS"."ORA$AT_SA_SPC_SY_49685"
ORA-08102: index key not found, obj# 39, file 1, block 55190 (2)

Thu Apr 03 00:15:18 2025
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x8] [PC:0xB9EC41, ksuloget()+421] [flags: 0x0, count: 1]
Errors in file /home/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_m000_12633.trc  (incident=400879):
ORA-07445: exception encountered:core dump [ksuloget()+421][SIGSEGV][ADDR:0x8][PC:0xB9EC41][Address not mapped to object]

Thu Apr 03 00:15:23 2025
Errors in file /home/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_pmon_4097.trc  (incident=396817):
ORA-00600: internal error code, arguments: [1100], [0x2E3947E78], [0x2E3947E78], [], [], [], [], [], [], [], [], []

数据库crash掉之后,处理好硬件环境和虚拟机启动之后,数据库直接启动失败,报ORA-01172 ORA-01151

Beginning crash recovery of 1 threads
Started redo scan
Completed redo scan
 read 29239 KB redo, 4020 data blocks need recovery
Started redo application at
 Thread 1: logseq 211603, block 9107
Recovery of Online Redo Log: Thread 1 Group 4 Seq 211603 Reading mem 0
  Mem# 0: /home/oracle/app/oradata/orcl/redo04.log
  Mem# 1: /home/oracle/app/oradata/orcl/redo041.log
Hex dump of (file 2, block 4835) in trace file /home/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_19174.trc
Reading datafile '/home/oracle/app/oradata/orcl/sysaux01.dbf' for corruption at rdba: 0x008012e3 (file 2, block 4835)
Reread (file 2, block 4835) found same corrupt data (logically corrupt)
RECOVERY OF THREAD 1 STUCK AT BLOCK 4835 OF FILE 2
Aborting crash recovery due to error 1172
Errors in file /home/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_19174.trc:
ORA-01172: recovery of thread 1 stuck at block 4835 of file 2
ORA-01151: use media recovery to recover block, restore backup if needed
Errors in file /home/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_19174.trc:
ORA-01172: recovery of thread 1 stuck at block 4835 of file 2
ORA-01151: use media recovery to recover block, restore backup if needed
ORA-1172 signalled during: ALTER DATABASE OPEN...

然后再次尝试重启提示ORA-01113 ORA-01110

Fri Apr 04 09:34:36 2025
ALTER DATABASE OPEN
Errors in file /home/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_4076.trc:
ORA-01113: file 5 needs media recovery
ORA-01110: data file 5: '/home/oracle/app/oradata/users01.dbf'
ORA-1113 signalled during: ALTER DATABASE OPEN...

可以自行尝试了各种恢复,比如using backup controlfile,until cancel,rectl等操作,数据库均为open成功,基本上都是卡在类似如下报ORA-00310 ORA-00334错

Sat Apr 05 10:17:34 2025
ALTER DATABASE RECOVER  database using backup controlfile   
Media Recovery Start
 started logmerger process
Sat Apr 05 10:17:34 2025
WARNING! Recovering data file 1 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 2 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 3 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 4 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 5 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 6 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 7 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 8 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 9 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 10 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 11 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 12 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 13 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 14 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 15 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
Parallel Media Recovery started with 28 slaves
ORA-279 signalled during: ALTER DATABASE RECOVER  database using backup controlfile   ...
Sat Apr 05 10:17:59 2025
ALTER DATABASE RECOVER    LOGFILE '/home/oradata/redo02.log'  
Media Recovery Log /home/oradata/redo02.log
Sat Apr 05 10:17:59 2025
Errors with log /home/oradata/redo02.log
Errors in file /u01/app/oracle/diag/rdbms/oorcl/oorcl/trace/oorcl_pr00_12141.trc:
ORA-00310: archived log contains sequence 211550; sequence 211603 required
ORA-00334: archived log: '/home/oradata/redo02.log'
ORA-310 signalled during: ALTER DATABASE RECOVER    LOGFILE '/home/oradata/redo02.log'  ...
ALTER DATABASE RECOVER CANCEL 
Media Recovery Canceled
Completed: ALTER DATABASE RECOVER CANCEL 

基于上述情况,数据库由于底层异常,导致所需要的redo和实际存在的redo文件内容不匹配,只能屏蔽一致性强制打开库

SQL> alter database open resetlogs ;
alter database open resetlogs 
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [2662], [0], [1685409503], [0], [1685415469], [12583040], []
ORA-00600: internal error code, arguments: [2662], [0], [1685409502], [0], [1685415469], [12583040], []
ORA-01092: ORACLE instance terminated. Disconnection force
ORA-00600: internal error code, arguments: [2662], [0], [1685409498], [0], [1685415469], [12583040], []
Process ID: 10637
Session ID: 645 Serial number: 7

ORA-600 2662这个错误比较常见,通过修改数据库scn,进行规避然后尝试打开库

Sat Apr 05 10:31:45 2025
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 1685409495
Resetting resetlogs activation ID 1725417463 (0x66d7c7f7)
Sat Apr 05 10:31:46 2025
Setting recovery target incarnation to 2
Initializing SCN for created control file
Database SCN compatibility initialized to 3
Warning - High Database SCN: Current SCN value is 1685409498, threshold SCN value is 0
Sat Apr 05 10:31:46 2025
Assigning activation ID 1725412798 (0x66d7b5be)
Thread 1 opened at log sequence 1
  Current log# 2 seq# 1 mem# 0: /home/oradata/redo02.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sat Apr 05 10:31:46 2025
SMON: enabling cache recovery
Undo initialization finished serial:0 start:61632504 end:61632514 diff:10 (0 seconds)
Dictionary check beginning
Tablespace 'TEMP' #3 found in data dictionary,
but not in the controlfile. Adding to controlfile.
Dictionary check complete
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
*********************************************************************
WARNING: The following temporary tablespaces contain no files.
         This condition can occur when a backup controlfile has
SMON: enabling tx recovery
         been restored.  It may be necessary to add files to these
         tablespaces.  That can be done using the SQL statement:
 
         ALTER TABLESPACE <tablespace_name> ADD TEMPFILE
 
         Alternatively, if these temporary tablespaces are no longer
         needed, then they can be dropped.
           Empty temporary tablespace: TEMP
*********************************************************************
Database Characterset is AL32UTF8
Errors in file /u01/app/oracle/diag/rdbms/oorcl/oorcl/trace/oorcl_smon_22927.trc  (incident=8145):
ORA-00600: internal error code, arguments: [4137], [9.1.436887], [0], [0], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/oorcl/oorcl/incident/incdir_8145/oorcl_smon_22927_i8145.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Stopping background process MMNL
ORACLE Instance oorcl (pid = 17) - Error 600 encountered while recovering transaction (9, 1).
Errors in file /u01/app/oracle/diag/rdbms/oorcl/oorcl/trace/oorcl_smon_22927.trc:
ORA-00600: internal error code, arguments: [4137], [9.1.436887], [0], [0], [], [], [], [], [], [], [], []
Sat Apr 05 10:31:46 2025
Sweep [inc][8145]: completed
Errors in file /u01/app/oracle/diag/rdbms/oorcl/oorcl/trace/oorcl_smon_22927.trc  (incident=8146):
ORA-00600: internal error code, arguments: [4137], [9.1.436887], [0], [0], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/oorcl/oorcl/incident/incdir_8146/oorcl_smon_22927_i8146.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Sat Apr 05 10:31:46 2025
Errors in file /u01/app/oracle/diag/rdbms/oorcl/oorcl/trace/oorcl_p054_2643.trc  (incident=8625):
ORA-00600: internal error code, arguments: [kturbleurec1], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/oorcl/oorcl/incident/incdir_8625/oorcl_p054_2643_i8625.trc
Sat Apr 05 10:31:46 2025
Errors in file /u01/app/oracle/diag/rdbms/oorcl/oorcl/trace/oorcl_p034_2603.trc  (incident=8465):
ORA-00600: internal error code, arguments: [kturbleurec1], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/oorcl/oorcl/incident/incdir_8465/oorcl_p034_2603_i8465.trc
replication_dependency_tracking turned off (no async multimaster replication found)
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
Completed: alter database open resetlogs
Sat Apr 05 10:31:48 2025
Starting background process CJQ0
Sat Apr 05 10:31:48 2025
CJQ0 started with pid=80, OS id=2852 
SMON: Restarting fast_start parallel rollback
Errors in file /u01/app/oracle/diag/rdbms/oorcl/oorcl/trace/oorcl_smon_22927.trc  (incident=8147):
ORA-00600: internal error code, arguments: [4137], [9.1.436887], [0], [0], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/oorcl/oorcl/incident/incdir_8147/oorcl_smon_22927_i8147.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Sat Apr 05 10:31:50 2025
Errors in file /u01/app/oracle/diag/rdbms/oorcl/oorcl/trace/oorcl_p000_2535.trc  (incident=8169):
ORA-00600: internal error code, arguments: [4198], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/oorcl/oorcl/incident/incdir_8169/oorcl_p000_2535_i8169.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ORACLE Instance oorcl (pid = 17) - Error 600 encountered while recovering transaction (9, 1).
Block recovery from logseq 1, block 19 to scn 2147483682
Recovery of Online Redo Log: Thread 1 Group 2 Seq 1 Reading mem 0
  Mem# 0: /home/oradata/redo02.log
Block recovery completed at rba 1.734.16, scn 0.2147483683
Block recovery from logseq 1, block 404 to scn 2147483682
Recovery of Online Redo Log: Thread 1 Group 2 Seq 1 Reading mem 0
  Mem# 0: /home/oradata/redo02.log
Block recovery completed at rba 1.734.16, scn 0.2147483683
Errors in file /u01/app/oracle/diag/rdbms/oorcl/oorcl/trace/oorcl_smon_22927.trc  (incident=8148):
ORA-00600: internal error code, arguments: [4198], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/oorcl/oorcl/incident/incdir_8148/oorcl_smon_22927_i8148.trc
Sat Apr 05 10:31:50 2025
Sweep [inc][8147]: completed
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
SMON: Parallel transaction recovery slave got internal error
SMON: Downgrading transaction recovery to serial
Errors in file /u01/app/oracle/diag/rdbms/oorcl/oorcl/trace/oorcl_smon_22927.trc  (incident=8149):
ORA-00600: internal error code, arguments: [4137], [10.28.1201778], [0], [0], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/oorcl/oorcl/incident/incdir_8149/oorcl_smon_22927_i8149.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ORACLE Instance oorcl (pid = 17) - Error 600 encountered while recovering transaction (10, 28).
Errors in file /u01/app/oracle/diag/rdbms/oorcl/oorcl/trace/oorcl_smon_22927.trc:
ORA-00600: internal error code, arguments: [4137], [10.28.1201778], [0], [0], [], [], [], [], [], [], [], []
Sat Apr 05 10:31:50 2025
Sweep [inc][8149]: completed
Checker run found 1 new persistent data failures
Errors in file /u01/app/oracle/diag/rdbms/oorcl/oorcl/trace/oorcl_smon_22927.trc  (incident=8150):
ORA-00600: internal error code, arguments: [4137], [10.28.1201778], [0], [0], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/oorcl/oorcl/incident/incdir_8150/oorcl_smon_22927_i8150.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ORACLE Instance oorcl (pid = 17) - Error 600 encountered while recovering transaction (10, 28).
Errors in file /u01/app/oracle/diag/rdbms/oorcl/oorcl/trace/oorcl_smon_22927.trc  (incident=8151):
ORA-00600: internal error code, arguments: [4137], [10.28.1201778], [0], [0], [], [], [], [], [], [], [], []
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Sat Apr 05 10:31:51 2025
Sweep [inc][8150]: completed
ORACLE Instance oorcl (pid = 17) - Error 600 encountered while recovering transaction (10, 28).

虽然数据库open成功,但是有ORA-600 4137/ORA-600 kturbleurec1/ORA-600 4198等错误,但是这里比较明显的undo有问题,对于异常undo进行处理,然后逻辑导出数据,导入新库完成本次恢复任务

存储故障后oracle报—ORA-01122/ORA-01207故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:存储故障后oracle报—ORA-01122/ORA-01207故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户存储异常,通过硬件恢复解决存储故障之后,oracle数据库无法正常启动(存储cache丢失),尝试recover数据库报ORA-00283 ORA-01122 ORA-01110 ORA-01207错误
以前处理过比较类似的存储故障case:
又一起存储故障导致ORA-00333 ORA-00312恢复
存储故障,强制拉库报ORA-600 kcbzib_kcrsds_1处理

SQL> recover database until cancel;
ORA-00283: 恢复会话因错误而取消
ORA-01122: 数据库文件 536 验证失败
ORA-01110: 数据文件 536: '+DATA/orcl/dt_img_dat511.ora'
ORA-01207: 文件比控制文件更新 - 旧的控制文件
Sun May 05 00:09:03 2024
ALTER DATABASE RECOVER  database until cancel  
Media Recovery Start
 started logmerger process
Sun May 05 00:09:10 2024
SUCCESS: diskgroup FRA was mounted
Sun May 05 00:09:10 2024
NOTE: dependency between database orcl and diskgroup resource ora.FRA.dg is established
Sun May 05 00:09:14 2024
WARNING! Recovering data file 1 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
Media Recovery failed with error 1122
Slave exiting with ORA-283 exception
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_pr00_8048.trc:
ORA-00283: 恢复会话因错误而取消
ORA-01122: 数据库文件 536 验证失败
ORA-01110: 数据文件 536: '+DATA/orcl/dt_img_dat511.ora'
ORA-01207: 文件比控制文件更新 - 旧的控制文件
Sun May 05 00:09:16 2024
Recovery Slave PR00 previously exited with exception 283
ORA-283 signalled during: ALTER DATABASE RECOVER  database until cancel  ...

using backup controlfile进行恢复

SQL> recover database using backup controlfile until cancel;
ORA-00279: 更改 18646239951 (在 04/25/2024 17:14:50 生成) 对于线程 1 是必需的
ORA-00289: 建议:
+FRA/orcl/archivelog/2024_04_25/thread_1_seq_1003886.199435.1167240505
ORA-00280: 更改 18646239951 (用于线程 1) 在序列 #1003886 中


指定日志: {<RET>=suggested | filename | AUTO | CANCEL}
auto
ORA-00279: 更改 18646239951 (在 04/25/2024 17:11:40 生成) 对于线程 2 是必需的
ORA-00289: 建议:
+FRA/orcl/archivelog/2024_04_25/thread_2_seq_677876.199531.1167239807
ORA-00280: 更改 18646239951 (用于线程 2) 在序列 #677876 中


ORA-00279: 更改 18646255791 (在 04/25/2024 17:16:46 生成) 对于线程 2 是必需的
ORA-00289: 建议:
+FRA/orcl/archivelog/2024_04_25/thread_2_seq_677877.199472.1167240099
ORA-00280: 更改 18646255791 (用于线程 2) 在序列 #677877 中
ORA-00278: 此恢复不再需要日志文件
'+FRA/orcl/archivelog/2024_04_25/thread_2_seq_677876.199531.1167239807'


ORA-00279: 更改 18646295647 (在 04/25/2024 17:21:38 生成) 对于线程 2 是必需的
ORA-00289: 建议:
+FRA/orcl/archivelog/2024_04_25/thread_2_seq_677878.199379.1167240623
ORA-00280: 更改 18646295647 (用于线程 2) 在序列 #677878 中
ORA-00278: 此恢复不再需要日志文件
'+FRA/orcl/archivelog/2024_04_25/thread_2_seq_677877.199472.1167240099'


ORA-00279: 更改 18646331784 (在 04/25/2024 17:28:25 生成) 对于线程 1 是必需的
ORA-00289: 建议:
+FRA/orcl/archivelog/2024_04_25/thread_1_seq_1003887.199320.1167241507
ORA-00280: 更改 18646331784 (用于线程 1) 在序列 #1003887 中
ORA-00278: 此恢复不再需要日志文件
'+FRA/orcl/archivelog/2024_04_25/thread_1_seq_1003886.199435.1167240505'


ORA-00308: 无法打开归档日志
'+FRA/orcl/archivelog/2024_04_25/thread_1_seq_1003887.199320.1167241507'
ORA-17503: ksfdopn: 2 未能打开文件
+FRA/orcl/archivelog/2024_04_25/thread_1_seq_1003887.199320.1167241507
ORA-15012: ASM file
'+FRA/orcl/archivelog/2024_04_25/thread_1_seq_1003887.199320.1167241507' does not exist


ORA-10879: error signaled in parallel recovery slave
ORA-01547: 警告: RECOVER 成功但 OPEN RESETLOGS 将出现如下错误
ORA-01194: 文件 1 需要更多的恢复来保持一致性
ORA-01110: 数据文件 1: '+DATA/orcl/system01.dbf'

通过分析,确认由于cache丢失导致thread_1_seq_1003887这个日志丢失(而且redo已经被覆盖)
20240506-2


20240506-1

数据库无法通过正常recover的思路解决.通过屏蔽一致性,强制打开数据库,alert日志报ORA-600 2662错误

Sat May 04 17:23:00 2024
Redo thread 2 internally disabled at seq 1 (CKPT)
ARC1: Archiving disabled thread 2 sequence 1
Archived Log entry 2 added for thread 2 sequence 1 ID 0x0 dest 1:
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_ora_3684.trc  (incident=47066):
ORA-00600: ??????, ??: [2662], [4], [1466533588], [4], [1466584862], [12583040], [], [], [], [], [], []
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_ora_3684.trc:
ORA-00600: ??????, ??: [2662], [4], [1466533588], [4], [1466584862], [12583040], [], [], [], [], [], []
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_ora_3684.trc:
ORA-00600: ??????, ??: [2662], [4], [1466533588], [4], [1466584862], [12583040], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 3684): terminating the instance due to error 600
Instance terminated by USER, pid = 3684
ORA-1092 signalled during: alter database open resetlogs...

通过修改数据库scn,open数据库报ORA-600 4137

Sun May 05 00:12:41 2024
replication_dependency_tracking turned off (no async multimaster replication found)
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
Completed: alter database open resetlogs
Sun May 05 00:12:56 2024
Trace dumping is performing id=[cdmp_20240505001256]
Sun May 05 00:12:56 2024
ORACLE Instance orcl1 (pid = 22) - Error 600 encountered while recovering transaction (28, 21).
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_smon_5896.trc:
ORA-00600: ??????, ??: [4137], [28.21.42965783], [0], [0], [], [], [], [], [], [], [], []

这个错误比较明显,由于28号回滚段异常导致,对异常回滚段进行处理,重建undo,数据库恢复主要工作完成

云主机快照之后Oracle无法正常启动处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:云主机快照之后Oracle无法正常启动处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

某客户数据库放在x云上面,需要对数据库盘进行扩容,在扩容之前对该盘做了快照,结果没有想到悲剧发生了

[root@xifenfei ~]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1        99G   64G   31G  68% /
devtmpfs         16G     0   16G   0% /dev
tmpfs            16G     0   16G   0% /dev/shm
tmpfs            16G  720K   16G   1% /run
tmpfs            16G     0   16G   0% /sys/fs/cgroup
/dev/vdb        2.0T  1.2T  910G  56% /www/xifenfei
tmpfs           3.2G     0  3.2G   0% /run/user/1004
tmpfs           3.2G     0  3.2G   0% /run/user/0

如上显示,客户的数据文件都放在/dev/vdb中了,但是很不幸,redo文件放在/data中(也就是vda磁盘组中),没有被做快照,结果客户还原vdb快照之后,发现现象如下

SQL> set pages 10000
SQL> set numw 16
SQL> SELECT status,
  2  checkpoint_change#,
  3  checkpoint_time,last_change#,
  4  count(*) ROW_NUM
  5  FROM v$datafile
  6  GROUP BY status, checkpoint_change#, checkpoint_time,last_change#
  7  ORDER BY status, checkpoint_change#, checkpoint_time;

STATUS         CHECKPOINT_CHANGE# CHECKPOINT_T     LAST_CHANGE#          ROW_NUM
-------------- ------------------ ------------ ---------------- ----------------
ONLINE                69632585947 04-JUL-22                                   38
SYSTEM                69632585947 04-JUL-22                                    2

SQL> set numw 16
SQL> col CHECKPOINT_TIME for a40
SQL> set lines 150
SQL> set pages 1000
SQL> SELECT status,
  2  to_char(checkpoint_time,'yyyy-mm-dd hh24:mi:ss') checkpoint_time,FUZZY,checkpoint_change#,
  3  count(*) ROW_NUM
  4  FROM v$datafile_header
  5  GROUP BY status, checkpoint_change#, to_char(checkpoint_time,'yyyy-mm-dd hh24:mi:ss'),fuzzy
  6  ORDER BY status, checkpoint_change#, checkpoint_time;

STATUS         CHECKPOINT_TIME                          FUZZY  CHECKPOINT_CHANGE#          ROW_NUM
-------------- ---------------------------------------- ------ ------------------ ----------------
ONLINE         2022-07-04 09:03:24                      YES           69631105424               40

20220704230638


通过上述分析,该库相当数据文件和redo文件之间相差了一段时间数据,而且该库为非归档,基于这种情况,该库只能强制打开,在打开过程中遇到ORA-600 ktpridestroy2错误

SMON: enabling tx recovery
Database Characterset is AL32UTF8
No Resource Manager plan active
replication_dependency_tracking turned off (no async multimaster replication found)
SMON: Restarting fast_start parallel rollback
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7332.trc  (incident=41257):
ORA-00600: internal error code, arguments: [ktpridestroy2], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /data/oracle/diag/rdbms/orcl/orcl/incident/incdir_41257/orcl_smon_7332_i41257.trc
Starting background process QMNC
Mon Jul 04 16:31:44 2022
QMNC started with pid=36, OS id=7454 
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
Fatal internal error happened while SMON was doing active transaction recovery.
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7332.trc:
ORA-00600: internal error code, arguments: [ktpridestroy2], [], [], [], [], [], [], [], [], [], [], []
SMON (ospid: 7332): terminating the instance due to error 474
Instance terminated by SMON, pid = 7332

对应trace文件

Dump continued from file: /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7332.trc
ORA-00600: internal error code, arguments: [ktpridestroy2], [], [], [], [], [], [], [], [], [], [], []

========= Dump for incident 41257 (ORA 600 [ktpridestroy2]) ========

*** 2022-07-04 16:31:44.261
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- SQL Statement (None) -----
Current SQL information unavailable - no cursor.

----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
skdstdst()+36        call     kgdsdst()            000000000 ? 000000000 ?
                                                   7FFCD123B998 ? 000000001 ?
                                                   7FFCD123FE98 ? 000000000 ?
ksedst1()+98         call     skdstdst()           000000000 ? 000000000 ?
                                                   7FFCD123B998 ? 000000001 ?
                                                   000000000 ? 000000000 ?
ksedst()+34          call     ksedst1()            000000000 ? 000000001 ?
                                                   7FFCD123B998 ? 000000001 ?
                                                   000000000 ? 000000000 ?
dbkedDefDump()+2736  call     ksedst()             000000000 ? 000000001 ?
                                                   7FFCD123B998 ? 000000001 ?
                                                   000000000 ? 000000000 ?
ksedmp()+36          call     dbkedDefDump()       000000003 ? 000000002 ?
                                                   7FFCD123B998 ? 000000001 ?
                                                   000000000 ? 000000000 ?
ksfdmp()+64          call     ksedmp()             000000003 ? 000000002 ?
                                                   7FFCD123B998 ? 000000001 ?
                                                   000000000 ? 000000000 ?
dbgexPhaseII()+1764  call     ksfdmp()             000000003 ? 000000002 ?
                                                   7FFCD123B998 ? 000000001 ?
                                                   000000000 ? 000000000 ?
dbgexProcessError()  call     dbgexPhaseII()       7F3C5D15C6F0 ? 7F3C5A851598 ?
+2279                                              7FFCD1247C88 ? 000000001 ?
                                                   000000000 ? 000000000 ?
dbgeExecuteForError  call     dbgexProcessError()  7F3C5D15C6F0 ? 7F3C5A851598 ?
()+83                                              000000001 ? 000000000 ?
                                                   7FFC00000000 ? 000000000 ?
dbgePostErrorKGE()+  call     dbgeExecuteForError  7F3C5D15C6F0 ? 7F3C5A851598 ?
1615                          ()                   000000001 ? 000000001 ?
                                                   000000000 ? 000000000 ?
dbkePostKGE_kgsf()+  call     dbgePostErrorKGE()   000000000 ? 7F3C5A6C1228 ?
63                                                 000000258 ? 7F3C5A851598 ?
                                                   000000000 ? 000000000 ?
kgeadse()+383        call     dbkePostKGE_kgsf()   00A984C60 ? 7F3C5A6C1228 ?
                                                   000000258 ? 7F3C5A851598 ?
                                                   000000000 ? 000000000 ?
kgerinv_internal()+  call     kgeadse()            00A984C60 ? 7F3C5A6C1228 ?
45                                                 000000258 ? 000000000 ?
                                                   000000000 ? 000000000 ?
kgerinv()+33         call     kgerinv_internal()   00A984C60 ? 7F3C5A6C1228 ?
                                                   D124022000000000 ?
                                                   000000258 ? 000000000 ?
                                                   000000000 ?
kgeasnmierr()+143    call     kgerinv()            00A984C60 ? 7F3C5A6C1228 ?
                                                   D124022000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ?
ktpridestroy()+912   call     kgeasnmierr()        00A984C60 ? 7F3C5A6C1228 ?
                                                   D124022000000000 ?
                                                   000000000 ? 1E0F02D40 ?
                                                   1EC6DA410 ?
ktprw1s()+527        call     ktpridestroy()       D124022000000000 ?
                                                   000000000 ? 1E7A1C2B0 ?
                                                   000000000 ? 1E0F02D40 ?
                                                   1EC6DA410 ?
ktprsched()+197      call     ktprw1s()            D124022000000000 ?
                                                   000000000 ? 1E7A1C2B0 ?
                                                   000000000 ? 1E0F02D40 ?
                                                   1EC6DA410 ?
kturRecoverUndoSegm  call     ktprsched()          D124022000000000 ?
ent()+1057                                         000000000 ? 1E7A1C2B0 ?
                                                   000000000 ? 1E0F02D40 ?
                                                   1EC6DA410 ?
kturRecoverActiveTx  call     kturRecoverUndoSegm  000000000 ? 000000000 ?
ns()+710                      ent()                000000001 ? 000000000 ?
                                                   0D124FFFF ? 6200000005 ?
ktprbeg()+2506       call     kturRecoverActiveTx  000000004 ? 000000000 ?
                              ns()                 000000027 ? 000000000 ?
                                                   0D124FFFF ? 6200000005 ?
ktmmon()+13588       call     ktprbeg()            000000000 ? 000000000 ?
                                                   000000027 ? 000000000 ?
                                                   0D124FFFF ? 6200000005 ?
ktmSmonMain()+201    call     ktmmon()             06002DEC0 ? 000000000 ?
                                                   000000027 ? 000000000 ?
                                                   0D124FFFF ? 6200000005 ?
ksbrdp()+923         call     ktmSmonMain()        06002DEC0 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   0D124FFFF ? 6200000005 ?
opirip()+618         call     ksbrdp()             06002DEC0 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   0D124FFFF ? 6200000005 ?
opidrv()+598         call     opirip()             000000032 ? 000000004 ?
                                                   7FFCD124B658 ? 000000000 ?
                                                   0D124FFFF ? 6200000005 ?
sou2o()+98           call     opidrv()             000000032 ? 000000004 ?
                                                   7FFCD124B658 ? 000000000 ?
                                                   0D124FFFF ? 6200000005 ?
opimai_real()+261    call     sou2o()              7FFCD124B630 ? 000000032 ?
                                                   000000004 ? 7FFCD124B658 ?
                                                   0D124FFFF ? 6200000005 ?
ssthrdmain()+209     call     opimai_real()        000000000 ? 7FFCD124B820 ?
                                                   000000004 ? 7FFCD124B658 ?
                                                   0D124FFFF ? 6200000005 ?
main()+196           call     ssthrdmain()         000000003 ? 7FFCD124B820 ?
                                                   000000001 ? 000000000 ?
                                                   0D124FFFF ? 6200000005 ?
__libc_start_main()  call     main()               000000003 ? 7FFCD124B9C0 ?
+245                                               000000001 ? 000000000 ?
                                                   0D124FFFF ? 6200000005 ?
_start()+36          call     __libc_start_main()  0009C12F0 ? 000000001 ?
                                                   7FFCD124B9B8 ? 000000000 ?
                                                   0D124FFFF ? 6200000005 ?
--------------------- Binary Stack Dump ---------------------

通过分析确认该错误和并行恢复有关系,绕过该错误之后,再次尝试启动库报错为ORA-600 4137

Mon Jul 04 16:33:41 2022
SMON: enabling cache recovery
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is AL32UTF8
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7554.trc  (incident=42457):
ORA-00600: internal error code, arguments: [4137], [6.11.21484016], [0], [0], [], [], [], [], [], [], [], []
Incident details in: /data/oracle/diag/rdbms/orcl/orcl/incident/incdir_42457/orcl_smon_7554_i42457.trc
Stopping background process MMNL
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7554.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/data/oracle/oradata/orcl/redo03.log'
ORA-00600: internal error code, arguments: [4137], [6.11.21484016], [0], [0], [], [], [], [], [], [], [], []
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7554.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/data/oracle/oradata/orcl/redo03.log'
ORA-00600: internal error code, arguments: [4137], [6.11.21484016], [0], [0], [], [], [], [], [], [], [], []
ORACLE Instance orcl (pid = 13) - Error 600 encountered while recovering transaction (6, 11).
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7554.trc:
ORA-00600: internal error code, arguments: [4137], [6.11.21484016], [0], [0], [], [], [], [], [], [], [], []

该错误比较常见,一般是由于undo中有异常事务,对异常事务进行处理,数据库open成功,并顺利导入数据到新库中,完成本次数据恢复

又一例存储cache丢失oracle数据库恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:又一例存储cache丢失oracle数据库恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

10.2.0.5 hp unix rac,由于存储掉电导致cache丢失,数据库无法正常启动,客户要求我们介入处理
数据库mount报ORA-00600 kccpb_sanity_check_2错误

Thu Jul 22 14:52:06 EAT 2021
alter database mount
Thu Jul 22 14:52:10 EAT 2021
Errors in file /oracle/admin/xff/udump/xff1_ora_4611.trc:
ORA-00600: internal error code, arguments: [kccpb_sanity_check_2], [4697564], [4697561], [0x000000000], [], [], [], []

该错误是由于控制文件损坏,尝试重建控制文件报ORA-01163,ORA-01517

'/dev/oradata/rxff_ls94'
CHARACTER SET ZHS16GBK
WARNING: Default Temporary Tablespace not specified in CREATE DATABASE command
Default Temporary Tablespace will be necessary for a locally managed database in future release
Thu Jul 22 14:54:02 EAT 2021
Errors in file /oracle/admin/xff/udump/xff1_ora_7283.trc:
ORA-01163: SIZE clause indicates 262144 (blocks), but should match header 204800
ORA-01517: log member: '/dev/oradata/rxff_redo1_1'
ORA-1503 signalled during: CREATE CONTROLFILE REUSE DATABASE "xff" NORESETLOGS  NOARCHIVELOG

由于redo大小错误导致该问题,设置正确的redo大小继续重建

'/dev/oradata/rxff_ls94'
CHARACTER SET ZHS16GBK
WARNING: Default Temporary Tablespace not specified in CREATE DATABASE command
Default Temporary Tablespace will be necessary for a locally managed database in future release
Thu Jul 22 15:01:00 EAT 2021
Errors in file /oracle/admin/xff/udump/xff1_ora_14737.trc:
ORA-00600: internal error code, arguments: [kccsga_update_ckpt_4], [32], [8], [], [], [], [], []
Thu Jul 22 15:01:01 EAT 2021
Errors in file /oracle/admin/xff/udump/xff1_ora_14737.trc:
ORA-00600: internal error code, arguments: [kccsga_update_ckpt_4], [32], [8], [], [], [], [], []
ORA-1503 signalled during: CREATE CONTROLFILE REUSE DATABASE "xff" NORESETLOGS  NOARCHIVELOG

报ORA-00600 kccsga_update_ckpt_4错误,导致控制文件失败,处理该错误之后,重建控制文件成功,分析文件头信息和redo信息,确认只能强制库,尝试强制open库

Thu Jul 22 16:02:05 EAT 2021
SMON: enabling cache recovery
Thu Jul 22 16:02:05 EAT 2021
ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 0x0002.cdad19ed):
Thu Jul 22 16:02:05 EAT 2021
select ctime, mtime, stime from obj$ where obj# = :1
Thu Jul 22 16:02:05 EAT 2021
Errors in file /oracle/admin/xff/udump/xff1_ora_23219.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 19 with name "_SYSSMU19$" too small
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 23219
ORA-1092 signalled during: alter database open resetlogs...

这个问题比较常见:ORA-00704 ORA-00604 ORA-01555,参考类似文章:
在数据库open过程中常遇到ORA-01555汇总
数据库open过程遭遇ORA-1555对应sql语句补充
数据库open成功但是报ORA-00600 4137

Database Characterset is ZHS16GBK
Opening with internal Resource Manager plan 
Thu Jul 22 16:08:48 EAT 2021
Errors in file /oracle/admin/xff/bdump/xff1_smon_27436.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=30, OS id=997
Thu Jul 22 16:08:49 EAT 2021
LOGSTDBY: Validating controlfile with logical metadata
Thu Jul 22 16:08:49 EAT 2021
ORACLE Instance xff1 (pid = 11) - Error 600 encountered while recovering transaction (1, 43).
Thu Jul 22 16:08:49 EAT 2021
Errors in file /oracle/admin/xff/bdump/xff1_smon_27436.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Thu Jul 22 16:08:49 EAT 2021
Trace dumping is performing id=[cdmp_20210722160849]
Thu Jul 22 16:08:49 EAT 2021
LOGSTDBY: Validation complete
Completed: alter database open

该问题是由于undo异常,对undo进行处理,数据库无明显报错,安排导出数据

硬件故障导致ORA-600 2662错误处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:硬件故障导致ORA-600 2662错误处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

前几天恢复了一个40多T的CASE:ORA-00600: internal error code, arguments: [16513], [1403] 恢复,又一个近30T的库由于硬件故障,通过其他人一系列恢复之后,无法正常open,让我们提供技术支持:
故障最初原因是由于存储异常

Fri Feb 19 09:03:49 2021
Errors in file d:\app\administrator\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_ora_3460.trc:
ORA-01114: 将块写入文件 849 时出现 IO 错误 (块 # 3871748)
ORA-27070: 异步读取/写入失败
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1167) 设备没有连接。
ORA-01114: 将块写入文件 849 时出现 IO 错误 (块 # 3871748)
ORA-27070: 异步读取/写入失败
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1167) 设备没有连接。

通过其他人一系列处理后,数据库报ORA-600 2662错误

Sat Feb 20 08:19:35 2021
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sat Feb 20 08:19:35 2021
SMON: enabling cache recovery
Errors in file d:\app\administrator\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_ora_5304.trc(incident=1960181):
ORA-00600:internal error code,arguments:[2662],[4],[2185364344], [4],[2185453722],[893388032],[],[],[],[],[],[]
Errors in file d:\app\administrator\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_ora_5304.trc:
ORA-00600:internal error code,arguments:[2662],[4],[2185364344], [4],[2185453722],[893388032],[],[],[],[],[],[]
Errors in file d:\app\administrator\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_ora_5304.trc:
ORA-00600:internal error code,arguments:[2662],[4],[2185364344], [4],[2185453722],[893388032],[],[],[],[],[],[]
Error 600 happened during db open, shutting down database
USER (ospid: 5304): terminating the instance due to error 600
Instance terminated by USER, pid = 5304
ORA-1092 signalled during: ALTER DATABASE OPEN...
opiodr aborting process unknown ospid (5304) as a result of ORA-1092
Sat Feb 20 08:19:42 2021
ORA-1092 : opitsk aborting process

通过对scn处理,数据库顺利绕过该错误,然后报ORA-600 4194错误

Doing block recovery for file 213 block 4688
No block recovery was needed
Errors in file d:\app\administrator\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_smon_7048.trc(incident=1984136):
ORA-00600: internal error code, arguments: [4194], [38.4.1381252], [0], [], [],[],[],[],[],[],[],[]
Sat Feb 20 10:50:45 2021
Doing block recovery for file 213 block 4688
No block recovery was needed
Fatal internal error happened while SMON was doing active transaction recovery.
Errors in file d:\app\administrator\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_smon_7048.trc:
ORA-00600: internal error code, arguments: [4194], [38.4.1381252], [0], [], [],[],[],[],[],[],[],[]
SMON (ospid: 7048): terminating the instance due to error 474
Errors in file d:\app\administrator\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_ora_6652.trc(incident=1984185):
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Sat Feb 20 10:50:52 2021
Instance terminated by SMON, pid = 7048

通过对异常事务进行处理,屏蔽smon进程进行回滚,数据库open成功,但是报ORA-600 4137错误

Sat Feb 20 10:53:46 2021
Sweep [inc][1992133]: completed
Stopping background process MMNL
Sat Feb 20 10:53:47 2021
Trace dumping is performing id=[cdmp_20210220105347]
Errors in file d:\app\administrator\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_smon_6576.trc(incident=1992134):
ORA-00600: internal error code, arguments: [4137], [23.13.3094188], [0], [0], [], [], [], [], [], [], [], []
ORACLE Instance xifenfei (pid = 14) - Error 600 encountered while recovering transaction (23, 13).
Errors in file d:\app\administrator\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_smon_6576.trc:
ORA-00600: internal error code, arguments: [4137], [23.13.3094188], [0], [0], [], [], [], [], [], [], [], []
Sat Feb 20 10:53:47 2021
Sweep [inc2][1992133]: completed
Sat Feb 20 10:53:47 2021
Sweep [inc][1992134]: completed
Stopping background process MMON
Trace dumping is performing id=[cdmp_20210220105348]
Errors in file d:\app\administrator\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_smon_6576.trc(incident=1992135):
ORA-00600: internal error code, arguments: [4137], [38.4.1381252], [0], [0], [], [], [], [], [], [], [], []
Starting background process MMON
Starting background process MMNL
Sat Feb 20 10:53:48 2021
MMON started with pid=16, OS id=6448 
ALTER SYSTEM enable restricted session;
Sat Feb 20 10:53:48 2021
MMNL started with pid=36, OS id=6840 
ORACLE Instance xifenfei (pid = 14) - Error 600 encountered while recovering transaction (38, 4).
Errors in file d:\app\administrator\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_smon_6576.trc:
ORA-00600: internal error code, arguments: [4137], [38.4.1381252], [0], [0], [], [], [], [], [], [], [], []
Sat Feb 20 10:53:49 2021
Sweep [inc][1992135]: completed
Trace dumping is performing id=[cdmp_20210220105349]
replication_dependency_tracking turned off (no async multimaster replication found)
Completed: alter database open

对异常回滚段进行处理,数据库后端启动正常,不再报明显ORA-错误.通过hcheck.sql检查字典正常

HCheck Version 07MAY18 on 20-FEB-2021 11:35:11
----------------------------------------------
Catalog Version 11.2.0.1.0 (1102000100)
db_name: JYJG

                                   Catalog       Fixed
Procedure Name                     Version    Vs Release    Timestamp
Result
------------------------------ ... ---------- -- ---------- --------------
------
.- LobNotInObj                 ... 1102000100 <=  *All Rel* 02/20 11:35:11 PASS
.- MissingOIDOnObjCol          ... 1102000100 <=  *All Rel* 02/20 11:35:11 PASS
.- SourceNotInObj              ... 1102000100 <=  *All Rel* 02/20 11:35:11 PASS
.- IndIndparMismatch           ... 1102000100 <= 1102000100 02/20 11:35:12 PASS
.- InvCorrAudit                ... 1102000100 <= 1102000100 02/20 11:35:12 PASS
.- OversizedFiles              ... 1102000100 <=  *All Rel* 02/20 11:35:12 PASS
.- PoorDefaultStorage          ... 1102000100 <=  *All Rel* 02/20 11:35:12 PASS
.- PoorStorage                 ... 1102000100 <=  *All Rel* 02/20 11:35:12 PASS
.- PartSubPartMismatch         ... 1102000100 <= 1102000100 02/20 11:35:12 PASS
.- TabPartCountMismatch        ... 1102000100 <=  *All Rel* 02/20 11:35:12 PASS
.- OrphanedTabComPart          ... 1102000100 <=  *All Rel* 02/20 11:35:12 PASS
.- MissingSum$                 ... 1102000100 <=  *All Rel* 02/20 11:35:12 PASS
.- MissingDir$                 ... 1102000100 <=  *All Rel* 02/20 11:35:12 PASS
.- DuplicateDataobj            ... 1102000100 <=  *All Rel* 02/20 11:35:12 PASS
.- ObjSynMissing               ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- ObjSeqMissing               ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- OrphanedUndo                ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- OrphanedIndex               ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- OrphanedIndexPartition      ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- OrphanedIndexSubPartition   ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- OrphanedTable               ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- OrphanedTablePartition      ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- OrphanedTableSubPartition   ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- MissingPartCol              ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- OrphanedSeg$                ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- OrphanedIndPartObj#         ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- DuplicateBlockUse           ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- FetUet                      ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- Uet0Check                   ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- ExtentlessSeg               ... 1102000100 <= 1102000100 02/20 11:35:13 PASS
.- SeglessUET                  ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- BadInd$                     ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- BadTab$                     ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- BadIcolDepCnt               ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- ObjIndDobj                  ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- TrgAfterUpgrade             ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- ObjType0                    ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- BadOwner                    ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- StmtAuditOnCommit           ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- BadPublicObjects            ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- BadSegFreelist              ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- BadDepends                  ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- CheckDual                   ... 1102000100 <=  *All Rel* 02/20 11:35:14 PASS
.- ObjectNames                 ... 1102000100 <=  *All Rel* 02/20 11:35:14 PASS
.- BadCboHiLo                  ... 1102000100 <=  *All Rel* 02/20 11:35:14 PASS
.- ChkIotTs                    ... 1102000100 <=  *All Rel* 02/20 11:35:15 PASS
.- NoSegmentIndex              ... 1102000100 <=  *All Rel* 02/20 11:35:15 PASS
.- BadNextObject               ... 1102000100 <=  *All Rel* 02/20 11:35:15 PASS
.- DroppedROTS                 ... 1102000100 <=  *All Rel* 02/20 11:35:15 PASS
.- FilBlkZero                  ... 1102000100 <=  *All Rel* 02/20 11:35:15 PASS
.- DbmsSchemaCopy              ... 1102000100 <=  *All Rel* 02/20 11:35:15 PASS
.- OrphanedObjError            ... 1102000100 >  1102000000 02/20 11:35:15 PASS
.- ObjNotLob                   ... 1102000100 <=  *All Rel* 02/20 11:35:15 PASS
.- MaxControlfSeq              ... 1102000100 <=  *All Rel* 02/20 11:35:15 PASS
.- SegNotInDeferredStg         ... 1102000100 >  1102000000 02/20 11:35:18 PASS
.- SystemNotRfile1             ... 1102000100 >   902000000 02/20 11:35:18 PASS
.- DictOwnNonDefaultSYSTEM     ... 1102000100 <=  *All Rel* 02/20 11:35:19 PASS
.- OrphanTrigger               ... 1102000100 <=  *All Rel* 02/20 11:35:19 PASS
.- ObjNotTrigger               ... 1102000100 <=  *All Rel* 02/20 11:35:19 PASS
---------------------------------------
20-FEB-2021 11:35:19  Elapsed: 8 secs
---------------------------------------
Found 0 potential problem(s) and 0 warning(s)

PL/SQL procedure successfully completed.

Statement processed.

虽然字典正常,但是由于数据库屏蔽了一致性,建议客户在条件允许的情况下,进行逻辑迁移,排除风险隐患.

ORA-15096: lost disk write detected

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-15096: lost disk write detected

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

又一例由于存储掉电导致asm磁盘组,由于ORA-15096: lost disk write detected,导致无法mount的恢复请求

SQL> ALTER DISKGROUP DATA MOUNT  /* asm agent *//* {1:45277:148} */
NOTE: cache registered group DATA number=2 incarn=0x73886b6a
NOTE: cache began mount (first) of group DATA number=2 incarn=0x73886b6a
NOTE: Assigning number (2,2) to disk (/dev/asm-data3)
NOTE: Assigning number (2,1) to disk (/dev/asm-data2)
NOTE: Assigning number (2,0) to disk (/dev/asm-data1)
Fri Nov 06 19:06:56 2020
NOTE: GMON heartbeating for grp 2
GMON querying group 2 at 94 for pid 30, osid 11596
NOTE: cache opening disk 0 of grp 2: DATA_0000 path:/dev/asm-data1
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache opening disk 1 of grp 2: DATA_0001 path:/dev/asm-data2
NOTE: cache opening disk 2 of grp 2: DATA_0002 path:/dev/asm-data3
NOTE: cache mounting (first) external redundancy group 2/0x73886B6A (DATA)
Fri Nov 06 19:06:57 2020
* allocate domain 2, invalid = TRUE
kjbdomatt send to inst 2
Fri Nov 06 19:06:57 2020
NOTE: attached to recovery domain 2
NOTE: starting recovery of thread=1 ckpt=25.7986 group=2 (DATA)
NOTE: starting recovery of thread=2 ckpt=33.364 group=2 (DATA)
NOTE: BWR validation signaled ORA-15096
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_11596.trc:
ORA-15096: lost disk write detected
NOTE: crash recovery signalled OER-15096
ERROR: ORA-15096 signalled during mount of diskgroup DATA
NOTE: cache dismounting (clean) group 2/0x73886B6A (DATA)
NOTE: messaging CKPT to quiesce pins Unix process pid: 11596, image: oracle@db1 (TNS V1-V3)
NOTE: lgwr not being msg'd to dismount
kjbdomdet send to inst 2
detach from dom 2, sending detach message to inst 2
freeing rdom 2
NOTE: detached from domain 2
NOTE: cache dismounted group 2/0x73886B6A (DATA)
NOTE: cache ending mount (fail) of group DATA number=2 incarn=0x73886b6a
NOTE: cache deleting context for group DATA 2/0x73886b6a
GMON dismounting group 2 at 95 for pid 30, osid 11596
NOTE: Disk DATA_0000 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0001 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0002 in mode 0x7f marked for de-assignment
ERROR: diskgroup DATA was not mounted
ORA-15032: not all alterations performed
ORA-15096: lost disk write detected
ERROR: ALTER DISKGROUP DATA MOUNT  /* asm agent *//* {1:45277:148} */

通过判断,通过一系列处理之后,数据库进行了mount操作发现报错ORA-600 2130

Fri Nov 06 17:03:27 2020
ALTER DATABASE RECOVER  database
Media Recovery Start
 started logmerger process
Parallel Media Recovery started with 40 slaves
Fri Nov 06 17:03:29 2020
Errors in file /u01/app/oracle/diag/rdbms/ynhis/ynhis1/trace/ynhis1_pr00_7393.trc  (incident=195869):
ORA-00600: internal error code, arguments: [2130], [2], [1], [2], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/ynhis/ynhis1/incident/incdir_195869/ynhis1_pr00_7393_i195869.trc
Fri Nov 06 17:03:30 2020
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Media Recovery failed with error 600
ORA-10877 signalled during: ALTER DATABASE RECOVER  database  ...

判断redo异常,通过resetlogs打开库,发现报错ORA-00600 2662

Fri Nov 06 18:21:32 2020
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 8670753264
Resetting resetlogs activation ID 306909514 (0x124b114a)
Redo thread 2 enabled by open resetlogs or standby activation
Fri Nov 06 18:21:39 2020
Setting recovery target incarnation to 2
Initializing SCN for created control file
Database SCN compatibility initialized to 3
Warning - High Database SCN: Current SCN value is 8670753267, threshold SCN value is 0
Fri Nov 06 18:21:39 2020
Assigning activation ID 408224320 (0x18550240)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: /orabak/data/group_1.289.954514319
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Nov 06 18:21:40 2020
SMON: enabling cache recovery
Errors in file /u01/app/oracle/diag/rdbms/ynhis/ynhis1/trace/ynhis1_ora_24310.trc  (incident=231847):
ORA-00600: internal error code, arguments: [2662], [2], [80818679], [2], [93545365], [4194545], [], [], [], [], [],[]
Incident details in: /u01/app/oracle/diag/rdbms/ynhis/ynhis1/incident/incdir_231847/ynhis1_ora_24310_i231847.trc
Fri Nov 06 18:21:42 2020
Dumping diagnostic data in directory=[cdmp_20201106182142],requested by(instance=1,osid=24310),summary=[incident=231847]
Fri Nov 06 18:21:43 2020
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/ynhis/ynhis1/trace/ynhis1_ora_24310.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [2662], [2], [80818679], [2], [93545365],[4194545],[],[],[],[],[],[]
Errors in file /u01/app/oracle/diag/rdbms/ynhis/ynhis1/trace/ynhis1_ora_24310.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [2662], [2], [80818679], [2], [93545365],[4194545],[],[],[],[],[],[]
Error 704 happened during db open, shutting down database
USER (ospid: 24310): terminating the instance due to error 704
Instance terminated by USER, pid = 24310
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (24310) as a result of ORA-1092

处理该错误之后,数据库resetlog之后,数据库open成功但是报错ORA-00600 4137

Database Characterset is ZHS16GBK
Errors in file /u01/app/oracle/diag/rdbms/ynhis/ynhis1/trace/ynhis1_smon_26195.trc  (incident=255799):
ORA-00600: internal error code, arguments: [4137], [25.33.122556], [0], [0], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/ynhis/ynhis1/incident/incdir_255799/ynhis1_smon_26195_i255799.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
No Resource Manager plan active
Fri Nov 06 18:30:46 2020
replication_dependency_tracking turned off (no async multimaster replication found)
ORACLE Instance ynhis1 (pid = 23) - Error 600 encountered while recovering transaction (25, 33).
Errors in file /u01/app/oracle/diag/rdbms/ynhis/ynhis1/trace/ynhis1_smon_26195.trc:
ORA-00600: internal error code, arguments: [4137], [25.33.122556], [0], [0], [], [], [], [], [], [], [], []

对异常undo进行处理,数据库可以正常启动关闭,然后安排数据导出导入新库操作,恢复完成.

在数据库恢复遭遇ORA-07445 kgegpa错误

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:在数据库恢复遭遇ORA-07445 kgegpa错误

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

接到客户恢复请求,数据库启动报ORA-600 2662错误

Fri Apr 24 19:52:58 2020
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 15491509441794
Resetting resetlogs activation ID 1460987657 (0x5714e709)
Fri Apr 24 19:52:59 2020
Setting recovery target incarnation to 3
Fri Apr 24 19:52:59 2020
Assigning activation ID 1566342598 (0x5d5c7dc6)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: Y:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO01.LOG
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Apr 24 19:52:59 2020
SMON: enabling cache recovery
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_3860.trc  (incident=8561):
ORA-00600: 内部错误代码, 参数: [2662], [3606], [3857372426], [3606], [3857377059], [12583040], [], [], [], [], [], []
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_8561\orcl_ora_3860_i8561.trc
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_3860.trc:
ORA-00600: 内部错误代码, 参数: [2662], [3606], [3857372426], [3606], [3857377059], [12583040], [], [], [], [], [], []
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_3860.trc:
ORA-00600: 内部错误代码, 参数: [2662], [3606], [3857372426], [3606], [3857377059], [12583040], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 3860): terminating the instance due to error 600
Instance terminated by USER, pid = 3860
ORA-1092 signalled during: alter database open resetlogs...

这个错误比较常见,通过对数据库scn进行调整,顺利规避该错误,继续启动报如下错误

SQL> startup mount pfile='d:/pfile.txt';
ORACLE 例程已经启动。

Total System Global Area 1.3696E+10 bytes
Fixed Size                  2188768 bytes
Variable Size            6878661152 bytes
Database Buffers         6777995264 bytes
Redo Buffers               37044224 bytes
数据库装载完毕。
SQL> alter database open;
alter database open
*
第 1 行出现错误:
ORA-03113: 通信通道的文件结尾
进程 ID: 5884
会话 ID: 66 序列号: 3
Fri Apr 24 20:57:49 2020
SMON: enabling cache recovery
Successfully onlined Undo Tablespace 2.
Dictionary check beginning
Dictionary check complete
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
No Resource Manager plan active
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0x898ADE43] [PC:0x9287D88, kgegpa()+38]
Dump file d:\app\administrator\diag\rdbms\orcl\orcl\trace\alert_orcl.log
Fri Apr 24 20:57:49 2020
ORACLE V11.2.0.1.0 - 64bit Production vsnsta=0
vsnsql=16 vsnxtr=3
Windows NT Version V6.1  
CPU                 : 16 - type 8664, 16 Physical Cores
Process Affinity    : 0x0x0000000000000000
Memory (Avail/Total): Ph:21429M/32767M, Ph+PgF:54255M/65533M 
Fri Apr 24 20:57:49 2020
Errors in file 
ORA-07445: caught exception [ACCESS_VIOLATION] at [kgegpa()+38] [0x0000000009287D88]
Fri Apr 24 20:57:52 2020
PMON (ospid: 2496): terminating the instance due to error 397
Instance terminated by PMON, pid = 2496

这里的主要错误是由于ORA-07445 kgegpa,根据以前恢复经验,该问题很可能和undo有关,对undo进行处理之后启动库

SQL> startup mount pfile='d:/pfile.txt' ;
ORACLE 例程已经启动。

Total System Global Area 1.3696E+10 bytes
Fixed Size                  2188768 bytes
Variable Size            6878661152 bytes
Database Buffers         6777995264 bytes
Redo Buffers               37044224 bytes
数据库装载完毕。
SQL> recover database;
完成介质恢复。
SQL> alter database open;

数据库已更改。

SMON: enabling tx recovery
Database Characterset is ZHS16GBK
SMON: Restarting fast_start parallel rollback
Fri Apr 24 21:01:28 2020
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_p000_4360.trc  (incident=13377):
ORA-00600: internal error code, arguments: [4198], [], [], [], [], [], [], [], [], [], [], []
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_13377\orcl_p000_4360_i13377.trc
Stopping background process MMNL
Doing block recovery for file 3 block 296
Resuming block recovery (PMON) for file 3 block 296
Block recovery from logseq 3, block 25 to scn 15491947056761
Recovery of Online Redo Log: Thread 1 Group 3 Seq 3 Reading mem 0
  Mem# 0: Y:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO03.LOG
Block recovery completed at rba 3.25.16, scn 3607.20090
Doing block recovery for file 6 block 165592
Resuming block recovery (PMON) for file 6 block 165592
Block recovery from logseq 3, block 33 to scn 15491947056769
Recovery of Online Redo Log: Thread 1 Group 3 Seq 3 Reading mem 0
  Mem# 0: Y:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO03.LOG
Block recovery completed at rba 3.58.16, scn 3607.20098
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_4912.trc  (incident=13321):
ORA-00600: internal error code, arguments: [4198], [], [], [], [], [], [], [], [], [], [], []
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_13321\orcl_smon_4912_i13321.trc
SMON: Parallel transaction recovery slave got internal error
SMON: Downgrading transaction recovery to serial
Stopping background process MMON
Fri Apr 24 21:01:29 2020
Trace dumping is performing id=[cdmp_20200424210129]
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_4912.trc  (incident=13322):
ORA-00600: internal error code, arguments: [4137], [12.30.1712324], [0], [0], [], [], [], [], [], [], [], []
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_13322\orcl_smon_4912_i13322.trc
ORACLE Instance orcl (pid = 14) - Error 600 encountered while recovering transaction (12, 30).
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_4912.trc:
ORA-00600: internal error code, arguments: [4137], [12.30.1712324], [0], [0], [], [], [], [], [], [], [], []
Completed: alter database open upgrade
Fri Apr 24 21:01:30 2020
MMON started with pid=16, OS id=4980 
Fri Apr 24 21:01:31 2020
Sweep [inc][13322]: completed
Corrupt block relative dba: 0x00c395ee (file 3, block 234990)
Fractured block found during buffer read
Data in bad block:
 type: 2 format: 2 rdba: 0x00c395ee
 last change scn: 0x0e16.e5ead38b seq: 0x2b flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0xdb720232
 check value in block header: 0xebe2
 computed block checksum: 0xb60b
Reading datafile'Y:\APP\ADMINISTRATOR\ORADATA\ORCL\UNDOTBS01.DBF'for corruption at rdba: 0x00c395ee (file 3,block 234990)
Reread (file 3, block 234990) found same corrupt data
Corrupt Block Found
         TSN = 2, TSNAME = UNDOTBS1
         RFN = 3, BLK = 234990, RDBA = 12817902
         OBJN = 0, OBJD = -1, OBJECT = , SUBOBJECT = 
         SEGMENT OWNER = , SEGMENT TYPE = 
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_m001_4852.trc  (incident=13641):
ORA-01578: ORACLE data block corrupted (file # 3, block # 234990)
ORA-01110: data file 3: 'Y:\APP\ADMINISTRATOR\ORADATA\ORCL\UNDOTBS01.DBF'
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_13641\orcl_m001_4852_i13641.trc
SQL> create undo tablespace undotbs2 datafile 
2   'Y:\APP\ADMINISTRATOR\ORADATA\ORCL\undo_xff02.dbf' size 128M autoextend on;

表空间已创建。

SQL> drop tablespace undotbs1 including contents and datafiles;

表空间已删除。

SQL> shutdown immediate;
数据库已经关闭。
已经卸载数据库。
ORACLE 例程已经关闭。
SQL> create spfile from pfile='d:/pfile.txt';

文件已创建。

SQL> startup mount
ORACLE 例程已经启动。

Total System Global Area 1.3696E+10 bytes
Fixed Size                  2188768 bytes
Variable Size            6878661152 bytes
Database Buffers         6777995264 bytes
Redo Buffers               37044224 bytes
数据库装载完毕。
SQL> alter database open;

数据库已更改。

数据库启动之后继续报出来的ORA-600 4198和ORA-600 4137以及undo坏块均证明是由于undo异常引起的问题,通过重建新undo,数据库open正常,安排客户进行数据导出导入到新库

记录一次200T的数据库恢复经历

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:记录一次200T的数据库恢复经历

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有一个客户恢复请求,6个节点11.2.0.3 RAC,非归档模式,数据量近200T
df_size


由于存储掉电导致数据库6个节点全部宕机,恢复硬件之后,数据库无法正常启动,报错如下:

SQL> recover database;
ORA-00279: change 318472018583 generated at 05/04/2019 17:58:05 needed for
thread 4
ORA-00289: suggestion :
/u01/app/oracle/product/11.2.0/db_1/dbs/arch4_322810_870181839.dbf
ORA-00280: change 318472018583 for thread 4 is in sequence #322810
Wed Aug 28 11:19:55 2019
ALTER DATABASE RECOVER  DATABAE
Media Recovery Start
Serial Media Recovery started
Recovery of Online Redo Log: Thread 1 Group 14 Seq 552 Reading mem 0
  Mem# 0: +REDO/xff/log2.ora
Recovery of Online Redo Log: Thread 2 Group 15 Seq 126 Reading mem 0
  Mem# 0: +REDO/xff/log3.ora
Recovery of Online Redo Log: Thread 3 Group 18 Seq 122 Reading mem 0
  Mem# 0: +REDO/xff/log6.ora
ORA-279 signalled during: ALTER DATABASE RECOVER  database  ...
Wed Aug 28 11:21:31 2019
ALTER DATABASE RECOVER CANCEL
Media Recovery Canceled
Completed: ALTER DATABASE RECOVER CANCEL

数据库恢复需要thread 4 sequence #322810,查询redo信息
redo


redo已经被覆盖,数据库无法通过正常途径恢复实现数据库open,尝试屏蔽一致性强制拉库操作后

Wed Aug 28 12:40:15 2019
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_smon_51338.trc  (incident=244209):
ORA-00600: internal error code, arguments: [4137], [44.47.613406], [0], [0], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/xff/xff1/incident/incdir_244209/xff1_smon_51338_i244209.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
No Resource Manager plan active
replication_dependency_tracking turned off (no async multimaster replication found)
Wed Aug 28 12:40:16 2019
ORACLE Instance xff1 (pid = 26) - Error 600 encountered while recovering transaction (44, 47).
Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_smon_51338.trc:
ORA-00600: internal error code, arguments: [4137], [44.47.613406], [0], [0], [], [], [], [], [], [], [], []
Wed Aug 28 12:40:20 2019
Exception[type: SIGSEGV,Address not mapped to object][ADDR:0x5122000000C8][PC:0xE1B4D3,ktugru()+87][flags:0x0,count:1]
Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_p086_54066.trc  (incident=245017):
ORA-07445:exception encountered:core dump [ktugru()+87][SIGSEGV][ADDR:0x5122000000C8][Address not mapped to object]
Incident details in: /u01/app/oracle/diag/rdbms/xff/xff1/incident/incdir_245017/xff1_p086_54066_i245017.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Wed Aug 28 12:40:20 2019
Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_p000_53873.trc  (incident=244305):
ORA-00600: internal error code, arguments: [4198], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/xff/xff1/incident/incdir_244305/xff1_p000_53873_i244305.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.

提示undo异常,屏蔽回滚段之后,数据库正常打开没有任何报错信息

Wed Aug 28 12:57:15 2019
SMON: enabling cache recovery
Instance recovery: looking for dead threads
Instance recovery: lock domain invalid but no dead threads
[57676] Successfully onlined Undo Tablespace 22.
Undo initialization finished serial:0 start:2386111306 end:2386112316 diff:1010 (10 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
Wed Aug 28 12:57:17 2019
minact-scn: Inst 1 is now the master inc#:2 mmon proc-id:57624 status:0x7
minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000
No Resource Manager plan active
Starting background process GTX0
Wed Aug 28 12:57:18 2019
GTX0 started with pid=45, OS id=57777
Starting background process RCBG
Wed Aug 28 12:57:18 2019
RCBG started with pid=46, OS id=57779
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Wed Aug 28 12:57:19 2019
QMNC started with pid=47, OS id=57788
Completed: ALTER DATABASE OPEN

后续涉及创建新undo,删除老undo并处理一些类似,基本上恢复正常
OPEN


ORA-600 4194/ORA-600 4193/ORA-600 4137故障解决

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORA-600 4194/ORA-600 4193/ORA-600 4137故障解决

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

对于常见的undo异常错误,ORA-600 4193,ORA-600 4194,ORA-600 4137等错误的处理一般步骤.
适用版本

Oracle Database - Enterprise Edition - Version 9.2.0.1 to 11.2.0.4 [Release 9.2 to 11.2]
Information in this document applies to any platform.

报错现象

The following error is occurring in the alert.log right before the database crashes.
ORA-00600: internal error code, arguments: [4194], [#], [#], [], [], [], [], []
This error indicates that a mismatch has been detected between redo records and rollback (undo) records.
ARGUMENTS:
Arg [a] - Maximum Undo record number in Undo block
Arg [b] - Undo record number from Redo block
Since we are adding a new undo record to our undo block, we would expect that the new record number
 is equal to the maximum record number in the undo block plus one. Before Oracle can add
a new undo record to the undo block it validates that this is correct. If this validation fails,
 then an ORA-600 [4194] will be triggered.

报错原因

This also can be cause by the following defect
Bug 8240762 Abstract: Undo corruptions with ORA-600 [4193]/ORA-600 [4194] or ORA-600 [4137] after SHRINK
Details:
Undo corruption may be caused after a shrink and the same undo block may be used
for two different transactions causing several internal errors like:
ORA-600 [4193] / ORA-600 [4194] for new transactions
ORA-600 [4137] for a transaction rollback

处理步骤

Best practice to create a new undo tablespace.
This method includes segment check.
Create pfile from spfile to edit
>create pfile from spfile;
1. Shutdown the instance
2. set the following parameters in the pfile
    undo_management = manual
    event = '10513 trace name context forever, level 2'
3. >startup restrict pfile=<initsid.ora>
4. >select tablespace_name, status, segment_name from dba_rollback_segs where status != 'OFFLINE';
This is critical - we are looking for all undo segments to be offline - System will always be online.
If any are 'PARTLY AVAILABLE' or 'NEEDS RECOVERY' - Please open an issue with Oracle Support or update the current SR.
If all offline then continue to the next step
5. Create new undo tablespace - example
>create undo tablespace <new undo tablespace> datafile <datafile> size 2000M;
6. Drop old undo tablespace
>drop tablespace <old undo tablespace> including contents and datafiles;
7. >shutdown immediate;
8 >startup nomount;  --> Using your Original spfile
9 modify the spfile with the new undo tablespace name
  Alter system set undo_tablespace = '<new tablespace created in step 5>' scope=spfile;
10. >shutdown immediate;
11. >startup;  --> Using spfile
The reason we create a new undo tablespace first is to use new undo segment numbers
 that are higher then the current segments being used.
This way when a transaction goes to do block clean-out
the reference to that undo segment does not exist and continues with the block clean-out.

参考:tep by step to resolve ORA-600 4194 4193 4197 on database crash (Doc ID 1428786.1)