又一起存储故障导致ORA-00333 ORA-00312恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:又一起存储故障导致ORA-00333 ORA-00312恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

数据库启动报ORA-00333 ORA-00312错误,无法正常open数据库

Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_arc0_4724.trc:
ORA-00333: redo log read error block 63489 count 2048
ORA-00312: online log 2 thread 1: 'F:\ORADATA\SZCG\REDO02.LOG'
ORA-27091: unable to queue I/O
ORA-27070: async read/write failed
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 1) 函数不正确。
Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_arc0_4724.trc:
ORA-00333: redo log read error block 63489 count 2048
Thu Aug 07 10:42:03  2014
ARC0: All Archive destinations made inactive due to error 333
Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_1856.trc:
ORA-00449: 后台进程 'LGWR' 因错误 340 异常终止
ORA-00340: 处理联机日志  (用于线程 ) 时出现 I/O 错误
Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_6548.trc:
ORA-00449: 后台进程 'LGWR' 因错误 340 异常终止
ORA-00340: 处理联机日志  (用于线程 ) 时出现 I/O 错误
Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_8104.trc:
ORA-00449: 后台进程 'LGWR' 因错误 340 异常终止
ORA-00340: 处理联机日志  (用于线程 ) 时出现 I/O 错误
Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_lgwr_884.trc:
ORA-00340: IO error processing online log 3 of thread 1
ORA-00345: redo log write error block 65238 count 13
ORA-00312: online log 3 thread 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27070: async read/write failed
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。
Thu Aug 07 10:42:03  2014
LGWR: terminating instance due to error 340
Thu Aug 07 10:42:05  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_8104.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00449: background process 'LGWR' unexpectedly terminated with error 340
ORA-00340: IO error processing online log  of thread
Thu Aug 07 10:42:05  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_1856.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00449: background process 'LGWR' unexpectedly terminated with error 340
ORA-00340: IO error processing online log  of thread
Thu Aug 07 10:42:05  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_6548.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00449: background process 'LGWR' unexpectedly terminated with error 340
ORA-00340: IO error processing online log  of thread
Thu Aug 07 17:40:05  2014
ALTER DATABASE OPEN
Thu Aug 07 17:40:05  2014
Beginning crash recovery of 1 threads
 parallel recovery started with 15 processes
Thu Aug 07 17:40:06  2014
Started redo scan
Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27070: 异步读取/写入失败
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。
Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27091: 无法将 I/O 排队
ORA-27070: 异步读取/写入失败
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 1) 函数不正确。
Thu Aug 07 17:40:06  2014
Aborting crash recovery due to error 333
Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-333 signalled during: ALTER DATABASE OPEN...

进一步检查发现在7月6日系统就已经报io异常

Sun Jul 06 10:05:23  2014
ARC0: All Archive destinations made inactive due to error 333
Sun Jul 06 10:06:07  2014
KCF: write/open error block=0xd03 online=1
     file=3 F:\ORADATA\SZCG\SYSAUX01.DBF
     error=27070 txt: 'OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。'
Automatic datafile offline due to write error on
file 3: F:\ORADATA\SZCG\SYSAUX01.DBF
Sun Jul 06 10:06:23  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_arc1_2676.trc:
ORA-00333: redo log read error block 63489 count 2048
ORA-00312: online log 2 thread 1: 'F:\ORADATA\SZCG\REDO02.LOG'
ORA-27091: unable to queue I/O
ORA-27070: async read/write failed
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 1) 函数不正确。
Thu Aug 07 10:36:54  2014
ARC1: All Archive destinations made inactive due to error 333
Thu Aug 07 10:37:25  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_m000_5832.trc:
ORA-01135: file 3 accessed for DML/query is offline
ORA-01110: data file 3: 'F:\ORADATA\SZCG\SYSAUX01.DBF'

检查硬件发现raid一块盘完全损坏,另外一块盘也处于告警状态,保护现场拷贝文件过程中发现redo02,redo03,sysaux无法拷贝,使用rman检查发现
sysaux-block


因为redo完全损坏,使用工具跳过坏块,拷贝相关有坏块文件到其他目录,重命名相关文件尝试启动数据库,依然报ORA-00333 ORA-00312

Started redo scan
Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27070: 异步读取/写入失败
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。
Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27091: 无法将 I/O 排队
ORA-27070: 异步读取/写入失败
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 1) 函数不正确。
Thu Aug 07 17:40:06  2014
Aborting crash recovery due to error 333
Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-333 signalled during: ALTER DATABASE OPEN...

设置隐含参数_allow_resetlogs_corruption,尝试强制拉库

Started redo scan
Fri Aug 08 12:13:25  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_3892.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27070: 异步读取/写入失败
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。
Fri Aug 08 12:13:25  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_3892.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27091: 无法将 I/O 排队
ORA-27070: 异步读取/写入失败
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 1) 函数不正确。
Fri Aug 08 12:13:25  2014
Aborting crash recovery due to error 333
Fri Aug 08 12:13:25  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_3892.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-333 signalled during: ALTER DATABASE OPEN...
Fri Aug 08 12:13:45  2014
ALTER DATABASE RECOVER  database until cancel
Fri Aug 08 12:13:45  2014
Media Recovery Start
 parallel recovery started with 15 processes
ORA-279 signalled during: ALTER DATABASE RECOVER  database until cancel  ...
Fri Aug 08 12:13:55  2014
ALTER DATABASE RECOVER    CANCEL
Fri Aug 08 12:13:59  2014
ORA-1547 signalled during: ALTER DATABASE RECOVER    CANCEL  ...
Fri Aug 08 12:13:59  2014
ALTER DATABASE RECOVER CANCEL
ORA-1112 signalled during: ALTER DATABASE RECOVER CANCEL ...
Fri Aug 08 12:14:12  2014
alter database open resetlogs
Fri Aug 08 12:14:13  2014
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
ORA-1245 signalled during: alter database open resetlogs...
Fri Aug 08 12:54:11  2014
alter tablespace sysaux offline
Fri Aug 08 12:54:11  2014
ORA-1109 signalled during: alter tablespace sysaux offline...
Fri Aug 08 13:05:30  2014
alter database open
Fri Aug 08 13:05:30  2014
ORA-1589 signalled during: alter database open...

在offline过程中,数据库检查到sysaux数据文件为offline状态,当表空间只有一个数据文件,而且该数据文件为offline,数据库将会尝试offline sysaux表空间,但是发现该表空间文件非正常scn,无法offline 表空间,导致resetlogs操作失败。这里是操作失误应该先online相关数据文件,然后再进行resetlogs操作

Sat Aug 09 11:56:03  2014
alter database datafile 3 online
Sat Aug 09 11:56:04  2014
Completed: alter database datafile 3 online
Sat Aug 09 11:56:08  2014
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
Sat Aug 09 11:56:18  2014
ARCH: Encountered disk I/O error 19502
Sat Aug 09 11:56:18  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc:
ORA-19502: 文件 "F:\ARCHIVE\ARC01745_0814618167.001", 块编号 55297 写错误 (块大小 = 512)
ORA-27072: 文件 I/O 错误
OSD-04008: WriteFile() 失败, 无法写入文件
O/S-Error: (OS 1) 函数不正确。
ORA-19502: 文件 "F:\ARCHIVE\ARC01745_0814618167.001", 块编号 55297 写错误 (块大小 = 512)
Sat Aug 09 11:56:18  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc:
ORA-19502: 文件 "F:\ARCHIVE\ARC01745_0814618167.001", 块编号 55297 写错误 (块大小 = 512)
ORA-27072: 文件 I/O 错误
OSD-04008: WriteFile() 失败, 无法写入文件
O/S-Error: (OS 1) 函数不正确。
ORA-19502: 文件 "F:\ARCHIVE\ARC01745_0814618167.001", 块编号 55297 写错误 (块大小 = 512)
ARCH: I/O error 19502 archiving log 3 to 'F:\ARCHIVE\ARC01745_0814618167.001'
Sat Aug 09 11:56:18  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc:
ORA-00265: 要求实例恢复, 无法设置 ARCHIVELOG 模式
Archive all online redo logfiles failed:265
RESETLOGS after incomplete recovery UNTIL CHANGE 77983856
Resetting resetlogs activation ID 3562192628 (0xd452bef4)
Online log F:\ORADATA\SZCG\REDO01.LOG: Thread 1 Group 1 was previously cleared
Online log F:\ORADATA\SZCG\REDO02.LOG: Thread 1 Group 2 was previously cleared
Online log D:\REDO04.LOG: Thread 1 Group 4 was previously cleared
Sat Aug 09 11:56:22  2014
Setting recovery target incarnation to 3
Sat Aug 09 11:56:23  2014
Assigning activation ID 3602586269 (0xd6bb1a9d)
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=33, OS id=5900
Sat Aug 09 11:56:23  2014
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=34, OS id=5776
Sat Aug 09 11:56:24  2014
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: F:\ORADATA\SZCG\REDO01.LOG
Successful open of redo thread 1
Sat Aug 09 11:56:24  2014
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sat Aug 09 11:56:24  2014
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
Sat Aug 09 11:56:24  2014
ARC0: Becoming the heartbeat ARCH
Sat Aug 09 11:56:24  2014
SMON: enabling cache recovery
Sat Aug 09 11:56:25  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc:
ORA-00600: 内部错误代码, 参数: [2662], [0], [77983864], [0], [77992379], [8388617], [], []
Sat Aug 09 11:56:26  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc:
ORA-00600: 内部错误代码, 参数: [2662], [0], [77983864], [0], [77992379], [8388617], [], []
Sat Aug 09 11:56:26  2014
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Instance terminated by USER, pid = 4516
ORA-1092 signalled during: alter database open resetlogs...

ORA-600 2662这个错误很熟悉,直接推SCN,数据库open,但是报ORA-600 4194

Sat Aug 09 12:01:28  2014
SMON: enabling cache recovery
Dictionary check complete
Sat Aug 09 12:01:32  2014
SMON: enabling tx recovery
Sat Aug 09 12:01:32  2014
Database Characterset is ZHS16GBK
Opening with internal Resource Manager plan
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=34, OS id=6116
Sat Aug 09 12:01:34  2014
LOGSTDBY: Validating controlfile with logical metadata
Sat Aug 09 12:01:34  2014
LOGSTDBY: Validation complete
Sat Aug 09 12:01:34  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_smon_920.trc:
ORA-00600: internal error code, arguments: [4194], [21], [53], [], [], [], [], []
Sat Aug 09 12:01:36  2014
Doing block recovery for file 2 block 319
Resuming block recovery (PMON) for file 2 block 319
Block recovery from logseq 2, block 56 to scn 1073742003
Sat Aug 09 12:01:36  2014
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: F:\ORADATA\SZCG\REDO02.LOG
Block recovery stopped at EOT rba 2.79.16
Block recovery completed at rba 2.79.16, scn 0.1073742002
Doing block recovery for file 2 block 153
Resuming block recovery (PMON) for file 2 block 153
Block recovery from logseq 2, block 56 to scn 1073741986
Sat Aug 09 12:01:36  2014
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: F:\ORADATA\SZCG\REDO02.LOG
Block recovery completed at rba 2.66.16, scn 0.1073741988
Sat Aug 09 12:01:36  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_smon_920.trc:
ORA-01595: error freeing extent (4) of rollback segment (10))
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4194], [21], [53], [], [], [], [], []
Sat Aug 09 12:01:36  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5272.trc:
ORA-00600: internal error code, arguments: [4194], [21], [53], [], [], [], [], []
Sat Aug 09 12:01:36  2014
Completed: alter database open

尝试重建undo表空间并切换undo_tabspace到新undo表空间解决,因为数据库在恢复过程中使用了隐含参数强制拉库,不能保证数据一致性,强烈建议逻辑方式重建数据库
在本次故障中,所幸的是只有redo和sysaux文件损坏,如果是业务数据文件或者system数据文件损坏,恢复的后果可能更加麻烦,丢失数据可能更加多。再次说明:数据库备份非常重要,数据的安全性不能完全寄希望于硬件之上

ORA-607/ORA-600[4194]不一定是重大灾难

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORA-607/ORA-600[4194]不一定是重大灾难

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

以前解决过ORA-607/ORA-600[4194]和模拟过ORA-607/ORA-600[4194]错误,所以固定思维任务ORA-607/ORA-600[4194]可能就是重大灾难,通过这个案例来说明ORA-607/ORA-600[4194]可能也就是一个常规的不能再常规的错误:有一网友数据库因意外关闭电源导致启动过程出现ORA-00607/ORA-00600[4194]/ORA-00600[4097]的错误,使得数据库启动失败.

SMON: enabling tx recovery
Fri Aug 31 23:14:08 2012
Database Characterset is ZHS16GBK
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=19, OS id=15619
Fri Aug 31 23:14:10 2012
Errors in file /u01/oradata/orcl/bdump/orcl_smon_15589.trc:
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Fri Aug 31 23:14:12 2012
Non-fatal internal error happenned while SMON was doing logging scn->time mapping.
SMON encountered 1 out of maximum 100 non-fatal internal errors.
Fri Aug 31 23:14:12 2012
Completed: alter database open
Fri Aug 31 23:14:14 2012
Errors in file /u01/oradata/orcl/bdump/orcl_smon_15589.trc:
ORA-01595: error freeing extent (2) of rollback segment (4))
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4194], [48], [34], [], [], [], [], []
Fri Aug 31 23:29:41 2012
Errors in file /u01/oradata/orcl/bdump/orcl_smon_15589.trc:
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4194], [17], [10], [], [], [], [], []
Fri Aug 31 23:29:43 2012
Errors in file /u01/oradata/orcl/bdump/orcl_smon_15589.trc:
ORA-00600: internal error code, arguments: [4194], [48], [34], [], [], [], [], []
Fri Aug 31 23:29:44 2012
Errors in file /u01/oradata/orcl/bdump/orcl_pmon_15577.trc:
ORA-00474: SMON process terminated with error
Fri Aug 31 23:29:44 2012
PMON: terminating instance due to error 474
Instance terminated by PMON, pid = 15577

通过alert日志可以定位到SMON_SCN_TIME表或者其回滚操作可能异常,结合alert和trace分析,发现这次错误的操作主要sql语句为:

ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4194], [48], [34], [], [], [], [], []
Current SQL statement for this session:
UPDATE SYS.COL_USAGE$
   SET EQUALITY_PREDS    = EQUALITY_PREDS +
                           DECODE(BITAND(:FLAG, 1), 0, 0, 1),
       EQUIJOIN_PREDS    = EQUIJOIN_PREDS +
                           DECODE(BITAND(:FLAG, 2), 0, 0, 1),
       NONEQUIJOIN_PREDS = NONEQUIJOIN_PREDS +
                           DECODE(BITAND(:FLAG, 4), 0, 0, 1),
       RANGE_PREDS       = RANGE_PREDS + DECODE(BITAND(:FLAG, 8), 0, 0, 1),
       LIKE_PREDS        = LIKE_PREDS + DECODE(BITAND(:FLAG, 16), 0, 0, 1),
       NULL_PREDS        = NULL_PREDS + DECODE(BITAND(:FLAG, 32), 0, 0, 1),
       TIMESTAMP         = :TIME
 WHERE OBJ# = :OBJN
   AND INTCOL# = :COLN
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4194], [17], [10], [], [], [], [], []
Current SQL statement for this session:
UPDATE SYS.MON_MODS$
   SET INSERTS       = INSERTS + :INS,
       UPDATES       = UPDATES + :UPD,
       DELETES       = DELETES + :DEL,
       FLAGS        =
       (DECODE(BITAND(FLAGS, :FLAG), :FLAG, FLAGS, FLAGS + :FLAG)),
       DROP_SEGMENTS = DROP_SEGMENTS + :DROPSEG,
       TIMESTAMP     = :TIME
 WHERE OBJ# = :OBJN
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Current SQL statement for this session:
INSERT INTO SMON_SCN_TIME
  (THREAD,
   TIME_MP,
   TIME_DP,
   SCN,
   SCN_WRP,
   SCN_BAS,
   NUM_MAPPINGS,
   TIM_SCN_MAP)
VALUES
  (0, :1, :2, :3, :4, :5, :6, :7)

这里主要涉及到对oracle的三张表的操作
COL_USAGE$:主要是在收集统计信息的时候作为是否需要收集列直方图信息参考
MON_MODS$:Oracle主要利用该表来记录那些表的数据发生改变,方便收集统计信息
SMON_SCN_TIME:记录SCN和TIME的对应关系
通过这里的分析可以确定这三张表中的数据对于数据库来说不是致命的基表信息,在数据库运行过程中可以清理掉这些信息,最多就是因为数据库性能的下降或者SCN和TIME互转功能不完善.

解决思路
完整的undo异常处理顺序
1.从alert中可以看出来数据库是在open之后由于SMON回滚到上述几条sql异常导致数据库down,所以可以尝试使用system回滚段启动数据库,看看是否可以屏蔽相关问题
2.如果方法1不可行,那使用event屏蔽smon对回滚段的相关操作,使得数据库正常启动
3.如果由于存在特殊事务,event无法屏蔽,尝试使用隐含参数处理该问题
4.如果隐含参数尚无法解决给问题考虑使用bbed
5.如果bbed不能解决,那只能选择dul或者其类似工具处理
这个案例中我们明确的看到是因为上面的三条sql回滚异常出现问题导致,对于这样的问题,经过测试使用方法1和2都能够顺利解决问题(open库之后需要重建undo,删除有问题undo表空间,修改参数[可能包括event],切换undo表空间).因为遇到几次ORA-607/ORA-600[4194]是因为system rollback损坏导致,所以这次开始也认为是一次比较复杂的恢复,最后证明这次是一种非常常规的恢复.对于ORACLE的数据库恢复有经验可能会比较快的定位问题,但是如果按照固定的思路去想可能会让自己走进死胡同.

数据库报ORA-00607/ORA-00600[4194]错误

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:数据库报ORA-00607/ORA-00600[4194]错误

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

昨天晚上处理一起比较奇特的ORA-00600[4194]错误的数据库恢复案例,客户数据库刚刚上线,因为一时疏忽没有做备份.谁知天有不测风云,就这样的系统也会出问题(数据库文件总共 5g redo log sequence#=9).这个事故告诉我们:作为dba在任何时候都不要有侥幸心理,备份重于一切
数据库报ORA-00607/ORA-00600[4194]错误

Thu Jul 26 13:21:11 2012
SMON: enabling cache recovery
Thu Jul 26 13:21:11 2012
Errors in file /orasvr/admin/mispdata/udump/mispdata_ora_2865.trc:
ORA-00600: internal error code, arguments: [4194], [31], [2], [], [], [], [], []
Thu Jul 26 13:21:11 2012
Doing block recovery for file 1 block 18
Block recovery from logseq 3994, block 3 to scn 89979535
Thu Jul 26 13:21:11 2012
Recovery of Online Redo Log: Thread 1 Group 1 Seq 3994 Reading mem 0
  Mem# 0: /orasvr/mispdata/redo01.log
Block recovery stopped at EOT rba 3994.5.16
Block recovery completed at rba 3994.5.16, scn 0.89979533
Doing block recovery for file 1 block 9
Block recovery from logseq 3994, block 3 to scn 89979532
Thu Jul 26 13:21:11 2012
Recovery of Online Redo Log: Thread 1 Group 1 Seq 3994 Reading mem 0
  Mem# 0: /orasvr/mispdata/redo01.log
Block recovery completed at rba 3994.5.16, scn 0.89979533
Thu Jul 26 13:21:11 2012
Errors in file /orasvr/admin/mispdata/udump/mispdata_ora_2865.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4194], [31], [2], [], [], [], [], []
Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604
Instance terminated by USER, pid = 2865
ORA-1092 signalled during: ALTER DATABASE OPEN...

通过alert日志中,我们可以发现是因为ORA-00600[4194]导致数据库不能被正常open,但是这次不同的是在报ORA-00600之前有ORA-00607的错误出现,根据这个提示,应该是一个基本的数据块有问题导致.而ORA-00600[4194]是因为undo和redo不一致导致.对于本错误放在一起分析,大概的评估是因为内部对象的异常出现ora-607,导致undo和redo不一致出现ORA-00600[4194].

trace文件分析

--dump redo
DUMP OF REDO FROM FILE '/orasvr/mispdata/redo02.log'
 Opcodes *.*
 DBAs (file#, block#):
      (1, 18)
 RBAs: 0x000000.00000000.0000 thru 0xffffffff.ffffffff.ffff
 SCNs: scn: 0x0000.00000000 thru scn: 0xffff.ffffffff
 Times: creation thru eternity
 FILE HEADER:
	Compatibility Vsn = 169870080=0xa200300
	Db ID=658120234=0x273a1e2a, Db Name='MISPDATA'
	Activation ID=658142762=0x273a762a
	Control Seq=16668=0x411c, File size=102400=0x19000
	File Number=2, Blksiz=512, File Type=2 LOG
 descrip:"Thread 0001, Seq# 0000003992, SCN 0x0000055c5e3c-0x0000055cac62"
 thread: 1 nab: 0x5 seq: 0x00000f98 hws: 0x6 eot: 0 dis: 0
 resetlogs count: 0x2d42646a scn: 0x0000.00000001 (1)
 resetlogs terminal rcv count: 0x0 scn: 0x0000.00000000
 prev resetlogs count: 0x0 scn: 0x0000.00000000
 prev resetlogs terminal rcv count: 0x0 scn: 0x0000.00000000
 Low  scn: 0x0000.055c5e3c (89939516) 07/26/2012 11:17:42
 Next scn: 0x0000.055cac62 (89959522) 07/26/2012 13:16:19
 Enabled scn: 0x0000.00000001 (1) 08/16/2011 11:50:10
 Thread closed scn: 0x0000.055cac61 (89959521) 07/26/2012 11:17:42
 Disk cksum: 0x3088 Calc cksum: 0x3088
 Terminal recovery stop scn: 0x0000.00000000
 Terminal recovery  01/01/1988 00:00:00
 Most recent redo scn: 0x0000.00000000
 Largest LWN: 0 blocks
 End-of-redo stream : No
 Unprotected mode
 Miscellaneous flags: 0x0
 Thread internal enable indicator: thr: 0, seq: 0 scn: 0x0000.00000000
--ORA-00600错误提示
*** 2012-07-26 13:21:11.566
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4194], [31], [2], [], [], [], [], []
Current SQL statement for this session:
update undo$ set name=:2,file#=:3,block#=:4,status$=:5,user#=:6,undosqn=:7,
xactsqn=:8,scnbas=:9,scnwrp=:10,inst#=:11,ts#=:12,spare1=:13 where us#=:1
--ora-607
Error 607 in redo application callback
TYP:0 CLS:16 AFN:1 DBA:0x00400012 OBJ:4294967295 SCN:0x0000.0551610e SEQ:  1 OP:5.1
ktudb redo: siz: 256 spc: 7892 flg: 0x0012 seq: 0x003d rec: 0x02
            xid:  0x0000.026.00000035
ktubl redo: slt: 38 rci: 0 opc: 11.1 objn: 15 objd: 15 tsn: 0
Undo type:  Regular undo        Begin trans    Last buffer split:  No
Temp Object:  No
Tablespace Undo:  No
             0x00000000  prev ctl uba: 0x00400012.003d.01
prev ctl max cmt scn:  0x0000.0550709b  prev tx cmt scn:  0x0000.0550709c
txn start scn:  0xffff.ffffffff  logon user: 0  prev brb: 4194318  prev bcl: 0 KDO undo record:
KTB Redo
op: 0x04  ver: 0x01
op: L  itl: xid:  0x0000.01e.00000035 uba: 0x00400012.003d.01
                      flg: C---    lkc:  0     scn: 0x0000.05511296
KDO Op code: URP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x0040006a  hdba: 0x00400069
itli: 1  ispac: 0  maxfr: 4863
tabn: 0 slot: 1(0x1) flag: 0x2c lock: 0 ckix: 0
ncol: 17 nnew: 12 size: 0
col  1: [ 9]  5f 53 59 53 53 4d 55 31 24
col  2: [ 2]  c1 02
col  3: [ 2]  c1 03
col  4: [ 2]  c1 0a
col  5: [ 5]  c4 5a 12 5a 14
col  6: [ 1]  80
col  7: [ 4]  c3 08 5f 3d
col  8: [ 4]  c3 02 38 52
col  9: [ 1]  80
col 10: [ 2]  c1 04
col 11: [ 2]  c1 02
col 16: [ 2]  c1 02
Block after image is corrupt:
buffer tsn: 0 rdba: 0x00400012 (1/18)
scn: 0x0000.0551610e seq: 0x01 flg: 0x04 tail: 0x610e0201
frmt: 0x02 chkval: 0x65f8 type: 0x02=KTU UNDO BLOCK

这里信息比较多:
1.dump redo部分得到file 1 block 18块可能异常
2.ora-600部分可以得出数据库在执行undo$对象update的回滚操作时候报错
3.通过ora-607信息得到update undo$记录对应的数据块是file 1 block 106(dba 0x00400069),在相同数据库版本数据库中查询.也就是说undo$这个回滚段回滚的时候出现错误.

SQL> SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME, A.PARTITION_NAME
  2    FROM DBA_EXTENTS A
  3   WHERE FILE_ID = &FILE_ID
  4     AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1;
Enter value for file_id: 1
old   3:  WHERE FILE_ID = &FILE_ID
new   3:  WHERE FILE_ID = 1
Enter value for block_id: 106
old   4:    AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
new   4:    AND 106 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
OWNER
------------------------------
SEGMENT_NAME
--------------------------------------------------------------------------------
SEGMENT_TYPE       TABLESPACE_NAME                PARTITION_NAME
------------------ ------------------------------ ------------------------------
SYS
UNDO$
TABLE              SYSTEM

4.发现dba 0x00400012发现坏块是file 1 block 18,查询坏块对象为

SQL> SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME, A.PARTITION_NAME
  2    FROM DBA_EXTENTS A
  3   WHERE FILE_ID = &FILE_ID
  4     AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1;
Enter value for file_id: 1
old   3:  WHERE FILE_ID = &FILE_ID
new   3:  WHERE FILE_ID = 1
Enter value for block_id: 18
old   4:    AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
new   4:    AND 18 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
OWNER
------------------------------
SEGMENT_NAME
--------------------------------------------------------------------------------
SEGMENT_TYPE       TABLESPACE_NAME                PARTITION_NAME
------------------ ------------------------------ ------------------------------
SYS
SYSTEM
ROLLBACK           SYSTEM

通过这里的分析,大概可以确定这次故障的原因:
因为ROLLBACK(file 1 block 18)坏块,redo 恢复undo 出现异常出现ORA-607,使得undo和redo不一致从而出现ORA-00600[4194],导致undo$(file 1 block 106)中的一条update事务不能被正常提交或者回滚,从而使得该数据库不能被正常打开.
针对这个库因为ROLLBACK异常,使用隐含参数无法屏蔽该回滚段,因为这个数据量非常小,我们选择了挖数据文件.如果数据量比较大,可以通过bbed尝试提交undo$(file 1 block 106)数据块中事务,看人品是否能够正常启动.

ORA-00600[4194]故障解决

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORA-00600[4194]故障解决

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

朋友数据库因为断电,导致数据库正常启动片刻之后,自动down掉
一、alert日志

Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Picked latch-free SCN scheme 2
Autotune of undo retention is turned on.
IMODE=BR
ILAT =18
LICENSE_MAX_USERS = 0
SYS auditing is disabled
ksdpec: called for event 13740 prior to event group initialization
Starting up ORACLE RDBMS Version: 10.2.0.1.0.
System parameters with non-default values:
  processes                = 150
  __shared_pool_size       = 58720256
  __large_pool_size        = 4194304
  __java_pool_size         = 4194304
  __streams_pool_size      = 4194304
  nls_date_format          = yyyy-mm-dd hh24:mi:ss
  sga_target               = 335544320
  control_files            = /u02/ezhou/control01.ctl
  db_block_size            = 8192
  compatible               = 10.2.0.1.0
  log_archive_dest         = /u02/arch
  log_archive_max_processes= 10
  db_file_multiblock_read_count= 16
  fast_start_mttr_target   = 300
  undo_management          = AUTO
  undo_tablespace          = UNDOTBS1
  remote_login_passwordfile= EXCLUSIVE
  db_domain                =
  dispatchers              = (PROTOCOL=TCP) (SERVICE=ezhouXDB)
  job_queue_processes      = 10
  background_dump_dest     = /u01/pp/oracle/admin/ezhou/bdump
  user_dump_dest           = /u01/pp/oracle/admin/ezhou/udump
  core_dump_dest           = /u01/pp/oracle/admin/ezhou/cdump
  audit_file_dest          = /u01/pp/oracle/admin/ezhou/adump
  db_name                  = ezhou
  open_cursors             = 400
  sql_trace                = TRUE
  pga_aggregate_target     = 94371840
MMAN started with pid=4, OS id=5539
PMON started with pid=2, OS id=5535
DBW0 started with pid=5, OS id=5541
LGWR started with pid=6, OS id=5543
SMON started with pid=8, OS id=5547
CJQ0 started with pid=10, OS id=5577
RECO started with pid=9, OS id=5575
Sat Dec 10 17:15:40 2011
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'
MMNL started with pid=12, OS id=5581
MMON started with pid=11, OS id=5579
Sat Dec 10 17:15:40 2011
starting up 1 shared server(s) ...
PSP0 started with pid=3, OS id=5537
CKPT started with pid=7, OS id=5545
Sat Dec 10 17:15:42 2011
ALTER DATABASE   MOUNT
Sat Dec 10 17:15:46 2011
Setting recovery target incarnation to 3
Sat Dec 10 17:15:47 2011
Successful mount of redo thread 1, with mount id 4055654398
Sat Dec 10 17:15:47 2011
Database mounted in Exclusive Mode
Completed: ALTER DATABASE   MOUNT
Sat Dec 10 17:15:47 2011
ALTER DATABASE OPEN
Sat Dec 10 17:15:47 2011
Beginning crash recovery of 1 threads
Sat Dec 10 17:15:47 2011
Started redo scan
Sat Dec 10 17:15:48 2011
Completed redo scan
 319 redo blocks read, 98 data blocks need recovery
Sat Dec 10 17:15:50 2011
Started redo application at
 Thread 1: logseq 24, block 3
Sat Dec 10 17:15:50 2011
Recovery of Online Redo Log: Thread 1 Group 3 Seq 24 Reading mem 0
  Mem# 0 errs 0: /u02/ezhou/redo03.log
Sat Dec 10 17:15:50 2011
Completed redo application
Sat Dec 10 17:15:51 2011
Completed crash recovery at
 Thread 1: logseq 24, block 322, scn 6168722
 98 data blocks read, 98 data blocks written, 319 redo blocks read
Sat Dec 10 17:15:51 2011
LGWR: STARTING ARCH PROCESSES
ARC1 started with pid=17, OS id=5645
ARC0 started with pid=16, OS id=5643
ARC3 started with pid=19, OS id=5649
ARC4 started with pid=20, OS id=5651
ARC2 started with pid=18, OS id=5647
ARC6 started with pid=22, OS id=5655
ARC7 started with pid=23, OS id=5657
ARC5 started with pid=21, OS id=5653
ARC8 started with pid=24, OS id=5659
Sat Dec 10 17:15:52 2011
ARC0: Archival started
ARC1: Archival started
ARC2: Archival started
ARC3: Archival started
ARC4: Archival started
ARC5: Archival started
ARC6: Archival started
ARC7: Archival started
ARC8: Archival started
ARC9: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC9 started with pid=25, OS id=5661
Sat Dec 10 17:15:52 2011
Thread 1 advanced to log sequence 25
Sat Dec 10 17:15:53 2011
ARC2: STARTING ARCH PROCESSES
Sat Dec 10 17:15:53 2011
ARC6: Becoming the 'no FAL' ARCH
ARC6: Becoming the 'no SRL' ARCH
Sat Dec 10 17:15:53 2011
ARC3: Becoming the heartbeat ARCH
Sat Dec 10 17:15:53 2011
Thread 1 opened at log sequence 25
  Current log# 1 seq# 25 mem# 0: /u02/ezhou/redo01.log
  Current log# 1 seq# 25 mem# 1: /u02/ezhou/redo01a.rdo
Successful open of redo thread 1
Sat Dec 10 17:15:53 2011
SMON: enabling cache recovery
Sat Dec 10 17:15:54 2011
ARCa: Archival started
ARC2: STARTING ARCH PROCESSES COMPLETE
ARCa started with pid=26, OS id=5663
Sat Dec 10 17:15:57 2011
Successfully onlined Undo Tablespace 1.
Sat Dec 10 17:15:57 2011
SMON: enabling tx recovery
Sat Dec 10 17:15:57 2011
Database Characterset is AL32UTF8
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=27, OS id=5666
Sat Dec 10 17:16:13 2011
Errors in file /u01/pp/oracle/admin/ezhou/bdump/ezhou_smon_5547.trc:
ORA-00600: internal error code, arguments: [4194], [30], [27], [], [], [], [], []
Sat Dec 10 17:16:17 2011
Completed: ALTER DATABASE OPEN
Sat Dec 10 17:16:27 2011
Doing block recovery for file 2 block 4124
Block recovery from logseq 25, block 68 to scn 6168829
Sat Dec 10 17:16:27 2011
Recovery of Online Redo Log: Thread 1 Group 1 Seq 25 Reading mem 0
  Mem# 0 errs 0: /u02/ezhou/redo01.log
  Mem# 1 errs 0: /u02/ezhou/redo01a.rdo
Block recovery stopped at EOT rba 25.126.16
Block recovery completed at rba 25.126.16, scn 0.6168829
Doing block recovery for file 2 block 73
Block recovery from logseq 25, block 68 to scn 6168786
Sat Dec 10 17:16:28 2011
Recovery of Online Redo Log: Thread 1 Group 1 Seq 25 Reading mem 0
  Mem# 0 errs 0: /u02/ezhou/redo01.log
  Mem# 1 errs 0: /u02/ezhou/redo01a.rdo
Block recovery completed at rba 25.69.16, scn 0.6168789
Sat Dec 10 17:16:28 2011
Errors in file /u01/pp/oracle/admin/ezhou/bdump/ezhou_smon_5547.trc:
ORA-01595: error freeing extent (2) of rollback segment (5))
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4194], [30], [27], [], [], [], [], []
Sat Dec 10 17:16:30 2011
Errors in file /u01/pp/oracle/admin/ezhou/bdump/ezhou_j002_5690.trc:
ORA-00600: internal error code, arguments: [4194], [30], [27], [], [], [], [], []
Sat Dec 10 17:16:37 2011
Doing block recovery for file 2 block 4124
Block recovery from logseq 25, block 68 to scn 6168829
Sat Dec 10 17:16:37 2011
Recovery of Online Redo Log: Thread 1 Group 1 Seq 25 Reading mem 0
  Mem# 0 errs 0: /u02/ezhou/redo01.log
  Mem# 1 errs 0: /u02/ezhou/redo01a.rdo
Block recovery completed at rba 25.126.16, scn 0.6168830
Doing block recovery for file 2 block 73
Block recovery from logseq 25, block 68 to scn 6168841
Sat Dec 10 17:16:37 2011
Recovery of Online Redo Log: Thread 1 Group 1 Seq 25 Reading mem 0
  Mem# 0 errs 0: /u02/ezhou/redo01.log
  Mem# 1 errs 0: /u02/ezhou/redo01a.rdo
Block recovery completed at rba 25.149.16, scn 0.6168843
Sat Dec 10 17:16:37 2011
Errors in file /u01/pp/oracle/admin/ezhou/bdump/ezhou_j002_5690.trc:
ORA-12012: error on auto execute of job 8886
ORA-00607: Internal error occurred while making a change to a data block
Sat Dec 10 17:16:41 2011
Errors in file /u01/pp/oracle/admin/ezhou/bdump/ezhou_j003_5692.trc:
ORA-00600: internal error code, arguments: [4194], [30], [27], [], [], [], [], []
Sat Dec 10 17:16:42 2011
DEBUG: Replaying xcb 0x32a2b17c, pmd 0x32bdbd24 for failed op 8
Doing block recovery for file 2 block 4124
Block recovery from logseq 25, block 68 to scn 6168829
Sat Dec 10 17:16:42 2011
Recovery of Online Redo Log: Thread 1 Group 1 Seq 25 Reading mem 0
  Mem# 0 errs 0: /u02/ezhou/redo01.log
  Mem# 1 errs 0: /u02/ezhou/redo01a.rdo
Block recovery completed at rba 25.126.16, scn 0.6168830
Sat Dec 10 17:16:43 2011
Errors in file /u01/pp/oracle/admin/ezhou/bdump/ezhou_j003_5692.trc:
ORA-00600: internal error code, arguments: [4194], [30], [27], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [30], [27], [], [], [], [], []
Sat Dec 10 17:16:46 2011
Errors in file /u01/pp/oracle/admin/ezhou/bdump/ezhou_j003_5692.trc:
ORA-00600: internal error code, arguments: [4194], [30], [27], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [30], [27], [], [], [], [], []
Sat Dec 10 17:17:46 2011
DEBUG: Replaying xcb 0x32a2b17c, pmd 0x32bdbd24 for failed op 8
Doing block recovery for file 2 block 4124
Block recovery from logseq 25, block 68 to scn 6168829
Sat Dec 10 17:17:46 2011
Recovery of Online Redo Log: Thread 1 Group 1 Seq 25 Reading mem 0
  Mem# 0 errs 0: /u02/ezhou/redo01.log
  Mem# 1 errs 0: /u02/ezhou/redo01a.rdo
Block recovery completed at rba 25.126.16, scn 0.6168830
Sat Dec 10 17:17:48 2011
Errors in file /u01/pp/oracle/admin/ezhou/bdump/ezhou_pmon_5535.trc:
ORA-00600: internal error code, arguments: [4194], [30], [27], [], [], [], [], []
Sat Dec 10 17:17:49 2011
Errors in file /u01/pp/oracle/admin/ezhou/bdump/ezhou_pmon_5535.trc:
ORA-00600: internal error code, arguments: [4194], [30], [27], [], [], [], [], []
PMON: terminating instance due to error 472
Instance terminated by PMON, pid = 5535

二、MOS记录

ERROR:
  ORA-600 [4194] [a] [b]
VERSIONS:
  versions 6.0 to 10.1
DESCRIPTION:
  A mismatch has been detected between Redo records and rollback (Undo)
  records.
  We are validating the Undo record number relating to the change being
  applied against the maximum undo record number recorded in the undo block.
  This error is reported when the validation fails.
ARGUMENTS:
  Arg [a] Maximum Undo record number in Undo block
  Arg [b] Undo record number from Redo block

三、解决办法
1、修改参数
undo_management= MANUAL
undo_tablespace= SYSTEM
2、打开数据库,删除当前undo空间,重建新undo空间
3、修改参数
undo_management= AUTO
undo_tablespace= UNDOTBSNEW
4、重新启动数据库