raid强制上线后数据库无法启动故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:raid强制上线后数据库无法启动故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

由于raid掉盘过多,强制raid上线,然后启动数据库报以下错误

Mon Apr 19 23:19:28 2021
ALTER DATABASE OPEN
Beginning crash recovery of 1 threads
 parallel recovery started with 15 processes
Started redo scan
Completed redo scan
 read 106750 KB redo, 9080 data blocks need recovery
Mon Apr 19 23:19:45 2021
Slave exiting with ORA-1115 exception
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_p000_3277.trc:
ORA-01115: IO error reading block from file 9 (block # 339)
ORA-01110: data file 9: '/u01/app/oracle/oradata/orcl/dev02.dbf'
ORA-27072: File I/O error
Additional information: 4
Additional information: 326
Additional information: 24576
ORA-27072: File I/O error
Additional information: 4
Additional information: 326
Additional information: 24576
ORA-27072: File I/O error
Additional information: 4
Additional information: 326
Additional information: 24576
ORA-27072: File I/O error
Additional information: 4
Additional information: 326
Additional information: 24576
ORA-27072: File I/O error
Additional information: 4
Additional information: 326
Additional information: 24576
ORA-27072: File I/O error
Additional information: 4
Additional information: 326
Additional information: 24576
ORA-27072: File I/O error
Additional information: 4
Additional information: 326
Additional information: 24576
ORA-27072: File I/O error
Additional information: 4
Additional information: 326
Additional information: 24576
ORA-27072: File I
Mon Apr 19 23:19:45 2021
Aborting crash recovery due to slave death, attempting serial crash recovery
Beginning crash recovery of 1 threads
Started redo scan
Completed redo scan
 read 106750 KB redo, 9080 data blocks need recovery
Aborting crash recovery due to error 1115
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_3275.trc:
ORA-01115: IO error reading block from file 9 (block # 329)
ORA-01110: data file 9: '/u01/app/oracle/oradata/orcl/dev02.dbf'
ORA-1115 signalled during: ALTER DATABASE OPEN...

错误提示比较明显IO error,结合客户强行上线raid的操作,比较明显是由于底层io问题导致该错误,直接对此文件dbv检查

[oracle@database orcl]$ dbv file=dev02.dbf

DBVERIFY: Release 11.2.0.4.0 - Production on Mon Apr 19 23:59:03 2021

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /u01/app/oracle/oradata/orcl/dev02.dbf

DBV-00600: Fatal Error - [28] [27061] [0] [0]

对于此类情况,通过工具进行处理

DUL> copy file from  /u01/app/oracle/oradata/orcl/dev02.dbf to /oradata/dev02.dbf
 
starting copy datafile '/u01/app/oracle/oradata/orcl/dev02.dbf' to '/oradata/dev02.dbf'
read data error from file '/u01/app/oracle/oradata/orcl/dev02.dbf'.error message:Input/output error
read block# error: 303
read data error from file '/u01/app/oracle/oradata/orcl/dev02.dbf'.error message:Input/output error
read block# error: 304
read data error from file '/u01/app/oracle/oradata/orcl/dev02.dbf'.error message:Input/output error
read block# error: 329
datafile copy completed with 2 block error.

dbv校验文件

[oracle@database oradata]$ dbv file=dev02.dbf 

DBVERIFY: Release 11.2.0.4.0 - Production on Tue Apr 20 00:28:31 2021

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /oradata/dev02.dbf
Page 303 is marked corrupt
Corrupt block relative dba: 0x0240012f (file 9, block 303)
Completely zero block found during dbv: 

Page 304 is marked corrupt
Corrupt block relative dba: 0x02400130 (file 9, block 304)
Completely zero block found during dbv: 

Page 329 is marked corrupt
Corrupt block relative dba: 0x02400149 (file 9, block 329)
Completely zero block found during dbv: 



DBVERIFY - Verification complete

Total Pages Examined         : 3932160
Total Pages Processed (Data) : 3213723
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 714294
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 4139
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 1
Total Pages Marked Corrupt   : 3
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 85078875 (6.85078875)

通过对io error的文件进行处理,最终损坏三个block,最大限度抢救数据.使用被恢复出来的文件,尝试open库遭遇以下错误

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [2662], [6], [85035771], [6],
[85084136], [12583040], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [2662], [6], [85035770], [6],
[85084136], [12583040], [], [], [], [], [], []
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [2662], [6], [85035764], [6],
[85084136], [12583040], [], [], [], [], [], []
Process ID: 6733
Session ID: 570 Serial number: 3

ora-600 2662这个错误比较明显,处理文件头scn,继续open库

SQL> alter database open ;
alter database open 
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 6840
Session ID: 570 Serial number: 3

查看alert日志信息

Tue Apr 20 01:22:27 2021
alter database open upgrade
Beginning crash recovery of 1 threads
 parallel recovery started with 15 processes
Started redo scan
Completed redo scan
 read 1 KB redo, 3 data blocks need recovery
Started redo application at
 Thread 1: logseq 1, block 3
Recovery of Online Redo Log: Thread 1 Group 1 Seq 1 Reading mem 0
  Mem# 0: /u01/app/oracle/oradata/orcl/redo01.log
Completed redo application of 0.00MB
Completed crash recovery at
 Thread 1: logseq 1, block 5, scn 25854859541
 3 data blocks read, 3 data blocks written, 1 redo k-bytes read
Tue Apr 20 01:22:28 2021
Thread 1 advanced to log sequence 2 (thread open)
Thread 1 opened at log sequence 2
  Current log# 2 seq# 2 mem# 0: /u01/app/oracle/oradata/orcl/redo02.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Tue Apr 20 01:22:28 2021
SMON: enabling cache recovery
[6840] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:5902014 end:5905574 diff:3560 (35 seconds)
Dictionary check beginning
Dictionary check complete
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_6824.trc  (incident=63970):
ORA-00600: internal error code, arguments: [6006], [1], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_63970/orcl_smon_6824_i63970.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ORACLE Instance orcl (pid = 14) - Error 600 encountered while recovering transaction (24, 2) on object 89023.
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_6824.trc:
ORA-00600: internal error code, arguments: [6006], [1], [], [], [], [], [], [], [], [], [], []
Tue Apr 20 01:22:38 2021
ORACLE Instance orcl (pid = 14) - Error 600 encountered while recovering transaction (63, 3) on object 89023.
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_6824.trc:
ORA-00600: internal error code, arguments: [6006], [1], [], [], [], [], [], [], [], [], [], []
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_6824.trc  (incident=63974):
ORA-00600: internal error code, arguments: [6006], [1], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_63974/orcl_smon_6824_i63974.trc
Tue Apr 20 01:22:55 2021
PMON (ospid: 6798): terminating the instance due to error 474

这个错误是比较常见的错误,参考:ORACLE Instance XFF (pid = 18) – Error 600 encountered while recovering transaction ,通过处理之后,数据库open成功

SQL> startup mount pfile='/tmp/pfile';
ORACLE instance started.

Total System Global Area 1603411968 bytes
Fixed Size                  2253664 bytes
Variable Size            1023413408 bytes
Database Buffers          570425344 bytes
Redo Buffers                7319552 bytes
Database mounted.
SQL> recover database;
Media recovery complete.
SQL> alter database open;

Database altered.

后续安排逻辑导出,导入新库

ORA-00333 故障恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORA-00333 故障恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

数据库启动报错

SQL> startup
ORACLE instance started.
Total System Global Area 1241513984 bytes
Fixed Size                  1219136 bytes
Variable Size             218105280 bytes
Database Buffers         1006632960 bytes
Redo Buffers               15556608 bytes
Database mounted.
ORA-00333: redo log read error block 48641 count 8192

数据库启动报ORA-00333错误,官方解释为读redo log发生错误.

00333, 00000, "redo log read error block %s count %s"
// *Cause:  An IO error occurred while reading the log described in the
//          accompanying error.
// *Action: Restore accessibility to file, or get another copy of the file.

alert日志

Sat Apr 14 00:39:13 2018
 alter database open
Sat Apr 14 00:39:13 2018
Beginning crash recovery of 1 threads
 parallel recovery started with 7 processes
Sat Apr 14 00:39:13 2018
Started redo scan
Sat Apr 14 00:39:14 2018
Errors in file /oracle/admin/oa/udump/oa_ora_5659.trc:
ORA-00333: redo log read error block 54785 count 2048
ORA-00312: online log 1 thread 1: '/oracle/oradata/oa/redo01.log'
ORA-27072: File I/O error
Linux Error: 5: Input/output error
Additional information: 4
Additional information: 54785
Additional information: 957952
Sat Apr 14 00:39:14 2018
Errors in file /oracle/admin/oa/udump/oa_ora_5659.trc:
ORA-00333: redo log read error block 48641 count 8192
ORA-00312: online log 1 thread 1: '/oracle/oradata/oa/redo01.log'
ORA-27091: unable to queue I/O
ORA-27072: File I/O error
Linux Error: 5: Input/output error
Additional information: 4
Additional information: 54785
Additional information: 957952
Sat Apr 14 00:39:14 2018
Aborting crash recovery due to error 333
Sat Apr 14 00:39:14 2018
Errors in file /oracle/admin/oa/udump/oa_ora_5659.trc:
ORA-00333: redo log read error block 48641 count 8192
ORA-333 signalled during:  alter database open...

由于硬件异常,数据库在启动的时候读取redo异常,从而使得数据库无法正常启动

检查系统日志

Apr 14 11:14:58 oa kernel: Info fld=0x0, Current sda: sense key Hardware Error
Apr 14 11:14:59 oa kernel: Additional sense: Internal target failure
Apr 14 11:14:59 oa kernel: end_request: I/O error, dev sda, sector 190500041
Apr 14 11:14:59 oa kernel: SCSI error : <0 0 0 0> return code = 0x8000002
Apr 14 11:14:59 oa kernel: Info fld=0x0, Current sda: sense key Hardware Error
Apr 14 11:14:59 oa kernel: Additional sense: Internal target failure
Apr 14 11:14:59 oa kernel: end_request: I/O error, dev sda, sector 190500049
Apr 14 11:14:59 oa kernel: SCSI error : <0 0 0 0> return code = 0x8000002
Apr 14 11:14:59 oa kernel: Info fld=0x0, Current sda: sense key Hardware Error
Apr 14 11:14:59 oa kernel: Additional sense: Internal target failure
Apr 14 11:14:59 oa kernel: end_request: I/O error, dev sda, sector 190500057
Apr 14 11:14:59 oa kernel: SCSI error : <0 0 0 0> return code = 0x8000002
Apr 14 11:14:59 oa kernel: Info fld=0x0, Current sda: sense key Hardware Error
Apr 14 11:14:59 oa kernel: Additional sense: Internal target failure
Apr 14 11:14:59 oa kernel: end_request: I/O error, dev sda, sector 190500065
Apr 14 11:14:59 oa kernel: SCSI error : <0 0 0 0> return code = 0x8000002

大量类似I/O error, dev sda, sector错误,很可能是由于硬件方面异常导致.

损坏redo为当前redo
redo


针对这样的情况,由于是硬件故障,先要通过dbv或者rman检查其他数据文件是否正常,如果有数据文件不能读,那需要对数据文件进行特殊处理.本次恢复的中,客户相对比较幸运,所有数据文件全部可以正常访问,只是当前redo异常,通过隐含参数强制拉库,然后导出数据,重建库解决.类似文章:又一起存储故障导致ORA-00333 ORA-00312恢复

又一起存储故障导致ORA-00333 ORA-00312恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:又一起存储故障导致ORA-00333 ORA-00312恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

数据库启动报ORA-00333 ORA-00312错误,无法正常open数据库

Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_arc0_4724.trc:
ORA-00333: redo log read error block 63489 count 2048
ORA-00312: online log 2 thread 1: 'F:\ORADATA\SZCG\REDO02.LOG'
ORA-27091: unable to queue I/O
ORA-27070: async read/write failed
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 1) 函数不正确。
Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_arc0_4724.trc:
ORA-00333: redo log read error block 63489 count 2048
Thu Aug 07 10:42:03  2014
ARC0: All Archive destinations made inactive due to error 333
Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_1856.trc:
ORA-00449: 后台进程 'LGWR' 因错误 340 异常终止
ORA-00340: 处理联机日志  (用于线程 ) 时出现 I/O 错误
Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_6548.trc:
ORA-00449: 后台进程 'LGWR' 因错误 340 异常终止
ORA-00340: 处理联机日志  (用于线程 ) 时出现 I/O 错误
Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_8104.trc:
ORA-00449: 后台进程 'LGWR' 因错误 340 异常终止
ORA-00340: 处理联机日志  (用于线程 ) 时出现 I/O 错误
Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_lgwr_884.trc:
ORA-00340: IO error processing online log 3 of thread 1
ORA-00345: redo log write error block 65238 count 13
ORA-00312: online log 3 thread 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27070: async read/write failed
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。
Thu Aug 07 10:42:03  2014
LGWR: terminating instance due to error 340
Thu Aug 07 10:42:05  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_8104.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00449: background process 'LGWR' unexpectedly terminated with error 340
ORA-00340: IO error processing online log  of thread
Thu Aug 07 10:42:05  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_1856.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00449: background process 'LGWR' unexpectedly terminated with error 340
ORA-00340: IO error processing online log  of thread
Thu Aug 07 10:42:05  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_6548.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00449: background process 'LGWR' unexpectedly terminated with error 340
ORA-00340: IO error processing online log  of thread
Thu Aug 07 17:40:05  2014
ALTER DATABASE OPEN
Thu Aug 07 17:40:05  2014
Beginning crash recovery of 1 threads
 parallel recovery started with 15 processes
Thu Aug 07 17:40:06  2014
Started redo scan
Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27070: 异步读取/写入失败
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。
Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27091: 无法将 I/O 排队
ORA-27070: 异步读取/写入失败
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 1) 函数不正确。
Thu Aug 07 17:40:06  2014
Aborting crash recovery due to error 333
Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-333 signalled during: ALTER DATABASE OPEN...

进一步检查发现在7月6日系统就已经报io异常

Sun Jul 06 10:05:23  2014
ARC0: All Archive destinations made inactive due to error 333
Sun Jul 06 10:06:07  2014
KCF: write/open error block=0xd03 online=1
     file=3 F:\ORADATA\SZCG\SYSAUX01.DBF
     error=27070 txt: 'OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。'
Automatic datafile offline due to write error on
file 3: F:\ORADATA\SZCG\SYSAUX01.DBF
Sun Jul 06 10:06:23  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_arc1_2676.trc:
ORA-00333: redo log read error block 63489 count 2048
ORA-00312: online log 2 thread 1: 'F:\ORADATA\SZCG\REDO02.LOG'
ORA-27091: unable to queue I/O
ORA-27070: async read/write failed
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 1) 函数不正确。
Thu Aug 07 10:36:54  2014
ARC1: All Archive destinations made inactive due to error 333
Thu Aug 07 10:37:25  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_m000_5832.trc:
ORA-01135: file 3 accessed for DML/query is offline
ORA-01110: data file 3: 'F:\ORADATA\SZCG\SYSAUX01.DBF'

检查硬件发现raid一块盘完全损坏,另外一块盘也处于告警状态,保护现场拷贝文件过程中发现redo02,redo03,sysaux无法拷贝,使用rman检查发现
sysaux-block


因为redo完全损坏,使用工具跳过坏块,拷贝相关有坏块文件到其他目录,重命名相关文件尝试启动数据库,依然报ORA-00333 ORA-00312

Started redo scan
Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27070: 异步读取/写入失败
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。
Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27091: 无法将 I/O 排队
ORA-27070: 异步读取/写入失败
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 1) 函数不正确。
Thu Aug 07 17:40:06  2014
Aborting crash recovery due to error 333
Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-333 signalled during: ALTER DATABASE OPEN...

设置隐含参数_allow_resetlogs_corruption,尝试强制拉库

Started redo scan
Fri Aug 08 12:13:25  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_3892.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27070: 异步读取/写入失败
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。
Fri Aug 08 12:13:25  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_3892.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27091: 无法将 I/O 排队
ORA-27070: 异步读取/写入失败
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 1) 函数不正确。
Fri Aug 08 12:13:25  2014
Aborting crash recovery due to error 333
Fri Aug 08 12:13:25  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_3892.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-333 signalled during: ALTER DATABASE OPEN...
Fri Aug 08 12:13:45  2014
ALTER DATABASE RECOVER  database until cancel
Fri Aug 08 12:13:45  2014
Media Recovery Start
 parallel recovery started with 15 processes
ORA-279 signalled during: ALTER DATABASE RECOVER  database until cancel  ...
Fri Aug 08 12:13:55  2014
ALTER DATABASE RECOVER    CANCEL
Fri Aug 08 12:13:59  2014
ORA-1547 signalled during: ALTER DATABASE RECOVER    CANCEL  ...
Fri Aug 08 12:13:59  2014
ALTER DATABASE RECOVER CANCEL
ORA-1112 signalled during: ALTER DATABASE RECOVER CANCEL ...
Fri Aug 08 12:14:12  2014
alter database open resetlogs
Fri Aug 08 12:14:13  2014
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
ORA-1245 signalled during: alter database open resetlogs...
Fri Aug 08 12:54:11  2014
alter tablespace sysaux offline
Fri Aug 08 12:54:11  2014
ORA-1109 signalled during: alter tablespace sysaux offline...
Fri Aug 08 13:05:30  2014
alter database open
Fri Aug 08 13:05:30  2014
ORA-1589 signalled during: alter database open...

在offline过程中,数据库检查到sysaux数据文件为offline状态,当表空间只有一个数据文件,而且该数据文件为offline,数据库将会尝试offline sysaux表空间,但是发现该表空间文件非正常scn,无法offline 表空间,导致resetlogs操作失败。这里是操作失误应该先online相关数据文件,然后再进行resetlogs操作

Sat Aug 09 11:56:03  2014
alter database datafile 3 online
Sat Aug 09 11:56:04  2014
Completed: alter database datafile 3 online
Sat Aug 09 11:56:08  2014
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
Sat Aug 09 11:56:18  2014
ARCH: Encountered disk I/O error 19502
Sat Aug 09 11:56:18  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc:
ORA-19502: 文件 "F:\ARCHIVE\ARC01745_0814618167.001", 块编号 55297 写错误 (块大小 = 512)
ORA-27072: 文件 I/O 错误
OSD-04008: WriteFile() 失败, 无法写入文件
O/S-Error: (OS 1) 函数不正确。
ORA-19502: 文件 "F:\ARCHIVE\ARC01745_0814618167.001", 块编号 55297 写错误 (块大小 = 512)
Sat Aug 09 11:56:18  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc:
ORA-19502: 文件 "F:\ARCHIVE\ARC01745_0814618167.001", 块编号 55297 写错误 (块大小 = 512)
ORA-27072: 文件 I/O 错误
OSD-04008: WriteFile() 失败, 无法写入文件
O/S-Error: (OS 1) 函数不正确。
ORA-19502: 文件 "F:\ARCHIVE\ARC01745_0814618167.001", 块编号 55297 写错误 (块大小 = 512)
ARCH: I/O error 19502 archiving log 3 to 'F:\ARCHIVE\ARC01745_0814618167.001'
Sat Aug 09 11:56:18  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc:
ORA-00265: 要求实例恢复, 无法设置 ARCHIVELOG 模式
Archive all online redo logfiles failed:265
RESETLOGS after incomplete recovery UNTIL CHANGE 77983856
Resetting resetlogs activation ID 3562192628 (0xd452bef4)
Online log F:\ORADATA\SZCG\REDO01.LOG: Thread 1 Group 1 was previously cleared
Online log F:\ORADATA\SZCG\REDO02.LOG: Thread 1 Group 2 was previously cleared
Online log D:\REDO04.LOG: Thread 1 Group 4 was previously cleared
Sat Aug 09 11:56:22  2014
Setting recovery target incarnation to 3
Sat Aug 09 11:56:23  2014
Assigning activation ID 3602586269 (0xd6bb1a9d)
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=33, OS id=5900
Sat Aug 09 11:56:23  2014
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=34, OS id=5776
Sat Aug 09 11:56:24  2014
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: F:\ORADATA\SZCG\REDO01.LOG
Successful open of redo thread 1
Sat Aug 09 11:56:24  2014
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sat Aug 09 11:56:24  2014
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
Sat Aug 09 11:56:24  2014
ARC0: Becoming the heartbeat ARCH
Sat Aug 09 11:56:24  2014
SMON: enabling cache recovery
Sat Aug 09 11:56:25  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc:
ORA-00600: 内部错误代码, 参数: [2662], [0], [77983864], [0], [77992379], [8388617], [], []
Sat Aug 09 11:56:26  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc:
ORA-00600: 内部错误代码, 参数: [2662], [0], [77983864], [0], [77992379], [8388617], [], []
Sat Aug 09 11:56:26  2014
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Instance terminated by USER, pid = 4516
ORA-1092 signalled during: alter database open resetlogs...

ORA-600 2662这个错误很熟悉,直接推SCN,数据库open,但是报ORA-600 4194

Sat Aug 09 12:01:28  2014
SMON: enabling cache recovery
Dictionary check complete
Sat Aug 09 12:01:32  2014
SMON: enabling tx recovery
Sat Aug 09 12:01:32  2014
Database Characterset is ZHS16GBK
Opening with internal Resource Manager plan
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=34, OS id=6116
Sat Aug 09 12:01:34  2014
LOGSTDBY: Validating controlfile with logical metadata
Sat Aug 09 12:01:34  2014
LOGSTDBY: Validation complete
Sat Aug 09 12:01:34  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_smon_920.trc:
ORA-00600: internal error code, arguments: [4194], [21], [53], [], [], [], [], []
Sat Aug 09 12:01:36  2014
Doing block recovery for file 2 block 319
Resuming block recovery (PMON) for file 2 block 319
Block recovery from logseq 2, block 56 to scn 1073742003
Sat Aug 09 12:01:36  2014
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: F:\ORADATA\SZCG\REDO02.LOG
Block recovery stopped at EOT rba 2.79.16
Block recovery completed at rba 2.79.16, scn 0.1073742002
Doing block recovery for file 2 block 153
Resuming block recovery (PMON) for file 2 block 153
Block recovery from logseq 2, block 56 to scn 1073741986
Sat Aug 09 12:01:36  2014
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: F:\ORADATA\SZCG\REDO02.LOG
Block recovery completed at rba 2.66.16, scn 0.1073741988
Sat Aug 09 12:01:36  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_smon_920.trc:
ORA-01595: error freeing extent (4) of rollback segment (10))
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4194], [21], [53], [], [], [], [], []
Sat Aug 09 12:01:36  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5272.trc:
ORA-00600: internal error code, arguments: [4194], [21], [53], [], [], [], [], []
Sat Aug 09 12:01:36  2014
Completed: alter database open

尝试重建undo表空间并切换undo_tabspace到新undo表空间解决,因为数据库在恢复过程中使用了隐含参数强制拉库,不能保证数据一致性,强烈建议逻辑方式重建数据库
在本次故障中,所幸的是只有redo和sysaux文件损坏,如果是业务数据文件或者system数据文件损坏,恢复的后果可能更加麻烦,丢失数据可能更加多。再次说明:数据库备份非常重要,数据的安全性不能完全寄希望于硬件之上