resetlogs失败故障恢复-ORA-01555

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:resetlogs失败故障恢复-ORA-01555

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户数据库resetlogs报错

Tue Dec 19 15:21:23 2023
ALTER DATABASE   MOUNT
Successful mount of redo thread 1, with mount id 1683789043
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: ALTER DATABASE   MOUNT
Tue Dec 19 15:22:01 2023
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
ORA-1248 signalled during: alter database open resetlogs...
Tue Dec 19 16:16:26 2023
alter database datafile 83 offline
Completed: alter database datafile 83 offline
Tue Dec 19 16:19:13 2023
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
Archived Log entry 50 added for thread 1 sequence 3657135 ID 0x5d907698 dest 1:
Tue Dec 19 16:20:01 2023
Errors in file /oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_94696.trc:
ORA-00333: 重做日志读取块 8806400 计数 16384 出错
ORA-00312: 联机日志 2 线程 1: '/data/oradata/orcl/redo2.log'
ORA-27072: 文件 I/O 错误
Linux-x86_64 Error: 25: Inappropriate ioctl for device
Additional information: 4
Additional information: 8806400
Additional information: 4325376
Errors in file /oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_94696.trc:
ORA-00333: 重做日志读取块 8806400 计数 16384 出错
ARCH: All Archive destinations made inactive due to error 333
ARCH: Closing local archive destination LOG_ARCHIVE_DEST_1: '/data/arch/1_3657136_874715183.dbf' (error 333) (orcl)
Committing creation of archivelog '/data/arch/1_3657136_874715183.dbf' (error 333)
Tue Dec 19 16:20:46 2023
Archived Log entry 51 added for thread 1 sequence 3657132 ID 0x5d907698 dest 1:
Tue Dec 19 16:21:28 2023
Archived Log entry 52 added for thread 1 sequence 3657133 ID 0x5d907698 dest 1:
Tue Dec 19 16:22:13 2023
Archived Log entry 53 added for thread 1 sequence 3657134 ID 0x5d907698 dest 1:
RESETLOGS after incomplete recovery UNTIL CHANGE 161052517347
Resetting resetlogs activation ID 1569748632 (0x5d907698)
Tue Dec 19 16:23:43 2023
Setting recovery target incarnation to 3
Tue Dec 19 16:23:43 2023
Assigning activation ID 1683789043 (0x645c94f3)
LGWR: STARTING ARCH PROCESSES
Tue Dec 19 16:23:43 2023
ARC0 started with pid=40, OS id=5391 
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Thread 1 advanced to log sequence 2 (thread open)
Tue Dec 19 16:23:44 2023
ARC1 started with pid=41, OS id=5393 
Tue Dec 19 16:23:44 2023
ARC2 started with pid=42, OS id=5395 
ARC1: Archival started
Tue Dec 19 16:23:44 2023
ARC3 started with pid=43, OS id=5397 
ARC2: Archival started
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
ARC2: Becoming the heartbeat ARCH
Thread 1 opened at log sequence 2
  Current log# 2 seq# 2 mem# 0: /data/oradata/orcl/redo2.log
Successful open of redo thread 1
Tue Dec 19 16:23:44 2023
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Tue Dec 19 16:23:44 2023
SMON: enabling cache recovery
Tue Dec 19 16:23:44 2023
NSA2 started with pid=44, OS id=5399 
ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 0x0025.7f7d42df):
select ctime, mtime, stime from obj$ where obj# = :1
Errors in file /oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_94696.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01555: 快照过旧: 回退段号 27 (名称为 "_SYSSMU27_4233559991$") 过小
Errors in file /oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_94696.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01555: 快照过旧: 回退段号 27 (名称为 "_SYSSMU27_4233559991$") 过小
Error 704 happened during db open, shutting down database
USER (ospid: 94696): terminating the instance due to error 704
Instance terminated by USER, pid = 94696
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (94696) as a result of ORA-1092

通过以上信息,可以的出来以下结论:
1. 客户的硬件或者文件系统可能有问题,通过系统日志进一步确认底层异常

Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb]  Sense Key : Medium Error [current] 
Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb]  Add. Sense: Unrecovered read error
Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb] CDB: Read(10): 28 00 47 bc ff c0 00 01 00 00
Dec 19 08:28:38 tdb2 kernel: end_request: critical medium error, dev sdb, sector 1203568576
Dec 19 08:28:38 tdb2 kernel: end_request: critical medium error, dev dm-3, sector 1203568576
Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb]  Sense Key : Medium Error [current] 
Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb]  Add. Sense: Unrecovered read error
Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb] CDB: Read(10): 28 00 47 bd 00 c0 00 01 00 00
Dec 19 08:28:38 tdb2 kernel: end_request: critical medium error, dev sdb, sector 1203568832
Dec 19 08:28:38 tdb2 kernel: end_request: critical medium error, dev dm-3, sector 1203568832
Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb]  Sense Key : Medium Error [current] 
Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb]  Add. Sense: Unrecovered read error
Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb] CDB: Read(10): 28 00 47 bd 00 80 00 00 08 00
Dec 19 08:28:38 tdb2 kernel: end_request: critical medium error, dev sdb, sector 1203568768
Dec 19 08:28:38 tdb2 kernel: end_request: critical medium error, dev dm-3, sector 1203568768
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb]  Sense Key : Medium Error [current] 
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb]  Add. Sense: Unrecovered read error
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb] CDB: Read(10): 28 00 9b 1a 28 20 00 01 00 00
Dec 19 16:20:01 tdb2 kernel: end_request: critical medium error, dev sdb, sector 2602182688
Dec 19 16:20:01 tdb2 kernel: end_request: critical medium error, dev dm-3, sector 2602182688
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb]  Sense Key : Medium Error [current] 
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb]  Add. Sense: Unrecovered read error
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb] CDB: Read(10): 28 00 9b 1a 29 20 00 01 00 00
Dec 19 16:20:01 tdb2 kernel: end_request: critical medium error, dev sdb, sector 2602182944
Dec 19 16:20:01 tdb2 kernel: end_request: critical medium error, dev dm-3, sector 2602182944
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb]  Sense Key : Medium Error [current] 
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb]  Add. Sense: Unrecovered read error
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb] CDB: Read(10): 28 00 9b 1a 29 00 00 00 08 00
Dec 19 16:20:01 tdb2 kernel: end_request: critical medium error, dev sdb, sector 2602182912
Dec 19 16:20:01 tdb2 kernel: end_request: critical medium error, dev dm-3, sector 2602182912

2. 数据库在强制拉库的时候,很可能是屏蔽了一致性,导致文件头scn过小
3. 在resetlogs之前,先offline了83号文件,这个将导致该文件的reseltogs scn和其他文件不一致,通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)
20231229095533



这个库由于客户在resetlogs之前offline了数据文件,导致一些麻烦,先使用Oracle Recovery Tools修改resetlogs scn
20231229100250

然后重建ctl,修改scn,打开数据库
20231229102556

hcheck检测字典一切正常

HCheck Version 07MAY18 on 26-12月-2023 18:44:20
----------------------------------------------
Catalog Version 11.2.0.1.0 (1102000100)
db_name: ORCL
                                   Catalog       Fixed           
Procedure Name                     Version    Vs Release    Timestamp      Result
------------------------------ ... ---------- -- ---------- -------------- ------
.- LobNotInObj                 ... 1102000100 <=  *All Rel* 12/26 18:44:20 
PASS
.- MissingOIDOnObjCol          ... 1102000100 <=  *All Rel* 12/26 18:44:20 
PASS
.- SourceNotInObj              ... 1102000100 <=  *All Rel* 12/26 18:44:20 
PASS
.- IndIndparMismatch           ... 1102000100 <= 1102000100 12/26 18:44:21 
PASS
.- InvCorrAudit                ... 1102000100 <= 1102000100 12/26 18:44:21 
PASS
.- OversizedFiles              ... 1102000100 <=  *All Rel* 12/26 18:44:21 
PASS
.- PoorDefaultStorage          ... 1102000100 <=  *All Rel* 12/26 18:44:21 
PASS
.- PoorStorage                 ... 1102000100 <=  *All Rel* 12/26 18:44:21 
PASS
.- PartSubPartMismatch         ... 1102000100 <= 1102000100 12/26 18:44:21 
PASS
.- TabPartCountMismatch        ... 1102000100 <=  *All Rel* 12/26 18:44:21 

*** 2023-12-26 18:44:21.507
PASS
.- OrphanedTabComPart          ... 1102000100 <=  *All Rel* 12/26 18:44:21 
PASS
.- MissingSum$                 ... 1102000100 <=  *All Rel* 12/26 18:44:21 
PASS
.- MissingDir$                 ... 1102000100 <=  *All Rel* 12/26 18:44:21 
PASS
.- DuplicateDataobj            ... 1102000100 <=  *All Rel* 12/26 18:44:21 
PASS
.- ObjSynMissing               ... 1102000100 <=  *All Rel* 12/26 18:44:21 
PASS
.- ObjSeqMissing               ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- OrphanedUndo                ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- OrphanedIndex               ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- OrphanedIndexPartition      ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- OrphanedIndexSubPartition   ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- OrphanedTable               ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- OrphanedTablePartition      ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- OrphanedTableSubPartition   ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- MissingPartCol              ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- OrphanedSeg$                ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- OrphanedIndPartObj#         ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- DuplicateBlockUse           ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- FetUet                      ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- Uet0Check                   ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- ExtentlessSeg               ... 1102000100 <= 1102000100 12/26 18:44:22 
PASS
.- SeglessUET                  ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- BadInd$                     ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- BadTab$                     ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- BadIcolDepCnt               ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- ObjIndDobj                  ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- TrgAfterUpgrade             ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- ObjType0                    ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- BadOwner                    ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- StmtAuditOnCommit           ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- BadPublicObjects            ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- BadSegFreelist              ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- BadDepends                  ... 1102000100 <=  *All Rel* 12/26 18:44:22 

*** 2023-12-26 18:44:22.571
PASS
.- CheckDual                   ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- ObjectNames                 ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- BadCboHiLo                  ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- ChkIotTs                    ... 1102000100 <=  *All Rel* 12/26 18:44:23 
PASS
.- NoSegmentIndex              ... 1102000100 <=  *All Rel* 12/26 18:44:23 
PASS
.- BadNextObject               ... 1102000100 <=  *All Rel* 12/26 18:44:23 
PASS
.- DroppedROTS                 ... 1102000100 <=  *All Rel* 12/26 18:44:23 
PASS
.- FilBlkZero                  ... 1102000100 <=  *All Rel* 12/26 18:44:23 
PASS
.- DbmsSchemaCopy              ... 1102000100 <=  *All Rel* 12/26 18:44:23 
PASS
.- OrphanedObjError            ... 1102000100 >  1102000000 12/26 18:44:23 
PASS
.- ObjNotLob                   ... 1102000100 <=  *All Rel* 12/26 18:44:23 
PASS
.- MaxControlfSeq              ... 1102000100 <=  *All Rel* 12/26 18:44:23 
PASS
.- SegNotInDeferredStg         ... 1102000100 >  1102000000 12/26 18:44:23 
PASS
.- SystemNotRfile1             ... 1102000100 >   902000000 12/26 18:44:23 

*** 2023-12-26 18:44:23.779
PASS
.- DictOwnNonDefaultSYSTEM     ... 1102000100 <=  *All Rel* 12/26 18:44:23 
PASS
.- OrphanTrigger               ... 1102000100 <=  *All Rel* 12/26 18:44:23 
PASS
.- ObjNotTrigger               ... 1102000100 <=  *All Rel* 12/26 18:44:23 
PASS
---------------------------------------
26-12月-2023 18:44:23  Elapsed: 3 secs
---------------------------------------
Found 0 potential problem(s) and 0 warning(s)

然后增加temp,导出数据数据,完成本次数据库救援

ORA-00333 ORA-01595 恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-00333 ORA-01595 恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户反馈数据库异常,查看日志发现asm和db均发生hang住情况(由于环境原因部分日志没有拷贝出来),基于现有情况,无法直接恢复,通过一些工具把asm磁盘组中的数据文件拷贝到文件系统,经过检测无坏块
20220705233329


修改相关路径,尝试recover库

Tue Jul 05 15:05:54 2022
ALTER DATABASE RECOVER  datafile 1  
Media Recovery Start
Serial Media Recovery started
Recovery of Online Redo Log: Thread 2 Group 4 Seq 29973 Reading mem 0
  Mem# 0: E:\ORADATA\GROUP_4.266.822672441
Recovery of Online Redo Log: Thread 1 Group 2 Seq 38422 Reading mem 0
  Mem# 0: E:\ORADATA\GROUP_2.262.822672137
Incomplete read from log member 'E:\ORADATA\GROUP_2.262.822672137'. Trying next member.
Media Recovery failed with error 333
ORA-283 signalled during: ALTER DATABASE RECOVER  datafile 1  ...

ORA-00333


错误信息比较明显,在读入redo进行恢复的时候遭遇“ORA-00333: 重做日志读取块 11557 计数 731 出错”错误,从而无法继续恢复.这次故障运气比较好,通过分析v$datafile和v$datafile_header关系
20220705233008

进行一些操作,绕过redo block 11557,顺利recover成功,并且open库

ALTER DATABASE RECOVER  database  
Media Recovery Start
 started logmerger process
Tue Jul 05 15:17:46 2022
Parallel Media Recovery started with 32 slaves
Tue Jul 05 15:17:46 2022
Recovery of Online Redo Log: Thread 2 Group 4 Seq 29973 Reading mem 0
  Mem# 0: E:\ORADATA\GROUP_4.266.822672441
Recovery of Online Redo Log: Thread 1 Group 2 Seq 38422 Reading mem 0
  Mem# 0: E:\ORADATA\GROUP_2.262.822672137
Completed: ALTER DATABASE RECOVER  database  

20220705232246


通过分析alert日志发现有ORA-600 4194错误

QMNC started with pid=58, OS id=15980 
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
Tue Jul 05 15:18:24 2022
Tue Jul 05 15:18:24 2022
Block recovery from logseq 38423, block 152 to scn 16218380250500
Recovery of Online Redo Log: Thread 1 Group 1 Seq 38423 Reading mem 0
  Mem# 0: E:\ORADATA\GROUP_1.261.822672135
Block recovery stopped at EOT rba 38423.154.16
Block recovery completed at rba 38423.154.16, scn 3776.583740804
Block recovery from logseq 38423, block 152 to scn 16218380250497
Recovery of Online Redo Log: Thread 1 Group 1 Seq 38423 Reading mem 0
  Mem# 0: E:\ORADATA\GROUP_1.261.822672135
Block recovery completed at rba 38423.154.16, scn 3776.583740804
Errors in file F:\APP\ADMINISTRATOR\diag\rdbms\xff\xff1\trace\xff1_smon_5660.trc:
ORA-01595: 释放区 (2) 回退段 (8) 时出错
ORA-00600: 内部错误代码, 参数: [4194], [], [
                                      
Completed: alter database open 

这比较简单,对于异常的undo进行处理即可,然后使用hcheck检查字典一致性

SQL> @e:/oradata/txt/11.txt
HCheck Version 07MAY18 on 05-7月 -2022 16:30:18
----------------------------------------------
Catalog Version 11.2.0.3.0 (1102000300)
db_name: xff

                                   Catalog       Fixed
Procedure Name                     Version    Vs Release    Timestamp
Result
------------------------------ ... ---------- -- ---------- --------------
------
.- LobNotInObj                 ... 1102000300 <=  *All Rel* 07/05 16:30:18 PASS
.- MissingOIDOnObjCol          ... 1102000300 <=  *All Rel* 07/05 16:30:19 PASS
.- SourceNotInObj              ... 1102000300 <=  *All Rel* 07/05 16:30:19 PASS
.- OversizedFiles              ... 1102000300 <=  *All Rel* 07/05 16:30:19 PASS
.- PoorDefaultStorage          ... 1102000300 <=  *All Rel* 07/05 16:30:19 PASS
.- PoorStorage                 ... 1102000300 <=  *All Rel* 07/05 16:30:19 PASS
.- TabPartCountMismatch        ... 1102000300 <=  *All Rel* 07/05 16:30:20 PASS
.- OrphanedTabComPart          ... 1102000300 <=  *All Rel* 07/05 16:30:20 PASS
.- MissingSum$                 ... 1102000300 <=  *All Rel* 07/05 16:30:20 PASS
.- MissingDir$                 ... 1102000300 <=  *All Rel* 07/05 16:30:20 PASS
.- DuplicateDataobj            ... 1102000300 <=  *All Rel* 07/05 16:30:20 PASS
.- ObjSynMissing               ... 1102000300 <=  *All Rel* 07/05 16:30:20 PASS
.- ObjSeqMissing               ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- OrphanedUndo                ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- OrphanedIndex               ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- OrphanedIndexPartition      ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- OrphanedIndexSubPartition   ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- OrphanedTable               ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- OrphanedTablePartition      ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- OrphanedTableSubPartition   ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- MissingPartCol              ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- OrphanedSeg$                ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- OrphanedIndPartObj#         ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- DuplicateBlockUse           ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- FetUet                      ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- Uet0Check                   ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- SeglessUET                  ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- BadInd$                     ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- BadTab$                     ... 1102000300 <=  *All Rel* 07/05 16:30:22 PASS
.- BadIcolDepCnt               ... 1102000300 <=  *All Rel* 07/05 16:30:22 PASS
.- ObjIndDobj                  ... 1102000300 <=  *All Rel* 07/05 16:30:22 PASS
.- TrgAfterUpgrade             ... 1102000300 <=  *All Rel* 07/05 16:30:22 PASS
.- ObjType0                    ... 1102000300 <=  *All Rel* 07/05 16:30:22 PASS
.- BadOwner                    ... 1102000300 <=  *All Rel* 07/05 16:30:22 PASS
.- StmtAuditOnCommit           ... 1102000300 <=  *All Rel* 07/05 16:30:22 PASS
.- BadPublicObjects            ... 1102000300 <=  *All Rel* 07/05 16:30:22 PASS
.- BadSegFreelist              ... 1102000300 <=  *All Rel* 07/05 16:30:22 PASS
.- BadDepends                  ... 1102000300 <=  *All Rel* 07/05 16:30:22 PASS
.- CheckDual                   ... 1102000300 <=  *All Rel* 07/05 16:30:23 PASS
.- ObjectNames                 ... 1102000300 <=  *All Rel* 07/05 16:30:23 WARN

HCKW-0018: OBJECT name clashes with SCHEMA name (Doc ID 2363142.1)
Schema=BSHRP INDEX=XFF.XFF

.- BadCboHiLo                  ... 1102000300 <=  *All Rel* 07/05 16:30:23 PASS
.- ChkIotTs                    ... 1102000300 <=  *All Rel* 07/05 16:30:24 PASS
.- NoSegmentIndex              ... 1102000300 <=  *All Rel* 07/05 16:30:24 PASS
.- BadNextObject               ... 1102000300 <=  *All Rel* 07/05 16:30:24 PASS
.- DroppedROTS                 ... 1102000300 <=  *All Rel* 07/05 16:30:24 PASS
.- FilBlkZero                  ... 1102000300 <=  *All Rel* 07/05 16:30:24 PASS
.- DbmsSchemaCopy              ... 1102000300 <=  *All Rel* 07/05 16:30:24 PASS
.- OrphanedObjError            ... 1102000300 >  1102000000 07/05 16:30:24 PASS
.- ObjNotLob                   ... 1102000300 <=  *All Rel* 07/05 16:30:24 PASS
.- MaxControlfSeq              ... 1102000300 <=  *All Rel* 07/05 16:30:24 PASS
.- SegNotInDeferredStg         ... 1102000300 >  1102000000 07/05 16:30:25 PASS
.- SystemNotRfile1             ... 1102000300 >   902000000 07/05 16:30:25 PASS
.- DictOwnNonDefaultSYSTEM     ... 1102000300 <=  *All Rel* 07/05 16:30:25 PASS
.- OrphanTrigger               ... 1102000300 <=  *All Rel* 07/05 16:30:25 PASS
.- ObjNotTrigger               ... 1102000300 <=  *All Rel* 07/05 16:30:25 PASS
---------------------------------------
05-7月 -2022 16:30:25  Elapsed: 7 secs
---------------------------------------
Found 0 potential problem(s) and 1 warning(s)
Contact Oracle Support with the output and trace file
to check if the above needs attention or not

PL/SQL 过程已成功完成。

有一个SCHEMA和对象名一样,这个不影响属于正常情况(客户创建了一个用户叫做XFF,然后有创建了一个XFF的对象),该数据库恢复至此基本上晚上,业务可以直接运行,不用做逻辑迁移

ORA-00333 故障恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORA-00333 故障恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

数据库启动报错

SQL> startup
ORACLE instance started.
Total System Global Area 1241513984 bytes
Fixed Size                  1219136 bytes
Variable Size             218105280 bytes
Database Buffers         1006632960 bytes
Redo Buffers               15556608 bytes
Database mounted.
ORA-00333: redo log read error block 48641 count 8192

数据库启动报ORA-00333错误,官方解释为读redo log发生错误.

00333, 00000, "redo log read error block %s count %s"
// *Cause:  An IO error occurred while reading the log described in the
//          accompanying error.
// *Action: Restore accessibility to file, or get another copy of the file.

alert日志

Sat Apr 14 00:39:13 2018
 alter database open
Sat Apr 14 00:39:13 2018
Beginning crash recovery of 1 threads
 parallel recovery started with 7 processes
Sat Apr 14 00:39:13 2018
Started redo scan
Sat Apr 14 00:39:14 2018
Errors in file /oracle/admin/oa/udump/oa_ora_5659.trc:
ORA-00333: redo log read error block 54785 count 2048
ORA-00312: online log 1 thread 1: '/oracle/oradata/oa/redo01.log'
ORA-27072: File I/O error
Linux Error: 5: Input/output error
Additional information: 4
Additional information: 54785
Additional information: 957952
Sat Apr 14 00:39:14 2018
Errors in file /oracle/admin/oa/udump/oa_ora_5659.trc:
ORA-00333: redo log read error block 48641 count 8192
ORA-00312: online log 1 thread 1: '/oracle/oradata/oa/redo01.log'
ORA-27091: unable to queue I/O
ORA-27072: File I/O error
Linux Error: 5: Input/output error
Additional information: 4
Additional information: 54785
Additional information: 957952
Sat Apr 14 00:39:14 2018
Aborting crash recovery due to error 333
Sat Apr 14 00:39:14 2018
Errors in file /oracle/admin/oa/udump/oa_ora_5659.trc:
ORA-00333: redo log read error block 48641 count 8192
ORA-333 signalled during:  alter database open...

由于硬件异常,数据库在启动的时候读取redo异常,从而使得数据库无法正常启动

检查系统日志

Apr 14 11:14:58 oa kernel: Info fld=0x0, Current sda: sense key Hardware Error
Apr 14 11:14:59 oa kernel: Additional sense: Internal target failure
Apr 14 11:14:59 oa kernel: end_request: I/O error, dev sda, sector 190500041
Apr 14 11:14:59 oa kernel: SCSI error : <0 0 0 0> return code = 0x8000002
Apr 14 11:14:59 oa kernel: Info fld=0x0, Current sda: sense key Hardware Error
Apr 14 11:14:59 oa kernel: Additional sense: Internal target failure
Apr 14 11:14:59 oa kernel: end_request: I/O error, dev sda, sector 190500049
Apr 14 11:14:59 oa kernel: SCSI error : <0 0 0 0> return code = 0x8000002
Apr 14 11:14:59 oa kernel: Info fld=0x0, Current sda: sense key Hardware Error
Apr 14 11:14:59 oa kernel: Additional sense: Internal target failure
Apr 14 11:14:59 oa kernel: end_request: I/O error, dev sda, sector 190500057
Apr 14 11:14:59 oa kernel: SCSI error : <0 0 0 0> return code = 0x8000002
Apr 14 11:14:59 oa kernel: Info fld=0x0, Current sda: sense key Hardware Error
Apr 14 11:14:59 oa kernel: Additional sense: Internal target failure
Apr 14 11:14:59 oa kernel: end_request: I/O error, dev sda, sector 190500065
Apr 14 11:14:59 oa kernel: SCSI error : <0 0 0 0> return code = 0x8000002

大量类似I/O error, dev sda, sector错误,很可能是由于硬件方面异常导致.

损坏redo为当前redo
redo


针对这样的情况,由于是硬件故障,先要通过dbv或者rman检查其他数据文件是否正常,如果有数据文件不能读,那需要对数据文件进行特殊处理.本次恢复的中,客户相对比较幸运,所有数据文件全部可以正常访问,只是当前redo异常,通过隐含参数强制拉库,然后导出数据,重建库解决.类似文章:又一起存储故障导致ORA-00333 ORA-00312恢复

重建控制文件丢失数据文件导致悲剧

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:重建控制文件丢失数据文件导致悲剧

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在Oracle职业生涯中,恢复过生产环境数据库也有几百个.对于Oracle恢复我还是相当的自信,今天因为自己的一时过于自信,对于环境错了错误的判断,简单问题复杂化,差点变成悲剧
数据库最初故障

Thu Sep 25 09:27:26 2014
MMON started with pid=15, OS id=1968
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
starting up 1 shared server(s) ...
ORACLE_BASE from environment = F:\oracle
Thu Sep 25 09:27:26 2014
ALTER DATABASE   MOUNT
Thu Sep 25 09:27:26 2014
MMNL started with pid=16, OS id=5976
Errors in file f:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_4624.trc:
ORA-00202: ????: ''F:\ORACLE\ORADATA\ORCL\CONTROL01.CTL
ORA-27070: ????/????
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 23) 数据错误(循环冗余检查)。
Thu Sep 25 09:28:31 2014
ORA-204 signalled during: ALTER DATABASE   MOUNT...

因为硬件或者系统层面问题,导致控制文件无法正常访问

重建控制文件

Fri Sep 26 12:28:44 2014
Successful mount of redo thread 1, with mount id 1387065723
Completed: CREATE CONTROLFILE REUSE DATABASE "orcl" RESETLOGS ARCHIVELOG
    MAXLOGFILES 5
    MAXLOGMEMBERS 3
    MAXDATAFILES 100
    MAXINSTANCES 2
    MAXLOGHISTORY 226
LOGFILE
  GROUP 1 'F:\oracle\oradata\orcl\REDO01.LOG'  SIZE 50M,  --redo log ????
  GROUP 2 'F:\oracle\oradata\orcl\REDO02.LOG'  SIZE 50M,  --redo log ????
  GROUP 3 'F:\oracle\oradata\orcl\REDO03.LOG'  SIZE 50M  --redo log ????
-- STANDBY LOGFILE
DATAFILE
  'F:\oracle\oradata\orcl\SYSAUX01.DBF',  --sysaux???????
  'F:\oracle\oradata\orcl\SYSTEM01.DBF',
  'F:\oracle\oradata\orcl\USERS01.DBF', --user????????
  'F:\oracle\oradata\orcl\UNDOTBS01.DBF' --undo???????
CHARACTER SET ZHS16GBK
Fri Sep 26 12:29:55 2014
alter database open resetlogs
ORA-1194 signalled during: alter database open resetlogs...

埋下了雷,创建控制文件中未全部列举出来所有数据文件

进行不完全恢复,尝试resetlogs库发现redo异常

Fri Sep 26 14:13:24 2014
ALTER DATABASE   MOUNT
Fri Sep 26 14:13:24 2014
MMNL started with pid=16, OS id=9024
Successful mount of redo thread 1, with mount id 1387037444
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: ALTER DATABASE   MOUNT
Fri Sep 26 14:14:08 2014
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
Fri Sep 26 14:15:16 2014
Errors in file f:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_3720.trc:
ORA-00333: 重做日志读取块 2049 计数 6143 出错
ORA-00312: 联机日志 1 线程 1: 'F:\ORACLE\ORADATA\ORCL\REDO01.LOG'
ORA-27070: 异步读取/写入失败
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 23) 数据错误(循环冗余检查)。
Fri Sep 26 14:16:24 2014
Errors in file f:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_3720.trc:
ORA-00333: 重做日志读取块 1 计数 8191 出错
ORA-00312: 联机日志 1 线程 1: 'F:\ORACLE\ORADATA\ORCL\REDO01.LOG'
ORA-27070: 异步读取/写入失败
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 23) 数据错误(循环冗余检查)。
Errors in file f:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_3720.trc:
ORA-00333: 重做日志读取块 1 计数 8191 出错
ARCH: All Archive destinations made inactive due to error 333

使用隐含参数尝试拉库,报ORA-600[2662]

Fri Sep 26 14:16:45 2014
SMON: enabling cache recovery
Errors in file f:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_3720.trc  (incident=57761):
ORA-00600: 内部错误代码, 参数: [2662], [0], [38221304], [0], [38352371], [4194545], [], [], [], [], [], []
Incident details in: f:\oracle\diag\rdbms\orcl\orcl\incident\incdir_57761\orcl_ora_3720_i57761.trc
Fri Sep 26 14:16:45 2014
ARC3 started with pid=23, OS id=9692
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Errors in file f:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_3720.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00600: 内部错误代码, 参数: [2662], [0], [38221304], [0], [38352371], [4194545], [], [], [], [], [], []
Errors in file f:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_3720.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00600: 内部错误代码, 参数: [2662], [0], [38221304], [0], [38352371], [4194545], [], [], [], [], [], []
Error 704 happened during db open, shutting down database
USER (ospid: 3720): terminating the instance due to error 704
Instance terminated by USER, pid = 3720
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (3720) as a result of ORA-1092

数据库在未使用所有数据文件的情况下,进行了resetlogs操作,悲剧的本质已经注定,我的失误是没有评估好现状,还继续在错误的道路上越走越远.

我开始接手该库现况

Database mounted in Exclusive Mode
Lost write protection disabled
Completed: ALTER DATABASE   MOUNT
Fri Sep 26 14:18:55 2014
alter database open
Errors in file f:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_8968.trc:
ORA-01113: 文件 1 需要介质恢复
ORA-01110: 数据文件 1: 'F:\ORACLE\ORADATA\ORCL\SYSTEM01.DBF'
ORA-1113 signalled during: alter database open...
Fri Sep 26 14:19:31 2014
alter database open
Errors in file f:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_8968.trc:
ORA-01113: 文件 1 需要介质恢复
ORA-01110: 数据文件 1: 'F:\ORACLE\ORADATA\ORCL\SYSTEM01.DBF'
ORA-1113 signalled during: alter database open ...
Fri Sep 26 14:22:26 2014
ALTER DATABASE RECOVER  database
Media Recovery Start
 started logmerger process
Fri Sep 26 14:22:26 2014
Media Recovery failed with error 16433
Recovery Slave PR00 previously exited with exception 283
ORA-283 signalled during: ALTER DATABASE RECOVER  database  ...
Fri Sep 26 14:24:25 2014
ALTER DATABASE RECOVER  datafile 'F:\ORACLE\ORADATA\ORCL\SYSTEM01.DBF'
Media Recovery Start
Media Recovery failed with error 16433
ORA-283 signalled during: ALTER DATABASE RECOVER  datafile 'F:\ORACLE\ORADATA\ORCL\SYSTEM01.DBF'  ...
Fri Sep 26 14:28:47 2014
alter database open read write
Errors in file f:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_8968.trc:
ORA-01113: 文件 1 需要介质恢复
ORA-01110: 数据文件 1: 'F:\ORACLE\ORADATA\ORCL\SYSTEM01.DBF'
ORA-1113 signalled during: alter database open read write...
Fri Sep 26 14:31:48 2014
ALTER DATABASE RECOVER  datafile 'F:\oracle\oradata\orcl\SYSTEM01.DBF'
Media Recovery Start
Media Recovery failed with error 16433
ORA-283 signalled during: ALTER DATABASE RECOVER  datafile 'F:\oracle\oradata\orcl\SYSTEM01.DBF'  ...

提示ORA-01110: 数据文件 1需要恢复,尝试recover操作

尝试recover操作

连接到:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> recover database ;
ORA-00283: recovery session canceled due to errors
ORA-16433: The database must be opened in read/write mode.
SQL> alter database backup controlfile to trace as 'd:\ctl.txt';
alter database backup controlfile to trace as 'd:\ctl.txt'
*
第 1 行出现错误:
ORA-16433: 必须以读/写模式打开数据库。
SQL> recover database using backup controlfile;
ORA-00283: recovery session canceled due to errors
ORA-16433: The database must be opened in read/write mode.

重建控制文件

SQL> shutdown immediate;
ORA-01109: 数据库未打开
已经卸载数据库。
ORACLE 例程已经关闭。
SQL> STARTUP NOMOUNT
ORACLE 例程已经启动。
Total System Global Area  970895360 bytes
Fixed Size                  1375452 bytes
Variable Size             603980580 bytes
Database Buffers          360710144 bytes
Redo Buffers                4829184 bytes
SQL> CREATE CONTROLFILE REUSE DATABASE orcl NORESETLOGS FORCE LOGGING ARCHIVELOG
  2      MAXLOGFILES 16
  3      MAXLOGMEMBERS 3
  4      MAXDATAFILES 100
  5      MAXINSTANCES 8
  6      MAXLOGHISTORY 2921
  7  LOGFILE
  8    GROUP 1 'F:\ORACLE\ORADATA\ORCL\REDO01.LOG'  SIZE 50M,
  9    GROUP 2 'F:\ORACLE\ORADATA\ORCL\REDO02.LOG'  SIZE 50M,
 10    GROUP 3 'F:\ORACLE\ORADATA\ORCL\REDO03.LOG'  SIZE 50M
 11  DATAFILE
 12    'F:\ORACLE\ORADATA\ORCL\SYSTEM01.DBF',
 13    'F:\ORACLE\ORADATA\ORCL\SYSAUX01.DBF',
 14    'F:\ORACLE\ORADATA\ORCL\UNDOTBS01.DBF',
 15    'F:\ORACLE\ORADATA\ORCL\USERS01.DBF'
 16  CHARACTER SET ZHS16GBK
 17  ;
控制文件已创建。

这一步严重发错,在恢复前未认真看alert日志,太依赖v$datafile查询出来结果,导致重建控制文件丢失数据文件,埋下大雷。根据前面alert日志报错ORA-600 2662,决定一并处理该问题,然后进行恢复

SQL> shutdown immediate;
ORA-01109: ??????
已经卸载数据库。
ORACLE 例程已经关闭。
SQL> startup pfile='d:\pfile.txt'  mount;
ORACLE 例程已经启动。
Total System Global Area  970895360 bytes
Fixed Size                  1375452 bytes
Variable Size             603980580 bytes
Database Buffers          360710144 bytes
Redo Buffers                4829184 bytes
数据库装载完毕。
SQL> recover database;
完成介质恢复。
SQL> alter database open;
alter database open
*
第 1 行出现错误:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [4194], [], [

数据库报ORA-600 4194,直接修改undo_management=manual,然后尝试启动数据库

SQL> conn / as sysdba
已连接到空闲例程。
SQL> startup pfile='d:\pfile.txt'
ORACLE 例程已经启动。
Total System Global Area  970895360 bytes
Fixed Size                  1375452 bytes
Variable Size             603980580 bytes
Database Buffers          360710144 bytes
Redo Buffers                4829184 bytes
数据库装载完毕。
数据库已经打开。
SQL> select name from v$datafile;
NAME
--------------------------------------------------------------------------------
F:\ORACLE\ORADATA\ORCL\SYSTEM01.DBF
F:\ORACLE\ORADATA\ORCL\SYSAUX01.DBF
F:\ORACLE\ORADATA\ORCL\UNDOTBS01.DBF
F:\ORACLE\ORADATA\ORCL\USERS01.DBF
F:\ORACLE\PRODUCT\11.2.0\DBHOME_1\DATABASE\MISSING00005
F:\ORACLE\PRODUCT\11.2.0\DBHOME_1\DATABASE\MISSING00006
已选择6行。
SQL> alter database rename file 'F:\ORACLE\PRODUCT\11.2.0\DBHOME_1\DATABASE\MISSING00005'
2   to 'F:\oracle\oradata\SOURCE_DATA1.DBF';
数据库已更改。
SQL> alter database rename file 'F:\ORACLE\PRODUCT\11.2.0\DBHOME_1\DATABASE\MISSING00006'
2   to 'F:\oracle\oradata\SOURCE_idx1.DBF';
数据库已更改。
SQL> shutdown immediate;
数据库已经关闭。
已经卸载数据库。
ORACLE 例程已经关闭。
SQL> startup mount pfile='d:\pfile.txt'
ORACLE 例程已经启动。
Total System Global Area  970895360 bytes
Fixed Size                  1375452 bytes
Variable Size             603980580 bytes
Database Buffers          360710144 bytes
Redo Buffers                4829184 bytes
数据库装载完毕。
SQL> alter datafile 5 online;
alter datafile 5 online
      *
第 1 行出现错误:
ORA-00940: 无效的 ALTER 命令
SQL> alter database datafile 5 online;
数据库已更改。
SQL> alter database datafile 6 online;
数据库已更改。
SQL> recover database until cancel;
ORA-00283: recovery session canceled due to errors
ORA-19909: datafile 5 belongs to an orphan incarnation
ORA-01110: data file 5: 'F:\ORACLE\ORADATA\SOURCE_DATA1.DBF'
SQL> alter database open resetlogs;
alter database open resetlogs
*
第 1 行出现错误:
ORA-01139: RESETLOGS 选项仅在不完全数据库恢复后有效
SQL> alter database datafile 6 offline;
数据库已更改。
SQL> alter database datafile 5 offline;
数据库已更改。
SQL> recover database until cancel;
完成介质恢复。
SQL> alter database datafile 6 online;
数据库已更改。
SQL> alter database datafile 5 online;
数据库已更改。
SQL> alter database open resetlogs;
数据库已更改。

还好结合一些隐含参数侥幸恢复成功,差点到了要使用bbed的程度

这次的恢复告诉我:Oracle数据库恢复千万比大意,需要认真分析alert日志和咨询客户做了那些操作,不然可能导致万劫不复之禁地

又一起存储故障导致ORA-00333 ORA-00312恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:又一起存储故障导致ORA-00333 ORA-00312恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

数据库启动报ORA-00333 ORA-00312错误,无法正常open数据库

Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_arc0_4724.trc:
ORA-00333: redo log read error block 63489 count 2048
ORA-00312: online log 2 thread 1: 'F:\ORADATA\SZCG\REDO02.LOG'
ORA-27091: unable to queue I/O
ORA-27070: async read/write failed
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 1) 函数不正确。
Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_arc0_4724.trc:
ORA-00333: redo log read error block 63489 count 2048
Thu Aug 07 10:42:03  2014
ARC0: All Archive destinations made inactive due to error 333
Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_1856.trc:
ORA-00449: 后台进程 'LGWR' 因错误 340 异常终止
ORA-00340: 处理联机日志  (用于线程 ) 时出现 I/O 错误
Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_6548.trc:
ORA-00449: 后台进程 'LGWR' 因错误 340 异常终止
ORA-00340: 处理联机日志  (用于线程 ) 时出现 I/O 错误
Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_8104.trc:
ORA-00449: 后台进程 'LGWR' 因错误 340 异常终止
ORA-00340: 处理联机日志  (用于线程 ) 时出现 I/O 错误
Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_lgwr_884.trc:
ORA-00340: IO error processing online log 3 of thread 1
ORA-00345: redo log write error block 65238 count 13
ORA-00312: online log 3 thread 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27070: async read/write failed
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。
Thu Aug 07 10:42:03  2014
LGWR: terminating instance due to error 340
Thu Aug 07 10:42:05  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_8104.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00449: background process 'LGWR' unexpectedly terminated with error 340
ORA-00340: IO error processing online log  of thread
Thu Aug 07 10:42:05  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_1856.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00449: background process 'LGWR' unexpectedly terminated with error 340
ORA-00340: IO error processing online log  of thread
Thu Aug 07 10:42:05  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_6548.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00449: background process 'LGWR' unexpectedly terminated with error 340
ORA-00340: IO error processing online log  of thread
Thu Aug 07 17:40:05  2014
ALTER DATABASE OPEN
Thu Aug 07 17:40:05  2014
Beginning crash recovery of 1 threads
 parallel recovery started with 15 processes
Thu Aug 07 17:40:06  2014
Started redo scan
Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27070: 异步读取/写入失败
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。
Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27091: 无法将 I/O 排队
ORA-27070: 异步读取/写入失败
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 1) 函数不正确。
Thu Aug 07 17:40:06  2014
Aborting crash recovery due to error 333
Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-333 signalled during: ALTER DATABASE OPEN...

进一步检查发现在7月6日系统就已经报io异常

Sun Jul 06 10:05:23  2014
ARC0: All Archive destinations made inactive due to error 333
Sun Jul 06 10:06:07  2014
KCF: write/open error block=0xd03 online=1
     file=3 F:\ORADATA\SZCG\SYSAUX01.DBF
     error=27070 txt: 'OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。'
Automatic datafile offline due to write error on
file 3: F:\ORADATA\SZCG\SYSAUX01.DBF
Sun Jul 06 10:06:23  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_arc1_2676.trc:
ORA-00333: redo log read error block 63489 count 2048
ORA-00312: online log 2 thread 1: 'F:\ORADATA\SZCG\REDO02.LOG'
ORA-27091: unable to queue I/O
ORA-27070: async read/write failed
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 1) 函数不正确。
Thu Aug 07 10:36:54  2014
ARC1: All Archive destinations made inactive due to error 333
Thu Aug 07 10:37:25  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_m000_5832.trc:
ORA-01135: file 3 accessed for DML/query is offline
ORA-01110: data file 3: 'F:\ORADATA\SZCG\SYSAUX01.DBF'

检查硬件发现raid一块盘完全损坏,另外一块盘也处于告警状态,保护现场拷贝文件过程中发现redo02,redo03,sysaux无法拷贝,使用rman检查发现
sysaux-block


因为redo完全损坏,使用工具跳过坏块,拷贝相关有坏块文件到其他目录,重命名相关文件尝试启动数据库,依然报ORA-00333 ORA-00312

Started redo scan
Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27070: 异步读取/写入失败
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。
Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27091: 无法将 I/O 排队
ORA-27070: 异步读取/写入失败
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 1) 函数不正确。
Thu Aug 07 17:40:06  2014
Aborting crash recovery due to error 333
Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-333 signalled during: ALTER DATABASE OPEN...

设置隐含参数_allow_resetlogs_corruption,尝试强制拉库

Started redo scan
Fri Aug 08 12:13:25  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_3892.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27070: 异步读取/写入失败
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。
Fri Aug 08 12:13:25  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_3892.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27091: 无法将 I/O 排队
ORA-27070: 异步读取/写入失败
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 1) 函数不正确。
Fri Aug 08 12:13:25  2014
Aborting crash recovery due to error 333
Fri Aug 08 12:13:25  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_3892.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-333 signalled during: ALTER DATABASE OPEN...
Fri Aug 08 12:13:45  2014
ALTER DATABASE RECOVER  database until cancel
Fri Aug 08 12:13:45  2014
Media Recovery Start
 parallel recovery started with 15 processes
ORA-279 signalled during: ALTER DATABASE RECOVER  database until cancel  ...
Fri Aug 08 12:13:55  2014
ALTER DATABASE RECOVER    CANCEL
Fri Aug 08 12:13:59  2014
ORA-1547 signalled during: ALTER DATABASE RECOVER    CANCEL  ...
Fri Aug 08 12:13:59  2014
ALTER DATABASE RECOVER CANCEL
ORA-1112 signalled during: ALTER DATABASE RECOVER CANCEL ...
Fri Aug 08 12:14:12  2014
alter database open resetlogs
Fri Aug 08 12:14:13  2014
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
ORA-1245 signalled during: alter database open resetlogs...
Fri Aug 08 12:54:11  2014
alter tablespace sysaux offline
Fri Aug 08 12:54:11  2014
ORA-1109 signalled during: alter tablespace sysaux offline...
Fri Aug 08 13:05:30  2014
alter database open
Fri Aug 08 13:05:30  2014
ORA-1589 signalled during: alter database open...

在offline过程中,数据库检查到sysaux数据文件为offline状态,当表空间只有一个数据文件,而且该数据文件为offline,数据库将会尝试offline sysaux表空间,但是发现该表空间文件非正常scn,无法offline 表空间,导致resetlogs操作失败。这里是操作失误应该先online相关数据文件,然后再进行resetlogs操作

Sat Aug 09 11:56:03  2014
alter database datafile 3 online
Sat Aug 09 11:56:04  2014
Completed: alter database datafile 3 online
Sat Aug 09 11:56:08  2014
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
Sat Aug 09 11:56:18  2014
ARCH: Encountered disk I/O error 19502
Sat Aug 09 11:56:18  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc:
ORA-19502: 文件 "F:\ARCHIVE\ARC01745_0814618167.001", 块编号 55297 写错误 (块大小 = 512)
ORA-27072: 文件 I/O 错误
OSD-04008: WriteFile() 失败, 无法写入文件
O/S-Error: (OS 1) 函数不正确。
ORA-19502: 文件 "F:\ARCHIVE\ARC01745_0814618167.001", 块编号 55297 写错误 (块大小 = 512)
Sat Aug 09 11:56:18  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc:
ORA-19502: 文件 "F:\ARCHIVE\ARC01745_0814618167.001", 块编号 55297 写错误 (块大小 = 512)
ORA-27072: 文件 I/O 错误
OSD-04008: WriteFile() 失败, 无法写入文件
O/S-Error: (OS 1) 函数不正确。
ORA-19502: 文件 "F:\ARCHIVE\ARC01745_0814618167.001", 块编号 55297 写错误 (块大小 = 512)
ARCH: I/O error 19502 archiving log 3 to 'F:\ARCHIVE\ARC01745_0814618167.001'
Sat Aug 09 11:56:18  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc:
ORA-00265: 要求实例恢复, 无法设置 ARCHIVELOG 模式
Archive all online redo logfiles failed:265
RESETLOGS after incomplete recovery UNTIL CHANGE 77983856
Resetting resetlogs activation ID 3562192628 (0xd452bef4)
Online log F:\ORADATA\SZCG\REDO01.LOG: Thread 1 Group 1 was previously cleared
Online log F:\ORADATA\SZCG\REDO02.LOG: Thread 1 Group 2 was previously cleared
Online log D:\REDO04.LOG: Thread 1 Group 4 was previously cleared
Sat Aug 09 11:56:22  2014
Setting recovery target incarnation to 3
Sat Aug 09 11:56:23  2014
Assigning activation ID 3602586269 (0xd6bb1a9d)
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=33, OS id=5900
Sat Aug 09 11:56:23  2014
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=34, OS id=5776
Sat Aug 09 11:56:24  2014
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: F:\ORADATA\SZCG\REDO01.LOG
Successful open of redo thread 1
Sat Aug 09 11:56:24  2014
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sat Aug 09 11:56:24  2014
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
Sat Aug 09 11:56:24  2014
ARC0: Becoming the heartbeat ARCH
Sat Aug 09 11:56:24  2014
SMON: enabling cache recovery
Sat Aug 09 11:56:25  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc:
ORA-00600: 内部错误代码, 参数: [2662], [0], [77983864], [0], [77992379], [8388617], [], []
Sat Aug 09 11:56:26  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc:
ORA-00600: 内部错误代码, 参数: [2662], [0], [77983864], [0], [77992379], [8388617], [], []
Sat Aug 09 11:56:26  2014
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Instance terminated by USER, pid = 4516
ORA-1092 signalled during: alter database open resetlogs...

ORA-600 2662这个错误很熟悉,直接推SCN,数据库open,但是报ORA-600 4194

Sat Aug 09 12:01:28  2014
SMON: enabling cache recovery
Dictionary check complete
Sat Aug 09 12:01:32  2014
SMON: enabling tx recovery
Sat Aug 09 12:01:32  2014
Database Characterset is ZHS16GBK
Opening with internal Resource Manager plan
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=34, OS id=6116
Sat Aug 09 12:01:34  2014
LOGSTDBY: Validating controlfile with logical metadata
Sat Aug 09 12:01:34  2014
LOGSTDBY: Validation complete
Sat Aug 09 12:01:34  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_smon_920.trc:
ORA-00600: internal error code, arguments: [4194], [21], [53], [], [], [], [], []
Sat Aug 09 12:01:36  2014
Doing block recovery for file 2 block 319
Resuming block recovery (PMON) for file 2 block 319
Block recovery from logseq 2, block 56 to scn 1073742003
Sat Aug 09 12:01:36  2014
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: F:\ORADATA\SZCG\REDO02.LOG
Block recovery stopped at EOT rba 2.79.16
Block recovery completed at rba 2.79.16, scn 0.1073742002
Doing block recovery for file 2 block 153
Resuming block recovery (PMON) for file 2 block 153
Block recovery from logseq 2, block 56 to scn 1073741986
Sat Aug 09 12:01:36  2014
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: F:\ORADATA\SZCG\REDO02.LOG
Block recovery completed at rba 2.66.16, scn 0.1073741988
Sat Aug 09 12:01:36  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_smon_920.trc:
ORA-01595: error freeing extent (4) of rollback segment (10))
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4194], [21], [53], [], [], [], [], []
Sat Aug 09 12:01:36  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5272.trc:
ORA-00600: internal error code, arguments: [4194], [21], [53], [], [], [], [], []
Sat Aug 09 12:01:36  2014
Completed: alter database open

尝试重建undo表空间并切换undo_tabspace到新undo表空间解决,因为数据库在恢复过程中使用了隐含参数强制拉库,不能保证数据一致性,强烈建议逻辑方式重建数据库
在本次故障中,所幸的是只有redo和sysaux文件损坏,如果是业务数据文件或者system数据文件损坏,恢复的后果可能更加麻烦,丢失数据可能更加多。再次说明:数据库备份非常重要,数据的安全性不能完全寄希望于硬件之上