admin – 第41页 – Database SOS

记录一次system表空间坏块(ORA-01578)数据库恢复

半夜朋友打来求救电话,说xx医院his系统因为存储异常导致system坏块无法正常启动,因为是win平台无法使用bbed,无法修复system 坏块,请求技术支持
dbv检查system文件报坏块

对应具体地址为:file 1 block 39041和66738

判断控制文件异常
通过数据库恢复检查脚本(Oracle Database Recovery Check)脚本检测数据库发现控制文件明显异常(checkpoint scn)

尝试恢复数据库

因此对该库进行了不完全恢复,然后尝试resetlogs打开数据库,数据库报ORA-600 2662错误

Fri Aug 29 02:35:08 2014
alter database open resetlogs
Fri Aug 29 02:35:11 2014
RESETLOGS after complete recovery through change 451371288
Resetting resetlogs activation ID 1232269761 (0x4972f1c1)
Fri Aug 29 02:35:15 2014
Setting recovery target incarnation to 3
Fri Aug 29 02:35:15 2014
Assigning activation ID 1384652231 (0x52881dc7)
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=17, OS id=1084
Fri Aug 29 02:35:15 2014
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=18, OS id=2836
Fri Aug 29 02:35:15 2014
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: Z:\ORACLE\PRODUCT\10.2.0\ORCL\REDO01.LOG
Successful open of redo thread 1
Fri Aug 29 02:35:15 2014
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Aug 29 02:35:15 2014
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
Fri Aug 29 02:35:15 2014
ARC0: Becoming the heartbeat ARCH
Fri Aug 29 02:35:15 2014
SMON: enabling cache recovery
Fri Aug 29 02:35:16 2014
Errors in file d:\oracle\product\10.2.0\admin\orcl\udump\orcl_ora_4824.trc:
ORA-00600: 内部错误代码, 参数: [2662], [0], [451371311], [0], [451374534], [8388977], [], []
Fri Aug 29 02:35:16 2014
Errors in file d:\oracle\product\10.2.0\admin\orcl\udump\orcl_ora_4824.trc:
ORA-00600: 内部错误代码, 参数: [2662], [0], [451371311], [0], [451374534], [8388977], [], []
Fri Aug 29 02:35:16 2014
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Fri Aug 29 02:35:17 2014
Errors in file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl_smon_2928.trc:
ORA-00600: ??????, ??: [], [], [], [], [], [], [], []
Instance terminated by USER, pid = 4824
ORA-1092 signalled during: alter database open resetlogs...

ORA-600 2662 该错误解决思路很明显,推进scn,数据库报ORA-01578

Fri Aug 29 02:42:47 2014
SMON: enabling cache recovery
Fri Aug 29 02:42:47 2014
Successfully onlined Undo Tablespace 1.
Dictionary check beginning
Dictionary check complete
Fri Aug 29 02:42:49 2014
SMON: enabling tx recovery
Fri Aug 29 02:42:49 2014
Database Characterset is ZHS16GBK
Opening with internal Resource Manager plan
where NUMA PG = 1, CPUs = 16
replication_dependency_tracking turned off (no async multimaster replication found)
Fri Aug 29 02:42:50 2014
Errors in file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl_smon_4804.trc:
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01578: ORACLE 数据块损坏 (文件号 1, 块号 39041)
ORA-01110: 数据文件 1: 'Z:\ORACLE\PRODUCT\10.2.0\ORCL\SYSTEM01.DBF'
Fri Aug 29 02:42:50 2014
LOGSTDBY: Validating controlfile with logical metadata
Fri Aug 29 02:42:51 2014
LOGSTDBY: Validation complete
ORA-604 signalled during: alter database open...

使用event跳过坏块，启动数据库成功

Fri Aug 29 02:48:59 2014
SMON: enabling cache recovery
Fri Aug 29 02:49:00 2014
Successfully onlined Undo Tablespace 1.
Fri Aug 29 02:49:00 2014
SMON: enabling tx recovery
Fri Aug 29 02:49:00 2014
Database Characterset is ZHS16GBK
Opening with internal Resource Manager plan
where NUMA PG = 1, CPUs = 16
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=34, OS id=3096
Fri Aug 29 02:49:01 2014
db_recovery_file_dest_size of 4096 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Fri Aug 29 02:49:01 2014
Completed: alter database open

查询坏块对象

因为这些对象均不是核心对象,直接进行truncate然后插入老数据

后续还有大量错误修复

ORA-12012: error on auto execute of job 1
ORA-08102: index key not found, obj# 239, file 1, block 1674 (2)
ORA-00600: 内部错误代码, 参数: [kcbz_check_objd_typ], [0], [0], [1], [], [], [], []
ORA-00600: internal error code, arguments: [6749], [3], [12606796], [173], [], [], [], []
ORA-00600: 内部错误代码, 参数: [13013], [52898], [52895], [38288618], [44], [38288618], [17], []
ORA-00600: 内部错误代码, 参数: [13013], [5001], [52895], [38286476], [5], [38286476], [17], []

再次说明,很多时候数据库恢复不要看成多神秘,就是几个参数搞定,更加不要神化有坏块就bbed修复,当然非常极端,使用N中工具,N种尝试的也存在.做好备份重于一切

ORACLE 8.1.7 数据库ORA-600 4000故障恢复

在数据库的恢复过程中遇到ORA-600 4000错误挺多的，但是在oracle 8i(8.1.7)中遇到此类问题，还是第一次，做个记忆，供参考：
数据库故障起因：因为存储异常，导致当前redo损坏，并_allow_resetlogs_corruption参数尝试打开数据库

Media Recovery Log
kcrrga: Warning.  Log sequence in archive filename wrapped
to fix length as indicated by %S in LOG_ARCHIVE_FORMAT.
Old log archive with same name might be overwritten.
ORA-279 signalled during: ALTER DATABASE RECOVER  database using backup cont...
Wed Aug 20 23:01:43 2014
ALTER DATABASE RECOVER    CANCEL
Media Recovery Cancelled
Completed: ALTER DATABASE RECOVER    CANCEL
Wed Aug 20 23:01:50 2014
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 262618871
Wed Aug 20 23:01:50 2014
Thread 1 opened at log sequence 1
  Current log# 3 seq# 1 mem# 0: F:\REDO01.LOG
Successful open of redo thread 1.
Wed Aug 20 23:01:50 2014
SMON: enabling cache recovery
Wed Aug 20 23:01:50 2014
Errors in file C:\oracle\admin\YCFD\udump\ORA00320.TRC:
ORA-00600: ??????????: [4000], [3], [], [], [], [], [], []
SMON: disabling cache recovery
Wed Aug 20 23:01:51 2014
ORA-704 signalled during: alter database open resetlogs

数据库遭遇ORA-600 4000错误，数据库无法打开，分析对应trace日志

Dump file C:\oracle\admin\YCFD\udump\ORA00320.TRC
Wed Aug 20 23:01:50 2014
ORACLE V8.1.7.0.0 - Production vsnsta=0
vsnsql=e vsnxtr=3
Windows 2000 Version 5.2 Service Pack 2, CPU type 586
Oracle8i Release 8.1.7.0.0 - Production
JServer Release 8.1.7.0.0 - Production
Windows 2000 Version 5.2 Service Pack 2, CPU type 586
Instance name: ycfd
Redo thread mounted by this instance: 1
Oracle process number: 8
Windows thread id: 320, image: ORACLE.EXE
*** SESSION ID:(7.1) 2014-08-20 23:01:50.838
*** 2014-08-20 23:01:50.838
ksedmp: internal or fatal error
ORA-00600: ??????????: [4000], [3], [], [], [], [], [], []
Current SQL statement for this session:
select ctime, mtime, stime from obj$ where obj# = :1
----- Call Stack Trace -----

这里可以看出来，是因为数据库在启动之时需要执行select ctime, mtime, stime from obj$ where obj# = :1语句，但是由于每种原因出现ORA-600 4000导致数据库无法正常启动，继续分析日志

lock header dump:  0x0040003e
 Object id on Block? Y
 seg/obj: 0x12  csc: 0x00.fb5c5c5  itc: 1  flg: -  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   xid:  0x0003.012.0002ae94    uba: 0x00801f5b.5389.11  --U-    1  fsc 0x0000.0fb5c5c6
SQL> select checkpoint_change# from v$database;
263570122

此处比较明显,通过xid可以知道三号回滚段中对应一个事务出现问题：
1. 该block为file 1 bock 62，object_id为 18（obj$)上有一个事务
2. 该事务的scn为263,570,886>database scn(263570122)导致该故障发生
3. 当数据库访问到file 1 block 62的时候，发现有一个事务，而该事务的scn大于数据库scn，从而出现ORA-600[4000]
解决该问题有几种方法
1. 修改block 62，人工提交该事务
2. 修改数据库scn，让数据库scn大于itl scn
解决block 62 事务问题后出现如下错误

Wed Aug 20 23:03:55 2014
SMON: enabling cache recovery
Wed Aug 20 23:03:55 2014
Dictionary check beginning
Dictionary check complete
Wed Aug 20 23:03:55 2014
SMON: enabling tx recovery
Wed Aug 20 23:03:56 2014
Errors in file C:\oracle\admin\YCFD\bdump\ycfdSMON.TRC:
ORA-00600: internal error code, arguments: [4193], [21173], [21181], [], [], [], [], []
Recovery of Online Redo Log: Thread 1 Group 1 Seq 2 Reading mem 0
  Mem# 0 errs 0: F:\REDO03.LOG
SMON: terminating instance due to error 600
Instance terminated by SMON, pid = 2468

数据库出现ORA-600 4193,这个是常见错误,因为redo记录和undo记录不匹配导致,可以直接使用_corrupted_rollback_segments/_offline_rollback_segments屏蔽回滚段跳过

Wed Aug 20 23:08:10 2014
SMON: enabling cache recovery
SMON: enabling tx recovery
SMON: about to recover undo segment 1
SMON: mark undo segment 1 as needs recovery
SMON: about to recover undo segment 2
SMON: mark undo segment 2 as needs recovery
SMON: about to recover undo segment 3
SMON: mark undo segment 3 as needs recovery
SMON: about to recover undo segment 4
SMON: mark undo segment 4 as needs recovery
SMON: about to recover undo segment 5
SMON: mark undo segment 5 as needs recovery
SMON: about to recover undo segment 6
SMON: mark undo segment 6 as needs recovery
SMON: about to recover undo segment 7
SMON: mark undo segment 7 as needs recovery
SMON: about to recover undo segment 1
SMON: mark undo segment 1 as needs recovery
SMON: about to recover undo segment 2
SMON: mark undo segment 2 as needs recovery
SMON: about to recover undo segment 3
SMON: mark undo segment 3 as needs recovery
SMON: about to recover undo segment 4
SMON: mark undo segment 4 as needs recovery
SMON: about to recover undo segment 5
SMON: mark undo segment 5 as needs recovery
SMON: about to recover undo segment 6
SMON: mark undo segment 6 as needs recovery
SMON: about to recover undo segment 7
SMON: mark undo segment 7 as needs recovery
Wed Aug 20 23:08:15 2014
Completed: alter database open

其他类似文章：
ORA-600[4194]/[4193]解决
 通过bbed解决ORA-600 4000案例
 通过bbed解决ORA-00600[4000]案例
 记录一次ORA-600 4000数据库故障恢复

一起ORA-600 3020故障恢复的大体思路

recover database 报ORA-600 3020

Recovery of Online Redo Log: Thread 1 Group 2 Seq 5729 Reading mem 0
  Mem# 0: E:\ORACLE\ORADATA\YYGDB\REDO02.LOG
Tue Aug 19 19:37:29 2014
Errors in file d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_pr0s_4296.trc  (incident=39403):
ORA-00600: internal error code, arguments: [3020], [3], [240], [12583152], [], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 3, block# 240)
ORA-10564: tablespace UNDOTBS1
ORA-01110: data file 3: 'E:\ORACLE\ORADATA\YYGDB\UNDOTBS01.DBF'
ORA-10560: block type 'KTU SMU HEADER BLOCK'
Incident details in: d:\oracle\diag\rdbms\yygdb\yygdb\incident\incdir_39403\yygdb_pr0s_4296_i39403.trc
ORA-00600: internal error code, arguments: [3020], [2], [90586], [8479194], [], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 2, block# 90586)
ORA-10564: tablespace SYSAUX
ORA-01110: data file 2: 'E:\ORACLE\ORADATA\YYGDB\SYSAUX01.DBF'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 6087
Errors in file d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_ora_12460.trc  (incident=39147):
ORA-00600: internal error code, arguments: [3020], [3], [240], [12583152], [], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 3, block# 240)
ORA-10564: tablespace UNDOTBS1
ORA-01110: data file 3: 'E:\ORACLE\ORADATA\YYGD
Incident details in: d:\oracle\diag\rdbms\yygdb\yygdb\incident\incdir_39147\yygdb_ora_12460_i39147.trc
Tue Aug 19 19:37:31 2014
Trace dumping is performing id=[cdmp_20140819193731]
Tue Aug 19 19:37:32 2014
Recovery Slave PR0S previously exited with an exception
Shutting down recovery slaves due to error 10877
Media Recovery failed with error 10877
ORA-283 signalled during: ALTER DATABASE RECOVER  database  ...

使用allow 1 corruption跳3020错误继续恢复

Tue Aug 19 19:38:53 2014
ALTER DATABASE RECOVER  database allow 1 corruption
Media Recovery Start
Fast Parallel Media Recovery enabled
 ALLOW CORRUPTION option must use serial recovery
Warning: Datafile 10 (D:\ORACLE\PRODUCT\11.1.0\DB_1\ORADATA\SAMPLE\LAYOUT_DB.DBF) is offline during full
database recovery and will not be recovered
Recovery of Online Redo Log: Thread 1 Group 2 Seq 5729 Reading mem 0
  Mem# 0: E:\ORACLE\ORADATA\YYGDB\REDO02.LOG
CORRUPTING BLOCK 240 OF FILE 3 AND CONTINUING RECOVERY
Errors in file d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_ora_12460.trc:
ORA-10567: Redo is inconsistent with data block (file# 3, block# 240)
ORA-10564: tablespace UNDOTBS1
ORA-01110: 数据文件 3: 'E:\ORACLE\ORADATA\YYGDB\UNDOTBS01.DBF'
ORA-10560: block type 'KTU SMU HEADER BLOCK'
Errors in file d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_ora_12460.trc  (incident=39148):
ORA-00600: 内部错误代码, 参数: [3020], [2], [90586], [8479194], [], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 2, block# 90586)
ORA-10564: tablespace SYSAUX
ORA-01110: 数据文件 2: 'E:\ORACLE\ORADATA\YYGDB\SYSAUX01.DBF'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 6087
Incident details in: d:\oracle\diag\rdbms\yygdb\yygdb\incident\incdir_39148\yygdb_ora_12460_i39148.trc
Media Recovery failed with error 600
ORA-283 signalled during: ALTER DATABASE RECOVER  database allow 1 corruption  ...
Tue Aug 19 19:38:56 2014
Trace dumping is performing id=[cdmp_20140819193856]
Tue Aug 19 19:38:59 2014
Sweep Incident[39148]: completed
Tue Aug 19 19:39:05 2014
ALTER DATABASE RECOVER  database allow 1 corruption
Media Recovery Start
Fast Parallel Media Recovery enabled
 ALLOW CORRUPTION option must use serial recovery
Warning: Datafile 10 (D:\ORACLE\PRODUCT\11.1.0\DB_1\ORADATA\SAMPLE\LAYOUT_DB.DBF) is offline during full
database recovery and will not be recovered
Recovery of Online Redo Log: Thread 1 Group 2 Seq 5729 Reading mem 0
  Mem# 0: E:\ORACLE\ORADATA\YYGDB\REDO02.LOG
CORRUPTING BLOCK 90586 OF FILE 2 AND CONTINUING RECOVERY
Errors in file d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_ora_12460.trc:
ORA-10567: Redo is inconsistent with data block (file# 2, block# 90586)
ORA-10564: tablespace SYSAUX
ORA-01110: 数据文件 2: 'E:\ORACLE\ORADATA\YYGDB\SYSAUX01.DBF'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 6087
Completed: ALTER DATABASE RECOVER  database allow 1 corruption

继续open数据库报ORA-01578错误,数据库无法open

Thread 1 opened at log sequence 5730
  Current log# 3 seq# 5730 mem# 0: E:\ORACLE\ORADATA\YYGDB\REDO03.LOG
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Tue Aug 19 19:39:34 2014
SMON: enabling cache recovery
Errors in file d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_ora_12460.trc  (incident=39149):
ORA-01578: ORACLE 数据块损坏 (文件号 3, 块号 240)
ORA-01110: 数据文件 3: 'E:\ORACLE\ORADATA\YYGDB\UNDOTBS01.DBF'
Incident details in: d:\oracle\diag\rdbms\yygdb\yygdb\incident\incdir_39149\yygdb_ora_12460_i39149.trc
Errors in file d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_ora_12460.trc  (incident=39150):
ORA-00353: 日志损坏接近块 520 更改 101455257 时间 08/18/2014 08:22:54
ORA-00312: 联机日志 1 线程 1: 'E:\ORACLE\ORADATA\YYGDB\REDO01.LOG'
ORA-01578: ORACLE 数据块损坏 (文件号 3, 块号 240)
ORA-01110: 数据文件 3: 'E:\ORACLE\ORADATA\YYGDB\UNDOTBS01.DBF'
Incident details in: d:\oracle\diag\rdbms\yygdb\yygdb\incident\incdir_39150\yygdb_ora_12460_i39150.trc
Errors in file d:\oracle\diag\rdbms\yygdb\yygdb\incident\incdir_39149\yygdb_ora_12460_i39149.trc:
ORA-00354: 损坏重做日志块标头
ORA-00353: 日志损坏接近块 520 更改 101455257 时间 08/18/2014 08:22:54
ORA-00312: 联机日志 1 线程 1: 'E:\ORACLE\ORADATA\YYGDB\REDO01.LOG'
ORA-01578: ORACLE 数据块损坏 (文件号 3, 块号 240)
ORA-01110: 数据文件 3: 'E:\ORACLE\ORADATA\YYGDB\UNDOTBS01.DBF'
Errors in file d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_ora_12460.trc  (incident=39151):
Error 1578 happened during db open, shutting down database
USER (ospid: 12460): terminating the instance due to error 1578
Tue Aug 19 19:39:41 2014
Errors in file d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_dbw3_18508.trc  (incident=38659):
ORA-01578: ORACLE data block corrupted (file # , block # )
Tue Aug 19 19:39:41 2014
Errors in file d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_dbw5_12160.trc  (incident=38675):
ORA-01578: ORACLE data block corrupted (file # , block # )
Tue Aug 19 19:39:42 2014
Instance terminated by USER, pid = 12460
ORA-1092 signalled during: alter database open...
ORA-1092 : opiodr aborting process unknown ospid (5084_12460)

由于undo 表空间有坏块，导致数据库open失败,尝试修改undo_management= “MANUAL”，继续启动数据库

Tue Aug 19 19:50:06 2014
alter database open
Beginning crash recovery of 1 threads
 parallel recovery started with 16 processes
Started redo scan
Completed redo scan
 3 redo blocks read, 0 data blocks need recovery
Started redo application at
 Thread 1: logseq 5731, block 2, scn 101497289
Recovery of Online Redo Log: Thread 1 Group 1 Seq 5731 Reading mem 0
  Mem# 0: E:\ORACLE\ORADATA\YYGDB\REDO01.LOG
Completed redo application of 0.00MB
Completed crash recovery at
 Thread 1: logseq 5731, block 5, scn 101517294
 0 data blocks read, 0 data blocks written, 3 redo blocks read
Tue Aug 19 19:50:08 2014
Thread 1 advanced to log sequence 5732 (thread open)
Thread 1 opened at log sequence 5732
  Current log# 2 seq# 5732 mem# 0: E:\ORACLE\ORADATA\YYGDB\REDO02.LOG
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Tue Aug 19 19:50:08 2014
SMON: enabling cache recovery
Dictionary check beginning
Tablespace 'TEMP' #3 found in data dictionary,
but not in the controlfile. Adding to controlfile.
File #3 is offline, but is part of an online tablespace.
data file 3: 'E:\ORACLE\ORADATA\YYGDB\UNDOTBS01.DBF'
File #10 is offline, but is part of an online tablespace.
data file 10: 'D:\ORACLE\PRODUCT\11.1.0\DB_1\ORADATA\SAMPLE\LAYOUT_DB.DBF'
File #11 is offline, but is part of an online tablespace.
data file 11: 'D:\ORACLE\PRODUCT\11.1.0\DB_1\ORADATA\SAMPLE\LAYOUT.DBF'
Dictionary check complete
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
*********************************************************************
WARNING: The following temporary tablespaces contain no files.
         This condition can occur when a backup controlfile has
         been restored.  It may be necessary to add files to these
         tablespaces.  That can be done using the SQL statement:
         ALTER TABLESPACE <tablespace_name> ADD TEMPFILE
         Alternatively, if these temporary tablespaces are no longer
         needed, then they can be dropped.
           Empty temporary tablespace: TEMP
*********************************************************************
Database Characterset is ZHS16GBK
Opening with internal Resource Manager plan : on 4 X 8 NUMA system
**********************************************************
WARNING: Files may exists in db_recovery_file_dest
that are not known to the database. Use the RMAN command
CATALOG RECOVERY AREA to re-catalog any such files.
If files cannot be cataloged, then manually delete them
using OS command.
One of the following events caused this:
1. A backup controlfile was restored.
2. A standby controlfile was restored.
3. The controlfile was re-created.
4. db_recovery_file_dest had previously been enabled and
   then disabled.
**********************************************************
Hex dump of (file 1, block 7065) in trace file
d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_ora_14296.trc
Corrupt block relative dba: 0x00401b99 (file 1, block 7065)
Fractured block found during buffer read
Data in bad block:
 type: 6 format: 2 rdba: 0x00401b99
 last change scn: 0x0000.060c1f83 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0xac3b0601
 check value in block header: 0x2e13
 computed block checksum: 0xa4ac
Reread of rdba: 0x00401b99 (file 1, block 7065) found same corrupted data
Errors in file d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_ora_14296.trc  (incident=42814):
ORA-01578: ORACLE 数据块损坏 (文件号 1, 块号 7065)
ORA-01110: 数据文件 1: 'E:\ORACLE\ORADATA\YYGDB\SYSTEM01.DBF'
Incident details in: d:\oracle\diag\rdbms\yygdb\yygdb\incident\incdir_42814\yygdb_ora_14296_i42814.trc
Errors in file d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_ora_14296.trc  (incident=42815):
ORA-01578: ORACLE 数据块损坏 (文件号 1, 块号 7065)
ORA-01110: 数据文件 1: 'E:\ORACLE\ORADATA\YYGDB\SYSTEM01.DBF'
Incident details in: d:\oracle\diag\rdbms\yygdb\yygdb\incident\incdir_42815\yygdb_ora_14296_i42815.trc
Tue Aug 19 19:50:12 2014
Trace dumping is performing id=[cdmp_20140819195012]
Tue Aug 19 19:50:12 2014
Sweep Incident[42814]: completed
Hex dump of (file 1, block 7065) in trace file
d:\oracle\diag\rdbms\yygdb\yygdb\incident\incdir_42814\yygdb_m000_11592_i42814_a.trc
Corrupt block relative dba: 0x00401b99 (file 1, block 7065)
Fractured block found during validation
Data in bad block:
 type: 6 format: 2 rdba: 0x00401b99
 last change scn: 0x0000.060c1f83 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0xac3b0601
 check value in block header: 0x2e13
 computed block checksum: 0xa4ac
Reread of blocknum=7065, file=E:\ORACLE\ORADATA\YYGDB\SYSTEM01.DBF. found same corrupt data
Reread of blocknum=7065, file=E:\ORACLE\ORADATA\YYGDB\SYSTEM01.DBF. found same corrupt data
Reread of blocknum=7065, file=E:\ORACLE\ORADATA\YYGDB\SYSTEM01.DBF. found same corrupt data
Reread of blocknum=7065, file=E:\ORACLE\ORADATA\YYGDB\SYSTEM01.DBF. found same corrupt data
Reread of blocknum=7065, file=E:\ORACLE\ORADATA\YYGDB\SYSTEM01.DBF. found same corrupt data
Hex dump of (file 1, block 7065) in trace file
d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_ora_14296.trc
Corrupt block relative dba: 0x00401b99 (file 1, block 7065)
Fractured block found during buffer read
Data in bad block:
 type: 6 format: 2 rdba: 0x00401b99
 last change scn: 0x0000.060c1f83 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0xac3b0601
 check value in block header: 0x2e13
 computed block checksum: 0xa4ac
Reread of rdba: 0x00401b99 (file 1, block 7065) found same corrupted data
Corrupt Block Found
         TSN = 0, TSNAME = SYSTEM
         RFN = 1, BLK = 7065, RDBA = 4201369
         OBJN = 1164, OBJD = 1164, OBJECT = SYS_FBA_BARRIERSCN, SUBOBJECT =
         SEGMENT OWNER = SYS, SEGMENT TYPE = Table Segment
Errors in file d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_ora_14296.trc  (incident=42816):
ORA-01578: ORACLE 数据块损坏 (文件号 1, 块号 7065)
ORA-01110: 数据文件 1: 'E:\ORACLE\ORADATA\YYGDB\SYSTEM01.DBF'
Incident details in: d:\oracle\diag\rdbms\yygdb\yygdb\incident\incdir_42816\yygdb_ora_14296_i42816.trc
Trace dumping is performing id=[cdmp_20140819195014]
Errors in file d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_ora_14296.trc  (incident=42817):
Starting background process FBDA
Tue Aug 19 19:50:18 2014
FBDA started with pid=86, OS id=17700
replication_dependency_tracking turned off (no async multimaster replication found)
Errors in file d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_fbda_17700.trc  (incident=42910):
ORA-01578: ORACLE data block corrupted (file # 1, block # 7065)
ORA-01110: data file 1: 'E:\ORACLE\ORADATA\YYGDB\SYSTEM01.DBF'
Trace dumping is performing id=[cdmp_20140819195018]
Errors in file d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_fbda_17700.trc  (incident=42911):
ORA-01578: ORACLE data block corrupted (file # 1, block # 7065)
ORA-01110: data file 1: 'E:\ORACLE\ORADATA\YYGDB\SYSTEM01.DBF'
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
ORA-604 signalled during: alter database open...

数据库不完全open成功，报了604错误，通过分析undo$，直接使用_offline_rollback_segments屏蔽了status$=5的回滚段,数据库open正常,因为system有大量坏块,幸运的是使用exp导出来几个业务用户的表数据全部OK.
数据库备份重于一切,别寄希望数据库非常规恢复

又一起存储故障导致ORA-00333 ORA-00312恢复

数据库启动报ORA-00333 ORA-00312错误，无法正常open数据库

Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_arc0_4724.trc:
ORA-00333: redo log read error block 63489 count 2048
ORA-00312: online log 2 thread 1: 'F:\ORADATA\SZCG\REDO02.LOG'
ORA-27091: unable to queue I/O
ORA-27070: async read/write failed
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 1) 函数不正确。
Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_arc0_4724.trc:
ORA-00333: redo log read error block 63489 count 2048
Thu Aug 07 10:42:03  2014
ARC0: All Archive destinations made inactive due to error 333
Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_1856.trc:
ORA-00449: 后台进程 'LGWR' 因错误 340 异常终止
ORA-00340: 处理联机日志  (用于线程 ) 时出现 I/O 错误
Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_6548.trc:
ORA-00449: 后台进程 'LGWR' 因错误 340 异常终止
ORA-00340: 处理联机日志  (用于线程 ) 时出现 I/O 错误
Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_8104.trc:
ORA-00449: 后台进程 'LGWR' 因错误 340 异常终止
ORA-00340: 处理联机日志  (用于线程 ) 时出现 I/O 错误
Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_lgwr_884.trc:
ORA-00340: IO error processing online log 3 of thread 1
ORA-00345: redo log write error block 65238 count 13
ORA-00312: online log 3 thread 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27070: async read/write failed
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。
Thu Aug 07 10:42:03  2014
LGWR: terminating instance due to error 340
Thu Aug 07 10:42:05  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_8104.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00449: background process 'LGWR' unexpectedly terminated with error 340
ORA-00340: IO error processing online log  of thread
Thu Aug 07 10:42:05  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_1856.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00449: background process 'LGWR' unexpectedly terminated with error 340
ORA-00340: IO error processing online log  of thread
Thu Aug 07 10:42:05  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_6548.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00449: background process 'LGWR' unexpectedly terminated with error 340
ORA-00340: IO error processing online log  of thread
Thu Aug 07 17:40:05  2014
ALTER DATABASE OPEN
Thu Aug 07 17:40:05  2014
Beginning crash recovery of 1 threads
 parallel recovery started with 15 processes
Thu Aug 07 17:40:06  2014
Started redo scan
Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27070: 异步读取/写入失败
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。
Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27091: 无法将 I/O 排队
ORA-27070: 异步读取/写入失败
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 1) 函数不正确。
Thu Aug 07 17:40:06  2014
Aborting crash recovery due to error 333
Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-333 signalled during: ALTER DATABASE OPEN...

进一步检查发现在7月6日系统就已经报io异常

Sun Jul 06 10:05:23  2014
ARC0: All Archive destinations made inactive due to error 333
Sun Jul 06 10:06:07  2014
KCF: write/open error block=0xd03 online=1
     file=3 F:\ORADATA\SZCG\SYSAUX01.DBF
     error=27070 txt: 'OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。'
Automatic datafile offline due to write error on
file 3: F:\ORADATA\SZCG\SYSAUX01.DBF
Sun Jul 06 10:06:23  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_arc1_2676.trc:
ORA-00333: redo log read error block 63489 count 2048
ORA-00312: online log 2 thread 1: 'F:\ORADATA\SZCG\REDO02.LOG'
ORA-27091: unable to queue I/O
ORA-27070: async read/write failed
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 1) 函数不正确。
Thu Aug 07 10:36:54  2014
ARC1: All Archive destinations made inactive due to error 333
Thu Aug 07 10:37:25  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_m000_5832.trc:
ORA-01135: file 3 accessed for DML/query is offline
ORA-01110: data file 3: 'F:\ORADATA\SZCG\SYSAUX01.DBF'

检查硬件发现raid一块盘完全损坏，另外一块盘也处于告警状态，保护现场拷贝文件过程中发现redo02,redo03,sysaux无法拷贝,使用rman检查发现
sysaux-block

因为redo完全损坏，使用工具跳过坏块，拷贝相关有坏块文件到其他目录，重命名相关文件尝试启动数据库，依然报ORA-00333 ORA-00312

Started redo scan
Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27070: 异步读取/写入失败
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。
Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27091: 无法将 I/O 排队
ORA-27070: 异步读取/写入失败
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 1) 函数不正确。
Thu Aug 07 17:40:06  2014
Aborting crash recovery due to error 333
Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-333 signalled during: ALTER DATABASE OPEN...

设置隐含参数_allow_resetlogs_corruption,尝试强制拉库

Started redo scan
Fri Aug 08 12:13:25  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_3892.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27070: 异步读取/写入失败
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。
Fri Aug 08 12:13:25  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_3892.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27091: 无法将 I/O 排队
ORA-27070: 异步读取/写入失败
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 1) 函数不正确。
Fri Aug 08 12:13:25  2014
Aborting crash recovery due to error 333
Fri Aug 08 12:13:25  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_3892.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-333 signalled during: ALTER DATABASE OPEN...
Fri Aug 08 12:13:45  2014
ALTER DATABASE RECOVER  database until cancel
Fri Aug 08 12:13:45  2014
Media Recovery Start
 parallel recovery started with 15 processes
ORA-279 signalled during: ALTER DATABASE RECOVER  database until cancel  ...
Fri Aug 08 12:13:55  2014
ALTER DATABASE RECOVER    CANCEL
Fri Aug 08 12:13:59  2014
ORA-1547 signalled during: ALTER DATABASE RECOVER    CANCEL  ...
Fri Aug 08 12:13:59  2014
ALTER DATABASE RECOVER CANCEL
ORA-1112 signalled during: ALTER DATABASE RECOVER CANCEL ...
Fri Aug 08 12:14:12  2014
alter database open resetlogs
Fri Aug 08 12:14:13  2014
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
ORA-1245 signalled during: alter database open resetlogs...
Fri Aug 08 12:54:11  2014
alter tablespace sysaux offline
Fri Aug 08 12:54:11  2014
ORA-1109 signalled during: alter tablespace sysaux offline...
Fri Aug 08 13:05:30  2014
alter database open
Fri Aug 08 13:05:30  2014
ORA-1589 signalled during: alter database open...

在offline过程中，数据库检查到sysaux数据文件为offline状态，当表空间只有一个数据文件，而且该数据文件为offline，数据库将会尝试offline sysaux表空间，但是发现该表空间文件非正常scn，无法offline 表空间，导致resetlogs操作失败。这里是操作失误应该先online相关数据文件，然后再进行resetlogs操作

Sat Aug 09 11:56:03  2014
alter database datafile 3 online
Sat Aug 09 11:56:04  2014
Completed: alter database datafile 3 online
Sat Aug 09 11:56:08  2014
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
Sat Aug 09 11:56:18  2014
ARCH: Encountered disk I/O error 19502
Sat Aug 09 11:56:18  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc:
ORA-19502: 文件 "F:\ARCHIVE\ARC01745_0814618167.001", 块编号 55297 写错误 (块大小 = 512)
ORA-27072: 文件 I/O 错误
OSD-04008: WriteFile() 失败, 无法写入文件
O/S-Error: (OS 1) 函数不正确。
ORA-19502: 文件 "F:\ARCHIVE\ARC01745_0814618167.001", 块编号 55297 写错误 (块大小 = 512)
Sat Aug 09 11:56:18  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc:
ORA-19502: 文件 "F:\ARCHIVE\ARC01745_0814618167.001", 块编号 55297 写错误 (块大小 = 512)
ORA-27072: 文件 I/O 错误
OSD-04008: WriteFile() 失败, 无法写入文件
O/S-Error: (OS 1) 函数不正确。
ORA-19502: 文件 "F:\ARCHIVE\ARC01745_0814618167.001", 块编号 55297 写错误 (块大小 = 512)
ARCH: I/O error 19502 archiving log 3 to 'F:\ARCHIVE\ARC01745_0814618167.001'
Sat Aug 09 11:56:18  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc:
ORA-00265: 要求实例恢复, 无法设置 ARCHIVELOG 模式
Archive all online redo logfiles failed:265
RESETLOGS after incomplete recovery UNTIL CHANGE 77983856
Resetting resetlogs activation ID 3562192628 (0xd452bef4)
Online log F:\ORADATA\SZCG\REDO01.LOG: Thread 1 Group 1 was previously cleared
Online log F:\ORADATA\SZCG\REDO02.LOG: Thread 1 Group 2 was previously cleared
Online log D:\REDO04.LOG: Thread 1 Group 4 was previously cleared
Sat Aug 09 11:56:22  2014
Setting recovery target incarnation to 3
Sat Aug 09 11:56:23  2014
Assigning activation ID 3602586269 (0xd6bb1a9d)
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=33, OS id=5900
Sat Aug 09 11:56:23  2014
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=34, OS id=5776
Sat Aug 09 11:56:24  2014
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: F:\ORADATA\SZCG\REDO01.LOG
Successful open of redo thread 1
Sat Aug 09 11:56:24  2014
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sat Aug 09 11:56:24  2014
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
Sat Aug 09 11:56:24  2014
ARC0: Becoming the heartbeat ARCH
Sat Aug 09 11:56:24  2014
SMON: enabling cache recovery
Sat Aug 09 11:56:25  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc:
ORA-00600: 内部错误代码, 参数: [2662], [0], [77983864], [0], [77992379], [8388617], [], []
Sat Aug 09 11:56:26  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc:
ORA-00600: 内部错误代码, 参数: [2662], [0], [77983864], [0], [77992379], [8388617], [], []
Sat Aug 09 11:56:26  2014
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Instance terminated by USER, pid = 4516
ORA-1092 signalled during: alter database open resetlogs...

ORA-600 2662这个错误很熟悉，直接推SCN,数据库open,但是报ORA-600 4194

Sat Aug 09 12:01:28  2014
SMON: enabling cache recovery
Dictionary check complete
Sat Aug 09 12:01:32  2014
SMON: enabling tx recovery
Sat Aug 09 12:01:32  2014
Database Characterset is ZHS16GBK
Opening with internal Resource Manager plan
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=34, OS id=6116
Sat Aug 09 12:01:34  2014
LOGSTDBY: Validating controlfile with logical metadata
Sat Aug 09 12:01:34  2014
LOGSTDBY: Validation complete
Sat Aug 09 12:01:34  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_smon_920.trc:
ORA-00600: internal error code, arguments: [4194], [21], [53], [], [], [], [], []
Sat Aug 09 12:01:36  2014
Doing block recovery for file 2 block 319
Resuming block recovery (PMON) for file 2 block 319
Block recovery from logseq 2, block 56 to scn 1073742003
Sat Aug 09 12:01:36  2014
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: F:\ORADATA\SZCG\REDO02.LOG
Block recovery stopped at EOT rba 2.79.16
Block recovery completed at rba 2.79.16, scn 0.1073742002
Doing block recovery for file 2 block 153
Resuming block recovery (PMON) for file 2 block 153
Block recovery from logseq 2, block 56 to scn 1073741986
Sat Aug 09 12:01:36  2014
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: F:\ORADATA\SZCG\REDO02.LOG
Block recovery completed at rba 2.66.16, scn 0.1073741988
Sat Aug 09 12:01:36  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_smon_920.trc:
ORA-01595: error freeing extent (4) of rollback segment (10))
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4194], [21], [53], [], [], [], [], []
Sat Aug 09 12:01:36  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5272.trc:
ORA-00600: internal error code, arguments: [4194], [21], [53], [], [], [], [], []
Sat Aug 09 12:01:36  2014
Completed: alter database open

尝试重建undo表空间并切换undo_tabspace到新undo表空间解决，因为数据库在恢复过程中使用了隐含参数强制拉库,不能保证数据一致性,强烈建议逻辑方式重建数据库
在本次故障中，所幸的是只有redo和sysaux文件损坏，如果是业务数据文件或者system数据文件损坏，恢复的后果可能更加麻烦，丢失数据可能更加多。再次说明：数据库备份非常重要,数据的安全性不能完全寄希望于硬件之上

某集团ebs数据库redo undo丢失导致悲剧

某集团的ebs系统因磁盘空间不足把redo和undo存放到raid 0之上，而且该库无任何备份。最终悲剧发生了,raid 0异常导致redo undo全部丢失,数据库无法正常启动(我接手之时数据库已经resetlogs过,但是未成功)

Sun Jul 27 11:31:27 2014
SMON: enabling cache recovery
SMON: enabling tx recovery
Sun Jul 27 11:31:27 2014
Database Characterset is ZHS16GBK
Sun Jul 27 11:31:27 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_454754.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 42 cannot be read at this time
ORA-01110: data file 42: '/prod/oracle/PROD/logdata/undo/undo1.dbf'
Sun Jul 27 11:31:27 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_454754.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 42 cannot be read at this time
ORA-01110: data file 42: '/prod/oracle/PROD/logdata/undo/undo1.dbf'
Sun Jul 27 11:31:27 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_454754.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 42 cannot be read at this time
ORA-01110: data file 42: '/prod/oracle/PROD/logdata/undo/undo1.dbf'
Sun Jul 27 11:31:27 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_663670.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 41 cannot be read at this time
ORA-01110: data file 41: '/prod/oracle/PROD/logdata/undo/undo2.dbf'
Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604
Instance terminated by USER, pid = 663670
ORA-1092 signalled during: ALTER DATABASE OPEN...

查询相关文件状态发现,undo表空间文件丢失,被offline处理
df_status
因为以前alert日志被清理,通过这里大概猜测是offline丢失的undo文件,然后resetlogs了数据库,现在处理方式为
使用_corrupted_rollback_segments屏蔽回滚段,然后尝试启动数据库

Tue Jul 29 11:40:39 2014
SMON: enabling cache recovery
SMON: enabling tx recovery
Tue Jul 29 11:40:39 2014
Database Characterset is ZHS16GBK
Tue Jul 29 11:40:39 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_569378.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number  with name "" too small
Tue Jul 29 11:40:39 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_569378.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number  with name "" too small
Tue Jul 29 11:40:39 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_569378.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number  with name "" too small
Tue Jul 29 11:40:39 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_585786.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number  with name "" too small
Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604
Instance terminated by USER, pid = 585786
ORA-1092 signalled during: alter database open...

该错误是由于数据库启动需要找到对应的回滚段,但是由于undo异常导致该回滚段无法找到，因此出现该错误，解决方法是通过修改数据scn，让其不找回滚段,从而屏蔽该错误.数据库启动后,删除undo重新创建新undo

Tue Jul 29 15:59:22 2014
drop tablespace undo2 including contents and datafiles
Tue Jul 29 15:59:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_782490.trc:
ORA-01122: database file 41 failed verification check
ORA-01110: data file 41: '/prod/oracle/PROD/logdata/undo/undo2.dbf'
ORA-01565: error in identifying file '/prod/oracle/PROD/logdata/undo/undo2.dbf'
ORA-27037: unable to obtain file status
IBM AIX RISC System/6000 Error: 2: No such file or directory
Additional information: 3
Tue Jul 29 15:59:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_782490.trc:
ORA-01259: unable to delete datafile /prod/oracle/PROD/logdata/undo/undo2.dbf
Tue Jul 29 15:59:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_782490.trc:
ORA-01122: database file 42 failed verification check
ORA-01110: data file 42: '/prod/oracle/PROD/logdata/undo/undo1.dbf'
ORA-01565: error in identifying file '/prod/oracle/PROD/logdata/undo/undo1.dbf'
ORA-27037: unable to obtain file status
IBM AIX RISC System/6000 Error: 2: No such file or directory
Additional information: 3
ORA-01259: unable to delete datafile /prod/oracle/PROD/logdata/undo/undo2.dbf
Tue Jul 29 15:59:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_782490.trc:
ORA-01259: unable to delete datafile /prod/oracle/PROD/logdata/undo/undo1.dbf
Tue Jul 29 15:59:23 2014
Completed: drop tablespace undo2 including contents and datafiles
Tue Jul 29 15:59:56 2014
create undo tablespace undotbs1 datafile '/prod/oracle/PROD/logdata/undo_new01.dbf' size 100M autoextend on next 128M maxsize 30G
Tue Jul 29 15:59:57 2014
Completed: create undo tablespace undotbs1 datafile '/prod/oracle/PROD/logdata/undo_new01.dbf' size 100M autoextend on next 128M maxsize 30G
Tue Jul 29 16:00:03 2014
alter tablespace undotbs1 add datafile '/prod/oracle/PROD/logdata/undo_new02.dbf' size 100M autoextend on next 128M maxsize 30G
Completed: alter tablespace undotbs1 add datafile '/prod/oracle/PROD/logdata/undo_new02.dbf' size 100M autoextend on next 128M maxsize 30G

业务运行过程中,数据库报大量ORA-600 4097,ORA-600 kdsgrp1,ORA-600 kcfrbd_3错误

Tue Jul 29 16:07:03 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_950484.trc:
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Tue Jul 29 16:07:06 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_950484.trc:
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
Tue Jul 29 16:10:06 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_917702.trc:
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Tue Jul 29 16:10:07 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_917702.trc:
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
Tue Jul 29 16:12:45 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_m000_880692.trc:
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Tue Jul 29 16:21:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1040638.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [41], [231381], [1], [12800], [12800], [], []
Tue Jul 29 16:21:37 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1040638.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [41], [231381], [1], [12800], [12800], [], []
Tue Jul 29 16:21:56 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1040638.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [41], [231381], [1], [12800], [12800], [], []
Tue Jul 29 16:22:18 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1040638.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [41], [231381], [1], [12800], [12800], [], []
Tue Jul 29 16:22:28 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1105950.trc:
ORA-00600: 内部错误代码, 参数: [4097], [], [], [], [], [], [], []
Tue Jul 29 16:22:33 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1159232.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [42], [61235], [1], [12800], [12800], [], []

出现该错误有几个原因和解决方法：
ORA-600 kdsgrp1 是因为相关坏块引起(tab,index,memory,cr block等),结合日志分析对象异常原因,根据具体情况确定对象然后选择合适处理方案(具体参考NOTE:1332252.1)
ORA-600 4097 由于数据库异常关闭然后open,创建回滚段,可能触发bug导致该问题(虽然说在当前版本修复,但是实际处理我确实按照NOTE:1030620.6解决)
ORA-600 kcfrbd_3 有事务的block被访问之后,根据回滚槽信息定位到相关回滚段,而正好新建的回滚段信息又和以前的名字编号一致,从而反馈出来是数据文件大小不够,从而出现该错误(具体参考NOTE:601798.1)
最终该数据库虽然恢复了,抢救了大量数据,但是对于ebs系统来说,丢失redo和undo数据的损失还是巨大的.再次温馨提示:数据库的redo,undo也很重要,数据库的备份更加重要

ORACLE 8.1.7 数据库ORA-600 4194故障恢复

一个817数据库报ORA-600 4194 无法正常启动

Fri Jul 25 10:49:47 2014
Database mounted in Exclusive Mode.
Completed: ALTER DATABASE   MOUNT
Fri Jul 25 10:49:58 2014
ALTER DATABASE RECOVER  database
Fri Jul 25 10:49:58 2014
Media Recovery Start
Media Recovery Log
Recovery of Online Redo Log: Thread 1 Group 2 Seq 3320 Reading mem 0
  Mem# 0 errs 0: D:\ORACLE\ORADATA\ORCL\REDO02.LOG
Media Recovery Complete
Completed: ALTER DATABASE RECOVER  database
Fri Jul 25 10:50:09 2014
alter database open
Beginning crash recovery of 1 threads
Fri Jul 25 10:50:09 2014
Thread recovery: start rolling forward thread 1
Recovery of Online Redo Log: Thread 1 Group 2 Seq 3320 Reading mem 0
  Mem# 0 errs 0: D:\ORACLE\ORADATA\ORCL\REDO02.LOG
Fri Jul 25 10:50:09 2014
Thread recovery: finish rolling forward thread 1
Thread recovery: 0 data blocks read, 0 data blocks written, 3 redo blocks read
Crash recovery completed successfully
Fri Jul 25 10:50:09 2014
Thread 1 advanced to log sequence 3321
Thread 1 opened at log sequence 3321
  Current log# 3 seq# 3321 mem# 0: D:\ORACLE\ORADATA\ORCL\REDO01.LOG
Successful open of redo thread 1.
Fri Jul 25 10:50:09 2014
SMON: enabling cache recovery
Fri Jul 25 10:50:10 2014
Errors in file D:\oracle\admin\ORCL\udump\ORA03216.TRC:
ORA-00600: ??????????: [4194], [12], [37], [], [], [], [], []
Fri Jul 25 10:50:10 2014
Recovery of Online Redo Log: Thread 1 Group 3 Seq 3321 Reading mem 0
  Mem# 0 errs 0: D:\ORACLE\ORADATA\ORCL\REDO01.LOG
Fri Jul 25 10:50:10 2014
SMON: disabling cache recovery
Fri Jul 25 10:50:10 2014
ORA-600 signalled during: alter database open

ORA-600 4194这个错误在数据库异常恢复中非常常见，因为库不是很重要,因此就是直接屏蔽掉故障回滚段,然后强制拉库,该库的恢复过程中,也直接使用隐含参数屏蔽回滚段
_corrupted_rollback_segments= RBS0, RBS1, RBS2, RBS3, RBS4, RBS5, RBS6, RBS_HDSYS,数据库依然无法open,进一步分析trace文件

Fri Jul 25 11:26:07 2014
ORACLE V8.1.7.0.0 - Production vsnsta=0
vsnsql=e vsnxtr=3
Windows 2000 Version 5.2 Service Pack 2, CPU type 586
Oracle8i Release 8.1.7.0.0 - Production
JServer Release 8.1.7.0.0 - Production
Windows 2000 Version 5.2 Service Pack 2, CPU type 586
Instance name: orcl
Redo thread mounted by this instance: 1
Oracle process number: 14
Windows thread id: 3648, image: ORACLE.EXE
*** SESSION ID:(11.1) 2014-07-25 11:26:07.843
*** 2014-07-25 11:26:07.843
ksedmp: internal or fatal error
ORA-00600: ??????????: [4194], [12], [37], [], [], [], [], []
Current SQL statement for this session:
update undo$ set name=:2,file#=:3,block#=:4,status$=:5,user#=:6,undosqn=:7,xactsqn=:8,
scnbas=:9,scnwrp=:10,inst#=:11,ts#=:12 where us#=:1
----- Call Stack Trace -----

这里很明显看出来，数据库是在open过程中，update undo$表遭遇到ORA-600 4194,因为该过程需要使用系统回滚段，但是由于其所对应的undo和redo信息不一致，所以无法正常启动数据库.继续读trace文件

  Extent Control Header
  -----------------------------------------------------------------
  Extent Header:: spare1: 0      space2: 0      #extents: 5      #blocks: 49
                  last map  0x00000000  #maps: 0      offset: 4128
      Highwater::  0x00400006  ext#: 0      blk#: 3      ext size: 9
  #blocks in seg. hdr's freelists: 0
  #blocks below: 0
  mapblk  0x00000000  offset: 0
                   Unlocked
     Map Header:: next  0x00000000  #extents: 5    obj#: 0      flag: 0x40000000
  Extent Map
  -----------------------------------------------------------------
   0x00400003  length: 9
   0x0040000c  length: 10
   0x0040008f  length: 10
   0x00400099  length: 10
   0x004000a3  length: 10
  TRN CTL:: seq: 0x003c chd: 0x004e ctl: 0x0050 inc: 0x00000000 nfb: 0x0000
            mgc: 0x8002 xts: 0x0068 flg: 0x0001 opt: 2147483646 (0x7ffffffe)
            uba: 0x00400006.003c.25 scn: 0x0000.009a4009
Version: 0x01
  FREE BLOCK POOL::
    uba: 0x00000000.003c.24 ext: 0x0  spc: 0x196
    uba: 0x00000000.001f.14 ext: 0x1  spc: 0x16f6
    uba: 0x00000000.0018.02 ext: 0x4  spc: 0x1f1a
    uba: 0x00000000.0000.00 ext: 0x0  spc: 0x0
    uba: 0x00000000.0000.00 ext: 0x0  spc: 0x0
  TRN TBL::

通过这里可以看出来,数据库在启动的时候，使用system undo的block为为0x00400006,使用bbed清除掉该uba记录,让数据库启动的时候重新分配system undo block给数据库执行update undo$使用,数据库open成功

BBED> m /x 0x00000000
 File: D:\ORACLE\ORADATA\ORCL\SYSTEM01.DBF (0)
 Block: 2                Offsets: 4188 to 4192           Dba:0x00000000
------------------------------------------------------------------------
 00000000 3c002400 00009601 00000000 1f001400 0100f616 00000000 18000200
BBED> m /x 0x0000
 File: D:\ORACLE\ORADATA\ORCL\SYSTEM01.DBF (0)
 Block: 2                Offsets: 4028 to 4032           Dba:0x00000000
------------------------------------------------------------------------
 00000000 00000000 3c005000 02800100 68000000 feffff7f 06004000 3c002400

Sat Jul 26 12:09:21 2014
Thread recovery: start rolling forward thread 1
Recovery of Online Redo Log: Thread 1 Group 2 Seq 3326 Reading mem 0
  Mem# 0 errs 0: D:\ORACLE\ORADATA\ORCL\REDO02.LOG
Sat Jul 26 12:09:21 2014
Thread recovery: finish rolling forward thread 1
Thread recovery: 0 data blocks read, 0 data blocks written, 3 redo blocks read
Crash recovery completed successfully
Sat Jul 26 12:09:22 2014
Thread 1 advanced to log sequence 3327
Thread 1 opened at log sequence 3327
  Current log# 3 seq# 3327 mem# 0: D:\ORACLE\ORADATA\ORCL\REDO01.LOG
Successful open of redo thread 1.
Sat Jul 26 12:09:22 2014
SMON: enabling cache recovery
SMON: enabling tx recovery
Sat Jul 26 12:09:39 2014
Completed: alter database open

数据库启动ORA-08103故障恢复

数据库在open过程报ORA-08103错误导致数据库无法正确启动

Fri Jul 18 22:02:51 2014
SMON: enabling tx recovery
Fri Jul 18 22:02:51 2014
Errors in file d:\oracle\product\10.2.0\admin\kemu3\udump\kemu3_ora_29788.trc:
ORA-00604: ?? SQL ?? 1 ????
ORA-08103: ??????
Fri Jul 18 22:02:51 2014
Database Characterset is ZHS16GBK
Fri Jul 18 22:02:51 2014
Errors in file d:\oracle\product\10.2.0\admin\kemu3\bdump\kemu3_smon_29704.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-08103: object no longer exists
Fri Jul 18 22:02:51 2014
Errors in file d:\oracle\product\10.2.0\admin\kemu3\bdump\kemu3_smon_29704.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-08103: object no longer exists
Fri Jul 18 22:02:51 2014
Errors in file d:\oracle\product\10.2.0\admin\kemu3\bdump\kemu3_smon_29704.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-08103: object no longer exists
Fri Jul 18 22:02:52 2014
Errors in file d:\oracle\product\10.2.0\admin\kemu3\bdump\kemu3_smon_29704.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-08103: object no longer exists
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=18, OS id=29876
Fri Jul 18 22:02:53 2014
Errors in file d:\oracle\product\10.2.0\admin\kemu3\bdump\kemu3_smon_29704.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-08103: object no longer exists
Fri Jul 18 22:02:54 2014
ORA-604 signalled during: alter database open...

对数据库启动过程做10046

PARSING IN CURSOR #22 len=210 dep=2 uid=0 oct=3 lid=0 tim=20960424464 hv=864012087 ad='3063f0b4'
select /*+ rule */ bucket_cnt, row_cnt, cache_cnt, null_cnt, timestamp#, sample_size, minimum,
maximum, distcnt, lowval, hival, density, col#, spare1, spare2, avgcln from hist_head$ where obj#=:1 and intcol#=:2
END OF STMT
EXEC #22:c=0,e=80,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=3,tim=20960424461
WAIT #22: nam='db file sequential read' ela= 5452 file#=1 block#=60213 blocks=1 obj#=4586 tim=20960429962
FETCH #22:c=0,e=5967,p=1,cr=1,cu=0,mis=0,r=0,dep=2,og=3,tim=20960430462
*** KEWUXS - encountered error: (ORA-00604: 递归 SQL 级别 2 出现错误
ORA-08103: 对象不再存在
)
*** kewrwdbi_1: Error=13515 encountered during run_once
BINDS #21:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=0a217744  bln=22  avl=01  flg=05
  value=0
 Bind#1
  oacdty=01 mxl=32(20) mxlc=00 mal=00 scl=00 pre=00
  oacflg=18 fl2=0001 frm=01 csi=852 siz=32 off=0
  kxsbbbfp=0a217718  bln=32  avl=20  flg=05
  value="WRI$_ADV_DEFINITIONS"
 Bind#2
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=0a2176f4  bln=24  avl=02  flg=05
  value=1

这里很明显数据库启动过程,由于hist_head$的file 1 block 60213中的object_id 与 data_object_id 不匹配,从而出现ORA-08103错误,导致数据库无法正常启动,这里的故障的对象为hist_head$,非oracle核心对象,因此直接标记该block 为坏块(模拟普通ORA-08103并解决,模拟极端ORA-08103并解决,rman制造坏块,bbed修复坏块,bbed破坏数据文件),然后启动数据库,备份hist_head$表数据,然后truncate hist_head$,再插入hist_head$,整体完工.
在数据库open过程中,如果遇到ora-8103错误,导致数据库无法正常open,可以对其做10046定位到故障block和对象,然后判断对象是否数据库启动必须的对象,甚至是bootstarp$中对象,然后采取不同的处理方法.

数据文件的三个创建SCN一点点探讨

在给一个朋友数据库恢复的过程中语句该库大量删除表空间,然后创建表空,由于在创建控制文件的时候,列出来不正确文件,导致出现v$datafile_header.error出现WRONG FILE CREATE错误.通过试验重现了该错误,并且进一步测试如果真的需要历史数据文件,该如何狸猫换太子(本实验为了进一步理解数据文件创建scn相关信息)
创建xifenfei表空间,然后删除表空间,但不删除数据文件,然后创建重名表空间

SQL> select to_char(sysdate,'yyyy-mm-dd hh24:mi:ss') today,'www.xifenfei.com' xifenfei from dual;
TODAY               XIFENFEI
------------------- ----------------
2014-07-16 15:54:26 www.xifenfei.com
SQL> create tablespace xifenfei datafile '/u01/app/oracle/oradata/ORCL/xifenfei_old.dbf' size 10m;
Tablespace created.
SQL> select file#,name from v$datafile;
     FILE# NAME
---------- --------------------------------------------------
         1 /u01/app/oracle/oradata/ORCL/system01.dbf
         2 /u01/app/oracle/oradata/ORCL/sysaux01.dbf
         3 /u01/app/oracle/oradata/ORCL/undotbs01.dbf
         4 /u01/app/oracle/oradata/ORCL/users01.dbf
         5 /u01/app/oracle/oradata/ORCL/xifenfei_old.dbf
SQL> select file#,CREATION_CHANGE#,to_char(CREATION_TIME,'yyyy-mm-dd hh24:mi:ss') CREATION_TIME from v$datafile;
     FILE# CREATION_CHANGE# CREATION_TIME
---------- ---------------- -------------------
         1               18 2014-07-14 21:53:05
         2             2338 2014-07-14 21:53:42
         3             3130 2014-07-14 21:53:51
         4            15268 2014-07-14 21:54:25
         5           593520 2014-07-16 16:00:54
SQL> select file#,CREATION_CHANGE#,to_char(CREATION_TIME,'yyyy-mm-dd hh24:mi:ss') CREATION_TIME from v$datafile_header;
     FILE# CREATION_CHANGE# CREATION_TIME
---------- ---------------- -------------------
         1               18 2014-07-14 21:53:05
         2             2338 2014-07-14 21:53:42
         3             3130 2014-07-14 21:53:51
         4            15268 2014-07-14 21:54:25
         5           593520 2014-07-16 16:00:54
SQL> drop tablespace xifenfei;
Tablespace dropped.
SQL>  create tablespace xifenfei datafile '/u01/app/oracle/oradata/ORCL/xifenfei_new.dbf' size 10m;
Tablespace created.
SQL> select file#,CREATION_CHANGE#,to_char(CREATION_TIME,'yyyy-mm-dd hh24:mi:ss') CREATION_TIME from v$datafile;
     FILE# CREATION_CHANGE# CREATION_TIME
---------- ---------------- -------------------
         1               18 2014-07-14 21:53:05
         2             2338 2014-07-14 21:53:42
         3             3130 2014-07-14 21:53:51
         4            15268 2014-07-14 21:54:25
         5           593613 2014-07-16 16:02:45
SQL> select file#,CREATION_CHANGE#,to_char(CREATION_TIME,'yyyy-mm-dd hh24:mi:ss') CREATION_TIME from v$datafile_header;
     FILE# CREATION_CHANGE# CREATION_TIME
---------- ---------------- -------------------
         1               18 2014-07-14 21:53:05
         2             2338 2014-07-14 21:53:42
         3             3130 2014-07-14 21:53:51
         4            15268 2014-07-14 21:54:25
         5           593613 2014-07-16 16:02:45

rename xifenfei表空间数据文件到老数据文件

SQL> alter database datafile 5 offline drop;
Database altered.
SQL> alter database rename file '/u01/app/oracle/oradata/ORCL/xifenfei_new.dbf'
 2   to '/u01/app/oracle/oradata/ORCL/xifenfei_old.dbf';
Database altered.
SQL> alter database datafile 5 online;
alter database datafile 5 online
*
ERROR at line 1:
ORA-01122: database file 5 failed verification check
ORA-01110: data file 5: '/u01/app/oracle/oradata/ORCL/xifenfei_old.dbf'
ORA-01203: wrong incarnation of this file - wrong creation SCN
SQL> select file#,CREATION_CHANGE#,to_char(CREATION_TIME,'yyyy-mm-dd hh24:mi:ss') CREATION_TIME from v$datafile;
     FILE# CREATION_CHANGE# CREATION_TIME
---------- ---------------- -------------------
         1               18 2014-07-14 21:53:05
         2             2338 2014-07-14 21:53:42
         3             3130 2014-07-14 21:53:51
         4            15268 2014-07-14 21:54:25
         5           593613 2014-07-16 16:02:45
SQL> select file#,CREATION_CHANGE#,to_char(CREATION_TIME,'yyyy-mm-dd hh24:mi:ss') CREATION_TIME from v$datafile_header;
     FILE# CREATION_CHANGE# CREATION_TIME
---------- ---------------- -------------------
         1               18 2014-07-14 21:53:05
         2             2338 2014-07-14 21:53:42
         3             3130 2014-07-14 21:53:51
         4            15268 2014-07-14 21:54:25
         5           593520 2014-07-16 16:00:54
SQL> select file#,error from v$datafile_header;
     FILE# ERROR
---------- -----------------------------------------------------------------
         1
         2
         3
         4
         5 WRONG FILE CREATE

至此今天数据库恢复的故障已经模拟出来,就是因为数据文件头的scn和控制文件中scn不一致,从而出现了v$datafile_header.error报WRONG FILE CREATE的现象.

因为控制文件中数据文件scn和数据文件头scn不一致,因此通过重建控制文件来实现两者scn一致

SQL> alter database backup controlfile to trace as '/tmp/ctl';
Database altered.
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> STARTUP NOMOUNT
ORACLE instance started.
Total System Global Area  718225408 bytes
Fixed Size                  2292432 bytes
Variable Size             373294384 bytes
Database Buffers          339738624 bytes
Redo Buffers                2899968 bytes
SQL> CREATE CONTROLFILE REUSE DATABASE "ORCL" NORESETLOGS  NOARCHIVELOG
  2      MAXLOGFILES 16
  3      MAXLOGMEMBERS 3
  4      MAXDATAFILES 100
  5      MAXINSTANCES 8
  6      MAXLOGHISTORY 292
  7  LOGFILE
  8    GROUP 1 '/u01/app/oracle/oradata/ORCL/redo01.log'  SIZE 50M BLOCKSIZE 512,
  9    GROUP 2 '/u01/app/oracle/oradata/ORCL/redo02.log'  SIZE 50M BLOCKSIZE 512,
 10    GROUP 3 '/u01/app/oracle/oradata/ORCL/redo03.log'  SIZE 50M BLOCKSIZE 512
 11  DATAFILE
 12    '/u01/app/oracle/oradata/ORCL/system01.dbf',
 13    '/u01/app/oracle/oradata/ORCL/sysaux01.dbf',
 14    '/u01/app/oracle/oradata/ORCL/undotbs01.dbf',
 15    '/u01/app/oracle/oradata/ORCL/users01.dbf',
 16    '/u01/app/oracle/oradata/ORCL/xifenfei_old.dbf'
 17  CHARACTER SET ZHS16GBK
 18  ;
Control file created.
SQL> select file#,CREATION_CHANGE#,to_char(CREATION_TIME,'yyyy-mm-dd hh24:mi:ss') CREATION_TIME from v$datafile_header;
     FILE# CREATION_CHANGE# CREATION_TIME
---------- ---------------- -------------------
         1               18 2014-07-14 21:53:05
         2             2338 2014-07-14 21:53:42
         3             3130 2014-07-14 21:53:51
         4            15268 2014-07-14 21:54:25
         5           593520 2014-07-16 16:00:54
SQL> select file#,CREATION_CHANGE#,to_char(CREATION_TIME,'yyyy-mm-dd hh24:mi:ss') CREATION_TIME from v$datafile;
     FILE# CREATION_CHANGE# CREATION_TIME
---------- ---------------- -------------------
         1               18 2014-07-14 21:53:05
         2             2338 2014-07-14 21:53:42
         3             3130 2014-07-14 21:53:51
         4            15268 2014-07-14 21:54:25
         5           593520 2014-07-16 16:00:54
SQL> select file#,error from v$datafile_header;
     FILE# ERROR
---------- -----------------------------------------------------------------
         1
         2
         3
         4
         5

通过重建控制文件消除了v$datafile_header.error报WRONG FILE CREATE错误,继续尝试online文件

SQL> recover datafile 5;
Media recovery complete.
SQL> alter database datafile 5 online;
Database altered.
SQL> select file#,name from v$datafile;
     FILE# NAME
---------- --------------------------------------------------
         1 /u01/app/oracle/oradata/ORCL/system01.dbf
         2 /u01/app/oracle/oradata/ORCL/sysaux01.dbf
         3 /u01/app/oracle/oradata/ORCL/undotbs01.dbf
         4 /u01/app/oracle/oradata/ORCL/users01.dbf
         5 /u01/app/oracle/oradata/ORCL/xifenfei_old.dbf
SQL> alter database open;
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-01177: data file does not match dictionary - probably old incarnation
ORA-01110: data file 5: '/u01/app/oracle/oradata/ORCL/xifenfei_old.dbf'
Process ID: 7437
Session ID: 7 Serial number: 5

出现这个错误,是由于数据库中,还有file$中也记录了数据文件创建scn,而这个scn现在和数据文件头和控制文件中的scn不相等,因此无法启动数据库成功.现在需要做的就是在数据库未启动状态下修改file$中的数据文件创建scn相关值,让其和数据文件头(控制文件中记录)一致

使用第三方工具定位file$记录

1|2|89600|0|1|4194302|1280|0|18||4194306|0x004000e9|0
2|2|70400|1|2|4194302|1280|0|2338||8388610|0x004000e9|1
3|2|25600|2|3|4194302|640|0|3130||12582914|0x004000e9|2
4|2|640|4|4|4194302|160|0|15268||16777218|0x004000e9|3
5|2|1280|7|5|0|0|0|593613||20971522|0x004000e9|4
6|1|3840|||0|0|0|586295||25165826|0x004000e9|5
7|1|3840|||3932160|1280|0|587030||29360130|0x004000e9|6
对应file$结构确定每列含义,以及确定需要修改的列
每行倒数第二列为rdba地址,可以通过转换为file and block,这里对应的就是file 1 block 233
每行最后一列为该条记录在该rdba中的记录顺序

使用工具修改593613为593520,使得file$中的scn与现在控制文件和数据文件头一致,具体参考bbed修改数据内容

修改好file$中数据文件创建scn后,尝试继续操作

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01113: file 5 needs media recovery
ORA-01110: data file 5: '/u01/app/oracle/oradata/ORCL/xifenfei_old.dbf'
SQL> recover datafile 5;
Media recovery complete.
SQL> alter database open;
Database altered.
SQL> select file#,name from v$datafile;
     FILE# NAME
---------- --------------------------------------------------
         1 /u01/app/oracle/oradata/ORCL/system01.dbf
         2 /u01/app/oracle/oradata/ORCL/sysaux01.dbf
         3 /u01/app/oracle/oradata/ORCL/undotbs01.dbf
         4 /u01/app/oracle/oradata/ORCL/users01.dbf
         5 /u01/app/oracle/oradata/ORCL/xifenfei_old.dbf

通过这里的简单测试,发现几个问题
1.v$datafile_header.error报WRONG FILE CREATE错误不一定就是数据文件异常,而其本质是数据文件头scn和控制文件中scn不一致
2.数据文件online需要file$,v$datafile_header,v$datafile中关于数据文件创建scn都一致
3.通过该分析,证明在一些极端情况下,考虑考虑该替换思路实现删除数据文件重新加入数据库

分享一次ORA-01113 ORA-01110故障处理过程

数据库报ORA-01113 ORA-01110无法正常open

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1: 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM01.DBF'

alert日志

Wed Jul 16 17:17:56 2014
ALTER DATABASE   MOUNT
Successful mount of redo thread 1, with mount id 1380870980
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: ALTER DATABASE   MOUNT
Wed Jul 16 17:19:01 2014
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_m000_3540.trc  (incident=367575):
ORA-00700: soft internal error, arguments: [dbgrmblcp_corrupt_page], [D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\metadata\HM_FINDING.ams],
[1245], [], [], [], [], [], [], [], [], []
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1: 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM01.DBF'
Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\incident\incdir_367575\orcl_m000_3540_i367575.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_m000_3540.trc  (incident=367576):
ORA-00600: internal error code, arguments: [dbgrmblgp_get_page_1], [1245], [0], [80], [], [], [], [], [], [], [], []
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1: 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM01.DBF'
Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\incident\incdir_367576\orcl_m000_3540_i367576.trc
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_m000_3540.trc:
ORA-51106: check failed to complete due to an error.  See error below
ORA-00600: internal error code, arguments: [dbgrmblgp_get_page_1], [1245], [0], [80], [], [], [], [], [], [], [], []
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1: 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM01.DBF'

这里出现ORA-00700[dbgrmblcp_corrupt_page]与ORA-00600[dbgrmblgp_get_page_1]是Bug 10321285 : ORA-600 [DBGRMBLCP_CORRUPT_PAGE]

尝试recover database 遭遇ORA-00283 ORA-16433

SQL> recover database;
ORA-00283: recovery session canceled due to errors
ORA-16433: The database must be opened in read/write mode.

alert日志信息

Wed Jul 16 17:36:33 2014
ALTER DATABASE RECOVER  database
Media Recovery Start
 started logmerger process
Wed Jul 16 17:36:33 2014
Media Recovery failed with error 16433
Slave exiting with ORA-283 exception
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_pr00_3868.trc:
ORA-00283: recovery session canceled due to errors
ORA-16433: The database must be opened in read/write mode.
Recovery Slave PR00 previously exited with exception 283
ORA-283 signalled during: ALTER DATABASE RECOVER  database  ...
Wed Jul 16 17:36:45 2014
alter database open
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_3236.trc:
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1: 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM01.DBF'
ORA-1113 signalled during: alter database open...

重建控制文件后恢复,遭遇ORA-600 2662错误

SQL> shutdown abort
ORACLE instance shut down.
SQL> startup nomount
ORACLE instance started.
Total System Global Area      10255212544 bytes
Fixed Size                        2264088 bytes
Variable Size                  4529849320 bytes
Database Buffers               5704253440 bytes
Redo Buffers                     18845696 bytes
SQL> @d:\ctl.sql
Control file created.
SQL> recover database using backup controlfile;
ORA-00279: change 13953689551797 generated at 07/16/2014 17:53:41 needed for
thread 1
ORA-00289: suggestion :
D:\APP\ADMINISTRATOR\FAST_RECOVERY_AREA\ORCL\ARCHIVELOG\2014_07_16\O1_MF_1_1_%U_
.ARC
ORA-00280: change 13953689551797 for thread 1 is in sequence #1
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
d:\app\administrator\oradata\orcl\redo01.log
Log applied.
Media recovery complete.
SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [2662], [3248], [3635774399],
[3249], [3580470066], [4194545], [], [], [], [], [], []
Process ID: 3932
Session ID: 200 Serial number: 1

因为该数据库是11.2.0.3未打上scn patch,直接推进scn(可以使用event,隐含参数,oradebug,bbed等),数据库就正常open

ORA-00600[17182],ORA-00600[25027],ORA-00600[kghfrempty:ds]故障处理

数据库不能启动(或者启动后马上crash),alert日志报错ORA-00600[17182],ORA-00600[25027],ORA-00600[kghfrempty:ds]等错误

Tue Jul 08 23:36:06 2014
alter database open
Beginning crash recovery of 1 threads
 parallel recovery started with 32 processes
Started redo scan
Completed redo scan
 read 1409 KB redo, 0 data blocks need recovery
Started redo application at
 Thread 1: logseq 2855, block 3
Recovery of Online Redo Log: Thread 1 Group 5 Seq 2855 Reading mem 0
  Mem# 0: /backup/oradata/ztmdb/redo05.log
Completed redo application of 0.00MB
Completed crash recovery at
 Thread 1: logseq 2855, block 2822, scn 104627804
 0 data blocks read, 0 data blocks written, 1409 redo k-bytes read
Tue Jul 08 23:36:12 2014
Thread 1 advanced to log sequence 2856 (thread open)
Thread 1 opened at log sequence 2856
  Current log# 1 seq# 2856 mem# 0: /backup/oradata/ztmdb/redo01.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Tue Jul 08 23:36:13 2014
SMON: enabling cache recovery
[15126] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:10340864 end:10340964 diff:100 (1 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
Errors in file /home/oracle/diag/rdbms/ztmdb/ztmdb/trace/ztmdb_smon_15100.trc  (incident=62061):
ORA-00600: internal error code, arguments: [17182], [0x2B6DFD23D7A0], [], [], [], [], [], [], [], [], [], []
Incident details in: /home/oracle/diag/rdbms/ztmdb/ztmdb/incident/incdir_62061/ztmdb_smon_15100_i62061.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
No Resource Manager plan active
Errors in file /home/oracle/diag/rdbms/ztmdb/ztmdb/trace/ztmdb_ora_15126.trc  (incident=62093):
ORA-00600: internal error code, arguments: [25027], [0], [875836979], [], [], [], [], [], [], [], [], []
Incident details in: /home/oracle/diag/rdbms/ztmdb/ztmdb/incident/incdir_62093/ztmdb_ora_15126_i62093.trc
Block recovery from logseq 2856, block 98 to scn 104627948
Recovery of Online Redo Log: Thread 1 Group 1 Seq 2856 Reading mem 0
  Mem# 0: /backup/oradata/ztmdb/redo01.log
Block recovery completed at rba 2856.99.16, scn 0.104627949
ORACLE Instance ztmdb (pid = 32) - Error 600 encountered while recovering transaction (14, 2) on object 15113.
Errors in file /home/oracle/diag/rdbms/ztmdb/ztmdb/trace/ztmdb_smon_15100.trc  (incident=62066):
ORA-00600: internal error code, arguments: [kghfrempty:ds], [0x2B6DFD23D790], [], [], [], [], [], [], [], [], [], []
ORA-07445: exception encountered: core dump [kghrst()+1835] [SIGSEGV] [ADDR:0x0] [PC:0x97EE385] [SI_KERNEL(general_protection)] []
ORA-00600: internal error code, arguments: [17182], [0x2B6DFD23D7A0], [], [], [], [], [], [], [], [], [], []
Incident details in: /home/oracle/diag/rdbms/ztmdb/ztmdb/incident/incdir_62066/ztmdb_smon_15100_i62066.trc
PMON (ospid: 14978): terminating the instance due to error 474
Tue Jul 08 23:36:26 2014
System state dump requested by (instance=1, osid=14978 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /home/oracle/diag/rdbms/ztmdb/ztmdb/trace/ztmdb_diag_14996_20140708233626.trc
Dumping diagnostic data in directory=[cdmp_20140708233626], requested by (instance=1, osid=14978 (PMON)), summary=[abnormal instance termination].
Tue Jul 08 23:36:27 2014
Instance terminated by PMON, pid = 14978

这里的错误比较明显,数据库进行完前滚后,smon进行回滚发现有部分block CORRUPTED 导致该问题,解决该问题思路就是:
1.设置event 屏蔽回滚
2.如果1不行,设置回滚段相关隐含参数,屏蔽回滚
3.逻辑方式重建数据库

补充知识点
ORA-600 [25027]

ERROR:
  Format: ORA-600 [25027] [a] [b]
VERSIONS:
  versions 9.2 and above
ARGUMENTS:
  Arg [a]  Tablespace Number (TSN)
  Arg [b]  Decimal Relative Data Block Address (RDBA)

作者：admin