又一例ORA-600 kcbzpbuf_1恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:又一例ORA-600 kcbzpbuf_1恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

数据库突然报ORA-600 kdddgb1和ORA-600 kcl_snd_cur_2错误,并且导致实例crash

Tue May 09 22:29:40 2023
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_338012.trc  (incident=962050):
ORA-00600: internal error code, arguments: [kdddgb1], [0], [], [], [], [], [], [], [], [], [], []
Incident details in: /oracle/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_962050/orcl1_ora_338012_i962050.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Tue May 09 22:29:43 2023
Hex dump of (file 75, block 1154926) in trace file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_lms3_217928.trc
Corrupt block relative dba: 0x12d19f6e (file 75, block 1154926)
Bad header found during preparing block for transfer
Data in bad block:
 type: 0 format: 2 rdba: 0x1affe051
 last change scn: 0x0009.a2266e65 seq: 0x2 flg: 0x10
 spare1: 0x83 spare2: 0x36 spare3: 0x3700
 consistency value in tail: 0x6e650002
 check value in block header: 0x0
 block checksum disabled
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_lms3_217928.trc  (incident=960186):
ORA-00600: internal error code, arguments: [kcl_snd_cur_2], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /oracle/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_960186/orcl1_lms3_217928_i960186.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Tue May 09 22:29:43 2023
Sweep [inc][962050]: completed
Sweep [inc][960186]: completed
Sweep [inc2][962050]: completed
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_lms3_217928.trc:
ORA-00600: internal error code, arguments: [kcl_snd_cur_2], [], [], [], [], [], [], [], [], [], [], []
LMS3 (ospid: 217928): terminating the instance due to error 484
System state dump requested by (instance=1, osid=217928 (LMS3)), summary=[abnormal instance termination].
System State dumped to trace file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_diag_217897_20230509222949.trc
Tue May 09 22:29:52 2023
ORA-1092 : opitsk aborting process
Tue May 09 22:29:53 2023
ORA-1092 : opitsk aborting process
Tue May 09 22:29:54 2023
Instance terminated by LMS3, pid = 217928

另外一个正在运行的实例做instance recovery,然后节点报ORA-600 kcbzpbuf_1,节点也crash,再次启动一直该错误无法正常启动.

Wed May 10 08:17:07 2023
Hex dump of (file 75, block 1154926) in trace file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_dbw9_134621.trc
Corrupt block relative dba: 0x12d19f6e (file 75, block 1154926)
Bad header found during preparing block for write
Data in bad block:
 type: 0 format: 2 rdba: 0x1affe051
 last change scn: 0x0009.a2266e65 seq: 0x2 flg: 0x34
 spare1: 0x83 spare2: 0x36 spare3: 0x3700
 consistency value in tail: 0x6e650002
 check value in block header: 0xf894
 computed block checksum: 0x0
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_dbw9_134621.trc  (incident=2240402):
ORA-00600: internal error code, arguments: [kcbzpbuf_1], [4], [1], [], [], [], [], [], [], [], [], []
Incident details in: /oracle/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_2240402/orcl1_dbw9_134621_i2240402.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_dbw9_134621.trc:
ORA-00600: internal error code, arguments: [kcbzpbuf_1], [4], [1], [], [], [], [], [], [], [], [], []
DBW9 (ospid: 134621): terminating the instance due to error 471
Wed May 10 08:17:08 2023
System state dump requested by (instance=1, osid=134621 (DBW9)), summary=[abnormal instance termination].
System State dumped to trace file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_diag_134555_20230510081708.trc
Instance terminated by DBW9, pid = 134621

尝试直接recover datafile 75失败,报ORA-03113

SQL> recover datafile 75;
ORA-03113: end-of-file on communication channel
Process ID: 281304
Session ID: 14161 Serial number: 1503

dbv检查file 75,发现15个block逻辑坏块

[oracle@oradb21 ~]$ dbv userid=xxx/xxx file=+datadg/orcl/datafile/xifenfei01.377.1130539753

DBVERIFY: Release 11.2.0.4.0 - Production on Wed May 10 08:29:44 2023

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = +datadg/orcl/datafile/xifenfei01.377.1130539753
Block Checking: DBA = 314866909, Block Type = KTB-managed data block
data header at 0x7f852b573064
kdbchk: row locked by non-existent transaction
        table=0   slot=13
        lockid=101   ktbbhitc=2
Page 294109 failed with check code 6101
Block Checking: DBA = 314866928, Block Type = KTB-managed data block
data header at 0x7f852b599064
kdbchk: row locked by non-existent transaction
        table=0   slot=18
        lockid=101   ktbbhitc=2
Page 294128 failed with check code 6101
Block Checking: DBA = 315415269, Block Type = KTB-managed data block
data header at 0x7f852b583064
kdbchk: the amount of space used is not equal to block size
        used=7470 fsc=0 avsp=625 dtl=8088
Page 842469 failed with check code 6110
Block Checking: DBA = 315415302, Block Type = KTB-managed data block
data header at 0x7f852b3c3064
kdbchk: row locked by non-existent transaction
        table=0   slot=13
        lockid=101   ktbbhitc=2
Page 842502 failed with check code 6101
Block Checking: DBA = 315415350, Block Type = KTB-managed data block
data header at 0x7f852b423064
kdbchk: row locked by non-existent transaction
        table=0   slot=14
        lockid=101   ktbbhitc=2
Page 842550 failed with check code 6101
Block Checking: DBA = 315415351, Block Type = KTB-managed data block
data header at 0x7f852b425064
kdbchk: row locked by non-existent transaction
        table=0   slot=10
        lockid=101   ktbbhitc=2
Page 842551 failed with check code 6101
Block Checking: DBA = 315415397, Block Type = KTB-managed data block
data header at 0x7f852b481064
kdbchk: row locked by non-existent transaction
        table=0   slot=14
        lockid=101   ktbbhitc=2
Page 842597 failed with check code 6101
Block Checking: DBA = 315415414, Block Type = KTB-managed data block
data header at 0x7f852b4a3064
kdbchk: row locked by non-existent transaction
        table=0   slot=14
        lockid=101   ktbbhitc=2
Page 842614 failed with check code 6101
Block Checking: DBA = 315665300, Block Type = KTB-managed data block
data header at 0x7f852b2dd0ac
kdbchk: the amount of space used is not equal to block size
        used=7191 fsc=0 avsp=832 dtl=8016
Page 1092500 failed with check code 6110
Block Checking: DBA = 315665302, Block Type = KTB-managed data block
data header at 0x7f852b2e10ac
kdbchk: row locked by non-existent transaction
        table=0   slot=14
        lockid=101   ktbbhitc=5
Page 1092502 failed with check code 6101
Block Checking: DBA = 315665316, Block Type = KTB-managed data block
data header at 0x7f852b2fd0ac
kdbchk: the amount of space used is not equal to block size
        used=7140 fsc=0 avsp=883 dtl=8016
Page 1092516 failed with check code 6110
Block Checking: DBA = 315665491, Block Type = KTB-managed data block
data header at 0x7f852f4170c4
kdbchk: row locked by non-existent transaction
        table=0   slot=3
        lockid=101   ktbbhitc=6
Page 1092691 failed with check code 6101
Block Checking: DBA = 315727518, Block Type = KTB-managed data block
data header at 0x7f852b4f50c4
kdbchk: row locked by non-existent transaction
        table=0   slot=8
        lockid=101   ktbbhitc=6
Page 1154718 failed with check code 6101
Block Checking: DBA = 315727614, Block Type = KTB-managed data block
data header at 0x7f852b5b50ac
kdbchk: row locked by non-existent transaction
        table=0   slot=15
        lockid=101   ktbbhitc=5
Page 1154814 failed with check code 6101
Block Checking: DBA = 315727646, Block Type = KTB-managed data block
data header at 0x7f852b3f30ac
kdbchk: row locked by non-existent transaction
        table=0   slot=3
        lockid=101   ktbbhitc=5
Page 1154846 failed with check code 6101


DBVERIFY - Verification complete

Total Pages Examined         : 1835008
Total Pages Processed (Data) : 250749
Total Pages Failing   (Data) : 15
Total Pages Processed (Index): 74532
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 1244181
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 265546
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 2720428335 (9.2720428335)

通过对坏块一些处理,数据库open成功,以前有过类似恢复ORA-600 kcbzpbuf_1故障恢复

SQL> alter database open;

Database altered.

alert日志报事务异常

ORACLE Instance orcl1 (pid = 34) - Error 1578 encountered while recovering transaction (697, 6) on object 170692.
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_smon_301450.trc:
ORA-01578: ORACLE data block corrupted (file # 75, block # 1154926)
ORA-01110: data file 75: '+DATADG/orcl/datafile/xifenfei01.377.1130539753'
Archived Log entry 9299 added for thread 1 sequence 4781 ID 0x5f4a1865 dest 1:
Wed May 10 08:24:03 2023
NOTE: dependency between database orcl and diskgroup resource ora.ARCHDG.dg is established
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Wed May 10 08:24:04 2023
Starting background process EMNC
Wed May 10 08:24:04 2023
EMNC started with pid=49, OS id=305303 
Archived Log entry 9300 added for thread 2 sequence 4530 ID 0x5f4a1865 dest 1:
ARC2: Archiving disabled thread 2 sequence 4531
Archived Log entry 9301 added for thread 2 sequence 4531 ID 0x5f4a1865 dest 1:
Wed May 10 08:24:13 2023
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_p000_305307.trc  (incident=2560578):
ORA-01578: ORACLE data block corrupted (file # 75, block # 1154926)
ORA-01110: data file 75: '+DATADG/orcl/datafile/xifenfei01.377.1130539753'
Incident details in: /oracle/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_2560578/orcl1_p000_305307_i2560578.trc
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_p000_305307.trc  (incident=2560579):
ORA-01578: ORACLE data block corrupted (file # , block # )
Incident details in: /oracle/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_2560579/orcl1_p000_305307_i2560579.trc
Wed May 10 08:24:15 2023
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_smon_301450.trc  (incident=2560427):
ORA-01578: ORACLE data block corrupted (file # 75, block # 1154926)
ORA-01110: data file 75: '+DATADG/orcl/datafile/xifenfei01.377.1130539753'
Incident details in: /oracle/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_2560427/orcl1_smon_301450_i2560427.trc
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_smon_301450.trc  (incident=2560432):
ORA-01578: ORACLE data block corrupted (file # 75, block # 1154926)
ORA-01110: data file 75: '+DATADG/orcl/datafile/xifenfei01.377.1130539753'
ORACLE Instance orcl1 (pid = 34) - Error 1578 encountered while recovering transaction (717, 20) on object 170692.
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_smon_301450.trc:
ORA-01578: ORACLE data block corrupted (file # 75, block # 1154926)
ORA-01110: data file 75: '+DATADG/orcl/datafile/xifenfei01.377.1130539753'

处理异常事务,并且定位异常对象表

SQL> select owner,object_name,object_type from dba_objects where object_id=170692;

OWNER
--------------------------------------------------------------------------------
OBJECT_NAME
--------------------------------------------------------------------------------
OBJECT_TYPE
---------------------------------------------------------
XFF
T_XIFENFEI
TABLE

rman检测逻辑坏块所属对象也是这个表(15个坏块均为该表),对该表数据进行重建抛弃损坏数据,完成本次恢复

ORA-600 kcbzpbuf_1故障恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-600 kcbzpbuf_1故障恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

数据库启动报错ORA-03113

SQL> startup;
ORACLE instance started.

Total System Global Area 5.1310E+10 bytes
Fixed Size                  2265224 bytes
Variable Size            1.8119E+10 bytes
Database Buffers         3.3152E+10 bytes
Redo Buffers               36069376 bytes
Database mounted.

ORA-03113: end-of-file on communication channel
Process ID: 117892
Session ID: 568 Serial number: 3

分析alert日志发现ORA-600 kcbzpbuf_1报错

Serial Media Recovery started
Recovery of Online Redo Log: Thread 1 Group 4 Seq 4744 Reading mem 0
  Mem# 0: /home/oradata/redo04.log
Recovery of Online Redo Log: Thread 1 Group 1 Seq 4745 Reading mem 0
  Mem# 0: /home/oradata/redo01.log
Wed Jan 11 14:44:35 2023
Hex dump of (file 87, block 3143379) in trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_dbw0_116740.trc
Corrupt block relative dba: 0x15eff6d3 (file 87, block 3143379)
Bad header found during preparing block for write
Data in bad block:
 type: 0 format: 2 rdba: 0x00000000
 last change scn: 0x0b7e.593518d5 seq: 0x1 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x18d50001
 check value in block header: 0x342b
 computed block checksum: 0x0
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_dbw0_116740.trc  (incident=553128):
ORA-00600: internal error code, arguments: [kcbzpbuf_1], [4], [1], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_553128/orcl_dbw0_116740_i553128.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_dbw0_116740.trc:
ORA-00600: internal error code, arguments: [kcbzpbuf_1], [4], [1], [], [], [], [], [], [], [], [], []
DBW0 (ospid: 116740): terminating the instance due to error 471
Wed Jan 11 14:44:36 2023
System state dump requested by (instance=1, osid=116740 (DBW0)), summary=[abnormal instance termination].
Instance terminated by DBW0, pid = 116740

错误比较明显,在应用日志的时候,redo和数据文件的block不匹配,从而出现Corrupt block relative dba: 0x15eff6d3 (file 87, block 3143379)问题,通过bbed对该block进行修复,数据库直接recover成功

RMAN> recover database;

Starting recover at 2023-01-11 14:53:44
using channel ORA_DISK_1

starting media recovery
media recovery complete, elapsed time: 00:00:01

Finished recover at 2023-01-11 14:53:45

数据库open成功

SQL> alter database open;

Database altered.

数据库报ORACLE Instance orcl (pid = 14)类似错误

Thread 1 opened at log sequence 4745
  Current log# 1 seq# 4745 mem# 0: /home/oradata/redo01.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Wed Jan 11 14:54:10 2023
SMON: enabling cache recovery
[108954] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:2313624 end:2313634 diff:10 (0 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_110633.trc  (incident=577160):
ORA-01578: ORACLE data block corrupted (file # 87, block # 3143379)
ORA-01110: data file 87: '/home/oradata/xifenfei04.dbf'
No Resource Manager plan active
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Wed Jan 11 14:54:10 2023
QMNC started with pid=80, OS id=114315
Completed: alter database open
Wed Jan 11 14:54:10 2023
db_recovery_file_dest_size of 4182 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
ORACLE Instance orcl (pid = 14) - Error 1578 encountered while recovering transaction (10, 0) on object 156475.
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_110633.trc:
ORA-01578: ORACLE data block corrupted (file # 87, block # 3143379)
ORA-01110: data file 87: '/home/oradata/xifenfei04.dbf'

对其异常对象进行分析,确认是回收站对象,清理回收站
20230111181445


数据库后续运行正常【alert日志没有其他报错】,该恢复完成,业务数据可以直接使用,数据0丢失
20230111181642

记录一次由于坏块和不恰当恢复引起各种ORA-600案例

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:记录一次由于坏块和不恰当恢复引起各种ORA-600案例

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

朋友让我帮忙处理一个不能open的库,打开alert日志一看,傻眼了,里面是各种ORA-600的错误应有尽有,被折腾的够惨
故障后重启,无法启动主要表现在block坏块,引起的各种ORA-600等错误

Mon Mar 02 16:09:27 2015
ALTER DATABASE OPEN
Beginning crash recovery of 1 threads
 parallel recovery started with 23 processes
Started redo scan
Completed redo scan
 read 962 KB redo, 256 data blocks need recovery
Started redo application at
 Thread 1: logseq 726, block 37343
Recovery of Online Redo Log: Thread 1 Group 3 Seq 726 Reading mem 0
  Mem# 0: /u01/app/oracle/oradata/oa/redo03.log
Mon Mar 02 16:09:27 2015
RECOVERY OF THREAD 1 STUCK AT BLOCK 1673 OF FILE 3
Completed redo application of 0.27MB
Mon Mar 02 16:09:27 2015
RECOVERY OF THREAD 1 STUCK AT BLOCK 3104 OF FILE 3
Mon Mar 02 16:09:27 2015
RECOVERY OF THREAD 1 STUCK AT BLOCK 3613 OF FILE 3
Mon Mar 02 16:09:28 2015
RECOVERY OF THREAD 1 STUCK AT BLOCK 272 OF FILE 3
Mon Mar 02 16:09:28 2015
RECOVERY OF THREAD 1 STUCK AT BLOCK 2512 OF FILE 3
Hex dump of (file 2, block 92889) in trace file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_dbw2_4158.trc
Corrupt block relative dba: 0x00816ad9 (file 2, block 92889)
Bad header found during preparing block for write
Data in bad block:
 type: 0 format: 0 rdba: 0x6ad90000
 last change scn: 0x0000.00c6a052 seq: 0x1 flg: 0x00
 spare1: 0x6 spare2: 0xa2 spare3: 0x5d7e
 consistency value in tail: 0xa0520001
 check value in block header: 0x0
 block checksum disabled
Mon Mar 02 16:09:28 2015
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p007_4196.trc  (incident=3833):
ORA-00600: internal error code, arguments: [4502], [1], [], [], [], [], [], [], [], [], [], []
Mon Mar 02 16:09:28 2015
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p013_4208.trc  (incident=3881):
ORA-00600: internal error code, arguments: [2037], [4259067], [4244307968], [159], [243], [0], [2162032704], [100728832], [], [], [], []
Slave exiting with ORA-1172 exception
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p009_4200.trc:
ORA-01172: recovery of thread 1 stuck at block 3613 of file 3
ORA-01151: use media recovery to recover block, restore backup if needed
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p001_4184.trc:
ORA-01172: recovery of thread 1 stuck at block 2512 of file 3
ORA-01151: use media recovery to recover block, restore backup if needed
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p021_4224.trc:
ORA-10388: parallel query server interrupt (failure)
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p021_4224.trc:
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_dbw2_4158.trc  (incident=3697):
ORA-00600: internal error code, arguments: [kcbzpbuf_1], [4], [1], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/oa/oa/incident/incdir_3697/oa_dbw2_4158_i3697.trc
Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0xD2DDB7, kcbs_shrink_pool()+705] [flags: 0x0, count: 1]
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_mman_4152.trc  (incident=3673):
ORA-07445: exception encountered: core dump [kcbs_shrink_pool()+705] [SIGSEGV] [ADDR:0x0] [PC:0xD2DDB7] [SI_KERNEL(general_protection)] []
Incident details in: /u01/app/oracle/diag/rdbms/oa/oa/incident/incdir_3673/oa_mman_4152_i3673.trc
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_dbw2_4158.trc:
Mon Mar 02 16:09:34 2015
Instance terminated by DBW2, pid = 4158

第二次重启后增加新错误ORA-00600[17182]

Mon Mar 02 16:39:50 2015
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p002_4321.trc  (incident=4993):
ORA-00600: internal error code, arguments: [17182], [0x7F548C2BDBA8], [], [], [], [], [], [], [], [], [], []

进行了一些恢复处理后,日志中报错
主要体现在进行了不完全恢复,而且应该是对redo进行了重命名或者redo头损坏锁引起的一系列提示

Beginning crash recovery of 1 threads
Started redo scan
Completed redo scan
 read 962 KB redo, 256 data blocks need recovery
Started redo application at
 Thread 1: logseq 726, block 37343
Recovery of Online Redo Log: Thread 1 Group 3 Seq 726 Reading mem 0
  Mem# 0: /u01/app/oracle/oradata/oa/redo03.log
RECOVERY OF THREAD 1 STUCK AT BLOCK 1673 OF FILE 3
Aborting crash recovery due to error 1172
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_6644.trc:
ORA-01172: recovery of thread 1 stuck at block 1673 of file 3
ORA-01151: use media recovery to recover block, restore backup if needed
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_6644.trc:
ORA-01172: recovery of thread 1 stuck at block 1673 of file 3
ORA-01151: use media recovery to recover block, restore backup if needed
ORA-1172 signalled during: alter  database open...
Tue Mar 03 11:17:59 2015
Sweep [inc][17178]: completed
Sweep [inc][17177]: completed
Sweep [inc2][17178]: completed
Tue Mar 03 11:18:00 2015
ALTER DATABASE RECOVER  database until cancel
Media Recovery Start
 started logmerger process
Parallel Media Recovery started with 24 slaves
ORA-279 signalled during: ALTER DATABASE RECOVER  database until cancel  ...
ALTER DATABASE RECOVER    CONTINUE DEFAULT
Tue Mar 03 11:18:06 2015
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_pr00_6701.trc:
ORA-00266: name of archived log file needed
ORA-266 signalled during: ALTER DATABASE RECOVER    CONTINUE DEFAULT  ...
ALTER DATABASE RECOVER CANCEL
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_pr00_6701.trc:
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/u01/app/oracle/oradata/oa/system01.dbf'
Slave exiting with ORA-1547 exception
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_pr00_6701.trc:
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/u01/app/oracle/oradata/oa/system01.dbf'
ORA-10879 signalled during: ALTER DATABASE RECOVER CANCEL ...
Tue Mar 03 11:18:06 2015
Checker run found 4 new persistent data failures
Tue Mar 03 11:18:13 2015
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 12986989
Resetting resetlogs activation ID 3278679642 (0xc36cae5a)
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_6644.trc:
ORA-00367: checksum error in log file header
ORA-00322: log 1 of thread 1 is not current copy
ORA-00312: online log 1 thread 1: '/u01/app/oracle/oradata/oa/redo01.log'
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_6644.trc:

再一步折腾,增加了_allow_resetlogs_corruption= TRUE之后数据库报ORA-600[2662]

Tue Mar 03 11:19:26 2015
SMON: enabling cache recovery
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_6864.trc  (incident=18195):
ORA-00600: internal error code, arguments: [2662], [0], [13007002], [0], [13016626], [4194545], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/oa/oa/incident/incdir_18195/oa_ora_6864_i18195.trc
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_6864.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [2662], [0], [13007002], [0], [13016626], [4194545], [], [], [], [], [], []
Error 704 happened during db open, shutting down database
USER (ospid: 6864): terminating the instance due to error 704
Instance terminated by USER, pid = 6864
ORA-1092 signalled during: alter database open...
opiodr aborting process unknown ospid (6864) as a result of ORA-1092
Tue Mar 03 11:19:29 2015
ORA-1092 : opitsk aborting process

进一步折腾,可以看出来undo已经被其offline,无法正常访问,导致系统报ORA-704和ORA-00376

Wed Mar 04 21:10:58 2015
SMON: enabling cache recovery
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_17074.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/app/oracle/oradata/oa/undotbs01.dbf'
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_17074.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/app/oracle/oradata/oa/undotbs01.dbf'
Error 704 happened during db open, shutting down database
USER (ospid: 17074): terminating the instance due to error 704
Instance terminated by USER, pid = 17074
ORA-1092 signalled during: alter database open...
opiodr aborting process unknown ospid (17074) as a result of ORA-1092
Wed Mar 04 21:11:00 2015
ORA-1092 : opitsk aborting process

通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)检测结果见附件(xifenfei_db_recover_20150304),这里可以知道undo 不知道怎么折腾的数据文件scn较大而且还offline,
通过一些列方法(bbed,隐含参数等)调整数据库scn,强制启动数据库,报如下错误

Wed Mar 04 22:50:23 2015
SMON: enabling cache recovery
ORA-01555 caused by SQL statement below (SQL ID: 3nkd3g3ju5ph1, SCN: 0x0000.4000003e):
select obj#,type#,ctime,mtime,stime, status, dataobj#, flags, oid$, spare1, spare2 from obj$ where owner#=:1 and name=:2 and namespace=:3 and remoteowner is null and linkname is null and subname is null
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_17807.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-01555: snapshot too old: rollback segment number 10 with name "_SYSSMU10_3550978943$" too small
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_17807.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-01555: snapshot too old: rollback segment number 10 with name "_SYSSMU10_3550978943$" too small
Error 704 happened during db open, shutting down database
USER (ospid: 17807): terminating the instance due to error 704
Instance terminated by USER, pid = 17807
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (17807) as a result of ORA-1092

根据经验,该错误怀疑是文件头scn不够大,块延迟清理导致,进一步增加scn尝试,最后依旧是ORA-00704/ORA-00604/ORA-01555错误

Wed Mar 04 22:50:23 2015
SMON: enabling cache recovery
ORA-01555 caused by SQL statement below (SQL ID: 3nkd3g3ju5ph1, SCN: 0x0000.4000003e):
select obj#,type#,ctime,mtime,stime, status, dataobj#, flags, oid$, spare1, spare2 from obj$ where owner#=:1 and name=:2 and namespace=:3 and remoteowner is null and linkname is null and subname is null
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_17807.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-01555: snapshot too old: rollback segment number 10 with name "_SYSSMU10_3550978943$" too small
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_17807.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-01555: snapshot too old: rollback segment number 10 with name "_SYSSMU10_3550978943$" too small
Error 704 happened during db open, shutting down database
USER (ospid: 17807): terminating the instance due to error 704
Instance terminated by USER, pid = 17807
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (17807) as a result of ORA-1092

根据经验,在scn上做手脚估计难以解决给问题,对其启动过程做10046和errorstack分析发现

PARSING IN CURSOR #3 len=202 dep=2 uid=0 oct=3 lid=0 tim=1425481940448439 hv=3819099649 ad='64ff91af8' sqlid='3nkd3g3ju5ph1'
select obj#,type#,ctime,mtime,stime, status, dataobj#, flags, oid$, spare1, spare2 from obj$ where owner#=:1 and name=:2 and namespace=:3 and remoteowner is null and linkname is null and subname is null
END OF STMT
PARSE #3:c=1000,e=334,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,plh=0,tim=1425481940448439
BINDS #3:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7f5b3253a6f0  bln=22  avl=01  flg=05
  value=0
 Bind#1
  oacdty=01 mxl=32(06) mxlc=00 mal=00 scl=00 pre=00
  oacflg=18 fl2=0001 frm=01 csi=852 siz=32 off=0
  kxsbbbfp=7f5b3253a6b8  bln=32  avl=06  flg=05
  value="PROPS$"
 Bind#2
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7f5b3253a688  bln=24  avl=02  flg=05
  value=1
EXEC #3:c=0,e=640,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,plh=2853959010,tim=1425481940449147
WAIT #3: nam='db file sequential read' ela= 5 file#=1 block#=345 blocks=1 obj#=37 tim=1425481940449186
WAIT #3: nam='db file sequential read' ela= 4 file#=1 block#=44528 blocks=1 obj#=37 tim=1425481940449221
WAIT #3: nam='db file sequential read' ela= 3 file#=1 block#=5505 blocks=1 obj#=37 tim=1425481940449247
*** 2015-03-04 23:12:20.450
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=3, mask=0x0)
----- Error Stack Dump -----
ORA-00604: error occurred at recursive SQL level 2
ORA-01555: snapshot too old: rollback segment number 10 with name "_SYSSMU10_3550978943$" too small
----- Current SQL Statement for this session (sql_id=g64r07v2jn8nq) -----
SELECT NULL FROM PROPS$ WHERE NAME='BOOTSTRAP_UPGRADE_ERROR'

这里可以发现是数据库在启动的过程中需要执行SELECT NULL FROM PROPS$ WHERE NAME=’BOOTSTRAP_UPGRADE_ERROR’语句,而该语句递归调用了select obj#,type#,ctime,mtime,stime, status, dataobj#, flags, oid$, spare1, spare2 from obj$ where owner#=:1 and name=:2 and namespace=:3 and remoteowner is null and linkname is null and subname is null 语句。既然这样通过一些方法避免数据库启动之时查询SELECT NULL FROM PROPS$ WHERE NAME=’BOOTSTRAP_UPGRADE_ERROR’语句,果然数据库启动成功。

知识点补充
ORA-600 [4502] [a]

Arg [a] ITL entry with a lock count
Meaning: During ITL cleanout we clear all row locks but the ITL entry
	 still thinks there is an uncleared lock. Ie: ITL has a locked
	 row but there are no locked rows in the block

大体意思是数据库发现undo 的itl已经被清除,但是block中的itl依然存在,从而出现ORA-600[4502],引起该问题除bug外主要原因是坏块

ORA-600 [2037] [a] [b] {c} [d] [e] [f] [g]

Arg [a] Relative Data Block Address (RDBA) that the redo vector is for
Arg [b] The Block format
Arg {c} RDBA in the block itself
Arg [d] The block type
Arg [e] The sequence number
Arg [f] Flags, if set
Arg [g] The return value from the block head/tail checker.
DESCRIPTION:
  During recovery we are examining a block to ensure that it is not
  corrupt prior to applying any change vectors.
  The block has failed this check and this exception is raised

大体意思是在恢复过程中,正在检查的块,以确保它在应用任何变化向量之前不损坏。如果检查失败排除该异常ORA-600[2037],引起该问题除bug外主要原因是坏块

ORA-600 [kcbzpbuf_1],[a],[b]

Arg [a] Corruption reason
Arg [b] Calculate checksum flag
Corruption reason:
#define KCBH_GOOD    0                                     /* block is valid */
#define KCBH_ZERO    1             /* block header was entirely zero on disk */
#define KCBH_BROKEN  2      /* corruption could be from a partial disk write */
#define KCBH_CHKVAL  3               /* The check value for the block failed */
#define KCBH_CORRUPT 4     /* this is the wrong block or is not a data block */
#define KCBH_ZERONG  5               /* all zero block and it is not allowed */
Calculate checksum flag:
The possible values are 1 (Generate Checksum - db_block_checksum is enabled - default value)
                        0 (do not generate checksum - db_block_checksum=false)

kcbzpbuf_1是该错误的源码函数

ORA-600 [17182] [a] [b] {c} [d] [e]

DESCRIPTION:
  Oracle has detected that the magic number in a memory chunk header has been overwritten.
  This is a heap (in memory) corruption and there is no underlying data corruption.
  The error may occur in the one of the process specific heaps
  (the Call heap, PGA heap, or session heap) or in the shared heap (SGA).

ORACLE 发现在内存中重要的块头被重新,但是没有基础数据损坏,大部分和数据块或者内存损坏有关系.

ORA-600 [4552] [a] [b] {c} [d] [e]

DESCRIPTION:
  This assertion is raised because we are trying to unlock the rows in a
  block, but receive an incorrect block type.
  The second argument is the block type received.

ORACLE尝试对某行进行解锁但是接收到了不正确的数据块类型,Arg [b]是接收到的数据块类型

ORA-600 [2662] [a] [b] {c} [d] [e]

DESCRIPTION:
  A data block SCN is ahead of the current SCN.
  The ORA-600 [2662] occurs when an SCN is compared to the dependent SCN
  stored in a UGA variable.
  If the SCN is less than the dependent SCN then we signal the ORA-600 [2662]
  internal error.
ARGUMENTS:
  Arg [a]  Current SCN WRAP
  Arg [b]  Current SCN BASE
  Arg {c}  dependent SCN WRAP
  Arg [d]  dependent SCN BASE
  Arg [e]  Where present this is the DBA where the dependent SCN came from.

主要的含义就是oracle文件头scn比某个block dependent scn小从而出现该问题