ORA-01555 – Database SOS

Oracle 19C 报ORA-704 ORA-01555故障处理

异常断电导致数据库无法启动,尝试对数据文件进行recover操作,报ORA-00283 ORA-00742 ORA-00312错误,由于redo写丢失无法正常应用

D:\check_db>sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on 星期日 7月 30 07:49:19 2023
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


连接到:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL> recover datafile 1;
ORA-00283: 恢复会话因错误而取消
ORA-00742: 日志读取在线程 1 序列 9274 块 18057 中检测到写入丢失情况
ORA-00312: 联机日志 1 线程 1: 'D:\APP\ADMINISTRATOR\ORADATA\XFF\REDO01.LOG'

屏蔽数据一致性,尝试强制打开库,报ORA-00604,ORA-00704,ORA-01555错误

SQL> alter database open resetlogs;
alter database open resetlogs
*
第 1 行出现错误:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 9 with name
"_SYSSMU9_4165470211$" too small
进程 ID: 4036
会话 ID: 2277 序列号: 40707

alert日志对应错误

2023-07-30T06:54:43.457383+08:00
.... (PID:5836): Clearing online redo logfile 1 complete
.... (PID:5836): Clearing online redo logfile 2 complete
.... (PID:5836): Clearing online redo logfile 3 complete
Resetting resetlogs activation ID 3572089731 (0xd4e9c383)
Online log D:\APP\ADMINISTRATOR\ORADATA\XFF\REDO01.LOG: Thread 1 Group 1 was previously cleared
Online log D:\APP\ADMINISTRATOR\ORADATA\XFF\REDO02.LOG: Thread 1 Group 2 was previously cleared
Online log D:\APP\ADMINISTRATOR\ORADATA\XFF\REDO03.LOG: Thread 1 Group 3 was previously cleared
2023-07-30T06:54:43.863676+08:00
Setting recovery target incarnation to 2
2023-07-30T06:54:44.816771+08:00
Ping without log force is disabled:
  instance mounted in exclusive mode.
Endian type of dictionary set to little
2023-07-30T06:54:44.957395+08:00
Assigning activation ID 3664275149 (0xda6866cd)
2023-07-30T06:54:44.957395+08:00
TT00 (PID:4640): Gap Manager starting
2023-07-30T06:54:45.004305+08:00
Redo log for group 1, sequence 1 is not located on DAX storage
2023-07-30T06:54:46.176153+08:00
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: D:\APP\ADMINISTRATOR\ORADATA\XFF\REDO01.LOG
Successful open of redo thread 1
2023-07-30T06:54:46.191771+08:00
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
stopping change tracking
2023-07-30T06:54:46.223036+08:00
TT03 (PID:1816): Sleep 5 seconds and then try to clear SRLs in 2 time(s)
2023-07-30T06:54:46.332398+08:00
ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 
0x0000000017b852a7
):
2023-07-30T06:54:46.332398+08:00
select ctime, mtime, stime from obj$ where obj# = :1
2023-07-30T06:54:46.332398+08:00
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_ora_5836.trc:
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01555: 快照过旧: 回退段号 9 (名称为 "_SYSSMU9_4165470211$") 过小
2023-07-30T06:54:46.332398+08:00
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_ora_5836.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01555: 快照过旧: 回退段号 9 (名称为 "_SYSSMU9_4165470211$") 过小
2023-07-30T06:54:46.348028+08:00
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_ora_5836.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01555: 快照过旧: 回退段号 9 (名称为 "_SYSSMU9_4165470211$") 过小
Error 704 happened during db open, shutting down database
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_ora_5836.trc  (incident=474502):
ORA-00603: ORACLE 服务器会话因致命错误而终止
ORA-01092: ORACLE 实例终止。强制断开连接
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01555: 快照过旧: 回退段号 9 (名称为 "_SYSSMU9_4165470211$") 过小
Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\incident\incdir_474502\xff_ora_5836_i474502.trc
2023-07-30T06:54:47.785549+08:00
opiodr aborting process unknown ospid (5836) as a result of ORA-603
2023-07-30T06:54:47.816792+08:00
ORA-603 : opitsk aborting process
License high water mark = 6
USER (ospid: (prelim)): terminating the instance due to ORA error

这类错误比较常见,参考以前类似恢复:
在数据库open过程中常遇到ORA-01555汇总
 数据库open过程遭遇ORA-1555对应sql语句补充
 Oracle Recovery Tools恢复—ORA-00704 ORA-01555故障
 使用_allow_resetlogs_corruption导致ORA-00704/ORA-01555故障
对于本次故障,通过Oracle Recovery Tools工具快速处理

open数据库成功

SQL> alter database open;

数据库已更改。

SQL>
SQL>
SQL> select status,count(1) from v$datafile group by status;

STATUS           COUNT(1)
-------------- ----------
SYSTEM                  1
ONLINE                 61

ORA-00600: internal error code, arguments: [16513], [1403] 恢复

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：ORA-00600: internal error code, arguments: [16513], [1403] 恢复

接到客户请求,存储异常断电之后,一个40多T的经分数据库无法正常启动,通过各方一系列操作之后,数据库依旧无法open,报错信息为:ORA-00600: internal error code, arguments: [16513], [1403], [28]

Sun Feb 14 00:20:09 BEIST 2021
SMON: enabling cache recovery
Sun Feb 14 00:20:09 BEIST 2021
ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 0x0f27.13cd4fc3):
Sun Feb 14 00:20:09 BEIST 2021
select ctime, mtime, stime from obj$ where obj# = :1
Sun Feb 14 00:20:09 BEIST 2021
Errors in file /oracle10g/db/admin/xifenfei/udump/xifenfei1_ora_177254.trc:
ORA-00600: internal error code, arguments: [16513], [1403], [28], [], [], [], [], []
Sun Feb 14 00:20:10 BEIST 2021
Errors in file /oracle10g/db/admin/xifenfei/udump/xifenfei1_ora_177254.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [16513], [1403], [28], [], [], [], [], []
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 177254
ORA-1092 signalled during: ALTER DATABASE OPEN...

通过对启动过程进行跟踪

=====================
PARSING IN CURSOR #5 len=52 dep=1 uid=0 oct=3 lid=0 tim=194171381576991 hv=429618617 ad='afcee60'
select ctime, mtime, stime from obj$ where obj# = :1
END OF STMT
PARSE #5:c=0,e=257,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=194171381576990
BINDS #5:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=1104b8ed0  bln=22  avl=02  flg=05
  value=28
EXEC #5:c=0,e=422,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=194171381577472
WAIT #5: nam='db file sequential read' ela= 274 file#=1 block#=110 blocks=1 obj#=36 tim=194171381577809
WAIT #5: nam='db file sequential read' ela= 257 file#=1 block#=78943 blocks=1 obj#=36 tim=194171381578123
WAIT #5: nam='db file sequential read' ela= 253 file#=1 block#=111 blocks=1 obj#=36 tim=194171381578416
WAIT #5: nam='db file sequential read' ela= 226 file#=1 block#=62 blocks=1 obj#=18 tim=194171381578692
=====================
PARSING IN CURSOR #6 len=142 dep=2 uid=0 oct=3 lid=0 tim=194171381579134 hv=361892850 ad='df87eb0'
select /*+ rule */ name,file#,block#,status$,user#,undosqn,xactsqn,scnbas,scnwrp,
DECODE(inst#,0,NULL,inst#),ts#,spare1 from undo$ where us#=:1
END OF STMT
PARSE #6:c=0,e=368,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=3,tim=194171381579133
BINDS #6:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=1105af9a8  bln=22  avl=02  flg=05
  value=92
EXEC #6:c=0,e=531,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=3,tim=194171381579736
WAIT #6: nam='db file sequential read' ela= 232 file#=1 block#=102 blocks=1 obj#=34 tim=194171381580011
WAIT #6: nam='db file sequential read' ela= 251 file#=1 block#=54 blocks=1 obj#=15 tim=194171381580307
FETCH #6:c=0,e=615,p=2,cr=2,cu=0,mis=0,r=1,dep=2,og=3,tim=194171381580369
STAT #6 id=1 cnt=1 pid=0 pos=1 obj=15 op='TABLE ACCESS BY INDEX ROWID UNDO$ (cr=2 pr=2 pw=0 time=584 us)'
STAT #6 id=2 cnt=1 pid=1 pos=1 obj=34 op='INDEX UNIQUE SCAN I_UNDO1 (cr=1 pr=1 pw=0 time=282 us)'
WAIT #5: nam='db file sequential read' ela= 260 file#=4290 block#=365090 blocks=1 obj#=0 tim=194171381580721
FETCH #5:c=10000,e=3327,p=7,cr=7,cu=0,mis=0,r=0,dep=1,og=4,tim=194171381580817
*** 2021-02-14 02:32:40.430
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [16513], [1403], [28], [], [], [], [], []
Current SQL statement for this session:
alter database open
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ksedst+001c          bl       ksedst1              70000090FC00AE4 ? 000000000 ?
ksedmp+0290          bl       ksedst               104C1CA58 ?
ksfdmp+02d8          bl       03F4BD0C             
kgeriv+0108          bl       _ptrgl               
kgesiv+0080          bl       kgeriv               7000008F6DB5890 ? 00000000C ?
                                                   7000008F6DB5890 ? 0FFFFFDC3 ?
                                                   0FFFFFFBF ?
ksesic2+0060         bl       kgesiv               5FFFEC580 ? 500000000 ?
                                                   FFFFFFFFFFEC590 ? 0F5FAF808 ?
                                                   7000008F5FAF7E8 ?
kqdpts+0158          bl       ksesic2              408100004081 ? 000000000 ?
                                                   00000057B ? 000000000 ?
                                                   00000001C ?
                                                   A006150571B05EF0 ?
                                                   1100CF0A8 ? 000000001 ?
kqrlfc+0274          bl       kqdpts               000000000 ?
kqlbplc+00b4         bl       03F4E6A8             
kqlblfc+0230         bl       kqlbplc              00000009D ?
adbdrv+1a9c          bl       03F4B66C             
opiexe+2db4          bl       adbdrv               
opiosq0+1ac8         bl       opiexe               1103B418C ? 000000000 ?
                                                   FFFFFFFFFFF9008 ?
kpooprx+016c         bl       opiosq0              300F1E1D4 ? 000000000 ?
                                                   000000000 ? A4FFFFFFFF9798 ?
                                                   000000000 ?
kpoal8+03cc          bl       kpooprx              FFFFFFFFFFFB814 ?
                                                   FFFFFFFFFFFB5C0 ?
                                                   1300000013 ? 100000001 ?
                                                   000000000 ? A40000000000A4 ?
                                                   000000000 ? 1103B4378 ?
opiodr+0b2c          bl       _ptrgl               
ttcpip+1020          bl       _ptrgl               
opitsk+117c          bl       01FBD04C             
opiino+09d0          bl       opitsk               1EFFFFD7E0 ? 000000000 ?
opiodr+0b2c          bl       _ptrgl               
opidrv+04a4          bl       opiodr               3C102B1398 ? 404C6FF30 ?
                                                   FFFFFFFFFFFF7A0 ? 0102B1390 ?
sou2o+0090           bl       opidrv               3C02705B3C ? 440663000 ?
                                                   FFFFFFFFFFFF7A0 ?
opimai_real+01bc     bl       01FB9BF4             
main+0098            bl       opimai_real          000000000 ? 000000000 ?
__start+0098         bl       main                 000000000 ? 000000000 ?
 
--------------------- Binary Stack Dump ---------------------

报错比较明显,数据库在查询obj$取obj#=28的数据的时候无法正常获取到该数据,从而报错ORA-600 16513错误.通过对相关block进行分析,发现有事务异常

BBED>  p ktbbh.ktbbhitl[1]
struct ktbbhitl[1], 24 bytes                @44
   struct ktbitxid, 8 bytes                 @44
      ub2 kxidusn                           @44       0x5c00
      ub2 kxidslt                           @46       0x0a00
      ub4 kxidsqn                           @48       0x48b4fc01
   struct ktbituba, 8 bytes                 @52
      ub4 kubadba                           @52       0x22928531
      ub2 kubaseq                           @56       0xe383
      ub1 kubarec                           @58       0x01
   ub2 ktbitflg                             @60       0x0120 (NONE)
   union _ktbitun, 2 bytes                  @62
      b2 _ktbitfsc                          @62       0
      ub2 _ktbitwrp                         @62       0x0000
   ub4 ktbitbas                             @64       0x95c2ff13

通过一些技巧处理规避掉该事务,然后启动库报熟悉的ORA-01555错误

相对比较简单参考(在数据库open过程中常遇到ORA-01555汇总),数据库顺利open成功,完成春节后第一个大库的恢复

Exadata火线救援：10TB级数据恢复—强制拉库篇

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：Exadata火线救援：10TB级数据恢复—强制拉库篇

这个库的恢复有一些历史故事(【力荐】Exadata火线救援：10TB级数据修复经典案例详解！)：xx运营商x2的1/4配置的oracle exadata机器，跑了近6年，最近有一个cell节点主机异常,在rebalance过程中,只有两个节点的cell其中一个节点坏了一个硬盘导致.导致asm diskgroup无法正常mount,最后该运营商运维三方通过amdu把该一体机中的数据文件全部抽出来,然后在恢复过程中出现大量错误无法解决,请求我们支持
数据库open过程报ORA-01555错误

Thu Jul  14 00:01:04 2016
alter database open
Thu Jul  14 00:01:04 2016
Thread 1 advanced to log sequence 2 (thread open)
Thread 1 opened at log sequence 2
  Current log# 2 seq# 2 mem# 0: /data/amdu/redo/DATA_EC_260.f
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu Jul  14 00:01:05 2016
SMON: enabling cache recovery
ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 0x0b26.9f080238):
select ctime, mtime, stime from obj$ where obj# = :1
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_59546.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 83 with name &quot;_SYSSMU83_1078760807$&quot; too small
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_59546.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 83 with name &quot;_SYSSMU83_1078760807$&quot; too small
Error 704 happened during db open, shutting down database
USER (ospid: 59546): terminating the instance due to error 704
Instance terminated by USER, pid = 59546
ORA-1092 signalled during: alter database open...
opiodr aborting process unknown ospid (59546) as a result of ORA-1092

这个错误比较常见，可以通过推scn就可以解决,由于已经安装了scn patch,通过oradebug推scn解决该问题.

ORA-600 4194
这个ora 600 4194相对比较特殊,在SMON: enabling cache recovery之后立马报出来,然后实例直接open失败.

Thu Jul  14 00:06:15 2016
alter database open
Thu Jul  14 00:06:15 2016
Thread 1 advanced to log sequence 3 (thread open)
Thread 1 opened at log sequence 3
  Current log# 3 seq# 3 mem# 0: /data/amdu/redo/DATA_EC_263.f
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu Jul  14 00:06:15 2016
SMON: enabling cache recovery
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_60038.trc  (incident=1080450):
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Block recovery from logseq 3, block 3 to scn 12269096776739
Recovery of Online Redo Log: Thread 1 Group 3 Seq 3 Reading mem 0
  Mem# 0: /data/amdu/redo/DATA_EC_263.f
Block recovery stopped at EOT rba 3.5.16
Block recovery completed at rba 3.5.16, scn 2856.2670179361
Block recovery from logseq 3, block 3 to scn 12269096776736
Recovery of Online Redo Log: Thread 1 Group 3 Seq 3 Reading mem 0
  Mem# 0: /data/amdu/redo/DATA_EC_263.f
Block recovery completed at rba 3.5.16, scn 2856.2670179361
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_60038.trc:
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_60038.trc:
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 60038): terminating the instance due to error 600
Instance terminated by USER, pid = 60038
ORA-1092 signalled during: alter database open...
opiodr aborting process unknown ospid (60038) as a result of ORA-1092

trace文件分析
打开数据库报ORA-600[4194]错误,对启动过程进行10046跟踪并且分析trace文件发现

PARSING IN CURSOR #140375370511672 len=148 dep=1 uid=0 oct=6 lid=0 tim=3501342849457766 hv=3540833987
ad='a47df47a8' sqlid='5ansr7r9htpq3'
update undo$ set name=:2,file#=:3,block#=:4,status$=:5,user#=:6,undosqn=:7,xactsqn=:8,scnbas=:9,
scnwrp=:10,inst#=:11,ts#=:12,spare1=:13 where us#=:1
END OF STMT
PARSE #140375370511672:c=27996,e=28041,p=66,cr=224,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=3501342849457765
BINDS #140375370511672:
 Bind#0
  oacdty=01 mxl=32(20) mxlc=00 mal=00 scl=00 pre=00
  oacflg=18 fl2=0001 frm=01 csi=178 siz=32 off=0
  kxsbbbfp=a47e093ca  bln=32  avl=20  flg=09
  value=&quot;_SYSSMU1_2856534670$&quot;
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b92b0  bln=24  avl=03  flg=05
  value=1024
 Bind#2
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b9280  bln=24  avl=03  flg=05
  value=128
 Bind#3
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b9248  bln=24  avl=02  flg=05
  value=5
 Bind#4
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b9218  bln=24  avl=02  flg=05
  value=1
 Bind#5
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b91e8  bln=24  avl=03  flg=05
  value=3398
 Bind#6
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b91b8  bln=24  avl=05  flg=05
  value=1485261
 Bind#7
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b9180  bln=24  avl=06  flg=05
  value=1946693999
 Bind#8
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b8ec8  bln=24  avl=03  flg=05
  value=2847
 Bind#9
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b8e98  bln=24  avl=02  flg=05
  value=1
 Bind#10
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b8e68  bln=24  avl=02  flg=05
  value=2
 Bind#11
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b8e38  bln=24  avl=02  flg=05
  value=2
 Bind#12
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b92e0  bln=22  avl=02  flg=05
  value=1
WAIT #140375370511672: nam='db file sequential read' ela= 21 file#=1 block#=179020 blocks=1 obj#=0 tim=3501342849459353
*** 2016-07-14 03:14:09.548
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []

很明显数据库是在update undo$的时候,读取到file 1 block 179020的时候报错,继续分析trace文件

Error 600 in redo application callback
Dump of change vector:
TYP:0 CLS:16 AFN:1 DBA:0x0042bb4c OBJ:4294967295 SCN:0x0b26.a88e815e SEQ:1 OP:5.1 ENC:0 RBL:0
ktudb redo: siz: 272 spc: 4906 flg: 0x0012 seq: 0x0f4c rec: 0x0d
            xid:  0x0000.01b.00000b63
ktubl redo: slt: 27 rci: 0 opc: 11.1 [objn: 15 objd: 15 tsn: 0]
Undo type:  Regular undo        Begin trans    Last buffer split:  No
Temp Object:  No
Tablespace Undo:  No
             0x00000000  prev ctl uba: 0x0042bb4c.0f4c.0c
prev ctl max cmt scn:  0x0b23.ccb9eb89  prev tx cmt scn:  0x0b23.ccb9ebac
txn start scn:  0xffff.ffffffff  logon user: 0  prev brb: 4373321  prev bcl: 0 BuExt idx: 0 flg2: 0
KDO undo record:
KTB Redo
op: 0x04  ver: 0x01
compat bit: 4 (post-11) padding: 1
op: L  itl: xid:  0x0000.01c.00000b65 uba: 0x0042bb4a.0f4c.1d
                      flg: C---    lkc:  0     scn: 0x0b25.7391ee5c
KDO Op code: URP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x004000e1  hdba: 0x004000e0
itli: 2  ispac: 0  maxfr: 4863
tabn: 0 slot: 1(0x1) flag: 0x2c lock: 0 ckix: 0
ncol: 17 nnew: 12 size: 0
col  1: [20]  5f 53 59 53 53 4d 55 31 5f 32 38 35 36 35 33 34 36 37 30 24
col  2: [ 2]  c1 02
col  3: [ 3]  c2 0b 19
col  4: [ 3]  c2 02 1d
col  5: [ 6]  c5 14 2f 46 28 64
col  6: [ 3]  c2 1d 30
col  7: [ 5]  c4 02 31 35 3e
col  8: [ 3]  c2 22 63
col  9: [ 2]  c1 02
col 10: [ 2]  c1 04
col 11: [ 2]  c1 03
col 16: [ 2]  c1 03
Block after image is corrupt:
buffer tsn: 0 rdba: 0x0042bb4c (1/179020)
scn: 0x0b26.a88e815e seq: 0x01 flg: 0x04 tail: 0x815e0201
frmt: 0x02 chkval: 0xf022 type: 0x02=KTU UNDO BLOCK
Hex dump of block: st=0, typ_found=1
Dump of memory from 0x000000094E07A000 to 0x000000094E07C000
94E07A000 0000A202 0042BB4C A88E815E 04010B26  [....L.B.^...&amp;...]
94E07A010 0000F022 004E0000 00000B5B 1D1D0F4C  [&quot;.....N.[...L...]

我们知道在file 1 block 179020的时候redo和undo信息不匹配,出现了上述的ORA 600 4194的错误.进一步分析

Block image after block recovery:
buffer tsn: 0 rdba: 0x00400080 (1/128)
scn: 0x0b26.6389e19d seq: 0x01 flg: 0x04 tail: 0xe19d0e01
frmt: 0x02 chkval: 0x7c95 type: 0x0e=KTU UNDO HEADER W/UNLIMITED EXTENTS
Hex dump of block: st=0, typ_found=1
Dump of memory from 0x000000094DAB2000 to 0x000000094DAB4000
94DAB2000 0000A20E 00400080 6389E19D 04010B26  [......@....c&amp;...]
94DAB2010 00007C95 00000000 00000000 00000000  [.|..............]
94DAB2020 00000000 00000015 000002FF 00001020  [............ ...]
94DAB2030 0000000D 0000004C 00000080 0042BB4C  [....L.......L.B.]
  Extent Control Header
  -----------------------------------------------------------------
  Extent Header:: spare1: 0      spare2: 0      #extents: 21     #blocks: 767
                  last map  0x00000000  #maps: 0      offset: 4128
      Highwater::  0x0042bb4c  ext#: 13     blk#: 76     ext size: 128
  #blocks in seg. hdr's freelists: 0
  #blocks below: 0
  mapblk  0x00000000  offset: 13
                   Unlocked
     Map Header:: next  0x00000000  #extents: 21   obj#: 0      flag: 0x40000000
  Extent Map
  -----------------------------------------------------------------
   0x00400081  length: 7
   0x00413a38  length: 8
   0x00400088  length: 8
   0x00413a30  length: 8
   0x0042b888  length: 8
   0x0042b890  length: 8
   0x0042b898  length: 8
   0x0042b8a0  length: 8
   0x0042b8a8  length: 8
   0x0042b8b0  length: 8
   0x0042b8b8  length: 8
   0x0042b8c0  length: 8
   0x0042ba80  length: 128
   0x0042bb00  length: 128
   0x0042bc00  length: 128
   0x0042bc80  length: 128
   0x0042bb80  length: 128
   0x00400210  length: 8
   0x00400218  length: 8
   0x00400220  length: 8
   0x00400228  length: 8
  TRN CTL:: seq: 0x0f4c chd: 0x001b ctl: 0x0043 inc: 0x00000000 nfb: 0x0001
            mgc: 0x8002 xts: 0x0068 flg: 0x0001 opt: 2147483646 (0x7ffffffe)
            uba: 0x0042bb4c.0f4c.0c scn: 0x0b23.ccb9eb89
Version: 0x01
  FREE BLOCK POOL::
    uba: 0x0042bb4c.0f4c.0c ext: 0xd  spc: 0x132a
    uba: 0x00000000.0f4c.0c ext: 0xd  spc: 0x12f6
    uba: 0x00000000.0f4c.01 ext: 0xd  spc: 0x1ec8
    uba: 0x00000000.0f4c.04 ext: 0xd  spc: 0x1b86
    uba: 0x00000000.0f4c.09 ext: 0xd  spc: 0x162c
  TRN TBL::
  index  state cflags  wrap#    uel         scn            dba            parent-xid    nub     stmt_num
  ------------------------------------------------------------------------------------------------
   0x00    9    0x00  0x0b62  0x0011  0x0b25.2dacaf61  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x01    9    0x00  0x0b64  0x0024  0x0b24.a6a2cf7b  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x02    9    0x00  0x0b65  0x0036  0x0b25.7391eda0  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x03    9    0x00  0x0b4f  0x0007  0x0b24.337bf49b  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x04    9    0x00  0x0b64  0x0051  0x0b23.ff22c637  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x05    9    0x00  0x0b64  0x0022  0x0b26.4393eb1e  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x06    9    0x00  0x0b66  0x0058  0x0b24.335c794d  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x07    9    0x00  0x0b4f  0x001d  0x0b24.4e05f2af  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x08    9    0x00  0x0b65  0x005e  0x0b23.ff22c618  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x09    9    0x00  0x0b5f  0x0035  0x0b24.337bf3d9  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x0a    9    0x00  0x0b64  0x004f  0x0b25.7391ee5f  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x0b    9    0x00  0x0b64  0x0040  0x0b24.335c7bd7  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x0c    9    0x00  0x0b65  0x0002  0x0b25.7391e929  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x0d    9    0x00  0x0b4f  0x0033  0x0b24.a6a2caa5  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x0e    9    0x00  0x0b65  0x0008  0x0b23.ff22c616  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x0f    9    0x00  0x0b61  0x0038  0x0b26.6389e195  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x10    9    0x00  0x0b4f  0x002a  0x0b24.bcff18b3  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x11    9    0x00  0x0b5c  0x0059  0x0b25.2dacaf69  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x12    9    0x00  0x0b65  0x0026  0x0b25.2dacb0a8  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x13    9    0x00  0x0b66  0x0021  0x0b24.a6a2caaa  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x14    9    0x00  0x0b62  0x0009  0x0b24.337bf3d7  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x15    9    0x00  0x0b63  0x0031  0x0b25.1b4e13ba  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x16    9    0x00  0x0b66  0x003b  0x0b25.2dacee5d  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x17    9    0x00  0x0b63  0x0034  0x0b26.6389e199  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x18    9    0x00  0x0b5d  0x002f  0x0b24.bcff18af  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x19    9    0x00  0x0b5b  0x004d  0x0b24.d60f78e3  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x1a    9    0x00  0x0b60  0x005b  0x0b25.1b4e13be  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x1b    9    0x00  0x0b62  0x003e  0x0b23.ccb9ebac  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x1c    9    0x00  0x0b65  0x000a  0x0b25.7391ee5c  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x1d    9    0x00  0x0b64  0x002c  0x0b24.4e05f2b1  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x1e    9    0x00  0x0b64  0x0045  0x0b24.33255ae9  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x1f    9    0x00  0x0b64  0x0015  0x0b25.1b4e13b5  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x20    9    0x00  0x0b63  0x0050  0x0b24.335c79a4  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x21    9    0x00  0x0b5d  0x0001  0x0b24.a6a2caf0  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x22    9    0x00  0x0b65  0x000f  0x0b26.4393eb20  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x23    9    0x00  0x0b62  0x0042  0x0b24.337bf3d2  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x24    9    0x00  0x0b65  0x003c  0x0b24.a6a2d137  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x25    9    0x00  0x0b62  0x0020  0x0b24.335c795d  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x26    9    0x00  0x0b63  0x0052  0x0b25.2dacee48  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x27    9    0x00  0x0b4e  0x003a  0x0b25.7391ee58  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x28    9    0x00  0x0b65  0x0049  0x0b25.2dacb089  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x29    9    0x00  0x0b61  0x0030  0x0b24.bcff18bb  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x2a    9    0x00  0x0b65  0x0057  0x0b24.bcff18b5  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x2b    9    0x00  0x0b64  0x0054  0x0b25.2dacee55  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x2c    9    0x00  0x0b61  0x000d  0x0b24.4e05f2b3  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x2d    9    0x00  0x0b64  0x000e  0x0b23.ff22c611  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x2e    9    0x00  0x0b65  0x0053  0x0b25.7391e78a  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x2f    9    0x00  0x0b66  0x0010  0x0b24.bcff18b1  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x30    9    0x00  0x0b63  0x0019  0x0b24.d60f78e1  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x31    9    0x00  0x0b65  0x001a  0x0b25.1b4e13bc  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x32    9    0x00  0x0b65  0x0037  0x0b24.a6a2d369  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x33    9    0x00  0x0b4f  0x0013  0x0b24.a6a2caa7  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x34    9    0x00  0x0b63  0x0043  0x0b26.6389e19b  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x35    9    0x00  0x0b65  0x0046  0x0b24.337bf409  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x36    9    0x00  0x0b52  0x0061  0x0b25.7391eda2  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x37    9    0x00  0x0b64  0x0018  0x0b24.bcff18ad  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x38    9    0x00  0x0b65  0x0017  0x0b26.6389e197  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x39    9    0x00  0x0b62  0x0006  0x0b24.335c7947  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x3a    9    0x00  0x0b64  0x001c  0x0b25.7391ee5a  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x3b    9    0x00  0x0b4d  0x002e  0x0b25.7391e730  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x3c    9    0x00  0x0b65  0x0032  0x0b24.a6a2d144  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x3d    9    0x00  0x0b65  0x0016  0x0b25.2dacee59  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x3e    9    0x00  0x0b63  0x002d  0x0b23.ff22c60b  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x3f    9    0x00  0x0b63  0x005f  0x0b24.335c7b57  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x40    9    0x00  0x0b65  0x0044  0x0b24.335c7bd9  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x41    9    0x00  0x0b65  0x0029  0x0b24.bcff18b9  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x42    9    0x00  0x0b61  0x0014  0x0b24.337bf3d4  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x43    9    0x00  0x0b5b  0xffff  0x0b26.6389e19d  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x44    9    0x00  0x0b4c  0x004b  0x0b24.335c7bdb  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x45    9    0x00  0x0b61  0x0039  0x0b24.335c7945  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x46    9    0x00  0x0b65  0x005a  0x0b24.337bf421  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x47    9    0x00  0x0b65  0x0027  0x0b25.7391eda8  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x48    9    0x00  0x0b62  0x004e  0x0b24.335c7953  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x49    9    0x00  0x0b63  0x0012  0x0b25.2dacb09f  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x4a    9    0x00  0x0b63  0x0023  0x0b24.335c7bf8  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x4b    9    0x00  0x0b5b  0x004a  0x0b24.335c7bf3  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x4c    9    0x00  0x0b63  0x003f  0x0b24.335c7b55  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x4d    9    0x00  0x0b61  0x001f  0x0b25.1b4e13b3  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x4e    9    0x00  0x0b5a  0x0025  0x0b24.335c795b  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x4f    9    0x00  0x0b5f  0x005c  0x0b25.8ff588bb  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x50    9    0x00  0x0b63  0x004c  0x0b24.335c79a7  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x51    9    0x00  0x0b64  0x005d  0x0b24.0c84cbf4  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x52    9    0x00  0x0b65  0x002b  0x0b25.2dacee49  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x53    9    0x00  0x0b64  0x000c  0x0b25.7391e927  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x54    9    0x00  0x0b63  0x003d  0x0b25.2dacee56  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x55    9    0x00  0x0b64  0x0005  0x0b26.4393eada  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x56    9    0x00  0x0b64  0x0055  0x0b26.4393ead3  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x57    9    0x00  0x0b60  0x0041  0x0b24.bcff18b7  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x58    9    0x00  0x0b65  0x0060  0x0b24.335c794f  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x59    9    0x00  0x0b60  0x0028  0x0b25.2dacaf74  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x5a    9    0x00  0x0b61  0x0003  0x0b24.337bf499  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x5b    9    0x00  0x0b62  0x0000  0x0b25.1b4e13c0  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x5c    9    0x00  0x0b62  0x0056  0x0b25.950a6e4b  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x5d    9    0x00  0x0b63  0x001e  0x0b24.33255ae7  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x5e    9    0x00  0x0b5f  0x0004  0x0b23.ff22c635  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x5f    9    0x00  0x0b64  0x000b  0x0b24.335c7bd5  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x60    9    0x00  0x0b61  0x0048  0x0b24.335c7950  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x61    9    0x00  0x0b65  0x0047  0x0b25.7391eda6  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
KQRCMT: Write failed with error=600 po=0xa47e092c0 cid=3
diagnostics : cid=3 hash=35e74caf flag=2a
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []

到这里我们基本上明白了报错的file 1 block 179020是由FREE BLOCK POOL分配出来的,现在解决给问题的思路就是直接使用bbed分配一个新块即可.

ORA-600 6711
数据库无法正常open报ORA 600 6711错误

Thu Jul  14 04:04:28 2016
alter database open
Beginning crash recovery of 1 threads
 parallel recovery started with 32 processes
Started redo scan
Completed redo scan
 read 1 KB redo, 0 data blocks need recovery
Started redo application at
 Thread 1: logseq 2, block 3
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: /data/amdu/redo/DATA_EC_260.f
Completed redo application of 0.00MB
Completed crash recovery at
 Thread 1: logseq 2, block 5, scn 12269633687653
 0 data blocks read, 0 data blocks written, 1 redo k-bytes read
Thu Jul  14 04:04:29 2016
Thread 1 advanced to log sequence 3 (thread open)
Thread 1 opened at log sequence 3
  Current log# 3 seq# 3 mem# 0: /data/amdu/redo/DATA_EC_263.f
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
After successful startup of the database, please remove
the parameters _allow_error_simulation and _smu_debug_mode
and restart the database
Thu Jul  14 04:04:29 2016
SMON: enabling cache recovery
Undo initialization finished serial:0 start:270626334 end:270626544 diff:210 (2 seconds)
Dictionary check beginning
Tablespace 'TEMP' #3 found in data dictionary,
but not in the controlfile. Adding to controlfile.
Corrected file 19 plugged in read-only status in control file
Corrected file 81 plugged in read-only status in control file
Corrected file 88 plugged in read-only status in control file
Corrected file 93 plugged in read-only status in control file
Corrected file 128 plugged in read-only status in control file
Corrected file 130 plugged in read-only status in control file
Corrected file 131 plugged in read-only status in control file
Corrected file 163 plugged in read-only status in control file
Corrected file 181 plugged in read-only status in control file
Corrected file 184 plugged in read-only status in control file
Corrected file 186 plugged in read-only status in control file
Corrected file 191 plugged in read-only status in control file
Corrected file 214 plugged in read-only status in control file
Corrected file 220 plugged in read-only status in control file
Dictionary check complete
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
*********************************************************************
WARNING: The following temporary tablespaces contain no files.
         This condition can occur when a backup controlfile has
         been restored.  It may be necessary to add files to these
         tablespaces.  That can be done using the SQL statement:
         ALTER TABLESPACE &lt;tablespace_name&gt; ADD TEMPFILE
         Alternatively, if these temporary tablespaces are no longer
         needed, then they can be dropped.
           Empty temporary tablespace: TEMP
*********************************************************************
Updating character set in controlfile to ZHS16GBK
WARNING: event 8105 is set. This event disables failed
         online index [re]build cleanup
No Resource Manager plan active
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Thu Jul  14 04:04:30 2016
QMNC started with pid=57, OS id=3549
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_3401.trc  (incident=1440450):
ORA-00600: internal error code, arguments: [6711], [4293062], [1], [4318348], [0], [], [], [], [], [], [], []
Incident details in: /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/incident/incdir_1440450/xifenfei_ora_3401_i1440450.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Error 600 in kwqmnpartition(), aborting txn
Thu Jul  14 04:04:32 2016
Dumping diagnostic data in directory=[cdmp_20801214040432], requested by (instance=1, osid=3401), summary=[incident=1440450].
Thu Jul  14 04:04:32 2016
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_mmon_3329.trc  (incident=1440178):
ORA-00600: internal error code, arguments: [6711], [4293062], [1], [4318348], [0], [], [], [], [], [], [], []
Incident details in: /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/incident/incdir_1440178/xifenfei_mmon_3329_i1440178.trc
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_3401.trc  (incident=1440451):
ORA-00600: internal error code, arguments: [6711], [4293062], [1], [4318348], [0], [], [], [], [], [], [], []
Incident details in: /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/incident/incdir_1440451/xifenfei_ora_3401_i1440451.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_mmon_3329.trc  (incident=1440179):
ORA-00600: internal error code, arguments: [6711], [4293062], [1], [4318348], [0], [], [], [], [], [], [], []
Incident details in: /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/incident/incdir_1440179/xifenfei_mmon_3329_i1440179.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ORA-600 signalled during: alter database open...

数据库正常open失败,但是可以upgrade启动.根据对trace文件的分析,定位到问题在HISTGRM$表上面,进一步分析该表结构为

CREATE TABLE SYS.HISTGRM$
(
  OBJ#      NUMBER,
  COL#      NUMBER,
  ROW#      NUMBER,
  BUCKET    NUMBER,
  ENDPOINT  NUMBER,
  INTCOL#   NUMBER,
  EPVALUE   VARCHAR2(1000 BYTE),
  SPARE1    NUMBER,
  SPARE2    NUMBER
)
CLUSTER SYS.C_OBJ#_INTCOL#(OBJ#, INTCOL#);

这个和ORA-600 6711错误相匹配了

ERROR:
ORA-600 [6711] [a] [b]  [d]
VERSIONS:
versions 6.0 to 12.1
DESCRIPTION:
This error is generated when we find more blocks on a cluster key
chain than are supposed to be there.
Usually this indicates that the chain contains a loop within itself. We
cannot have more than 65535 blocks in a chain.
ARGUMENTS:
Arg [a] beginning DBA
Arg [b] table slot number (in table index)
Arg {c} dba of key next in chain
Arg [d] row slot of key next in chain

需要处理该错误,也就是需要处理CLUSTER SYS.C_OBJ#_INTCOL#,这个可以通过重建来实现,但是当重建之时发生

ORA-00600: internal error code, arguments: [kkoipt:invalid aptyp], [0], [0], [], [], [], [], [], [], [], [], []
ORA-08102: index key not found, obj# 39, file 1, block 1374829 (2)

这里错误比较明显obj#=39为obj$的i_obj4的index的记录和表不匹配,导致任何ddl无法执行,因此如果要处理C_OBJ#_INTCOL#就必须要先处理i_obj4的问题.通过一些技巧重建i_obj4,然后重建C_OBJ#_INTCOL#,数据库终于可以正常打开.由于大量数据字典不一致,exp/expdp导出依旧有问题,通过dblink直接拉数据到新库,完成本次恢复
补充说明：1. 在这个库的恢复过程中,我们还使用了大量的event和隐含参数,因为比较常规而且不涉及核心环节,因为未列举出来;2. 由于当时操作记录未能够保留日志因此相关操作步骤无法贴出来,本文只能提供恢复处理思路
再次提醒各位数据库做好备份,做好巡检工作,哪怕是强大的Oracle exadata也禁不起无备份折腾,数据重于一切

ORA-01555 ORA-600 kdiulk:kcbz_objdchk ORA-600 kdBlkCheckError等错误恢复

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：ORA-01555 ORA-600 kdiulk:kcbz_objdchk ORA-600 kdBlkCheckError等错误恢复

数据库启动ORA-00704,0RA-00604,ORA-01555导致数据库无法启动

Tue May 31 17:32:42 2016
SMON: enabling cache recovery
SUCCESS: diskgroup RECOVERY was mounted
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 0x0004.3af84bee):
select ctime, mtime, stime from obj$ where obj# = :1
Archived Log entry 5 added for thread 1 sequence 10 ID 0x86a261e7 dest 1:
Errors in file /opt/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_12779.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 7 with name "_SYSSMU7_1592079335$" too small
Errors in file /opt/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_12779.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 7 with name "_SYSSMU7_1592079335$" too small
Error 704 happened during db open, shutting down database
USER (ospid: 12779): terminating the instance due to error 704

通过bbed修改事务之后启动数据库

Tue May 31 17:35:49 2016
SMON: enabling tx recovery
*********************************************************************
WARNING: The following temporary tablespaces contain no files.
         This condition can occur when a backup controlfile has
         been restored.  It may be necessary to add files to these
         tablespaces.  That can be done using the SQL statement:
         ALTER TABLESPACE <tablespace_name> ADD TEMPFILE
         Alternatively, if these temporary tablespaces are no longer
         needed, then they can be dropped.
           Empty temporary tablespace: TEMP
*********************************************************************
Updating character set in controlfile to AL32UTF8
Tue May 31 17:35:50 2016
Errors in file /opt/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_p021_13862.trc  (incident=166002):
ORA-00600: 内部错误代码, 参数: [kdiulk:kcbz_objdchk], [0], [0], [1], [], [], [], [], [], [], [], []
Tue May 31 17:35:50 2016
Errors in file /opt/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_p010_13818.trc  (incident=165914):
ORA-00600: 内部错误代码, 参数: [kdiulk:kcbz_objdchk], [0], [0], [1], [], [], [], [], [], [], [], []
Tue May 31 17:35:50 2016
Errors in file /opt/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_p004_13794.trc  (incident=165866):
ORA-00600: 内部错误代码, 参数: [kdiulk:kcbz_objdchk], [0], [0], [1], [], [], [], [], [], [], [], []
Tue May 31 17:35:50 2016
Errors in file /opt/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_p011_13822.trc  (incident=165922):
ORA-00600: 内部错误代码, 参数: [kdiulk:kcbz_objdchk], [0], [0], [1], [], [], [], [], [], [], [], []
Tue May 31 17:35:50 2016
Errors in file /opt/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_p016_13842.trc  (incident=165962):
ORA-00600: 内部错误代码, 参数: [kdiulk:kcbz_objdchk], [0], [0], [1], [], [], [], [], [], [], [], []

ORA-600 [kdiulk:kcbz_objdchk] trace文件

*** SESSION ID:(3.5) 2016-05-31 17:35:50.068
OBJD MISMATCH typ=6, seg.obj=-2, diskobj=222225, dsflg=0, dsobj=285890, tid=285890, cls=1
ORA-00600: 内部错误代码, 参数: [kdiulk:kcbz_objdchk], [0], [0], [1], [], [], [], [], [], [], [], []
Parallel Transaction recovery server caught exception 600
begin Parallel Recovery Context Dump
nsi: 48, nsactive: 48
, nirsi: 1, nidti: 1, ndt: 1, rescan: 0, ptrs: 48
[ktprsi] wdone: 50
[ktpritp 378651b8] ktprsi:
37903b60 37903b78 37903b90 37903ba8 37903bc0 37903bd8 37903bf0 37903c08 37903c20 37903c38 37903c50
37903c68 37903c80 37903c98 37903cb0 37903cc8 37903ce0 37903cf8 37903d10 37903d28 37903d40 37903d58
37903d70 37903d88 37903da0 37903db8 37903dd0 37903de8 37903e00 37903e18 37903e30 37903e48 37903e60
37903e78 37903e90 37903ea8 37903ec0 37903ed8 37903ef0 37903f08 37903f20 37903f38 37903f50 37903f68
37903f80 37903f98 37903fb0 37903fc8
[ktprht] nhb: 47, nfl: 20247, flg: 2
*** 2016-05-31 17:36:08.584
[ktprhb] nfl: 1, nelem: 97, flg: 0, sqn: 1
flist: 37698940
nhe: [ktprhe 32] sqn: -1297235803
[kturur] uoff: -1797708320, sqn: 4
uba: 0x098004cd.07e4.0b
*-----------------------------
* Rec #0xb  slt: 0x07  objn: 123986(0x0001e452)  objd: 285891  tblspc: 10(0x0000000a)
*       Layer:  10 (Index)   opc: 22   rci 0x0a
Undo type:  Regular undo   Last buffer split:  No
Temp Object:  No
Tablespace Undo:  No
rdba: 0x00000000

这里基本上可以确定是由于undo index中的dataobj#和block中的dataobj# 不匹配.在数据库undo回滚之时出现该错误.可以通过跳过undo回滚,然后重建对象

Tue May 31 17:36:06 2016
Simulated error on redo application.
Block recovery from logseq 12, block 959 to scn 20401094719
Recovery of Online Redo Log: Thread 1 Group 3 Seq 12 Reading mem 0
  Mem# 0: +DATA/xifenfei/onlinelog/group_3.263.802446627
Block recovery completed at rba 12.1012.16, scn 4.3221225536
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Simulated error for redo application done.
Errors in file /opt/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_p009_13814.trc  (incident=165906):
ORA-00600: 内部错误代码, 参数: [kdBlkCheckError], [26], [950417], [18025], [], [], [], [], [], [], [], []

这些错误是由于数据库block逻辑异常导致,错过参数含义
在10g中ORA-600 kddummy_blkchk 在11g中ORA-600 kdBlkCheckError

ARGUMENTS:
Arg [a] Absolute file number
Arg [b] Bock number
Arg  Internal error code returned from kcbchk() which indicates the problem encountered.
See Note 46389.1 for details of block check codes.

根据QREF kddummy_blkchk / kdBlkCheckError Check Codes Listing (Full) (Doc ID 1264040.1)分析
这里的18025是代码的KCBTEMAP_EC_START + KTS4_EC_SBFREE部分异常，主要表现在Incorrect firstfree or nfree 可以通过设置一些参数进行屏蔽

在恢复过程中还有其他错误

ORA-600 encountered when generating server alert SMG-4128
ORA-00600: internal error code, arguments: [ktcpoptx:!cmt top lvl], [], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [4406], [0x1026B65348], [0x000000000], [2], [6215], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [ktcpoptx:!cmt top lvl], [], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
ORACLE Instance xifenfei (pid = 15) - Error 600 encountered while recovering transaction (10, 7) on object 123986.
ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kturbleurec1], [], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kewrose_1], [600],
  [ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Non-fatal internal error happenned while SMON was doing logging scn->time mapping.

通过整体分析错误主要是由于undo异常导致,通过设置_corrupted_rollback_segments设置db_block_checking等相关参数,清理SMON_SCN_TIME等操作数据库没有其他异常报错,让其通过逻辑方式重建库

在数据库open过程中常遇到ORA-01555汇总

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：在数据库open过程中常遇到ORA-01555汇总

在数据库open的过程中，select ctime, mtime, stime from obj$ where obj# = :1语句报ORA-01555错误,数据库无法正常open
一般情况下会报某个回滚段,但是这里ORA-01555: snapshot too old: rollback segment number 0 with name “SYSTEM” too small这里直接报了system(系统回滚段),属于少见情况

Fri Jun 26 11:47:31 2015
SMON: enabling cache recovery
Fri Jun 26 11:47:31 2015
ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 0x0b41.37629378):
Fri Jun 26 11:47:31 2015
select ctime, mtime, stime from obj$ where obj# = :1
Fri Jun 26 11:47:31 2015
Errors in file /orabin/admin/doocrm/udump/doocrm_ora_5046722.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 0 with name "SYSTEM" too small
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 5046722
ORA-1092 signalled during: alter database open resetlogs...

在数据库open的过程中,select increment$,minvalue,maxvalue,cycle#,order$,cache,highwater,audit$,flags from seq$ where obj#=:1语句上报ORA-01555,导致数据库open失败

在数据库open过程中,select /*+ rule */ name,file#,block#,status$,user#,undosqn,xactsqn,scnbas,scnwrp,DECODE(inst#,0,NULL,inst#),ts#,spare1 from undo$ where us#=:1语句导致数据库open失败.

ARC0: Becoming the 'no SRL' ARCH
Sun Jun 28 16:08:22 2015
ARC1: Becoming the heartbeat ARCH
Sun Jun 28 16:08:22 2015
SMON: enabling cache recovery
Sun Jun 28 16:08:22 2015
ORA-01555 caused by SQL statement below (SQL ID: 7bd391hat42zk, Query Duration=0 sec, SCN: 0x0d27.0a1ce29d):
Sun Jun 28 16:08:22 2015
select /*+ rule */ name,file#,block#,status$,user#,undosqn,xactsqn,scnbas,scnwrp,
   DECODE(inst#,0,NULL,inst#),ts#,spare1 from undo$ where us#=:1
Sun Jun 28 16:08:22 2015
Errors in file /oracle/app/oracle/admin/ibsscrm/udump/xxxx_ora_30212428.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 0 with name "SYSTEM" too small
Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604
Instance terminated by USER, pid = 30212428
ORA-1092 signalled during: alter database open...

出现这类问题,一般是由于obj$,seq$,undo$等基表上对象scn大于数据库当前scn或者是由于这些表上有事务未提交,出现上述两种情况,数据库需要找对应的undo的回滚段中记录,而此时对应的回滚段异常(或者是由于redo未进行正常前滚,导致上述对象或者回滚段记录不正常),从而出现类似情况,一般出现此类情况,可以通过10046定位到block，然后故障原因采用bbed修改scn或者bbed提交事务来解决此类问题.
最近两年的恢复中又遇到一些ora-00704 ora-00604 ora-01555的错误,导致数据库无法正常open,对其进行补充,具体参见:数据库open过程遭遇ORA-1555对应sql语句补充

某集团ebs数据库redo undo丢失导致悲剧

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：某集团ebs数据库redo undo丢失导致悲剧

某集团的ebs系统因磁盘空间不足把redo和undo存放到raid 0之上，而且该库无任何备份。最终悲剧发生了,raid 0异常导致redo undo全部丢失,数据库无法正常启动(我接手之时数据库已经resetlogs过,但是未成功)

Sun Jul 27 11:31:27 2014
SMON: enabling cache recovery
SMON: enabling tx recovery
Sun Jul 27 11:31:27 2014
Database Characterset is ZHS16GBK
Sun Jul 27 11:31:27 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_454754.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 42 cannot be read at this time
ORA-01110: data file 42: '/prod/oracle/PROD/logdata/undo/undo1.dbf'
Sun Jul 27 11:31:27 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_454754.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 42 cannot be read at this time
ORA-01110: data file 42: '/prod/oracle/PROD/logdata/undo/undo1.dbf'
Sun Jul 27 11:31:27 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_454754.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 42 cannot be read at this time
ORA-01110: data file 42: '/prod/oracle/PROD/logdata/undo/undo1.dbf'
Sun Jul 27 11:31:27 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_663670.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 41 cannot be read at this time
ORA-01110: data file 41: '/prod/oracle/PROD/logdata/undo/undo2.dbf'
Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604
Instance terminated by USER, pid = 663670
ORA-1092 signalled during: ALTER DATABASE OPEN...

查询相关文件状态发现,undo表空间文件丢失,被offline处理
df_status
因为以前alert日志被清理,通过这里大概猜测是offline丢失的undo文件,然后resetlogs了数据库,现在处理方式为
使用_corrupted_rollback_segments屏蔽回滚段,然后尝试启动数据库

Tue Jul 29 11:40:39 2014
SMON: enabling cache recovery
SMON: enabling tx recovery
Tue Jul 29 11:40:39 2014
Database Characterset is ZHS16GBK
Tue Jul 29 11:40:39 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_569378.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number  with name "" too small
Tue Jul 29 11:40:39 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_569378.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number  with name "" too small
Tue Jul 29 11:40:39 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_569378.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number  with name "" too small
Tue Jul 29 11:40:39 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_585786.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number  with name "" too small
Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604
Instance terminated by USER, pid = 585786
ORA-1092 signalled during: alter database open...

该错误是由于数据库启动需要找到对应的回滚段,但是由于undo异常导致该回滚段无法找到，因此出现该错误，解决方法是通过修改数据scn，让其不找回滚段,从而屏蔽该错误.数据库启动后,删除undo重新创建新undo

Tue Jul 29 15:59:22 2014
drop tablespace undo2 including contents and datafiles
Tue Jul 29 15:59:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_782490.trc:
ORA-01122: database file 41 failed verification check
ORA-01110: data file 41: '/prod/oracle/PROD/logdata/undo/undo2.dbf'
ORA-01565: error in identifying file '/prod/oracle/PROD/logdata/undo/undo2.dbf'
ORA-27037: unable to obtain file status
IBM AIX RISC System/6000 Error: 2: No such file or directory
Additional information: 3
Tue Jul 29 15:59:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_782490.trc:
ORA-01259: unable to delete datafile /prod/oracle/PROD/logdata/undo/undo2.dbf
Tue Jul 29 15:59:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_782490.trc:
ORA-01122: database file 42 failed verification check
ORA-01110: data file 42: '/prod/oracle/PROD/logdata/undo/undo1.dbf'
ORA-01565: error in identifying file '/prod/oracle/PROD/logdata/undo/undo1.dbf'
ORA-27037: unable to obtain file status
IBM AIX RISC System/6000 Error: 2: No such file or directory
Additional information: 3
ORA-01259: unable to delete datafile /prod/oracle/PROD/logdata/undo/undo2.dbf
Tue Jul 29 15:59:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_782490.trc:
ORA-01259: unable to delete datafile /prod/oracle/PROD/logdata/undo/undo1.dbf
Tue Jul 29 15:59:23 2014
Completed: drop tablespace undo2 including contents and datafiles
Tue Jul 29 15:59:56 2014
create undo tablespace undotbs1 datafile '/prod/oracle/PROD/logdata/undo_new01.dbf' size 100M autoextend on next 128M maxsize 30G
Tue Jul 29 15:59:57 2014
Completed: create undo tablespace undotbs1 datafile '/prod/oracle/PROD/logdata/undo_new01.dbf' size 100M autoextend on next 128M maxsize 30G
Tue Jul 29 16:00:03 2014
alter tablespace undotbs1 add datafile '/prod/oracle/PROD/logdata/undo_new02.dbf' size 100M autoextend on next 128M maxsize 30G
Completed: alter tablespace undotbs1 add datafile '/prod/oracle/PROD/logdata/undo_new02.dbf' size 100M autoextend on next 128M maxsize 30G

业务运行过程中,数据库报大量ORA-600 4097,ORA-600 kdsgrp1,ORA-600 kcfrbd_3错误

Tue Jul 29 16:07:03 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_950484.trc:
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Tue Jul 29 16:07:06 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_950484.trc:
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
Tue Jul 29 16:10:06 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_917702.trc:
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Tue Jul 29 16:10:07 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_917702.trc:
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
Tue Jul 29 16:12:45 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_m000_880692.trc:
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Tue Jul 29 16:21:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1040638.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [41], [231381], [1], [12800], [12800], [], []
Tue Jul 29 16:21:37 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1040638.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [41], [231381], [1], [12800], [12800], [], []
Tue Jul 29 16:21:56 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1040638.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [41], [231381], [1], [12800], [12800], [], []
Tue Jul 29 16:22:18 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1040638.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [41], [231381], [1], [12800], [12800], [], []
Tue Jul 29 16:22:28 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1105950.trc:
ORA-00600: 内部错误代码, 参数: [4097], [], [], [], [], [], [], []
Tue Jul 29 16:22:33 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1159232.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [42], [61235], [1], [12800], [12800], [], []

出现该错误有几个原因和解决方法：
ORA-600 kdsgrp1 是因为相关坏块引起(tab,index,memory,cr block等),结合日志分析对象异常原因,根据具体情况确定对象然后选择合适处理方案(具体参考NOTE:1332252.1)
ORA-600 4097 由于数据库异常关闭然后open,创建回滚段,可能触发bug导致该问题(虽然说在当前版本修复,但是实际处理我确实按照NOTE:1030620.6解决)
ORA-600 kcfrbd_3 有事务的block被访问之后,根据回滚槽信息定位到相关回滚段,而正好新建的回滚段信息又和以前的名字编号一致,从而反馈出来是数据文件大小不够,从而出现该错误(具体参考NOTE:601798.1)
最终该数据库虽然恢复了,抢救了大量数据,但是对于ebs系统来说,丢失redo和undo数据的损失还是巨大的.再次温馨提示:数据库的redo,undo也很重要,数据库的备份更加重要

记录一次ORA-00316 ORA-00312 redo异常恢复

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：记录一次ORA-00316 ORA-00312 redo异常恢复

正常运行的数据库报突然报ORA-00316: log 1 of thread 1, type 286 in header is not log file,异常终止

Fri Feb 21 08:44:42 2014
Thread 1 advanced to log sequence 591 (LGWR switch)
  Current log# 1 seq# 591 mem# 0: J:\ORADATA\ORCL\REDO01.LOG
Fri Feb 21 15:31:20 2014
Errors in file c:\oracle\product\10.2.0\admin\orcl\bdump\orcl_lgwr_10312.trc:
ORA-00316: log 1 of thread 1, type 286 in header is not log file
ORA-00312: online log 1 thread 1: 'J:\ORADATA\ORCL\REDO01.LOG'
Fri Feb 21 15:31:20 2014
Errors in file c:\oracle\product\10.2.0\admin\orcl\bdump\orcl_lgwr_10312.trc:
ORA-00316: log 1 of thread 1, type 286 in header is not log file
ORA-00312: online log 1 thread 1: 'J:\ORADATA\ORCL\REDO01.LOG'
Fri Feb 21 15:31:20 2014
LGWR: terminating instance due to error 316
Fri Feb 21 15:31:20 2014
Errors in file c:\oracle\product\10.2.0\admin\orcl\bdump\orcl_j001_11328.trc:
ORA-00316: log  of thread , type  in header is not log file
Fri Feb 21 15:31:20 2014
Errors in file c:\oracle\product\10.2.0\admin\orcl\bdump\orcl_j000_14116.trc:
ORA-00316: 日志  (用于线程 ) 标头中的类型  不是日志文件
Fri Feb 21 15:31:20 2014
Errors in file c:\oracle\product\10.2.0\admin\orcl\bdump\orcl_dbw1_8964.trc:
ORA-00316: log  of thread , type  in header is not log file
Fri Feb 21 15:31:22 2014
Errors in file c:\oracle\product\10.2.0\admin\orcl\bdump\orcl_dbw0_10592.trc:
ORA-00316: log  of thread , type  in header is not log file
ORA-27041: unable to open file
OSD-04002: unable to open file
O/S-Error: (OS 2) 系统找不到指定的文件。
ORA-27041: unable to open file
OSD-04002: unable to open file
O/S-Error: (OS 2) 系统找不到指定的文件。
Fri Feb 21 15:31:41 2014
Errors in file c:\oracle\product\10.2.0\admin\orcl\bdump\orcl_smon_11112.trc:
ORA-00316: log  of thread , type  in header is not log file
Fri Feb 21 15:31:41 2014
Instance terminated by LGWR, pid = 10312

数据库启动报ORA-00316错误

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-00316: log 1 of thread 1, type 286 in header is not log file
ORA-00312: online log 1 thread 1: 'J:\ORADATA\ORCL\REDO01.LOG'

alert日志信息,报ORA-00316 ORA-00312

Sun Feb 23 13:54:08 2014
Started redo scan
Sun Feb 23 13:54:08 2014
Errors in file e:\oracle\product\10.2.0\admin\ora10g\udump\orcl_ora_5544.trc:
ORA-00316: log 1 of thread 1, type 286 in header is not log file
ORA-00312: online log 1 thread 1: 'J:\ORADATA\ORCL\REDO01.LOG'
Sun Feb 23 13:54:08 2014
Aborting crash recovery due to error 316
Sun Feb 23 13:54:08 2014
Errors in file e:\oracle\product\10.2.0\admin\ora10g\udump\orcl_ora_5544.trc:
ORA-00316: log 1 of thread 1, type 286 in header is not log file
ORA-00312: online log 1 thread 1: 'J:\ORADATA\ORCL\REDO01.LOG'
ORA-316 signalled during: alter database open...

通过dump redo header可以看出来redo header完全混乱了,里面很多数据文件内容在里面,初步估计系统或者硬件有问题(不稳定)导致该问题

LOG FILE #1:
  (name #3) J:\ORADATA\ORCL\REDO01.LOG
 Thread 1 redo log links: forward: 2 backward: 0
 siz: 0x7d000 seq: 0x0000024f hws: 0x1 bsz: 512 nab: 0xffffffff flg: 0xa dup: 1
 Archive links: fwrd: 0 back: 0 Prev scn: 0x0000.4f4a1951
 Low scn: 0x0000.4f4e5400 02/21/2014 08:44:42
 Next scn: 0xffff.ffffffff 01/01/1988 00:00:00
 FILE HEADER:
	Software vsn=4280360459=0xff211e0b, Compatibility Vsn=103886850=0x6313002
	Db ID=842019892=0x32303434, Db Name='01a0001'
	Activation ID=1107558657=0x42040101
	Control Seq=842019892=0x32303434, File size=808464688=0x30303130
	File Number=2417, Blksiz=2013738803, File Type=286 UNKNOWN
 descrip:"00-440201|广东省xxxxxx研究院|00014402010037"
 thread: 767 nab: 0x534e4906 seq: 0xbcfebcbd hws: 0xffffffff eot: 55 dis: 66
 reset logs count: 0x7545245 scn: 0x0101.1e097178
 Low scn: 0x4537.2022002c 01/15/2022 20:59:37
 Next scn: 0x3734.46463135 04/30/2017 17:34:49
 Enabled scn: 0x4345.37333038 10/22/2015 02:04:16
 Thread closed scn: 0x3939.43303143 08/25/2024 00:11:31
 Log format vsn: 0x46343835 Disk cksum: 0x1d09 Calc cksum: 0x70ed
 Terminal Recovery Stop scn: 0x3734.35304141
 Terminal Recovery Stamp  03/17/2014 19:59:48
 Most recent redo scn: 0x3636.31303134
 Largest LWN: 758788710 blocks
 Miscellaneous flags: 0x41444534
 Thread internal enable indicator: thr: 263902399, seq: 808464496 scn: 0x3032.30343431

因为当前redo完全损坏,尝试不完全恢复并结合隐含参数(_allow_resetlogs_corruption)拉库,出现错误ORA-00704 ORA-00604 ORA-01555

Sun Feb 23 14:03:37 2014
SMON: enabling cache recovery
Sun Feb 23 14:03:39 2014
ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 0x0000.4f4e5405):
Sun Feb 23 14:03:39 2014
select ctime, mtime, stime from obj$ where obj# = :1
Sun Feb 23 14:03:39 2014
Errors in file e:\oracle\product\10.2.0\admin\ora10g\udump\orcl_ora_5504.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 11 with name "_SYSSMU11$" too small
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704

通过修改scn,让数据库顺利open

使用_allow_resetlogs_corruption导致ORA-00704/ORA-01555故障

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：使用_allow_resetlogs_corruption导致ORA-00704/ORA-01555故障

以前写过一篇乱用_allow_resetlogs_corruption参数导致悲剧的文章,昨天晚上又遇到一个朋友不谨慎使用_allow_resetlogs_corruption导致ORA-00704/ORA-01555故障
环境描述
系统环境:solaris
数据库版本:10.2.0.5.7
数据存储方式:ASM
数据量:15T以上
补充事宜:数据库SCN距离headroom只有54天

报ORA-00020错误，实例crash
数据库因为超过了系统的进程数，出现dbwn进程写数据文件异常

Sun Aug 25 16:00:41 CST 2013
Errors in file /opt/oracle/admin/orcl/bdump/orcl_dbw0_7490.trc:
ORA-01148: 无法刷新数据文件 22 的文件大小
ORA-01110: 数据文件 22: '+DATA/orcl/datafile/index_jh.dbf'
ORA-00020: 超出最大进程数 ()
Sun Aug 25 16:00:41 CST 2013
Errors in file /opt/oracle/admin/orcl/bdump/orcl_dbw0_7490.trc:
ORA-01242: 数据文件出现介质故障: 数据库处于 NOARCHIVELOG 模式
ORA-01110: 数据文件 22: '+DATA/orcl/datafile/index_jh.dbf'
Sun Aug 25 16:00:41 CST 2013
DBW0: terminating instance due to error 1242
Termination issued to instance processes. Waiting for the processes to exit
Sun Aug 25 16:00:51 CST 2013
Instance termination failed to kill one or more processes
Instance terminated by DBW0, pid = 7490

ORA-00600[kcbtema_10]
实例恢复出现ORA-00600: 内部错误代码, 参数: [kcbtema_10], [1], [], [], [], [], [], []

Sun Aug 25 19:19:23 CST 2013
ALTER DATABASE OPEN
Sun Aug 25 19:19:38 CST 2013
Beginning crash recovery of 1 threads
 parallel recovery started with 16 processes
Sun Aug 25 19:19:40 CST 2013
Started redo scan
Sun Aug 25 19:20:07 CST 2013
Completed redo scan
 12016413 redo blocks read, 93405 data blocks need recovery
Sun Aug 25 19:20:19 CST 2013
Started redo application at
 Thread 1: logseq 53681, block 1091966
Sun Aug 25 19:20:19 CST 2013
Recovery of Online Redo Log: Thread 1 Group 1 Seq 53681 Reading mem 0
  Mem# 0: +DATA/orcl/onlinelog/redo_1_1.log
  Mem# 1: +DATA/orcl/onlinelog/redo_1_2.log
Sun Aug 25 19:20:21 CST 2013
Errors in file /opt/oracle/admin/orcl/bdump/orcl_p011_16944.trc:
ORA-00600: 内部错误代码, 参数: [kcbtema_10], [1], [], [], [], [], [], []
Sun Aug 25 19:20:23 CST 2013
Errors in file /opt/oracle/admin/orcl/bdump/orcl_p011_16944.trc:
ORA-00600: 内部错误代码, 参数: [kcbtema_10], [1], [], [], [], [], [], []
Sun Aug 25 19:20:23 CST 2013
Aborting crash recovery due to slave death, attempting serial crash recovery
Sun Aug 25 19:20:23 CST 2013
Beginning crash recovery of 1 threads
Sun Aug 25 19:20:23 CST 2013
Started redo scan
Sun Aug 25 19:20:47 CST 2013
Completed redo scan
 12016413 redo blocks read, 93405 data blocks need recovery
Sun Aug 25 19:20:54 CST 2013
Started redo application at
 Thread 1: logseq 53681, block 1091966
Sun Aug 25 19:20:54 CST 2013
Recovery of Online Redo Log: Thread 1 Group 1 Seq 53681 Reading mem 0
  Mem# 0: +DATA/orcl/onlinelog/redo_1_1.log
  Mem# 1: +DATA/orcl/onlinelog/redo_1_2.log
Sun Aug 25 19:20:54 CST 2013
Errors in file /opt/oracle/admin/orcl/udump/orcl_ora_16751.trc:
ORA-00600: 内部错误代码, 参数: [kcbtema_10], [1], [], [], [], [], [], []
Sun Aug 25 19:20:56 CST 2013
Aborting crash recovery due to error 600
Sun Aug 25 19:20:56 CST 2013
Errors in file /opt/oracle/admin/orcl/udump/orcl_ora_16751.trc:
ORA-00600: 内部错误代码, 参数: [kcbtema_10], [1], [], [], [], [], [], []
ORA-600 signalled during: ALTER DATABASE OPEN...

使用隐含参数

ALTER SYSTEM SET _allow_resetlogs_corruption=TRUE SCOPE=SPFILE;

报ORA-00704/ORA-01555
因为在前面的恢复中进行了不完全恢复，因此这里加入隐含参数，然后尝试resetlogs,然后报如下错误

Sun Aug 25 20:11:54 CST 2013
alter database open resetlogs
Sun Aug 25 20:12:10 CST 2013
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 13429649847189
Resetting resetlogs activation ID 1312390734 (0x4e397e4e)
Sun Aug 25 20:16:25 CST 2013
Setting recovery target incarnation to 2
Sun Aug 25 20:16:42 CST 2013
************************************************************
Warning: The SCN headroom for this database is only 54 days!
************************************************************
Sun Aug 25 20:16:43 CST 2013
Assigning activation ID 1352200163 (0x5098efe3)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: +DATA/orcl/onlinelog/redo_1_1.log
  Current log# 1 seq# 1 mem# 1: +DATA/orcl/onlinelog/redo_1_2.log
Successful open of redo thread 1
Sun Aug 25 20:16:43 CST 2013
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sun Aug 25 20:16:52 CST 2013
SMON: enabling cache recovery
Sun Aug 25 20:16:52 CST 2013
ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 0x0c36.d582339b):
Sun Aug 25 20:16:52 CST 2013
select ctime, mtime, stime from obj$ where obj# = :1
Sun Aug 25 20:16:52 CST 2013
Errors in file /opt/oracle/admin/orcl/udump/orcl_ora_2859.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01555: 快照过旧: 回退段号 143 (名称为 "_SYSSMU143$") 过小
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Termination issued to instance processes. Waiting for the processes to exit
Sun Aug 25 20:17:02 CST 2013
Instance termination failed to kill one or more processes
Instance terminated by USER, pid = 2859
ORA-1092 signalled during: alter database open resetlogs...

数据库当前SCN

SQL > select CHECKPOINT_CHANGE# from v$database;
CHECKPOINT_CHANGE#
------------------
    13429649947222
SQL > select distinct CHECKPOINT_CHANGE# from v$datafile_header;
CHECKPOINT_CHANGE#
------------------
    13429649947222

解决方法
因为该数据库版本为10.2.0.5.7,已经包含了scn patch,因此不能使用event或者隐含参数来修改scn,而且该库容量15T以上(asm),因此也无法使用bbed修改数据文件头,最后决定使用ordebug来解决该问题
使用oradebug DUMPvar SGA kcsgscn_
使用oradebug poke

sqlplus / as sysdba
startup mount
oradebug setmypid
oradebug DUMPvar SGA kcsgscn_
oradebug poke
recover database;
alter database open;

事后总结
查询MOS,发现ORA-00600[kcbtema_10] Raised During Recovery Operations (Doc ID 472282.1)

--故障原因
The cause of this problem has been identified and verified in unpublished Bug 5184359 ORA-600 [KCBTEMA_10].
Due to this bug, during recovery, the class designation of a data block has changed.
--处理方法
SQL>startup mount
SQL>recover database;
SQL>alter database open;

因为MOS上给的解决思路在该数据库中已经无法尝试，不能确定该方法一定可行，但是对于本次的恢复过程中，没有任何直接recover database操作(只有一次不完全恢复)确实让人有无限的遗憾和可惜。对于本次应该先查询MOS，尝试该种方法，慎重使用_allow_resetlogs_corruption参数

resetlogs失败故障恢复-ORA-01555

Oracle 19C 报ORA-704 ORA-01555故障处理

ORA-00600: internal error code, arguments: [16513], [1403] 恢复

ORA-600 16513故障恢复

Exadata火线救援：10TB级数据恢复—强制拉库篇

ORA-01555 ORA-600 kdiulk:kcbz_objdchk ORA-600 kdBlkCheckError等错误恢复

在数据库open过程中常遇到ORA-01555汇总

某集团ebs数据库redo undo丢失导致悲剧

记录一次ORA-00316 ORA-00312 redo异常恢复

使用_allow_resetlogs_corruption导致ORA-00704/ORA-01555故障

标签归档：ORA-01555