ORA-00704 ORA-00604 ORA-01406故障分析

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORA-00704 ORA-00604 ORA-01406故障分析

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

根据客户反馈系统运行的是9.2.0.8版本,但是服务器上面安装有10.2.0.1服务端,由于使用过10.2启动过数据库导致现在9.2.0.8无法启动.
数据库启动报错

Starting up ORACLE RDBMS Version: 9.2.0.8.0.
System parameters with non-default values:
…………
  db_cache_size            = 209715200
  compatible               = 9.2.0.0.0
  db_file_multiblock_read_count= 16
  fast_start_mttr_target   = 300
…………
Thread 1 advanced to log sequence 7517
Thread 1 opened at log sequence 7517
  Current log# 1 seq# 7517 mem# 0: F:\ORACLE\ORADATA\ORACLE\REDO01.LOG
Successful open of redo thread 1
Tue Apr 25 21:16:20 2017
SMON: enabling cache recovery
Tue Apr 25 21:16:20 2017
Errors in file f:\oracle\admin\oracle\udump\oracle_ora_3908.trc:
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 层 1 出现错误
ORA-01406: 读取的列值被截断
Tue Apr 25 21:16:20 2017
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Tue Apr 25 21:16:20 2017
Errors in file f:\oracle\admin\oracle\bdump\oracle_pmon_2124.trc:
ORA-00704: bootstrap process failure
Tue Apr 25 21:16:20 2017
Errors in file f:\oracle\admin\oracle\bdump\oracle_reco_2556.trc:
ORA-00704: bootstrap process failure
Tue Apr 25 21:16:20 2017
Errors in file f:\oracle\admin\oracle\bdump\oracle_smon_628.trc:
ORA-00704: bootstrap process failure
Tue Apr 25 21:16:21 2017
Errors in file f:\oracle\admin\oracle\bdump\oracle_ckpt_2212.trc:
ORA-00704: bootstrap process failure
Tue Apr 25 21:16:21 2017
Errors in file f:\oracle\admin\oracle\bdump\oracle_lgwr_2756.trc:
ORA-00704: bootstrap process failure
Tue Apr 25 21:16:21 2017
Errors in file f:\oracle\admin\oracle\bdump\oracle_dbw0_1756.trc:
ORA-00704: bootstrap process failure
Instance terminated by USER, pid = 3908
ORA-1092 signalled during: ALTER DATABASE OPEN...

错误比较明显bootstrap$的相关sql在递归的时候报错(ORA-01406)

我们分析alert日志

---9.2.0.1版本运行了很长时间
Mon Apr 13 20:44:29 2009
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
SCN scheme 2
Using log_archive_dest parameter default value
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up ORACLE RDBMS Version: 9.2.0.1.0.
--然后升级到9.2.0.8
Thu Jun 18 17:32:09 2015
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
SCN scheme 2
Using log_archive_dest parameter default value
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up ORACLE RDBMS Version: 9.2.0.8.0.
System parameters with non-default values:
…………
Thu Jun 18 17:32:18 2015
SMON: enabling cache recovery
Thu Jun 18 17:32:19 2015
Successfully onlined Undo Tablespace 1.
Thu Jun 18 17:32:19 2015
SMON: enabling tx recovery
Thu Jun 18 17:32:19 2015
Database Characterset is ZHS16GBK
Updating 9.2.0.1.0 NLS parameters in sys.props$
-- adding 9.2.0.8.0 NLS parameters.
replication_dependency_tracking turned off (no async multimaster replication found)
Completed: alter database open
…………
Thu Jun 18 17:38:32 2015
Database Characterset is ZHS16GBK
Thu Jun 18 17:38:33 2015
ALTER SYSTEM SET _system_trig_enabled=FALSE SCOPE=MEMORY;
Thu Jun 18 17:38:33 2015
ALTER SYSTEM SET job_queue_processes=0 SCOPE=MEMORY;
Thu Jun 18 17:38:33 2015
ALTER SYSTEM SET aq_tm_processes=0 SCOPE=MEMORY;
replication_dependency_tracking turned off (no async multimaster replication found)
Completed: ALTER DATABASE OPEN MIGRATE
--再升级到10.2.0.1
ksdpec: called for event 13740 prior to event group initialization
Starting up ORACLE RDBMS Version: 10.2.0.1.0.
System parameters with non-default values:
  processes                = 150
  timed_statistics         = TRUE
  sga_max_size             = 1610612736
  shared_pool_size         = 209715200
  large_pool_size          = 8388608
  java_pool_size           = 159383552
  streams_pool_size        = 50331648
  control_files            = F:\ORACLE\ORADATA\ORACLE\CONTROL01.CTL
  db_block_size            = 8192
  db_cache_size            = 209715200
  compatible               = 9.2.0.0.0
…………
Thu Jun 18 19:43:30 2015
Database Characterset is ZHS16GBK
Updating 9.2.0.8.0 NLS parameters in sys.props$
-- adding 10.2.0.1.0 NLS parameters.
…………
Thu Jun 18 19:43:44 2015
ALTER SYSTEM enable restricted session;
MMNL started with pid=12, OS id=5212
Thu Jun 18 19:43:44 2015
ALTER SYSTEM SET _system_trig_enabled=FALSE SCOPE=MEMORY;
Thu Jun 18 19:43:44 2015
ALTER SYSTEM SET aq_tm_processes=0 SCOPE=MEMORY;
Thu Jun 18 19:43:44 2015
ALTER SYSTEM SET resource_manager_plan='' SCOPE=MEMORY;
Thu Jun 18 19:43:44 2015
replication_dependency_tracking turned off (no async multimaster replication found)
Completed: ALTER DATABASE OPEN MIGRATE

这里很明显数据库从2009年4月开始9.2.0.1版本开始运行,然后到2015年6月18日升级到9.2.0.8版本,紧接着升级到10.2.0.1(升级8.2.0.8是为升级10.2.0.1的中间过度操作).然后这个库一直使用10.2.0.1版本运行,这次重启不知道什么原因客户以为是9.2.0.8的数据库版本,然后不管怎么样也无法启动成功(这里不知道什么原因win 服务中使用的9.2.0.8的软件,估计被人误操作了).解决该问题,就是把服务切换成10.2.0.1版本数据库正常启动.
再次提醒大家,在oracle恢复的过程中,需要仔细分析日志,日志不会骗人,不要轻信客户的现场描述

ORA 600 3005恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORA 600 3005恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

数据库打开报ora-600 3005错误

D:\>sqlplus / as sysdba
SQL*Plus: Release 11.2.0.1.0 Production on 星期二 3月 7 23:04:25 2017
Copyright (c) 1982, 2010, Oracle.  All rights reserved.
连接到:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> recover datafile 1;
完成介质恢复。
SQL> recover datafile 2;
完成介质恢复。
SQL> recover datafile 3;
完成介质恢复。
SQL> recover datafile 4;
完成介质恢复。
SQL> recover datafile 5;
完成介质恢复。
SQL> recover datafile 6;
完成介质恢复。
SQL> alter database open;
alter database open
*
第 1 行出现错误:
ORA-00600: 内部错误代码, 参数: [3005], [1], [8242], [29937], [0], [0], [], [],
[], [], [], []

查询数据库信息

SQL> SELECT status,
  2  checkpoint_change#,
  3  checkpoint_time,FUZZY,
  4  count(*) ROW_NUM
  5  FROM v$datafile_header
  6  GROUP BY status, checkpoint_change#, checkpoint_time,fuzzy
  7  ORDER BY status, checkpoint_change#, checkpoint_time;
STATUS         CHECKPOINT_CHANGE# CHECKPOINT_TIM FUZZY     ROW_NUM
-------------- ------------------ -------------- ------ ----------
ONLINE                  227036249 06-3月 -17     NO              5
ONLINE                  227036252 06-3月 -17     NO              1
SQL> set numw 16
SQL> SELECT status,
  2  checkpoint_change#,
  3  checkpoint_time,last_change#,
  4  count(*) ROW_NUM
  5  FROM v$datafile
  6  GROUP BY status, checkpoint_change#, checkpoint_time,last_change#
  7  ORDER BY status, checkpoint_change#, checkpoint_time;
STATUS         CHECKPOINT_CHANGE# CHECKPOINT_TIM     LAST_CHANGE#
-------------- ------------------ -------------- ----------------
         ROW_NUM
----------------
ONLINE                  227036249 06-3月 -17
               4
ONLINE                  227036252 06-3月 -17
               1
SYSTEM                  227036249 06-3月 -17
               1

mos上关于ora-600 3005描述

VERSIONS:
versions 10.2 and later
DESCRIPTION:
Raised during pass one of the two pass recovery processing, which
reads and merges open redo threads into a hash table of blocks
that need recovery.
During examination of the the change vectors of online redologs, this
error is raised if no online redo log could be opened to cover the start RBA.
ARGUMENTS:
Arg [a] Thread
Arg [b] Redo Log File Sequence
Arg {c} Redo Log File Block Number
Arg [d] SCN Wrap
Arg [e] SCN Base

根据官方描述,出现该错误的原因是由于在数据库启动的过程中,通过控制文件读取的redo信息不匹配,从而出现该问题,通过重建控制文件可以绕过去该问题

SQL> shutdown immediate;
ORA-01109: 数据库未打开
已经卸载数据库。
ORACLE 例程已经关闭。
SQL> startup nomount pfile='d:/pfile.txt'
ORACLE 例程已经启动。
Total System Global Area      10288615424 bytes
Fixed Size                        2184672 bytes
Variable Size                  7482640928 bytes
Database Buffers               2785017856 bytes
Redo Buffers                     18771968 bytes
SQL> CREATE CONTROLFILE REUSE DATABASE "ORACLEDO" NORESETLOGS  NOARCHIVELOG
  2      MAXLOGFILES 16
  3      MAXLOGMEMBERS 3
  4      MAXDATAFILES 100
  5      MAXINSTANCES 8
  6      MAXLOGHISTORY 2336
  7  LOGFILE
  8    GROUP 1 'D:\APP\ADMINISTRATOR\ORADATA\ORACLEDOC\REDO01.LOG'  SIZE 50M BLOCKSIZE 512,
  9    GROUP 2 'D:\APP\ADMINISTRATOR\ORADATA\ORACLEDOC\REDO02.LOG'  SIZE 50M BLOCKSIZE 512,
 10    GROUP 3 'D:\APP\ADMINISTRATOR\ORADATA\ORACLEDOC\REDO03.LOG'  SIZE 50M BLOCKSIZE 512
 11  DATAFILE
 12    'D:\APP\ADMINISTRATOR\ORADATA\ORACLEDOC\SYSTEM01.DBF',
 13    'D:\APP\ADMINISTRATOR\ORADATA\ORACLEDOC\SYSAUX01.DBF',
 14    'D:\APP\ADMINISTRATOR\ORADATA\ORACLEDOC\UNDOTBS01.DBF',
 15    'D:\APP\ADMINISTRATOR\ORADATA\ORACLEDOC\USERS01.DBF',
 16    'D:\APP\ADMINISTRATOR\ORADATA\ORACLEDOC\XIFENFEI01.DBF',
 17    'D:\APP\ADMINISTRATOR\ORADATA\ORACLEDOC\XIFENFEI0102.DBF'
 18  CHARACTER SET AL32UTF8
 19  ;
控制文件已创建。
SQL> recover database;
完成介质恢复。
SQL> alter database open;
alter database open
*
第 1 行出现错误:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [],
[], [], [], []
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [],
[], [], [], []
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [],
[], [], [], []
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [],
[], [], [], []
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [],
[], [], [], []
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [],
[], [], [], []
进程 ID: 4036
会话 ID: 96 序列号: 1

这个错误就比较熟悉了,按照undo异常方案处理即可
补充说明
ora-600 3005的错误可能需要internal 帐号才能够查询到准确描述和处理方法,其实在这个库的运行最后crash之前,就已经报了控制文件异常,然后库crash掉了.

Mon Mar 06 10:16:37 2017
Thread 1 advanced to log sequence 8242 (LGWR switch)
  Current log# 1 seq# 8242 mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORACLEDOC\REDO01.LOG
Mon Mar 06 11:06:31 2017
********************* ATTENTION: ********************
 The controlfile header block returned by the OS
 has a sequence number that is too old.
 The controlfile might be corrupted.
 PLEASE DO NOT ATTEMPT TO START UP THE INSTANCE
 without following the steps below.
 RE-STARTING THE INSTANCE CAN CAUSE SERIOUS DAMAGE
 TO THE DATABASE, if the controlfile is truly corrupted.
 In order to re-start the instance safely,
 please do the following:
 (1) Save all copies of the controlfile for later
     analysis and contact your OS vendor and Oracle support.
 (2) Mount the instance and issue:
     ALTER DATABASE BACKUP CONTROLFILE TO TRACE;
 (3) Unmount the instance.
 (4) Use the script in the trace file to
     RE-CREATE THE CONTROLFILE and open the database.
*****************************************************
MMON (ospid: 3320): terminating the instance
Mon Mar 06 11:06:32 2017
opiodr aborting process unknown ospid (1528) as a result of ORA-1092
Mon Mar 06 11:06:32 2017
ORA-1092 : opitsk aborting process
Mon Mar 06 11:06:32 2017
opiodr aborting process unknown ospid (2852) as a result of ORA-1092
Mon Mar 06 11:06:32 2017
ORA-1092 : opitsk aborting process
Mon Mar 06 11:06:33 2017
opiodr aborting process unknown ospid (3836) as a result of ORA-1092
Mon Mar 06 11:06:33 2017
ORA-1092 : opitsk aborting process
Instance terminated by MMON, pid = 3320

KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type]

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type]

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在oracle asm的使用过程中由于操作系统层面的错误操作导致asm disk 被破坏,这里列举了几种破坏之后的kfed报错现象(KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type])
asm mount 磁盘组报错(ORA-15040 ORA-15042)

SQL> alter diskgroup DATA mount;
alter diskgroup DATA mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "2" is missing from group number "2"

asm alert日志报错(ORA-15335 ORA-15066 ORA-15196等)

ORA-15335: ASM metadata corruption detected in disk group 'DATA'
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15066: offlining disk "DATA_0002" in group "DATA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483651] [48] [0 != 1]

kfed查看磁盘头报错
文件文件头(不光是disk header的4k,可能是连续的几个au,甚至更多)可能彻底损坏,一般kfed 读取都会看到KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type]之类错误

[oracle@fcomtaep2 disks]$ kfed read ASMRECO03
kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 0 ; 0x001: 0x00
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 0 ; 0x004: blk=0
kfbh.block.obj: 0 ; 0x008: file=0
kfbh.check: 0 ; 0x00c: 0x00000000
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
7FC18D899400 00000000 00000000 00000000 00000000 [................]
  Repeat 27 times
7FC18D8995C0 FEEE0001 0001FFFF FFFF0000 00000FFF [................]
7FC18D8995D0 00000000 00000000 00000000 00000000 [................]
  Repeat 1 times
7FC18D8995F0 00000000 00000000 00000000 AA550000 [..............U.]
7FC18D899600 20494645 54524150 00010000 0000005C [EFI PART....\...] <==== **** Here ******
7FC18D899610 BD82BBB3 00000000 00000001 00000000 [................]
7FC18D899620 0FFFFFFF 00000000 00000022 00000000 [........".......]
7FC18D899630 0FFFFFDE 00000000 FD8857E5 42D7B49B [.........W.....B]
7FC18D899640 0901FA87 6B3DB5AA 00000002 00000000 [......=k........]
7FC18D899650 00000080 00000080 FE48EB77 00000000 [........w.H.....]
7FC18D899660 00000000 00000000 00000000 00000000 [................]
  Repeat 25 times
7FC18D899800 EBD0A0A2 4433B9E5 B668C087 C79926B7 [......3D..h..&..]
7FC18D899810 5381F6DF 4626F988 0E4F468D D78D3B28 [...S..&F.FO.(;..]
7FC18D899820 000007A1 00000000 0FFFF85F 00000000 [........_.......]
7FC18D899830 00000000 00000000 00720070 006D0069 [........p.r.i.m.]
7FC18D899840 00720061 00000079 00000000 00000000 [a.r.y...........]
7FC18D899850 00000000 00000000 00000000 00000000 [................]
 Repeat 186 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

“EFI PART”是分区的元数据,一般是被分区导致asm disk损坏.

[ebernal@dbaasm new2]$ kfed read emcpowerl | head -25
kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 0 ; 0x001: 0x00
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 0 ; 0x004: blk=0
kfbh.block.obj: 0 ; 0x008: file=0
kfbh.check: 0 ; 0x00c: 0x00000000
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
2ABD671E9400 00000000 00000000 00000000 00000000 [................]
  Repeat 31 times
2ABD671E9600 4542414C 454E4F4C 00000001 00000000 [LABELONE........]
2ABD671E9610 E4E1DDB1 00000020 324D564C 31303020 [.... ...LVM2 001] <==== **** Here ******
2ABD671E9620 50365A77 71327874 34303156 4B4E6136 [wZ6Ptx2qV1046aNK]
2ABD671E9630 35395159 5147634C 487A5A38 63575A37 [YQ95LcGQ8ZzH7ZWc]
2ABD671E9640 00000000 00000019 00030000 00000000 [................]
2ABD671E9650 00000000 00000000 00000000 00000000 [................]
2ABD671E9660 00000000 00000000 00001000 00000000 [................]
2ABD671E9670 0002F000 00000000 00000000 00000000 [................]
2ABD671E9680 00000000 00000000 00000000 00000000 [................]
  Repeat 215 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

“LVM2 001” 是逻辑卷的名字,该asm disk很可能被做为lvm管理而被破坏

[ebernal@dbaasm tars]$ kfed read rhdisk16
kfbh.endian:                         65 ; 0x000: 0x41
kfbh.hard:                           73 ; 0x001: 0x49
kfbh.type:                           88 ; 0x002: *** Unknown Enum ***
kfbh.datfmt:                         32 ; 0x003: 0x20
kfbh.block.blk:              1111709260 ; 0x004: blk=1111709260
kfbh.block.obj:              1634861056 ; 0x008: file=131072
kfbh.check:                         119 ; 0x00c: 0x00000077
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
2B6FE2AC1400 20584941 4243564C 61720000 00000077  [AIX LVCB..raw...] <==== **** Here ******
2B6FE2AC1410 00000000 00000000 00000000 00000000  [................]
2B6FE2AC1420 00000000 00000000 30300000 38306430  [..........000d08]
2B6FE2AC1430 30306131 34643030 30303030 31303030  [1a0000d400000001]
2B6FE2AC1440 61006533 766C6D73 7461645F 00003161  [3e.asmlv_data1..]
2B6FE2AC1450 00000000 00000000 00000000 00000000  [................]
        Repeat 2 times
2B6FE2AC1480 54000000 4D206575 20207961 31312037  [...Tue May  7 11]
2B6FE2AC1490 3A33343A 32203633 0A333130 00000000  [:43:36 2013.....]
2B6FE2AC14A0 65755400 79614D20 20372020 343A3131  [.Tue May  7 11:4]
2B6FE2AC14B0 34323A38 31303220 00000A33 44000000  [8:24 2013......D]
2B6FE2AC14C0 41313830 30303444 6D6D7900 02007900  [081AD400.ymm.y..]
2B6FE2AC14D0 0100E40C 656E6F4E 00000000 00000000  [....None........]
2B6FE2AC14E0 00000000 00000000 00000000 00000000  [................]
        Repeat 14 times
2B6FE2AC15D0 00000000 00000000 65310000 61653934  [..........1e49ea]
2B6FE2AC15E0 342E3862 00000000 00000000 00000000  [b8.4............]
2B6FE2AC15F0 00000000 00000000 00000000 00000000  [................]
  Repeat 224 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][88]

这里的“AIX LVCB..raw” 是AIX OS volume 的元数据库,也就是说,asm disk 被作为了aix os层面破坏

[oracle@dbep2 disks]$ kfed read asm-disk3
kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 0 ; 0x001: 0x00
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 0 ; 0x004: blk=0
kfbh.block.obj: 0 ; 0x008: file=0
kfbh.check: 0 ; 0x00c: 0x00000000
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
06000000 00000000 00000000 00000000 00000000 [................]
  Repeat 25 times
0602100 51e2b7f6 00ed4e00 00000000 00000001  [...Q.N..........]
0602120 00000000 0000000b 00000100 0000003c [............<...]
0602140 00000242 0000007b 5d8468e7 6147782a [B...{....h.]*xGa]
0602160 d17851a2 327552e2 00000000 00000000 [.Qx..Ru2........]
0602200 00000000 00000000 3130752f 91a4f000 [......../u01....] <==== **** Here ******
0602220 ff8808e4 d5104cff 000000ac 00000100 [.....L..........]
0602240 00000000 00000000 00000000 09d18000 [................]
  Repeat 254 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][88]

这里的/u01很可能表明该asm disk被文件系统覆盖

对于asm disk的各种破坏情况,如果是normal/high冗余,那么asm dg没有问题,可以考虑通过删除异常盘,然后重新加入;如果是外部冗余遭遇到asm disk 被破坏,一般asm disk 会dismount,而且无法正常mount,如果有备份的磁盘头,可以尝试还原磁盘头,mount 磁盘组,然后只读方式迁移数据;如果没有备份磁盘头或者还原之后也无法mount,可能需要通过一些额外的方式处理比如通过工具在asm dismount状态下恢复数据文件,甚至通过对asm block/oracle block碎片重组的方式恢复数据.参考相关文章:
oracle asm系列文章汇总
pvid=yes导致asm无法mount
asm disk header 彻底损坏恢复
分区无法识别导致asm diskgroup无法mount
oracle asm disk格式化恢复—格式化为ext4文件系统
oracle asm disk格式化恢复—格式化为ntfs文件系统
asm disk误设置pvid导致asm diskgroup无法mount恢复
分享oracleasm createdisk重新创建asm disk后数据0丢失恢复案例
ORA-15042: ASM disk “N” is missing from group number “M” 故障恢复
如果您遇到此类情况,无法解决请联系我们,提供专业ORACLE数据库恢复技术支持
Phone:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

解决CON$ ORA-600 kdsgrp1错误

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:解决CON$ ORA-600 kdsgrp1错误

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

数据库报ORA 600 kdsgrp1错误
数据库报ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []错

Thread 1 advanced to log sequence 23861 (LGWR switch)
  Current log# 7 seq# 23861 mem# 0: /oradata/easdb/redo07.log
Tue Nov 15 10:00:42 2016
Errors in file /u01/oracle/diag/rdbms/easdb/easdb/trace/easdb_dw00_3165.trc  (incident=908262):
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/oracle/diag/rdbms/easdb/easdb/incident/incdir_908262/easdb_dw00_3165_i908262.trc
Tue Nov 15 10:00:55 2016
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Tue Nov 15 10:00:56 2016
Errors in file /u01/oracle/diag/rdbms/easdb/easdb/trace/easdb_dw00_3165.trc  (incident=908263):
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
ORA-06512: at "SYS.KUPW$WORKER", line 1751
ORA-06512: at line 2
Incident details in: /u01/oracle/diag/rdbms/easdb/easdb/incident/incdir_908263/easdb_dw00_3165_i908263.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
DW00 terminating with fatal err=600, pid=40, wid=1, job SYSTEM.
Tue Nov 15 10:01:01 2016
Thread 1 advanced to log sequence 23862 (LGWR switch)
  Current log# 2 seq# 23862 mem# 0: /oradata/easdb/redo02.log
Tue Nov 15 10:01:23 2016
Errors in file /u01/oracle/diag/rdbms/easdb/easdb/trace/easdb_dm00_3163.trc  (incident=908254):
ORA-31671: Worker process DW00 had an unhandled exception.
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
ORA-06512: at "SYS.KUPW$WORKER", line 1751
ORA-06512: at line 2
Incident details in: /u01/oracle/diag/rdbms/easdb/easdb/incident/incdir_908254/easdb_dm00_3163_i908254.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Tue Nov 15 10:01:26 2016
Tue Nov 15 10:01:28 2016
Thread 1 advanced to log sequence 23863 (LGWR switch)
  Current log# 4 seq# 23863 mem# 0: /oradata/easdb/redo04.log

trace文件中信息

*** 2016-11-15 10:00:35.977
* kdsgrp1-1: *************************************************
            row 0x004459e6.26 continuation at
            0x004459e6.26 file# 1 block# 285158 slot 38 not found
KDSTABN_GET: 0 ..... ntab: 1
curSlot: 38 ..... nrows: 208
kdsgrp - dump CR block dba=0x004459e6
Block header dump:  0x004459e6
 Object id on Block? Y
 seg/obj: 0x1c  csc: 0x01.c712f743  itc: 3  flg: -  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000b.015.0036d715  0x00c01bba.0fbd.02  C---    0  scn 0x0001.c6b4cb1a
0x02   0x000c.004.00044d36  0x04c0dd93.3eec.33  C---    0  scn 0x0001.c6d2c65b
0x03   0x000d.008.00008eb9  0x04c0777a.10e3.02  --U-    2  fsc 0x0056.c7346f21

确定报错对象和确认异常

SQL> select object_name from dba_objects where object_id=28;
OBJECT_NAME
---------------------------------------------------------
CON$
SQL> ANALYZE TABLE sys.CON$ VALIDATE STRUCTURE CASCADE online;
ANALYZE TABLE sys.CON$ VALIDATE STRUCTURE CASCADE online
*
ERROR at line 1:
ORA-01499: table/index cross reference failure - see trace file
SQL> SET LINES 122
SQL> COL INDEX_OWNER FOR A20
SQL> COL INDEX_NAME FOR A30
SQL> COL TABLE_OWNER FOR A20
SQL> COL COLUMN_NAME FOR A25
SQL> SELECT TABLE_OWNER,INDEX_NAME,COLUMN_NAME,COLUMN_POSITION
2  FROM Dba_Ind_Columns
3  WHERE table_name = upper('&TABLE_NAME') order by TABLE_OWNER,INDEX_OWNER,INDEX_NAME,COLUMN_POSITION;
Enter value for table_name: CON$
old   3:  WHERE table_name = upper('&TABLE_NAME') order by TABLE_OWNER,INDEX_OWNER,INDEX_NAME,COLUMN_POSITION
new   3:  WHERE table_name = upper('CON$') order by TABLE_OWNER,INDEX_OWNER,INDEX_NAME,COLUMN_POSITION
TABLE_OWNER	     INDEX_NAME 		    COLUMN_NAME 	      COLUMN_POSITION
-------------------- ------------------------------ ------------------------- ---------------
SYS		     I_CON1			    OWNER#				    1
SYS		     I_CON1			    NAME				    2
SYS		     I_CON2			    CON#				    1
SQL> select owner#,name from con$
2    minus
3   select /*+ full(t) */owner#,name from con$ t;
no rows selected
SQL> select /*+ full(t) */owner#,name from con$ t
2    minus
3   select owner#,name from con$  ;
no rows selected
SQL> select /*+ full(t) */ con# from con$ t
2    minus
3   select con# from con$ ;
no rows selected
SQL> select con# from con$
2    minus
3   select /*+ full(t) */ con# from con$ t   ;
      CON#
----------
   1037224
   1037225
   1037386
   1037387
   1037388
   ……
   1037846
62 rows selected.

通过上述分析,可以确定是由于CON$和I_CON2数据不一致,而且是index的数据比表中多了62条.针对这样情况,考虑通过重建index来解决.

尝试rebuild index

SQL> alter index I_CON2 rebuild online;
alter index I_CON2 rebuild online
*
ERROR at line 1:
ORA-00701: object necessary for warmstarting database cannot be altered
SQL>
SQL>
SQL>
SQL>
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup upgrade
ORACLE instance started.
Total System Global Area 2421825536 bytes
Fixed Size                  2215744 bytes
Variable Size            1828716736 bytes
Database Buffers          570425344 bytes
Redo Buffers               20467712 bytes
Database mounted.
Database opened.
SQL> alter index I_CON2 rebuild;
alter index I_CON2 rebuild
*
ERROR at line 1:
ORA-00701: object necessary for warmstarting database cannot be altered

因为是数据库核心index,无法直接rebuild解决,只能通过bootstrap$核心index(I_OBJ1,I_USER1,I_FILE#_BLOCK#,I_IND1,I_TS#,I_CDEF1等)异常恢复—ORA-00701错误解决 方式解决

File #N is offline, but is part of an online tablespace

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:File #N is offline, but is part of an online tablespace

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在看一个客户的数据库恢复日志的时候发现类似警告(File #N is offline, but is part of an online tablespace.),以前没有注意,这次通过试验来重现该部分内容
创建表空间

SQL> create tablespace readonly datafile '/home/oracle/.oradata/test/readonly01.dbf' size 128M;
Tablespace created.
SQL> alter tablespace  readonly add datafile '/home/oracle/.oradata/test/readonly02.dbf' size 128M;
Tablespace altered.

写入数据

SQL> create table t_readonly tablespace readonly as select * from dba_objects;
Table created.

read only 表空间

SQL> select count(*) from t_readonly;
  COUNT(*)
----------
     72226
SQL> alter system switch logfile;
System altered.
SQL> /
System altered.
SQL> /
System altered.
SQL>  alter tablespace readonly  read only;
Tablespace altered.
SQL> alter system switch logfile;
System altered.
SQL> /
System altered.
SQL> /
System altered.
SQL> /
System altered.

备份数据库

[oracle@localhost ~]$ rman target /
Recovery Manager: Release 11.2.0.1.0 - Production on Tue Nov 1 21:15:51 2016
Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.
connected to target database: TEST (DBID=2210907828)
RMAN> backup database format '/home/oracle/full_%U.rman';
Starting backup at 01-NOV-16
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=197 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=/home/oracle/.oradata/test/system01.dbf
input datafile file number=00002 name=/home/oracle/.oradata/test/sysaux01.dbf
input datafile file number=00005 name=/home/oracle/.oradata/test/test01.dbf
input datafile file number=00006 name=/home/oracle/.oradata/test/test02.dbf
input datafile file number=00007 name=/home/oracle/.oradata/test/readonly01.dbf
input datafile file number=00008 name=/home/oracle/.oradata/test/readonly02.dbf
input datafile file number=00003 name=/home/oracle/.oradata/test/undotbs01.dbf
input datafile file number=00004 name=/home/oracle/.oradata/test/users01.dbf
channel ORA_DISK_1: starting piece 1 at 01-NOV-16
channel ORA_DISK_1: finished piece 1 at 01-NOV-16
piece handle=/home/oracle/full_03rjrp0t_1_1.rman tag=TAG20161101T211613 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:25
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current control file in backup set
including current SPFILE in backup set
channel ORA_DISK_1: starting piece 1 at 01-NOV-16
channel ORA_DISK_1: finished piece 1 at 01-NOV-16
piece handle=/home/oracle/full_04rjrp1m_1_1.rman tag=TAG20161101T211613 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 01-NOV-16
RMAN> sql 'alter system archive log current';
backup  as compressed backupset archivelog  all format '/home/oracle/arch_%T_%U.rman'  delete input;
sql statement: alter system archive log current
RMAN>
Starting backup at 01-NOV-16
current log archived
using channel ORA_DISK_1
channel ORA_DISK_1: starting compressed archived log backup set
channel ORA_DISK_1: specifying archived log(s) in backup set
input archived log thread=1 sequence=494 RECID=1 STAMP=926802386
input archived log thread=1 sequence=495 RECID=2 STAMP=926802386
input archived log thread=1 sequence=496 RECID=3 STAMP=926802389
input archived log thread=1 sequence=497 RECID=4 STAMP=926802693
input archived log thread=1 sequence=498 RECID=5 STAMP=926802693
input archived log thread=1 sequence=499 RECID=6 STAMP=926802696
input archived log thread=1 sequence=500 RECID=7 STAMP=926802787
input archived log thread=1 sequence=501 RECID=8 STAMP=926802789
input archived log thread=1 sequence=502 RECID=9 STAMP=926802792
input archived log thread=1 sequence=503 RECID=10 STAMP=926802793
input archived log thread=1 sequence=504 RECID=11 STAMP=926802812
input archived log thread=1 sequence=505 RECID=12 STAMP=926802813
input archived log thread=1 sequence=506 RECID=13 STAMP=926802816
input archived log thread=1 sequence=507 RECID=14 STAMP=926803076
input archived log thread=1 sequence=508 RECID=15 STAMP=926803077
channel ORA_DISK_1: starting piece 1 at 01-NOV-16
channel ORA_DISK_1: finished piece 1 at 01-NOV-16
piece handle=/home/oracle/arch_20161101_05rjrp45_1_1.rman tag=TAG20161101T211757 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:03
channel ORA_DISK_1: deleting archived log(s)
archived log file name=/opt/oracle/flash_recovery_area/TEST/archivelog/2016_11_01/o1_mf_1_494_d1k4tkot_.arc RECID=1 STAMP=926802386
archived log file name=/opt/oracle/flash_recovery_area/TEST/archivelog/2016_11_01/o1_mf_1_495_d1k4tln7_.arc RECID=2 STAMP=926802386
archived log file name=/opt/oracle/flash_recovery_area/TEST/archivelog/2016_11_01/o1_mf_1_496_d1k4tot5_.arc RECID=3 STAMP=926802389
archived log file name=/opt/oracle/flash_recovery_area/TEST/archivelog/2016_11_01/o1_mf_1_497_d1k544w3_.arc RECID=4 STAMP=926802693
archived log file name=/opt/oracle/flash_recovery_area/TEST/archivelog/2016_11_01/o1_mf_1_498_d1k545wc_.arc RECID=5 STAMP=926802693
archived log file name=/opt/oracle/flash_recovery_area/TEST/archivelog/2016_11_01/o1_mf_1_499_d1k548bm_.arc RECID=6 STAMP=926802696
archived log file name=/opt/oracle/flash_recovery_area/TEST/archivelog/2016_11_01/o1_mf_1_500_d1k5734v_.arc RECID=7 STAMP=926802787
archived log file name=/opt/oracle/flash_recovery_area/TEST/archivelog/2016_11_01/o1_mf_1_501_d1k5752s_.arc RECID=8 STAMP=926802789
archived log file name=/opt/oracle/flash_recovery_area/TEST/archivelog/2016_11_01/o1_mf_1_502_d1k578c2_.arc RECID=9 STAMP=926802792
archived log file name=/opt/oracle/flash_recovery_area/TEST/archivelog/2016_11_01/o1_mf_1_503_d1k579hy_.arc RECID=10 STAMP=926802793
archived log file name=/opt/oracle/flash_recovery_area/TEST/archivelog/2016_11_01/o1_mf_1_504_d1k57w6s_.arc RECID=11 STAMP=926802812
archived log file name=/opt/oracle/flash_recovery_area/TEST/archivelog/2016_11_01/o1_mf_1_505_d1k57xj1_.arc RECID=12 STAMP=926802813
archived log file name=/opt/oracle/flash_recovery_area/TEST/archivelog/2016_11_01/o1_mf_1_506_d1k580hj_.arc RECID=13 STAMP=926802816
archived log file name=/opt/oracle/flash_recovery_area/TEST/archivelog/2016_11_01/o1_mf_1_507_d1k5j4q0_.arc RECID=14 STAMP=926803076
archived log file name=/opt/oracle/flash_recovery_area/TEST/archivelog/2016_11_01/o1_mf_1_508_d1k5j4yq_.arc RECID=15 STAMP=926803077
Finished backup at 01-NOV-16
RMAN> backup  format '/home/oracle/ctl_%T_%U.rman' current controlfile;
Starting backup at 01-NOV-16
using channel ORA_DISK_1
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current control file in backup set
channel ORA_DISK_1: starting piece 1 at 01-NOV-16
channel ORA_DISK_1: finished piece 1 at 01-NOV-16
piece handle=/home/oracle/ctl_20161101_06rjrp75_1_1.rman tag=TAG20161101T211933 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 01-NOV-16

清理环境还原数据库

RMAN> shutdown immediate;
database closed
database dismounted
Oracle instance shut down
RMAN> exit
Recovery Manager complete.
[oracle@localhost .oradata]$ mv test  test_20161101
[oracle@localhost .oradata]$ mkdir test

还原数据库

[oracle@localhost ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.1.0 Production on Tue Nov 1 21:21:09 2016
Copyright (c) 1982, 2009, Oracle.  All rights reserved.
Connected to an idle instance.
SQL> startup nomount;
ORACLE instance started.
Total System Global Area 2421825536 bytes
Fixed Size                  2215744 bytes
Variable Size            1795162304 bytes
Database Buffers          603979776 bytes
Redo Buffers               20467712 bytes
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
[oracle@localhost ~]$ rman target /
Recovery Manager: Release 11.2.0.1.0 - Production on Tue Nov 1 21:21:22 2016
Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.
connected to target database: TEST (not mounted)
RMAN> restore controlfile from '/home/oracle/ctl_20161101_06rjrp75_1_1.rman';
Starting restore at 01-NOV-16
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=63 device type=DISK
channel ORA_DISK_1: restoring control file
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
output file name=/home/oracle/.oradata/test/control01.ctl
Finished restore at 01-NOV-16
RMAN> alter database mount;
database mounted
released channel: ORA_DISK_1
RMAN> restore database;
Starting restore at 01-NOV-16
Starting implicit crosscheck backup at 01-NOV-16
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=63 device type=DISK
Crosschecked 3 objects
Finished implicit crosscheck backup at 01-NOV-16
Starting implicit crosscheck copy at 01-NOV-16
using channel ORA_DISK_1
Finished implicit crosscheck copy at 01-NOV-16
searching for all files in the recovery area
cataloging files...
no files cataloged
using channel ORA_DISK_1
datafile 5 not processed because file is offline
datafile 6 not processed because file is offline
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00001 to /home/oracle/.oradata/test/system01.dbf
channel ORA_DISK_1: restoring datafile 00002 to /home/oracle/.oradata/test/sysaux01.dbf
channel ORA_DISK_1: restoring datafile 00003 to /home/oracle/.oradata/test/undotbs01.dbf
channel ORA_DISK_1: restoring datafile 00004 to /home/oracle/.oradata/test/users01.dbf
channel ORA_DISK_1: restoring datafile 00007 to /home/oracle/.oradata/test/readonly01.dbf
channel ORA_DISK_1: restoring datafile 00008 to /home/oracle/.oradata/test/readonly02.dbf
channel ORA_DISK_1: reading from backup piece /home/oracle/full_03rjrp0t_1_1.rman
channel ORA_DISK_1: piece handle=/home/oracle/full_03rjrp0t_1_1.rman tag=TAG20161101T211613
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:15
Finished restore at 01-NOV-16
RMAN> exit
Recovery Manager complete.

检查数据库文件
read_only1


通过Oracle Database Recovery Check很明显发现,这里看到文件状态是read only的.
recover database

[oracle@localhost ~]$ rman target /
Recovery Manager: Release 11.2.0.1.0 - Production on Tue Nov 1 21:28:14 2016
Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.
connected to target database: TEST (DBID=2210907828, not open)
RMAN> recover database;
Starting recover at 01-NOV-16
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=63 device type=DISK
datafile 7 not processed because file is read-only    <<<<=====注意
datafile 8 not processed because file is read-only    <<<<=====注意
starting media recovery
channel ORA_DISK_1: starting archived log restore to default destination
channel ORA_DISK_1: restoring archived log
archived log thread=1 sequence=507
channel ORA_DISK_1: restoring archived log
archived log thread=1 sequence=508
channel ORA_DISK_1: reading from backup piece /home/oracle/arch_20161101_05rjrp45_1_1.rman
channel ORA_DISK_1: piece handle=/home/oracle/arch_20161101_05rjrp45_1_1.rman tag=TAG20161101T211757
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
archived log file name=/opt/oracle/flash_recovery_area/TEST/archivelog/2016_11_01/o1_mf_1_507_d1k63pj8_.arc thread=1 sequence=507
channel default: deleting archived log(s)
archived log file name=/opt/oracle/flash_recovery_area/TEST/archivelog/2016_11_01/o1_mf_1_507_d1k63pj8_.arc RECID=16 STAMP=926803702
archived log file name=/opt/oracle/flash_recovery_area/TEST/archivelog/2016_11_01/o1_mf_1_508_d1k63pww_.arc thread=1 sequence=508
channel default: deleting archived log(s)
archived log file name=/opt/oracle/flash_recovery_area/TEST/archivelog/2016_11_01/o1_mf_1_508_d1k63pww_.arc RECID=17 STAMP=926803702
unable to find archived log
archived log thread=1 sequence=509
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 11/01/2016 21:28:24
RMAN-06054: media recovery requesting unknown archived log for thread 1 with sequence 509 and starting SCN of 8933048
RMAN> exit

resetlogs 打开数据库

SQL> set numw 16
SQL> SELECT status,
checkpoint_change#,
  2    3  checkpoint_time,last_change#,
count(*) ROW_NUM
FROM v$datafile
GROUP BY status, checkpoint_change#, checkpoint_time,last_change#
ORDER BY status, checkpoint_change#, checkpoint_time;  4    5    6    7
STATUS  CHECKPOINT_CHANGE# CHECKPOIN     LAST_CHANGE#          ROW_NUM
------- ------------------ --------- ---------------- ----------------
ONLINE             8932792 01-NOV-16                                 2
ONLINE             8933048 01-NOV-16                                 3
SYSTEM             8933048 01-NOV-16                                 1
SQL> SELECT status,
  2  checkpoint_change#,
  3  checkpoint_time,FUZZY,
  4  count(*) ROW_NUM
  5  FROM v$datafile_header
GROUP BY status, checkpoint_change#, checkpoint_time,fuzzy
ORDER BY status, checkpoint_change#, checkpoint_time;
  6    7
STATUS  CHECKPOINT_CHANGE# CHECKPOIN FUZ          ROW_NUM
------- ------------------ --------- --- ----------------
ONLINE             8932792 01-NOV-16 NO                 2
ONLINE             8933048 01-NOV-16 NO                 4
SQL> alter database open resetlogs;
Database altered.

alert日志信息

Tue Nov 01 21:29:56 2016
alter database open resetlogs
Errors in file /opt/oracle/diag/rdbms/test/test/trace/test_ora_14479.trc:
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1: '/home/oracle/.oradata/test/redo01.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /opt/oracle/diag/rdbms/test/test/trace/test_ora_14479.trc:
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1: '/home/oracle/.oradata/test/redo01.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /opt/oracle/diag/rdbms/test/test/trace/test_ora_14479.trc:
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '/home/oracle/.oradata/test/redo02.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /opt/oracle/diag/rdbms/test/test/trace/test_ora_14479.trc:
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '/home/oracle/.oradata/test/redo02.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /opt/oracle/diag/rdbms/test/test/trace/test_ora_14479.trc:
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: '/home/oracle/.oradata/test/redo03.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /opt/oracle/diag/rdbms/test/test/trace/test_ora_14479.trc:
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: '/home/oracle/.oradata/test/redo03.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
RESETLOGS after incomplete recovery UNTIL CHANGE 8933048
Resetting resetlogs activation ID 2210869684 (0x83c731b4)
Errors in file /opt/oracle/diag/rdbms/test/test/trace/test_ora_14479.trc:
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1: '/home/oracle/.oradata/test/redo01.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /opt/oracle/diag/rdbms/test/test/trace/test_ora_14479.trc:
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1: '/home/oracle/.oradata/test/redo01.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /opt/oracle/diag/rdbms/test/test/trace/test_ora_14479.trc:
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '/home/oracle/.oradata/test/redo02.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /opt/oracle/diag/rdbms/test/test/trace/test_ora_14479.trc:
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '/home/oracle/.oradata/test/redo02.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /opt/oracle/diag/rdbms/test/test/trace/test_ora_14479.trc:
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: '/home/oracle/.oradata/test/redo03.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /opt/oracle/diag/rdbms/test/test/trace/test_ora_14479.trc:
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: '/home/oracle/.oradata/test/redo03.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Tue Nov 01 21:29:58 2016
Setting recovery target incarnation to 3
Tue Nov 01 21:29:58 2016
Assigning activation ID 2224900353 (0x849d4901)
LGWR: STARTING ARCH PROCESSES
Tue Nov 01 21:29:58 2016
ARC0 started with pid=20, OS id=14486
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Tue Nov 01 21:29:59 2016
ARC1 started with pid=21, OS id=14488
Tue Nov 01 21:29:59 2016
ARC2 started with pid=22, OS id=14490
ARC1: Archival started
Tue Nov 01 21:29:59 2016
ARC3 started with pid=23, OS id=14492
ARC2: Archival started
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
ARC2: Becoming the heartbeat ARCH
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: /home/oracle/.oradata/test/redo01.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Tue Nov 01 21:29:59 2016
SMON: enabling cache recovery
Successfully onlined Undo Tablespace 2.
Dictionary check beginning
File #7 is offline, but is part of an online tablespace.
data file 7: '/home/oracle/.oradata/test/readonly01.dbf'
Successfuly brought file #7 online.
File #8 is offline, but is part of an online tablespace.
data file 8: '/home/oracle/.oradata/test/readonly02.dbf'
Successfuly brought file #8 online.
Tue Nov 01 21:29:59 2016
Errors in file /opt/oracle/diag/rdbms/test/test/trace/test_dbw0_14226.trc:
ORA-01157: cannot identify/lock data file 201 - see DBWR trace file
ORA-01110: data file 201: '/home/oracle/.oradata/test/temp01.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /opt/oracle/diag/rdbms/test/test/trace/test_dbw0_14226.trc:
ORA-01186: file 201 failed verification tests
ORA-01157: cannot identify/lock data file 201 - see DBWR trace file
ORA-01110: data file 201: '/home/oracle/.oradata/test/temp01.dbf'
File 201 not verified due to error ORA-01157
Dictionary check complete
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Re-creating tempfile /home/oracle/.oradata/test/temp01.dbf
Database Characterset is AL32UTF8
No Resource Manager plan active
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Tue Nov 01 21:30:00 2016
QMNC started with pid=24, OS id=14494
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Completed: alter database open resetlogs

这里看到了我们期待的警告
File #7 is offline, but is part of an online tablespace.
data file 7: ‘/home/oracle/.oradata/test/readonly01.dbf’
Successfuly brought file #7 online.
File #8 is offline, but is part of an online tablespace.
data file 8: ‘/home/oracle/.oradata/test/readonly02.dbf’
Successfuly brought file #8 online.
结论:如果数据库的表空间之前是read only,然后resetlogs操作就会有类似提示(File #N is offline, but is part of an online tablespace).这样的整个恢复过程不影响read only表空间中的数据

ORA-600 kcrfr_update_nab_2 故障恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORA-600 kcrfr_update_nab_2 故障恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

由于控制器掉线导致数据库启动报ORA-600 kcrfr_update_nab_2错误,导致无法正常open
数据库版本信息

ORACLE V10.2.0.4.0 - 64bit Production vsnsta=0
vsnsql=14 vsnxtr=3
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Windows Server 2003 Version V5.2 Service Pack 2
CPU                 : 12 - type 8664, 2 Physical Cores
Process Affinity    : 0x0000000000000000
Memory (Avail/Total): Ph:22579M/32754M, Ph+PgF:24594M/33845M

ORA-600 kcrfr_update_nab_2报错

Mon Oct 24 17:42:57 2016
Database mounted in Exclusive Mode
Completed: ALTER DATABASE   MOUNT
Mon Oct 24 17:42:58 2016
ALTER DATABASE OPEN
Mon Oct 24 17:43:14 2016
Beginning crash recovery of 1 threads
 parallel recovery started with 11 processes
Mon Oct 24 17:43:14 2016
Started redo scan
Mon Oct 24 17:43:16 2016
Errors in file d:\oracle\product\10.2.0\admin\spcsjkdb\udump\spcsjkdb_ora_10108.trc:
ORA-00600: internal error code, arguments: [kcrfr_update_nab_2], [0x7FFC22A2150], [2], [], [], [], [], []
Mon Oct 24 17:43:18 2016
Aborting crash recovery due to error 600
Mon Oct 24 17:43:18 2016
Errors in file d:\oracle\product\10.2.0\admin\spcsjkdb\udump\spcsjkdb_ora_10108.trc:
ORA-00600: internal error code, arguments: [kcrfr_update_nab_2], [0x7FFC22A2150], [2], [], [], [], [], []
ORA-600 signalled during: ALTER DATABASE OPEN...

trace文件信息

*** 2016-10-24 17:43:14.515
*** ACTION NAME:() 2016-10-24 17:43:14.515
*** MODULE NAME:(sqlplus.exe) 2016-10-24 17:43:14.515
*** SERVICE NAME:() 2016-10-24 17:43:14.515
*** SESSION ID:(356.3) 2016-10-24 17:43:14.515
Successfully allocated 11 recovery slaves
Using 101 overflow buffers per recovery slave
Thread 1 checkpoint: logseq 33251, block 2, scn 14624215134369
  cache-low rba: logseq 33251, block 2463324
    on-disk rba: logseq 33251, block 2803965, scn 14624216078841
  start recovery at logseq 33251, block 2463324, scn 0
*** 2016-10-24 17:43:16.406
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kcrfr_update_nab_2], [0x7FFC22A2150], [2], [], [], [], [], []
Current SQL statement for this session:
ALTER DATABASE OPEN
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedmp+663           CALL???  ksedst+55            003C878B8 000000000 012B863E8
                                                   000000000
ksfdmp+19            CALL???  ksedmp+663           000000003 015572A70 007222698
                                                   003CACC80
kgerinv+158          CALL???  ksfdmp+19            015572430 000000000 0FFFFFFFF
                                                   000000000
kgeasnmierr+62       CALL???  kgerinv+158          000000000 000000000 000000000
                                                   004FD788F
kcrfr_update_nab+18  CALL???  kgeasnmierr+62       00BDA1170 000000000 000000000
6                                                  000000002
kcrfr_read+1078      CALL???  kcrfr_update_nab+18  007222698 00001E650 015572430
                              6                    0072229B8
kcrfrgv+8134         CALL???  kcrfr_read+1078      000000000 0051525D7 000000000
                                                   0051525D7
kcratr1+488          CALL???  kcrfrgv+8134         007222698 000000000 000000000
                                                   000000000
kcratr+412           CALL???  kcratr1+488          012B891C8 012B890A4 00727FFB8
                                                   00BEA7FF0
kctrec+1910          CALL???  kcratr+412           012B891C8 012B91E18 000000000
                                                   012B91E48
kcvcrv+3585          CALL???  kctrec+1910          012B92C58 000000000 00726DF00
                                                   00726BDB0
kcfopd+1007          CALL???  kcvcrv+3585          012B93350 000000000 000000000
                                                   000000000
adbdrv+55820         CALL???  kcfopd+1007          000000000 000000000 000000000
                                                   000000000
opiexe+13897         CALL???  adbdrv+55820         000000023 000000003 000000102
                                                   000000000
opiosq0+3558         CALL???  opiexe+13897         000000004 000000000 012B9B238
                                                   4155474E414C5F45
kpooprx+339          CALL???  opiosq0+3558         000000003 00000000E 012B9B3C8
                                                   0000000A4
kpoal8+894           CALL???  kpooprx+339          015587550 000000018 0041AE700
                                                   000000001
opiodr+1136          CALL???  kpoal8+894           00000005E 000000017 012B9E868
                                                   0072F5100
ttcpip+5146          CALL???  opiodr+1136          00000005E 000000017 012B9E868
                                                   2D8C00000000
opitsk+1818          CALL???  ttcpip+5146          015587550 000000000 000000000
                                                   000000000
opiino+1129          CALL???  opitsk+1818          00000001E 000000000 000000000
                                                   000000000
opiodr+1136          CALL???  opiino+1129          00000003C 000000004 012B9FB20
                                                   000000000
opidrv+815           CALL???  opiodr+1136          00000003C 000000004 012B9FB20
                                                   000000000
sou2o+52             CALL???  opidrv+815           00000003C 000000004 012B9FB20
                                                   7FF7FC48580
opimai_real+131      CALL???  sou2o+52             000000000 012B9FC40
                                                   7FFFFF7F258 077EF4D1C
opimai+96            CALL???  opimai_real+131      7FF7FC48580 7FFFFF7E000
                                                   0001F0003 000000000
OracleThreadStart+6  CALL???  opimai+96            012B9FEF0 01289FF3C 012B9FCC0
40                                                 7FF7FC48580
0000000077D6B6DA     CALL???  OracleThreadStart+6  01289FF3C 000000000 000000000
                              40                   012B9FFA8

官方描述
The assert ORA-600: [kcrfr_update_nab_2] is a direct result of a lost write in the current on line log that we are attempting to resolve.So, this confirms the theory that this is a OS/hardware lost write issue not an internal oracle bug. In fact the assert ORA-600: [kcrfr_update_nab_2] is how we detect a lost log write.
Bug 5692594
Hdr: 5692594 10.2.0.1 RDBMS 10.2.0.1 RECOVERY PRODID-5 PORTID-226 ORA-600
Abstract: AFTER DATABASE CRASHED DOESN’T OPEN ORA-600 [KCRFR_UPDATE_NAB_2]
Status: 95,Closed, Vendor OS Problem
Bug 6655116
Hdr: 6655116 10.2.0.3 RDBMS 10.2.0.3 RECOVERY PRODID-5 PORTID-23
Abstract: INSTANCES CRASH WITH ORA-600 [KCRFR_UPDATE_NAB_2] AFTER DISK FAILURE
根据官方的描述,结合故障情况,基本上可以确定是由于硬件异常导致Oracle写丢失,从而除非oracle相关bug导致数据库无法正常启动

ORA-600 [kcrfr_update_nab_2] [a] [b]
VERSIONS:
versions 10.2 to 11.1
DESCRIPTION:
Failure of upgrade of recovery node (RN) enqueue to SSX mode
ARGUMENTS:
Arg [a] State Object for redo nab enqueue for resilvering
Arg [b] Redo nab enqueue mode
FUNCTIONALITY:
Kernel Cache Redo File Read
IMPACT:
INSTANCE FAILURE

处理方法
1.如果有备份,利用备份进行不完全恢复,跳过最后异常的redo,数据库resetlogs打开
2.如果没有备份,尝试使用历史的控制文件进行不完全恢复,或者直接跳过数据库一致性打开库.
3.互联网有人解决删除redo第二组成员数据库open成功(http://blog.itpub.net/16976507/viewspace-1266952/)
redo


存储精简卷导致asm磁盘组异常

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:存储精简卷导致asm磁盘组异常

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有朋友在一个存储空间给asm使用,发生空间不足,然后使用另外一个存储中的lun给asm的data磁盘组增加asm disk,运行了大概1天之后,asm磁盘组直接dismount,数据库crash.然后就无法正常mount.包括这个存储上的几个其他磁盘组也无法正常mount.
数据库异常日志

Sun Oct 23 08:43:59 2016
SUCCESS: diskgroup DATA was dismounted
SUCCESS: diskgroup DATA was dismounted
Sun Oct 23 08:44:00 2016
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_lmon_79128.trc:
ORA-00202: control file: '+DATA/orcl/controlfile/current.278.892363163'
ORA-15078: ASM diskgroup was forcibly dismounted
Sun Oct 23 08:44:00 2016
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_lgwr_79174.trc:
ORA-00345: redo log write error block 15924 count 2
ORA-00312: online log 2 thread 1: '+DATA/orcl/onlinelog/group_2.274.892363167'
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_lgwr_79174.trc:
ORA-00202: control file: '+DATA/orcl/controlfile/current.278.892363163'
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_lgwr_79174.trc:
ORA-00204: error in reading (block 1, # blocks 1) of control file
ORA-00202: control file: '+DATA/orcl/controlfile/current.278.892363163'
ORA-15078: ASM diskgroup was forcibly dismounted
Sun Oct 23 08:44:00 2016
LGWR (ospid: 79174): terminating the instance due to error 204
Sun Oct 23 08:44:00 2016
opiodr aborting process unknown ospid (79742) as a result of ORA-1092
Sun Oct 23 08:44:01 2016
ORA-1092 : opitsk aborting process
Sun Oct 23 08:44:01 2016
ORA-1092 : opitsk aborting process
System state dump requested by (instance=1, osid=79174 (LGWR)), summary=[abnormal instance termination].
System State dumped to trace file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_diag_79118.trc
Instance terminated by LGWR, pid = 79174

很明显,数据库异常是由于asm diskgroup dismount,因此分析asm 日志

asm 日志

Sun Oct 23 07:00:31 2016
Time drift detected. Please check VKTM trace file for more details.
Sun Oct 23 08:43:55 2016
Errors in file /oracle/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_8755.trc:
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 1048576
WARNING: Write Failed. group:1 disk:2 AU:1222738 offset:0 size:1048576
ERROR: failed to copy file +DATA.524, extent 15030
ERROR: ORA-15080 thrown in ARB0 for group number 1
Errors in file /oracle/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_8755.trc:
ORA-15080: synchronous I/O operation to a disk failed
Sun Oct 23 08:43:55 2016
NOTE: stopping process ARB0
NOTE: rebalance interrupted for group 1/0xec689cdd (DATA)
NOTE: ASM did background COD recovery for group 1/0xec689cdd (DATA)
NOTE: starting rebalance of group 1/0xec689cdd (DATA) at power 1
Starting background process ARB0
Sun Oct 23 08:43:56 2016
ARB0 started with pid=24, OS id=103554
NOTE: assigning ARB0 to group 1/0xec689cdd (DATA) with 1 parallel I/O
Errors in file /oracle/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_103554.trc:
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 1048576
WARNING: Write Failed. group:1 disk:2 AU:1222738 offset:0 size:1048576
ERROR: failed to copy file +DATA.256, extent 6570
ERROR: ORA-15080 thrown in ARB0 for group number 1
Errors in file /oracle/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_103554.trc:
ORA-15080: synchronous I/O operation to a disk failed
NOTE: stopping process ARB0
Sun Oct 23 08:43:58 2016
Errors in file /oracle/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_dbw0_8521.trc:
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 4096
WARNING: Write Failed. group:1 disk:3 AU:6789 offset:24576 size:4096
NOTE: cache initiating offline of disk 3 group DATA
NOTE: process _dbw0_+asm1 (8521) initiating offline of disk 3.3915934787 (DATA_0003) with mask 0x7e in group 1
Sun Oct 23 08:43:58 2016
WARNING: Disk 3 (DATA_0003) in group 1 mode 0x7f is now being offlined
WARNING: Disk 3 (DATA_0003) in group 1 in mode 0x7f is now being taken offline on ASM inst 1
NOTE: initiating PST update: grp = 1, dsk = 3/0xe9686c43, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 14 for pid 14, osid 8521
ERROR: Disk 3 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 1)
Sun Oct 23 08:43:58 2016
NOTE: cache dismounting (not clean) group 1/0xEC689CDD (DATA)
NOTE: messaging CKPT to quiesce pins Unix process pid: 103577, image: oracle@node1 (B000)
WARNING: Disk 3 (DATA_0003) in group 1 mode 0x7f offline is being aborted
WARNING: Offline of disk 3 (DATA_0003) in group 1 and mode 0x7f failed on ASM inst 1
NOTE: halting all I/Os to diskgroup 1 (DATA)
Sun Oct 23 08:43:59 2016
NOTE: LGWR doing non-clean dismount of group 1 (DATA)
NOTE: LGWR sync ABA=160.10145 last written ABA 160.10145

错误信息很明显,由于Write Failed导致asm diskgroup dismount.

系统日志

Oct 23 08:43:55 node1 kernel: sd 6:0:12:1: [sdd]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Oct 23 08:43:55 node1 kernel: sd 6:0:12:1: [sdd]  Sense Key : Data Protect [current]
Oct 23 08:43:55 node1 kernel: sd 6:0:12:1: [sdd]  Add. Sense: Space allocation failed write protect
Oct 23 08:43:55 node1 kernel: sd 6:0:12:1: [sdd] CDB: Write(16): 8a 00 00 00 00 02 e7 18 37 f9 00 00 00 07 00 00
Oct 23 08:43:55 node1 kernel: end_request: critical space allocation error, dev sdd, sector 12467058681
Oct 23 08:43:55 node1 kernel: end_request: critical space allocation error, dev dm-3, sector 12467058681
Oct 23 08:43:55 node1 kernel: sd 8:0:6:1: [sdh]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Oct 23 08:43:55 node1 kernel: sd 8:0:6:1: [sdh]  Sense Key : Data Protect [current]
Oct 23 08:43:55 node1 kernel: sd 8:0:6:1: [sdh]  Add. Sense: Space allocation failed write protect
Oct 23 08:43:55 node1 kernel: sd 8:0:6:1: [sdh] CDB: Write(16): 8a 00 00 00 00 02 e7 18
Oct 23 08:43:55 node1 kernel: sd 6:0:4:1: [sdb]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Oct 23 08:43:55 node1 kernel: sd 6:0:4:1: [sdb]  Sense Key : Data Protect [current]
Oct 23 08:43:55 node1 kernel: sd 6:0:4:1: [sdb]   33Add. Sense: Space allocation failed write protect
Oct 23 08:43:55 node1 kernel: sd 6:0:4:1: [sdb] CDB: Write(16): 8a 00 00 00 00 02 e7 18 30 00 00 00 03 f9 00 00
Oct 23 08:43:55 node1 kernel: end_request: critical space allocation error, dev sdb, sector 12467056640
Oct 23 08:43:55 node1 kernel: f9 00 00 04 00
Oct 23 08:43:55 node1 kernel: end_request: critical space allocation error, dev dm-3, sector 12467056640
Oct 23 08:43:55 node1 kernel: 00 00
Oct 23 08:43:55 node1 kernel: end_request: critical space allocation error, dev sdh, sector 12467057657
Oct 23 08:43:55 node1 kernel: end_request: critical space allocation error, dev dm-3, sector 12467057657
Oct 23 08:43:57 node1 kernel: sd 8:0:6:1: [sdh]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Oct 23 08:43:57 node1 kernel: sd 8:0:6:1: [sdh]  Sense Key : Data Protect [current]
Oct 23 08:43:57 node1 kernel: sd 8:0:6:1: [sdh]  Add. Sense: Space allocation failed write protect
Oct 23 08:43:57 node1 kernel: sd 8:0:6:1: [sdh] CDB: Write(16): 8a 00 00 00 00 02 e7 18 37 f9 00 00 00 07 00 00
Oct 23 08:43:57 node1 kernel: end_request: critical space allocation error, dev sdh, sector 12467058681
Oct 23 08:43:57 node1 kernel: end_request: critical space allocation error, dev dm-3, sector 12467058681
Oct 23 08:43:57 node1 kernel: sd 8:0:12:1: [sdj]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Oct 23 08:43:57 node1 kernel: sd 8:0:12:1: [sdj]  Sense Key : Data Protect [current]
Oct 23 08:43:57 node1 kernel: sd 8:0:12:1: [sdj]  Add. Sense: Space allocation failed write protect
Oct 23 08:43:57 node1 kernel: sd 8:0:12:1: [sdj] CDB: Write(16): 8a 00 00 00 00 02 e7 18 30 00 00 00 03 f9 00 00
Oct 23 08:43:57 node1 kernel: end_request: critical space allocation error, dev sdj, sector 12467056640
Oct 23 08:43:57 node1 kernel: end_request: critical space allocation error, dev dm-3, sector 12467056640
Oct 23 08:43:57 node1 kernel: sd 6:0:4:1: [sdb]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Oct 23 08:43:57 node1 kernel: sd 6:0:4:1: [sdb]  Sense Key : Data Protect [current]
Oct 23 08:43:57 node1 kernel: sd 6:0:4:1: [sdb]  Add. Sense: Space allocation failed write protect
Oct 23 08:43:57 node1 kernel: sd 6:0:4:1: [sdb] CDB: Write(16): 8a 00 00 00 00 02 e7 18 33 f9 00 00 04 00 00 00
Oct 23 08:43:58 node1 kernel: sd 6:0:4:1: [sdb]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Oct 23 08:43:58 node1 kernel: sd 6:0:4:1: [sdb]  Sense Key : Data Protect [current]
Oct 23 08:43:58 node1 kernel: sd 6:0:4:1: [sdb]  Add. Sense: Space allocation failed write protect
Oct 23 08:43:58 node1 kernel: sd 6:0:4:1: [sdb] CDB: Write(16): 8a 00 00 00 00 03 3b 7e 78 30 00 00 00 08 00 00
Oct 23 10:50:59 node1 init: oracle-ohasd main process (6150) killed by TERM signal

错误信息为:critical space allocation error,严重空间分配错误.也就是linux在分配空间之时发生错误.在换而言之,由于分配空间错误导致asm 磁盘组dismount.

查看多路径信息

[root@node1 ~]# multipath -ll
36000d31003190c000000000000000003 dm-3 COMPELNT,Compellent Vol
size=80T features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  |- 6:0:9:1  sdd 8:48  active ready running
  `- 8:0:9:1  sdi 8:128 active ready running
delldisk2 (36000d310031908000000000000000003) dm-4 COMPELNT,Compellent Vol
size=8.0T features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  |- 6:0:12:1 sde 8:64  active ready running
  |- 8:0:6:1  sdh 8:112 active ready running
  |- 6:0:4:1  sdb 8:16  active ready running
  `- 8:0:12:1 sdj 8:144 active ready running
delldisk1 (36000d31003190a000000000000000007) dm-2 COMPELNT,Compellent Vol
size=12T features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  |- 6:0:1:1  sda 8:0   active ready running
  |- 8:0:2:1  sdf 8:80  active ready running
  |- 6:0:7:1  sdc 8:32  active ready running
  `- 8:0:3:1  sdg 8:96  active ready running

很明显报错的都是同一个lun(delldisk2),也就是存储空间使用完的存储.也就是说,由于delldisk2存储的空间使用尽了导致系统出现分配空间错误,从而导致asm 写失败,进而导致数据库异常.这种问题的本质其实就是存储给系统分配了8T,但是实际存储可以使用的空间不足8T,而os按照8T来使用从而出现该问题.专业名字叫做”存储精简卷”.因此各位在存储配置之时需要注意该问题.因为这种情况的出现一般只是写io异常,读依旧正常,因此不会丢失数据.

MON_MODS$表ORA-600 13013报错处理

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:MON_MODS$表ORA-600 13013报错处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有朋友反馈数据库启动运行一点时间之后,然后就自动crash,让我们帮忙找原因,通过分析是由于smon进程触发ORA-600 13013导致数据库异常
alert日志报错信息

Thu Aug  4 18:39:44 2016
Database Characterset is ZHS16GBK
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=33, OS id=22935
Thu Aug  4 18:39:44 2016
Completed: ALTER DATABASE OPEN
Thu Aug  4 18:39:44 2016
db_recovery_file_dest_size of 2048 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Thu Aug  4 18:48:41 2016
Thread 1 advanced to log sequence 86746
  Current log# 3 seq# 86746 mem# 0: /opt/ora10/oradata/ora10g/redo03.log
Thu Aug  4 18:58:13 2016
Errors in file /opt/ora10/admin/ora10g/bdump/ora10g_smon_22449.trc:
ORA-00600: internal error code, arguments: [13013], [5001], [482], [4198075], [40], [4198075], [17], []
Thu Aug  4 18:58:56 2016
Non-fatal internal error happenned while SMON was doing flushing of monitored table stats.
SMON encountered 8 out of maximum 100 non-fatal internal errors.
Thu Aug  4 18:59:06 2016
Errors in file /opt/ora10/admin/ora10g/bdump/ora10g_smon_22449.trc:
ORA-00600: internal error code, arguments: [13013], [5001], [482], [4198075], [40], [4198075], [17], []
Thu Aug  4 18:59:08 2016
Errors in file /opt/ora10/admin/ora10g/bdump/ora10g_pmon_22413.trc:
ORA-00474: SMON process terminated with error
Thu Aug  4 18:59:08 2016
PMON: terminating instance due to error 474
Instance terminated by PMON, pid = 22413

通过trace文件大概可以发现是由于ORA-600 13013错误导致数据库crash,而且这里有类似”SMON was doing flushing of monitored table stats”错误提示,根据经验,很可能是smon把表的dml操作收集信息相关.

ORA-600 [13013] 含义

ORA-600 [13013] [a] [b] {c} [d] [e] [f]
This format relates to Oracle Server 8.0.3 to 10.1
Arg [a] Passcount
Arg [b] Data Object number
Arg {c} Tablespace Relative DBA of block containing the row to be updated
Arg [d] Row Slot number
Arg [e] Relative DBA of block being updated (should be same as )
Arg [f] Code

根据这个错误信息,以及How to resolve ORA-00600 [13013], [5001] [ID 816784.1]中的描述

ORA-600 13013 对应对象

SQL> select object_name from dba_objects where object_id=482
OBJECT_NAME
--------------------------------------------------------------------------------
MON_MODS$

该对象正是和监控dml变化相关的表,smon会对其进行相关操作,以前写过一篇:MON_MODS$和MON_MODS_ALL$统计DML操作次数的文章
对于MON_MODS$表ORA-600 13013处理

SQL> analyze table mon_mods$ validate structure cascade;
analyze table mon_mods$ validate structure cascade
*
ERROR at line 1:
ORA-01499: table/index cross reference failure - see trace file
SQL> select index_name from dba_indexes where table_name='MON_MODS$';
INDEX_NAME
------------------------------
I_MON_MODS$_OBJ
SQL> ALTER INDEX I_MON_MODS$_OBJ REBUILD;
Index altered.
SQL> analyze table mon_mods$ validate structure cascade;
analyze table mon_mods$ validate structure cascade
*
ERROR at line 1:
ORA-01499: table/index cross reference failure - see trace file
SQL> CREATE TABLE MON_MODS_BAK AS SELECT * FROM MON_MODS$;
Table created.
SQL> SELECT COUNT(*) FROM MON_MODS$;
  COUNT(*)
----------
      1247
SQL> C/MON_MODS$/MON_MODS_BAK;
  1* SELECT COUNT(*) FROM MON_MODS_BAK
SQL> /
  COUNT(*)
----------
      1247
SQL> TRUNCATE TABLE MON_MODS$;
Table truncated.
SQL> INSERT INTO MON_MODS$ SELECT * fROM MON_MODS_BAK;
1247 rows created.
SQL> COMMIT;
Commit complete.
SQL>  analyze table mon_mods$ validate structure cascade;
Table analyzed.

自此关于MON_MODS$表相关的ORA-600 13013异常处理完全,当然也可以通过重建I_MON_MODS$_OBJ索引来解决,但是不能通过rebuild index解决.数据库也就不会因此而crash了.

ORA-600 4042 故障恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORA-600 4042 故障恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check) 检查结果
wrong_scn
通过上图可以知道file 2未能正常恢复(需要看日志分析原因),file 3以前就被offline,需要历史归档(非归档状态,所以这个先放着,后续再处理)

分析file 2 不成功原因

Wed Aug  3 15:21:11 2016
ALTER DATABASE RECOVER  datafile 2
Wed Aug  3 15:21:11 2016
Media Recovery Start
 parallel recovery started with 2 processes
Wed Aug  3 15:21:11 2016
Recovery of Online Redo Log: Thread 1 Group 1 Seq 1916 Reading mem 0
  Mem# 0 errs 0: /home/oracle/orabase/oradata/ORACLE/redo01.log
Wed Aug  3 15:21:11 2016
Errors in file /u01/app/oracle/admin/oracle/bdump/oracle_p001_22017.trc:
ORA-00600: internal error code, arguments: [3020], [2], [41], [8388649], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 2, block# 41)
ORA-10564: tablespace UNDOTBS1
ORA-01110: data file 2: '/home/oracle/orabase/oradata/ORACLE/undotbs01.dbf'
ORA-10560: block type '0'
Wed Aug  3 15:21:13 2016
Errors in file /u01/app/oracle/admin/oracle/bdump/oracle_p001_22017.trc:
ORA-00600: internal error code, arguments: [3020], [2], [41], [8388649], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 2, block# 41)
ORA-10564: tablespace UNDOTBS1
ORA-01110: data file 2: '/home/oracle/orabase/oradata/ORACLE/undotbs01.dbf'
ORA-10560: block type '0'
Wed Aug  3 15:21:18 2016
Media Recovery failed with error 12801
ORA-283 signalled during: ALTER DATABASE RECOVER  datafile 2  ...

通过日志可以知道由于ORA-600 3020导致file 2不能正常的恢复.
处理file 2

SQL> recover  datafile 2 allow 1 corruption;
Media recovery complete.
Thu Aug  4 01:58:35 2016
ALTER DATABASE RECOVER  datafile 2 allow 1 corruption
Media Recovery Start
 ALLOW CORRUPTION option must use serial recovery
Thu Aug  4 01:58:35 2016
Recovery of Online Redo Log: Thread 1 Group 1 Seq 1916 Reading mem 0
  Mem# 0 errs 0: /home/oracle/orabase/oradata/ORACLE/redo01.log
Thu Aug  4 01:58:35 2016
Media Recovery Complete (oracle)
Completed: ALTER DATABASE RECOVER  datafile 2 allow 1 corruption

尝试open数据库

SQL> alter database open ;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
Thu Aug  4 01:59:20 2016
alter database open
Thu Aug  4 01:59:21 2016
Beginning crash recovery of 1 threads
 parallel recovery started with 2 processes
Thu Aug  4 01:59:21 2016
Started redo scan
Thu Aug  4 01:59:21 2016
Completed redo scan
 1619 redo blocks read, 0 data blocks need recovery
Thu Aug  4 01:59:21 2016
Started redo application at
 Thread 1: logseq 1916, block 12724
Thu Aug  4 01:59:21 2016
Recovery of Online Redo Log: Thread 1 Group 1 Seq 1916 Reading mem 0
  Mem# 0 errs 0: /home/oracle/orabase/oradata/ORACLE/redo01.log
Thu Aug  4 01:59:21 2016
Completed redo application
Thu Aug  4 01:59:21 2016
Completed crash recovery at
 Thread 1: logseq 1916, block 14343, scn 3303614971196
 0 data blocks read, 0 data blocks written, 1619 redo blocks read
Thu Aug  4 01:59:21 2016
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=18, OS id=5542
Thu Aug  4 01:59:21 2016
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=19, OS id=5544
Thu Aug  4 01:59:21 2016
Thread 1 advanced to log sequence 1917
Thread 1 opened at log sequence 1917
  Current log# 2 seq# 1917 mem# 0: /home/oracle/orabase/oradata/ORACLE/redo02.log
Successful open of redo thread 1
Thu Aug  4 01:59:21 2016
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu Aug  4 01:59:21 2016
ARC1: STARTING ARCH PROCESSES
Thu Aug  4 01:59:21 2016
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
Thu Aug  4 01:59:21 2016
SMON: enabling cache recovery
Thu Aug  4 01:59:21 2016
ARC2: Archival started
ARC1: STARTING ARCH PROCESSES COMPLETE
ARC1: Becoming the heartbeat ARCH
ARC2 started with pid=20, OS id=5546
Thu Aug  4 01:59:21 2016
db_recovery_file_dest_size of 2048 MB is 1.05% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Thu Aug  4 01:59:22 2016
Errors in file /u01/app/oracle/admin/oracle/udump/oracle_ora_5505.trc:
ORA-00600: internal error code, arguments: [4042], [0], [], [], [], [], [], []
Thu Aug  4 01:59:23 2016
Errors in file /u01/app/oracle/admin/oracle/udump/oracle_ora_5505.trc:
ORA-00600: internal error code, arguments: [4042], [0], [], [], [], [], [], []
Thu Aug  4 01:59:23 2016
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Instance terminated by USER, pid = 5505
ORA-1092 signalled during: alter database open ...

由于ORA-600 4042错误导致数据库无法正常open.
分析ORA-600 4042

PARSING IN CURSOR #4 len=142 dep=1 uid=0 oct=3 lid=0 tim=1435788503594313 hv=361892850 ad='a7ab2db8'
select /*+ rule */ name,file#,block#,status$,user#,undosqn,xactsqn,scnbas,scnwrp,
DECODE(inst#,0,NULL,inst#),ts#,spare1 from undo$ where us#=:1
END OF STMT
PARSE #4:c=0,e=11,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,tim=1435788503594311
BINDS #4:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=2aae75802218  bln=22  avl=02  flg=05
  value=3
EXEC #4:c=0,e=39,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,tim=1435788503594393
FETCH #4:c=0,e=8,p=0,cr=2,cu=0,mis=0,r=1,dep=1,og=3,tim=1435788503594412
STAT #4 id=1 cnt=1 pid=0 pos=1 obj=15 op='TABLE ACCESS BY INDEX ROWID UNDO$ (cr=2 pr=0 pw=0 time=8 us)'
STAT #4 id=2 cnt=1 pid=1 pos=1 obj=34 op='INDEX UNIQUE SCAN I_UNDO1 (cr=1 pr=0 pw=0 time=3 us)'
WAIT #1: nam='db file sequential read' ela= 10 file#=2 block#=41 blocks=1 obj#=-1 tim=1435788503594468
Dump of buffer cache at level 4 for tsn=1, rdba=8388649
BH (0x95ff3c58) file#: 2 rdba: 0x00800029 (2/41) class: 21 ba: 0x95ef0000
  set: 3 blksize: 8192 bsi: 0 set-flg: 2 pwbcnt: 0
  dbwrid: 0 obj: -1 objn: 0 tsn: 1 afn: 2
  hash: [a8b77880,a8b77880] lru: [95ff3dd0,a8e70338]
  ckptq: [NULL] fileq: [NULL] objq: [a43da110,a43da110]
  use: [a8e6e658,a8e6e658] wait: [NULL]
  st: XCURRENT md: SHR tch: 0
  flags: gotten_in_current_mode
  LRBA: [0x0.0.0] HSCN: [0xffff.ffffffff] HSUB: [65535]
  buffer tsn: 1 rdba: 0x00800029 (2/41)
  scn: 0x0000.00000000 seq: 0x01 flg: 0x01 tail: 0x00000001
  frmt: 0x02 chkval: 0x0000 type: 0x00=unknown
Hex dump of block: st=0, typ_found=0
Dump of memory from 0x0000000095EF0000 to 0x0000000095EF2000
095EF0000 0000A200 00800029 00000000 01010000  [....)...........]
095EF0010 00000000 00000000 00000000 00000000  [................]
        Repeat 509 times
095EF1FF0 00000000 00000000 00000000 00000001  [................]
Dump of memory from 0x0000000095EF0014 to 0x0000000095EF1FFC
095EF0010          00000000 00000000 00000000      [............]
095EF0020 00000000 00000000 00000000 00000000  [................]

这里可以发现,file 2 block 41的type为unknown,注意观察ORA-600 3020的错误,我们发现当时报的坏块也正好是该block.基本上可以确定由于前面的allow 1 corruption操作导致了后面的ORA-600 4042的错误.官方关于ORA-600[4042]解释
ORA-600-4042


通过修改undo$中的回滚段状态(参考:bbed修改undo$(回滚段)状态)
正常open数据库,修改file 3的scn并online数据文件

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.
Total System Global Area 1224736768 bytes
Fixed Size                  2020384 bytes
Variable Size             318770144 bytes
Database Buffers          889192448 bytes
Redo Buffers               14753792 bytes
Database mounted.
SQL>   SELECT thread#,
  2           a.sequence#,
  3           a.group#,
  4           TO_CHAR (first_change#, '9999999999999999') "SCN",
  5           a.status,
  6           MEMBER
  7      FROM v$log a, v$logfile b
  8     WHERE a.group# = B.GROUP#
  9  ORDER BY a.sequence# DESC;
   THREAD#  SEQUENCE#     GROUP# SCN
---------- ---------- ---------- ----------------------------------
STATUS
--------------------------------
MEMBER
--------------------------------------------------------------------------------
         1       1919          1     3303615011212
CURRENT
/home/oracle/orabase/oradata/ORACLE/redo01.log
         1       1918          3     3303614991206
INACTIVE
/home/oracle/orabase/oradata/ORACLE/redo03.log
   THREAD#  SEQUENCE#     GROUP# SCN
---------- ---------- ---------- ----------------------------------
STATUS
--------------------------------
MEMBER
--------------------------------------------------------------------------------
         1       1917          2     3303614971197
INACTIVE
/home/oracle/orabase/oradata/ORACLE/redo02.log
SQL> recover database using backup controlfile;
ORA-00279: change 3303615011452 generated at 08/04/2016 02:06:52 needed for
thread 1
ORA-00289: suggestion :
/u01/app/oracle/flash_recovery_area/ORACLE/archivelog/2016_08_04/o1_mf_1_1919_%u
_.arc
ORA-00280: change 3303615011452 for thread 1 is in sequence #1919
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/home/oracle/orabase/oradata/ORACLE/redo01.log
Log applied.
Media recovery complete.
SQL> alter database datafile 3 online;
Database altered.
SQL> alter database open resetlogs;
Database altered.
SQL>

至此该数据库基本上恢复完成,强烈建议使用逻辑方式导出导入重建库.

ORA-600 4194/ORA-600 4193/ORA-600 4137故障解决

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORA-600 4194/ORA-600 4193/ORA-600 4137故障解决

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

对于常见的undo异常错误,ORA-600 4193,ORA-600 4194,ORA-600 4137等错误的处理一般步骤.
适用版本

Oracle Database - Enterprise Edition - Version 9.2.0.1 to 11.2.0.4 [Release 9.2 to 11.2]
Information in this document applies to any platform.

报错现象

The following error is occurring in the alert.log right before the database crashes.
ORA-00600: internal error code, arguments: [4194], [#], [#], [], [], [], [], []
This error indicates that a mismatch has been detected between redo records and rollback (undo) records.
ARGUMENTS:
Arg [a] - Maximum Undo record number in Undo block
Arg [b] - Undo record number from Redo block
Since we are adding a new undo record to our undo block, we would expect that the new record number
 is equal to the maximum record number in the undo block plus one. Before Oracle can add
a new undo record to the undo block it validates that this is correct. If this validation fails,
 then an ORA-600 [4194] will be triggered.

报错原因

This also can be cause by the following defect
Bug 8240762 Abstract: Undo corruptions with ORA-600 [4193]/ORA-600 [4194] or ORA-600 [4137] after SHRINK
Details:
Undo corruption may be caused after a shrink and the same undo block may be used
for two different transactions causing several internal errors like:
ORA-600 [4193] / ORA-600 [4194] for new transactions
ORA-600 [4137] for a transaction rollback

处理步骤

Best practice to create a new undo tablespace.
This method includes segment check.
Create pfile from spfile to edit
>create pfile from spfile;
1. Shutdown the instance
2. set the following parameters in the pfile
    undo_management = manual
    event = '10513 trace name context forever, level 2'
3. >startup restrict pfile=<initsid.ora>
4. >select tablespace_name, status, segment_name from dba_rollback_segs where status != 'OFFLINE';
This is critical - we are looking for all undo segments to be offline - System will always be online.
If any are 'PARTLY AVAILABLE' or 'NEEDS RECOVERY' - Please open an issue with Oracle Support or update the current SR.
If all offline then continue to the next step
5. Create new undo tablespace - example
>create undo tablespace <new undo tablespace> datafile <datafile> size 2000M;
6. Drop old undo tablespace
>drop tablespace <old undo tablespace> including contents and datafiles;
7. >shutdown immediate;
8 >startup nomount;  --> Using your Original spfile
9 modify the spfile with the new undo tablespace name
  Alter system set undo_tablespace = '<new tablespace created in step 5>' scope=spfile;
10. >shutdown immediate;
11. >startup;  --> Using spfile
The reason we create a new undo tablespace first is to use new undo segment numbers
 that are higher then the current segments being used.
This way when a transaction goes to do block clean-out
the reference to that undo segment does not exist and continues with the block clean-out.

参考:tep by step to resolve ORA-600 4194 4193 4197 on database crash (Doc ID 1428786.1)