Oracle 19c故障恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Oracle 19c故障恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户找到我们,他们的oracle 19c数据库由于异常断电,导致启动异常,经过一系列恢复之后,依旧无法解决问题,请求我们给予支持.通过我们的Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check),获取数据库当前信息如下:
数据库版本为19C并且安装了19.5.0.0.191015 (30125133)补丁
20200310220453
20200310220748


数据库使用pdb
20200310220610

数据库启动成功后,一会就crash掉

2020-03-10T01:44:41.018032+08:00
Pluggable database RACBAK opened read write
2020-03-10T01:44:41.018996+08:00
Pluggable database RAC opened read write
2020-03-10T01:44:51.244050+08:00
Completed: ALTER PLUGGABLE DATABASE ALL OPEN
Starting background process CJQ0
Completed: ALTER DATABASE OPEN
2020-03-10T01:44:51.317085+08:00
CJQ0 started with pid=224, OS id=32581 
2020-03-10T01:44:56.067043+08:00
Errors in file /opt/oracle/diag/rdbms/XFF/XFF/trace/XFF_j001_32588.trc  (incident=1095281) (PDBNAME=RAC):
ORA-00600: internal error code, arguments: [4193], [27733], [27754], [], [], [], [], [], [], [], [], []
RAC(4):Incident details in: /opt/oracle/diag/rdbms/XFF/XFF/incident/incdir_1095281/XFF_j001_32588_i1095281.trc
RAC(4):Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2020-03-10T01:44:56.073112+08:00
RAC(4):*****************************************************************
RAC(4):An internal routine has requested a dump of selected redo.
RAC(4):This usually happens following a specific internal error, when
RAC(4):analysis of the redo logs will help Oracle Support with the
RAC(4):diagnosis.
RAC(4):It is recommended that you retain all the redo logs generated (by
RAC(4):all the instances) during the past 12 hours, in case additional
RAC(4):redo dumps are required to help with the diagnosis.
RAC(4):*****************************************************************
2020-03-10T01:44:56.079228+08:00
Errors in file /opt/oracle/diag/rdbms/XFF/XFF/trace/XFF_j002_32590.trc  (incident=1095289) (PDBNAME=RAC):
ORA-00600: internal error code, arguments: [4193], [2633], [2638], [], [], [], [], [], [], [], [], []
RAC(4):Incident details in: /opt/oracle/diag/rdbms/XFF/XFF/incident/incdir_1095289/XFF_j002_32590_i1095289.trc
RAC(4):Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2020-03-10T01:44:56.085068+08:00
RAC(4):*****************************************************************
RAC(4):An internal routine has requested a dump of selected redo.
RAC(4):This usually happens following a specific internal error, when
RAC(4):analysis of the redo logs will help Oracle Support with the
RAC(4):diagnosis.
RAC(4):It is recommended that you retain all the redo logs generated (by
RAC(4):all the instances) during the past 12 hours, in case additional
RAC(4):redo dumps are required to help with the diagnosis.
RAC(4):*****************************************************************
2020-03-10T01:44:56.115765+08:00
Errors in file /opt/oracle/diag/rdbms/XFF/XFF/trace/XFF_j004_32594.trc  (incident=1095305) (PDBNAME=RAC):
ORA-00600: internal error code, arguments: [4193], [63532], [63537], [], [], [], [], [], [], [], [], []
RAC(4):Incident details in: /opt/oracle/diag/rdbms/XFF/XFF/incident/incdir_1095305/XFF_j004_32594_i1095305.trc
RAC(4):Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2020-03-10T01:46:48.202213+08:00
RAC(4):Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
RAC(4):  Mem# 0: /opt/oracle/oradata/XFF/redo02.log
RAC(4):Block recovery completed at rba 0.0.0, scn 0x0000000d3675e48e
RAC(4):DDE: Problem Key 'ORA 600 [4193]' was completely flood controlled (0x6)
Further messages for this problem key will be suppressed for up to 10 minutes
2020-03-10T01:46:48.384040+08:00
Errors in file /opt/oracle/diag/rdbms/XFF/XFF/trace/XFF_clmn_31741.trc:
ORA-00600: internal error code, arguments: [4193], [27733], [27754], [], [], [], [], [], [], [], [], []
Errors in file /opt/oracle/diag/rdbms/XFF/XFF/trace/XFF_clmn_31741.trc  (incident=1093505) (PDBNAME=CDB$ROOT):
ORA-501 [] [] [] [] [] [] [] [] [] [] [] []
Incident details in: /opt/oracle/diag/rdbms/XFF/XFF/incident/incdir_1093505/XFF_clmn_31741_i1093505.trc
2020-03-10T01:46:49.264624+08:00
USER (ospid: 31741): terminating the instance due to ORA error 501
2020-03-10T01:46:49.280664+08:00
System state dump requested by (instance=1, osid=31741 (CLMN)), summary=[abnormal instance termination].
System State dumped to trace file /opt/oracle/diag/rdbms/XFF/XFF/trace/XFF_diag_31759.trc
2020-03-10T01:46:53.156926+08:00
ORA-00501: CLMN process terminated with error
2020-03-10T01:46:53.157103+08:00
Errors in file /opt/oracle/diag/rdbms/XFF/XFF/trace/XFF_diag_31759.trc:
ORA-00501: CLMN process terminated with error
2020-03-10T01:46:53.157211+08:00
Dumping diagnostic data in directory=[cdmp_20200310014649], requested by (instance=1, osid=31741 (CLMN)), 
summary=[abnormal instance termination].

通过报错信息判断,数据库open之后(特别是pdb 4 open之后),开始报ORA-600 4193错误.然后由于CLMN进程异常,最后数据库crash.对于这类故障,因为使用的pdb,而且是由于pdb的undo异常导致数据库启动之后crash,可以通过对于pdb进行特殊处理,从而实现数据库启动之后不再crash.

Oracle 极端恢复支持

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:Oracle 极端恢复支持

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

所有类似dul工具(包括原厂和三方工具)可以完成的Oracle数据库恢复工作,我们均可通过服务提供恢复支持,而且是恢复数据成功之后收费,恢复不成功不收费

  • 不需要运行Oracle数据库软件,直接读取数据库文件解析数据。
  • 支持ASM,能够直接从ASM磁盘中导出数据,即使相关的磁盘组不能成功mount
  • 支持从ASM中直接抽取出数据文件和其他任意存储在ASM中的文件(包括控制文件、日志文件和归档日志等),即使相关的磁盘组不能成功mount
  • 在ASM磁盘损坏严重的情况下,可以扫描ASM磁盘,提取出没有被覆盖的数据文件,然后再提取和恢复数据。
  • 支持的Oracle数据库版本包括7,8i,9i,10g,11g,12c,18c,19c
  • 支持多种平台的数据库,包括AIX、LINUX、HPUX、SOLARIS、WINDOWS等。能够在一个平台上导出其他平台的数据,比如在Windows 32位系统上,使用AIX系统上的数据文件导出数据。
  • 支持的数据类型包括:NUMBER, CHAR, VARCHAR2, NCHAR,NVARHCAR2, LONG, DATE, RAW, LONG RAW, BLOB, CLOB, TIMESTAMP (9i+) , BINARY FLOAT, BINARY DOUBLE (10g+),XMLTYPE
  • 全面支持LOB字段:
    • 支持CLOB、NCLOB和BLOB
    • CLOB支持Big Endian和Little Endian字节序
    • 支持LOB分区,子分区
    • 支持同一个表中,不同LOB列使用不同CHUNK SIZE的情况
    • CLOB数据可以导出到与其他列相同的文件中,或存储到单独的文件
    • LOB列在没有SYSTEM表空间的情况下仍然能够导出
    • LOB列在相关的lob index损坏的情况下依然能够导出
    • 能够恢复Oracle 11g及以上版本的SecureFile LOB(目前不支持去重和加密的SecureFile LOB)
  • 支持各种表,包括普通的HEAP表,IOT表和聚簇(CLUSTER)表
  • 支持IOT表:
    • 支持普通IOT表的导出
    • 支持压缩IOT表的导出
    • 支持IOT表溢出段
    • 支持IOT表分区(包括子分区)
    • 只能在有SYSTEM表空间时才能导出IOT表
  • 支持压缩表
  • 支持表被truncate后的数据恢复
  • 支持表被drop后的数据恢复
  • 在有SYSTEM表空间的情况下,自动获取数据字典信息
  • 支持在没有SYSTEM表空间和数据字典损坏的情况下恢复数据,在没有数据字典可用时,能够自动判断数据的类型
  • 支持10g及以上的大文件(BigFile)表空间
  • 全面支持64位系统,支持超过4G大小的数据文件。
  • 支持复制操作系统命令不能复制的坏文件
  • 支持同一个库中不同块大小的数据文件。
  • 支持多种字符集之间的转换,能够正确的转换CLOB、NCLOB、NVARCHAR2列类型的数据到指定的字符集。
  • 自动检测数据文件的表空间号和文件号
  • 导出的数据格式包括纯文本、exp dmp和expdp dmp文件三种。以纯文本导出时,能够自动生成建表的SQL语句和SQL*Loader导入所需的control文件
  • 模拟Oracle的dump块功能,能够dump数据文件中的数据块
  • 支持DESC表,以显示表的列定义
  • 支持列出表的分区和子分区
  • 支持对误删除数据的恢复,即使被删除数据的表中有LOB列,即使被删除数据的表中对应row directory中所有记录的offset都已经完全被Oracle清除
  • 支持表创建语句,存储过程,视图,函数,包,索引,约束等非数据恢复

ORA-00600: internal error code, arguments: [2252], [3987]

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORA-00600: internal error code, arguments: [2252], [3987]

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户数据库版本10.2.0.4,启动成功之后立马crash,让我们协助解决
version


Thu Jul  4 13:03:10 2019
Completed: ALTER DATABASE OPEN
Thu Jul  4 13:03:10 2019
db_recovery_file_dest_size of 2048 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Thu Jul  4 13:04:01 2019
Errors in file /oracle/app/oracle/admin/orcl/bdump/orcl_reco_22268.trc:
ORA-00600: internal error code, arguments: [2252], [3987], [3375047096], [], [], [], [], []
Thu Jul  4 13:04:01 2019
Errors in file /oracle/app/oracle/admin/orcl/bdump/orcl_reco_22268.trc:
ORA-00600: internal error code, arguments: [2252], [3987], [3375047096], [], [], [], [], []
Thu Jul  4 13:04:02 2019
Errors in file /oracle/app/oracle/admin/orcl/bdump/orcl_reco_22268.trc:
ORA-00600: internal error code, arguments: [2252], [3987], [3375047096], [], [], [], [], []
Thu Jul  4 13:04:02 2019
RECO: terminating instance due to error 476
Instance terminated by RECO, pid = 22268

根据以往经验记录一次ORA-00600[2252]故障解决,很可能是scn过大引起.通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)检查scn相关信息
scn


从ORA-600 2252错误信息看,由于scn可能超过该数据库的天花板理论上而导致该问题,而reco进程主要是由于分布式事务引起,通过和客户确认,该库有通过dblink去访问11204版本oracle,而从2019年6月23日之后scn的算法发生了一些改变(SCN Compatibility问题汇总-2019年6月23日),导致数据库可以支持更大的scn,从而当低版本需要进行分布式事务操作之时,可能导致数据库异常.

处理方案:通过临时屏蔽分布式事务,让数据库临时正常工作;长期解决方案需要把数据库版本升级,避免scn引起相关问题

ORA-00470: LGWR process terminated with error

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORA-00470: LGWR process terminated with error

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户win 10.2.0.1数据库由于断电之后无法正常启动,报ORA-00470错误

SQL> startup mount;
ORACLE 例程已经启动。
Total System Global Area  293601280 bytes
Fixed Size                  1248600 bytes
Variable Size              92275368 bytes
Database Buffers          192937984 bytes
Redo Buffers                7139328 bytes
数据库装载完毕。
SQL> recoer database;
SQL> recover database;
完成介质恢复。
SQL> alter database open;
alter database open
*
第 1 行出现错误:
ORA-00470: LGWR 进程因错误而终止

查看alert日志发现报ORA-600 kcrfwfl_nab错误,导致后台进程异常

Mon May 13 10:38:52 2019
alter database open
Mon May 13 10:38:52 2019
Thread 1 opened at log sequence 1699
  Current log# 3 seq# 1699 mem# 0: D:\ORACLE\PRODUCT\10.2.0\ORADATA\ORACLE\REDO03.LOG
Successful open of redo thread 1
Mon May 13 10:38:52 2019
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Mon May 13 10:38:52 2019
Errors in file d:\oracle\product\10.2.0\db_1\admin\oracle\bdump\oracle_lgwr_2308.trc:
ORA-00600: internal error code, arguments: [kcrfwfl_nab], [4294967295], [102401], [], [], [], [], []
Mon May 13 10:38:53 2019
Errors in file d:\oracle\product\10.2.0\db_1\admin\oracle\bdump\oracle_lgwr_2308.trc:
ORA-00600: internal error code, arguments: [kcrfwfl_nab], [4294967295], [102401], [], [], [], [], []
Mon May 13 10:38:53 2019
LGWR: terminating instance due to error 470
Mon May 13 10:38:53 2019
ORA-470 signalled during: alter database open...
Mon May 13 10:38:53 2019
Errors in file d:\oracle\product\10.2.0\db_1\admin\oracle\bdump\oracle_pmon_2928.trc:
ORA-00470: LGWR process terminated with error
Mon May 13 10:38:54 2019
Errors in file d:\oracle\product\10.2.0\db_1\admin\oracle\bdump\oracle_reco_3224.trc:
ORA-00470: LGWR process terminated with error
Mon May 13 10:38:54 2019
Errors in file d:\oracle\product\10.2.0\db_1\admin\oracle\bdump\oracle_smon_3064.trc:
ORA-00470: LGWR process terminated with error
Mon May 13 10:38:54 2019
Errors in file d:\oracle\product\10.2.0\db_1\admin\oracle\bdump\oracle_ckpt_3640.trc:
ORA-00470: LGWR process terminated with error
Mon May 13 10:38:54 2019
Errors in file d:\oracle\product\10.2.0\db_1\admin\oracle\bdump\oracle_dbw0_1976.trc:
ORA-00470: LGWR process terminated with error
Mon May 13 10:38:54 2019
Errors in file d:\oracle\product\10.2.0\db_1\admin\oracle\bdump\oracle_mman_2356.trc:
ORA-00470: LGWR process terminated with error
Mon May 13 10:38:54 2019
Errors in file d:\oracle\product\10.2.0\db_1\admin\oracle\bdump\oracle_psp0_3876.trc:
ORA-00470: LGWR process terminated with error
Mon May 13 10:38:54 2019
Instance terminated by LGWR, pid = 2308

尝试启动数据库到upgrade模式,报ORA-600 2758错误

SQL> alter database open upgrade;
alter database open upgrade
*
第 1 行出现错误:
ORA-00600: 内部错误代码, 参数: [2758], [3], [4294967295], [102400], [10], [],[], []

对应alert日志报错

Mon May 13 10:45:50 2019
Completed redo application
Mon May 13 10:45:50 2019
Completed crash recovery at
 Thread 1: logseq 1699, block 4294967295, scn 56807170
 0 data blocks read, 0 data blocks written, 0 redo blocks read
Mon May 13 10:45:50 2019
Errors in file d:\oracle\product\10.2.0\db_1\admin\oracle\udump\oracle_ora_3220.trc:
ORA-00600: 内部错误代码, 参数: [2758], [3], [4294967295], [102400], [10], [], [], []
Mon May 13 10:45:51 2019
Aborting crash recovery due to error 600
Mon May 13 10:45:51 2019
Errors in file d:\oracle\product\10.2.0\db_1\admin\oracle\udump\oracle_ora_3220.trc:
ORA-00600: 内部错误代码, 参数: [2758], [3], [4294967295], [102400], [10], [], [], []
ORA-600 signalled during: alter database open upgrade...

通过上述相关报错分析,以及ORA-600 kcrfwfl_nab和ORA-600 2758报错的相关资料查询,确定是由于redo和ctl损坏导致,通过强制拉库,恢复成功

SQL> recover database;
完成介质恢复。
SQL> alter database open resetlogs;
数据库已更改。

存储双活系统逻辑损坏数据库抢救恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:存储双活系统逻辑损坏数据库抢救恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

计划休假的前一夜晚上节点朋友求救电话,说xx医院核心his系统的Oracle数据库很多表报ORA-8103错误,业务无法正常办理.
ora-8103


通过dbv检查文件发现连续坏块
2


根据以往经验数据库出现类似这样的错误,很可能是底层问题,查看系统日志发现大量磁盘错误
3


该报错时间和应用反馈系统异常时间基本上匹配,初步怀疑是硬件或者os异常导致.因为客户数据库大量表表ORA-8103,而且有文件出现连片被置空,无法准确定位数据库损坏情况(置空值数据库级别的物理损坏,ora-8103是逻辑错误在表不被访问的情况下无法检查出来),考虑分析客户的硬件环境,备份容灾情况,分析选择最佳方案.
通过和客户沟通以及检查数据库的相关情况发现信息如下:
1)存储使用的是xx厂商的双活方案,这种存储级解决方案对于该故障来说没用,因为是lun的逻辑级别损坏,损坏数据同时同步到两套存储上.
2)数据库库容灾使用的是某厂家的cdp同步容灾,客户对cdp库进行分析,发现数据同步异常,基本上该方案也无法使用
3)数据库的备份情况:由于存放数据库备份的存储电池异常和有坏盘导致存储写io效率非常低,客户在3天之前停止掉了文件系统中的rman备份;有tsm的带库备份,结果检查发现竟无一次备份成功.
故障进一步扩大
针对客户情况,确定是节点2有明显异常,准备停掉节点2的数据库和集群,然后看下在节点1上是否有改善,结果发现把节点2的crs停掉之后,节点1的库直接crash,通过分析发现asm disk有一块盘磁盘前几M表直接置空(应该是在关闭crs之前就已经异常,只是因为磁盘头部分数据没有相关操作,因此没有触发相关问题),当一个节点关闭会去写磁盘头信息,asm发现异常直接dismount 节点1的磁盘组了,从而使得节点1的库异常.
4


5


现在的情况:
1)现在的asm 磁盘组异常(其中一个磁盘头前几M损坏),也就是说在原库基础上直接修复的概率基本上没有可能
2)cdp数据异常,不可用
3)在数据库相关服务器中找到一份4天之前的一次全备
恢复思路:
1.客户准备新空间,直接把4天之前的备份还原到本地文件系统中
2.通过底层工具对于有磁盘损坏的asm磁盘组进行分析,尝试恢复归档日志和redo(尽可能做到最大限度恢复数据)
3.通过备份还原4天之前的备份结合我们恢复的归档日志和redo尝试完全恢复数据
4.问题风险,就算归档日志和redo从损坏的asm 磁盘组中恢复出来,但是也有可能损坏,导致后面无法恢复到最新数据(造成数据丢失)
实际操作:
1. 由于客户在昨天晚上故障之后增加了一些undo数据文件,使得无法正常全库restore database(因为ctl中数据文件信息比备份集中多)
2. 后续由于10204 rac还原到单机出现ORA-600 kgeade_is_1错误
6


3. 数据库恢复完成之后,出现sqlplus 操作数据库正常,plsql dev和应用访问数据库报ora-27092的问题
7


最后运气不错,经过一系列努力,数据库open成功,应用也正常访问,最初生产环境中损坏的表现在查询也不再报ORA-8103,dbv检查异常文件也ok
8


9


再次提醒各位朋友:
1)你的数据库备份是否正常,建议定期做故障演练
2)选择合适数据库的容灾方案,建议定期检查或者演练
3)存储双活可以解决硬件故障问题,但是还要有适当的解决方案来规避存储逻辑错误风险.

数据库open过程遭遇ORA-1555对应sql语句补充

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:数据库open过程遭遇ORA-1555对应sql语句补充

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在2015年的在数据库open过程中常遇到ORA-01555汇总文章中写过oracle open过程中可能会遇到ORA-01555错误,对应的sql语句.最近的恢复中又遇到两个新的,对其进行补充
select rowcnt,blkcnt,empcnt,avgspc,chncnt,avgrln,nvl(degree,1), nvl(instances,1) from tab$ where obj# = :1

Thu May 09 02:10:27 2019
SMON: enabling cache recovery
ORA-01555 caused by SQL statement below (SQL ID: bqbdby3c400p7, SCN: 0x0000.3e785fc7):
select rowcnt,blkcnt,empcnt,avgspc,chncnt,avgrln,nvl(degree,1), nvl(instances,1) from tab$ where obj# = :1
Errors in file /home/u01/diag/rdbms/orcl/orcl/trace/orcl_ora_15929.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 91 with name "_SYSSMU91_1360910548$" too small
Errors in file /home/u01/diag/rdbms/orcl/orcl/trace/orcl_ora_15929.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 91 with name "_SYSSMU91_1360910548$" too small
Error 704 happened during db open, shutting down database
USER (ospid: 15929): terminating the instance due to error 704
Instance terminated by USER, pid = 15929
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (15929) as a result of ORA-1092
Thu May 09 02:10:28 2019
ORA-1092 : opitsk aborting process

select obj#,type#,ctime,mtime,stime, status, dataobj#, flags, oid$, spare1, spare2 from obj$ where owner#=:1 and name=:2 and namespace=:3 and remoteowner is null and linkname is null and subname is null

NSA2 started with pid=41, OS id=32571518
ORA-01555 caused by SQL statement below (SQL ID: 3nkd3g3ju5ph1, Query Duration=0 sec, SCN: 0x0005.e4bea784):
select obj#,type#,ctime,mtime,stime, status, dataobj#, flags, oid$, spare1, spare2 from
    obj$ where owner#=:1 and name=:2 and namespace=:3 and remoteowner is null and linkname is null and subname is null
Errors in file /u01/app/oracle/diag/rdbms/xifenfei_std/xifenfei/trace/xifenfei_ora_18939904.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-01555: snapshot too old: rollback segment number 7 with name "_SYSSMU7_542380376$" too small
Errors in file /u01/app/oracle/diag/rdbms/xifenfei_std/xifenfei/trace/xifenfei_ora_18939904.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-01555: snapshot too old: rollback segment number 7 with name "_SYSSMU7_542380376$" too small
Error 704 happened during db open, shutting down database
USER (ospid: 18939904): terminating the instance due to error 704
Instance terminated by USER, pid = 18939904
ORA-1092 signalled during: alter database open RESETLOGS...
opiodr aborting process unknown ospid (18939904) as a result of ORA-1092

ORA-600 kcffo_online_pdb_check: fno_system 和 ORA-600 kcvfdb_pdb_set_clean_scn: cleanckpt错误

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORA-600 kcffo_online_pdb_check: fno_system 和 ORA-600 kcvfdb_pdb_set_clean_scn: cleanckpt错误

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在做18c模拟故障测试中,经过自己一系列折腾,主要遭遇了ORA-600 kcffo_online_pdb_check: fno_system 和 ORA-600 kcvfdb_pdb_set_clean_scn: cleanckpt错误,这些都是pdb特有的,主要是由于一些bug引起,在非pdb环境中不太可能遇到.其实这也就是说明由于pdb机制的引入,使得后续的数据库异常恢复中会更加复杂.
18c数据库open ORA-00603 ORA-01092 ORA-00600报错

[oracle@ora11g tmp]$ ss
SQL*Plus: Release 18.0.0.0.0 - Production on Sat Apr 20 21:18:09 2019
Version 18.3.0.0.0
Copyright (c) 1982, 2018, Oracle.  All rights reserved.
Connected to:
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.3.0.0.0
SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
BANNER_FULL
--------------------------------------------------------------------------------
BANNER_LEGACY
--------------------------------------------------------------------------------
    CON_ID
----------
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.3.0.0.0
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
         0
BANNER
--------------------------------------------------------------------------------
BANNER_FULL
--------------------------------------------------------------------------------
BANNER_LEGACY
--------------------------------------------------------------------------------
    CON_ID
----------
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], [], [],
[], [], []
Process ID: 55775
Session ID: 135 Serial number: 49652

alert日志信息

Database Characterset is AL32UTF8
2019-04-20T18:20:54.256841+08:00
No Resource Manager plan active
2019-04-20T18:20:56.751241+08:00
replication_dependency_tracking turned off (no async multimaster replication found)
2019-04-20T18:20:57.862516+08:00
Starting background process AQPC
2019-04-20T18:20:58.341991+08:00
AQPC started with pid=45, OS id=55830
2019-04-20T18:21:01.476252+08:00
PDB$SEED(2):Autotune of undo retention is turned on.
2019-04-20T18:21:01.738732+08:00
Pdb PDB$SEED hit error 1157 during open read only (2) and will be closed.
2019-04-20T18:21:01.755310+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_55775.trc:
ORA-01157: cannot identify/lock data file 5 - see DBWR trace file
ORA-01110: data file 5: '/u01/app/oracle/oradata/ORCL18C/pdbseed/system01.dbf'
PDB$SEED(2):JIT: pid 55775 requesting stop
PDB$SEED(2):Buffer Cache flush deferred for PDB 2
Could not open PDB$SEED error=1157
2019-04-20T18:21:01.887601+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_55775.trc:
ORA-01157: cannot identify/lock data file 5 - see DBWR trace file
ORA-01110: data file 5: '/u01/app/oracle/oradata/ORCL18C/pdbseed/system01.dbf'
2019-04-20T18:21:03.385503+08:00
PDB1(3):Autotune of undo retention is turned on.
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_p000_55808.trc  (incident=66865) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [kcffo_online_pdb_check: fno_system], [3], [], [], [], [], [], [], [], [], [], []
2019-04-20T18:21:03.682428+08:00
PDB2(4):Autotune of undo retention is turned on.
Incident details in: /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/incident/incdir_66865/orcl18c_p000_55808_i66865.trc
2019-04-20T18:21:12.863880+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2019-04-20T18:21:12.879921+08:00
Pdb PDB1 hit error 600 during open read write (5) and will be closed.
2019-04-20T18:21:12.880506+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_p000_55808.trc:
ORA-00600: internal error code, arguments: [kcffo_online_pdb_check: fno_system], [3], [], [], [], [], [], [], [], [], [], []
PDB1(3):JIT: pid 55808 requesting stop
2019-04-20T18:21:12.915407+08:00
Dumping diagnostic data in directory=[cdmp_20190420182112], requested by (instance=1, osid=55808 (P000)), summary=[incident=66865].
2019-04-20T18:21:12.989890+08:00
PDB1(3):Buffer Cache flush deferred for PDB 3
2019-04-20T18:21:13.004575+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_p001_55810.trc  (incident=66873) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [kcffo_online_pdb_check: fno_system], [4], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/incident/incdir_66873/orcl18c_p001_55810_i66873.trc
2019-04-20T18:21:17.218642+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2019-04-20T18:21:17.222057+08:00
Pdb PDB2 hit error 600 during open read write (5) and will be closed.
2019-04-20T18:21:17.222236+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_p001_55810.trc:
ORA-00600: internal error code, arguments: [kcffo_online_pdb_check: fno_system], [4], [], [], [], [], [], [], [], [], [], []
2019-04-20T18:21:17.260941+08:00
Dumping diagnostic data in directory=[cdmp_20190420182117], requested by (instance=1, osid=55810 (P001)), summary=[incident=66873].
2019-04-20T18:21:17.262023+08:00
PDB2(4):JIT: pid 55810 requesting stop
PDB2(4):Buffer Cache flush deferred for PDB 4
2019-04-20T18:21:17.352939+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2019-04-20T18:21:17.483695+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_55775.trc  (incident=66857) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [kcffo_online_pdb_check: fno_system], [3], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/incident/incdir_66857/orcl18c_ora_55775_i66857.trc
2019-04-20T18:21:22.612339+08:00
*****************************************************************
An internal routine has requested a dump of selected redo.
This usually happens following a specific internal error, when
analysis of the redo logs will help Oracle Support with the
diagnosis.
It is recommended that you retain all the redo logs generated (by
all the instances) during the past 12 hours, in case additional
redo dumps are required to help with the diagnosis.
*****************************************************************
2019-04-20T18:21:26.062635+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_55775.trc  (incident=66858) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/incident/incdir_66858/orcl18c_ora_55775_i66858.trc
2019-04-20T18:21:26.506644+08:00
Dumping diagnostic data in directory=[cdmp_20190420182126], requested by (instance=1, osid=55775), summary=[incident=66857].
2019-04-20T18:21:30.119381+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2019-04-20T18:21:30.119505+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_55775.trc:
ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], [], [], [], [], []
2019-04-20T18:21:30.119629+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_55775.trc:
ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], [], [], [], [], []
2019-04-20T18:21:30.119719+08:00
Error 600 happened during db open, shutting down database
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_55775.trc  (incident=66859) (PDBNAME=CDB$ROOT):
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/incident/incdir_66859/orcl18c_ora_55775_i66859.trc
2019-04-20T18:21:30.346811+08:00
Dumping diagnostic data in directory=[cdmp_20190420182130], requested by (instance=1, osid=55775), summary=[incident=66858].
2019-04-20T18:21:34.551418+08:00
opiodr aborting process unknown ospid (55775) as a result of ORA-603
2019-04-20T18:21:34.720885+08:00
ORA-603 : opitsk aborting process
License high water mark = 4
2019-04-20T18:21:34.754766+08:00
USER (ospid: 55775): terminating the instance due to ORA error 600
2019-04-20T18:21:35.839992+08:00
Instance terminated by USER, pid = 55775

alert日志提示文件不存在,实际上文件是存在的

[root@ora11g ~]#
[root@ora11g ~]# ls -l /u01/app/oracle/oradata/ORCL18C/pdbseed/system01.dbf
-rw-r-----. 1 oracle oinstall 283123712 4月  13 23:59 /u01/app/oracle/oradata/ORCL18C/pdbseed/system01.dbf

主要错误是ORA-600 kcffo_online_pdb_check: fno_system 查询mos发现主要是由于数据库bug导致,查询mos发现不少bug
KCFFO_ONLINE_PDB_CHECK FNO_SYSTEM]


官方解释,主要是由于kcffo_online_pdb函数执行异常导致

Function kcffo_online_pdb_check  Check if it ok to online the files in a
pluggable database. An error is  signalled if it is not ok. This routine will
grab a file enqueue for the relevant files and save it in the NULL-terminated
array fenqsp. The caller must release these after kcffo_online_pdb() or if an
error occurs.

通过人工修改文件状态,绕过该错误,cdb数据库open成功,但是pdb依旧无法正常open

SQL> alter database open;
Database altered.
SQL>  alter session set container=PDB1;
Session altered.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [kcvfdb_pdb_set_clean_scn: cleanckpt], [3], [1739494], [38655308813], [2], [], [], [], [], [], [], []

alert日志

PDB1(3):alter database open
PDB1(3):Autotune of undo retention is turned on.
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_56178.trc  (incident=69185) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [kcvfdb_pdb_set_clean_scn: cleanckpt], [3], [1739494], [38655308813], [2], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/incident/incdir_69185/orcl18c_ora_56178_i69185.trc
2019-04-20T18:31:41.761479+08:00
*****************************************************************
An internal routine has requested a dump of selected redo.
This usually happens following a specific internal error, when
analysis of the redo logs will help Oracle Support with the
diagnosis.
It is recommended that you retain all the redo logs generated (by
all the instances) during the past 12 hours, in case additional
redo dumps are required to help with the diagnosis.
*****************************************************************
2019-04-20T18:31:42.097465+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Pdb PDB1 hit error 600 during open read write (1) and will be closed.
2019-04-20T18:31:42.097847+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_56178.trc:
ORA-00600: internal error code, arguments: [kcvfdb_pdb_set_clean_scn: cleanckpt], [3], [1739494], [38655308813], [2], [], [], [], [], [], [], []
PDB1(3):JIT: pid 56178 requesting stop
2019-04-20T18:31:42.098808+08:00
Dumping diagnostic data in directory=[cdmp_20190420183142], requested by (instance=1, osid=56178), summary=[incident=69185].
2019-04-20T18:31:42.138818+08:00
PDB1(3):Buffer Cache flush deferred for PDB 3
PDB1(3):ORA-600 signalled during: alter database open...

主要错误是ORA-600 kcvfdb_pdb_set_clean_scn: cleanckpt,通过查询mos,依旧发现mos上有的主要可能的bug
kcvfdb_pdb_set_clean_scn cleanckpt


通过人工修改数据文件的checkpoint scn解决该问题,pdb open成功

ORACLE Instance XFF (pid = 18) – Error 600 encountered while recovering transaction

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORACLE Instance XFF (pid = 18) – Error 600 encountered while recovering transaction

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

分享一次由于一个表异常导致数据库报类似:ORACLE Instance XFF (pid = 18) – Error 600 encountered while recovering transaction故障的案例
一个10.2.0.4的数据库,正常运行的库突然出现如下错误

Sun Apr 07 11:07:12 2019
Thread 1 advanced to log sequence 602883 (LGWR switch)
  Current log# 3 seq# 602883 mem# 0: L:\ORADATA\XFF\REDO03.LOG
Sun Apr 07 11:10:38 2019
Thread 1 advanced to log sequence 602884 (LGWR switch)
  Current log# 1 seq# 602884 mem# 0: L:\ORADATA\XFF\REDO01.LOG
Sun Apr 07 11:11:56 2019
Errors in file c:\oracle\product\10.2.0\admin\XFF\udump\XFF_ora_22956.trc:
ORA-00600: 内部错误代码, 参数: [ktspgfb-1], [], [], [], [], [], [], []
Sun Apr 07 11:12:46 2019
Errors in file c:\oracle\product\10.2.0\admin\XFF\udump\XFF_ora_27408.trc:
ORA-00600: 内部错误代码, 参数: [kcbnew_3], [0], [1], [168354056], [], [], [], []
Sun Apr 07 11:13:57 2019
Errors in file c:\oracle\product\10.2.0\admin\XFF\udump\XFF_ora_6632.trc:
ORA-00600: 内部错误代码, 参数: [ktspgfb-1], [], [], [], [], [], [], []

过一段时间报,然后实例直接crash

Tue Apr 09 07:47:35 2019
ORACLE Instance XFF (pid = 18) - Error 600 encountered while recovering transaction (1, 1) on object 113718002.
Tue Apr 09 07:47:35 2019
Errors in file c:\oracle\product\10.2.0\admin\XFF\bdump\XFF_smon_12948.trc:
ORA-00600: internal error code, arguments: [kcbgcur_3], [168454497], [8], [4], [0], [], [], []
Tue Apr 09 07:55:23 2019
Errors in file c:\oracle\product\10.2.0\admin\XFF\bdump\XFF_pmon_22652.trc:
ORA-00474: SMON process terminated with error
Tue Apr 09 07:55:24 2019
PMON: terminating instance due to error 474
Tue Apr 09 07:55:24 2019
Errors in file c:\oracle\product\10.2.0\admin\XFF\bdump\XFF_lgwr_28608.trc:
ORA-00474: SMON process terminated with error
Tue Apr 09 07:55:34 2019
Errors in file c:\oracle\product\10.2.0\admin\XFF\bdump\XFF_psp0_12544.trc:
ORA-00474: SMON process terminated with error
Tue Apr 09 07:55:34 2019
Errors in file c:\oracle\product\10.2.0\admin\XFF\bdump\XFF_j000_5216.trc:
ORA-00474: SMON process terminated with error
Tue Apr 09 07:55:35 2019
Errors in file c:\oracle\product\10.2.0\admin\XFF\bdump\XFF_ckpt_28204.trc:
ORA-00474: SMON process terminated with error
Tue Apr 09 07:55:36 2019
Errors in file c:\oracle\product\10.2.0\admin\XFF\bdump\XFF_mman_9320.trc:
ORA-00474: SMON process terminated with error
Tue Apr 09 07:55:44 2019
Errors in file c:\oracle\product\10.2.0\admin\XFF\bdump\XFF_q002_24384.trc:
ORA-00474: SMON process terminated with error
Tue Apr 09 07:55:53 2019
Errors in file c:\oracle\product\10.2.0\admin\XFF\bdump\XFF_reco_24124.trc:
ORA-00474: SMON process terminated with error

根据以上报错,数据库crash的原因是由于undo异常导致,通过对undo进行重建,解决掉异常undo,但是业务运行之后,一样的问题又重现,最后通过分析确认是对象异常导致

SQL> create table XFF.T_XIFENFEI_xff as select * from XFF.T_XIFENFEI;
create table XFF.T_XIFENFEI_xff as select * from XFF.T_XIFENFEI
                                                           *
ERROR at line 1:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ], [0], [0], [1], [], [], [], []
屏蔽相关block obj的check之后
SQL> create table XFF.T_XIFENFEI_xff as select * from XFF.T_XIFENFEI;
create table XFF.T_XIFENFEI_xff as select * from XFF.T_XIFENFEI
                                                           *
ERROR at line 1:
ORA-00600: internal error code, arguments: [ktspScanInit-l1], [], [], [], [],[], [], []

比较明显该表对象出现逻辑异常,通过基于rowid的方式对该表数据进行抽取

SQL> create table XFF.T_XIFENFEI_new
  2  as
  3  select * from XFF.T_XIFENFEI where 1=0;
Table created.
SQL> set serveroutput on
SQL> set concat off
SQL> DECLARE
  2   nrows number;
  3   rid rowid;
  4   dobj number;
  5   ROWSPERBLOCK number;
  6  BEGIN
  7   ROWSPERBLOCK:=1000;
  8   nrows:=0;
  9   select data_object_id  into dobj
 10   from dba_objects
 11   where owner = 'XFF'
 12   and object_name = 'T_XIFENFEI'
 13   ;
 14   for i in (select relative_fno, block_id, block_id+blocks-1 totblocks
 15             from dba_extents
 16             where owner = 'XFF'
 17               and segment_name = 'T_XIFENFEI'
 18            order by extent_id)
 19   loop
 20     for br in i.block_id..i.totblocks loop
 21      for j in 1..ROWSPERBLOCK loop
 22      begin
 23        rid := dbms_rowid.ROWID_CREATE(1,dobj,i.relative_fno, br , j-1);
 24        insert into XFF.T_XIFENFEI_NEW
 25        select /*+ ROWID(A) */ *
 26        from XFF.T_XIFENFEI A
 27        where rowid = rid;
 28        if sql%rowcount = 1 then nrows:=nrows+1; end if;
 29        if (mod(nrows,10000)=0) then commit; end if;
 30      exception when others then null;
 31      end;
 32      end loop;
 33    end loop;
 34   end loop;
 35   COMMIT;
 36   dbms_output.put_line('Total rows: '||to_char(nrows));
 37  END;
 38  /
Total rows: 227000
PL/SQL procedure successfully completed.

再次观察数据库恢复正常,也不再crash和报错,恢复完成

linux资源限制导致数据库异常

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:linux资源限制导致数据库异常

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

一起由于liunx系统资源限制导致数据库无法启动案例分享
数据库启动报ORA-01157错

SQL> startup
ORACLE instance started.
Total System Global Area 3340451840 bytes
Fixed Size		    2217952 bytes
Variable Size		 1862273056 bytes
Database Buffers	 1459617792 bytes
Redo Buffers		   16343040 bytes
Database mounted.
ORA-01157: cannot identify/lock data file 5 - see DBWR trace file
ORA-01110: data file 5: '/home/oracle/oradata/XIFENFEI.dbf'

该错误一般是由于文件丢失或者路径错误导致

alert日志显示

Sun Apr 07 20:57:03 2019
ALTER DATABASE OPEN
Sun Apr 07 20:57:03 2019
Errors in file /dbdata/oracle/diag/rdbms/orcl/orcl/trace/orcl_dbw0_2681.trc:
ORA-01157: cannot identify/lock data file 5 - see DBWR trace file
ORA-01110: data file 5: '/home/oracle/oradata/XIFENFEI.dbf'
ORA-27092: size of file exceeds file size limit of the process
Additional information: 262144
Additional information: 262145
Errors in file /dbdata/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_2802.trc:
ORA-01157: cannot identify/lock data file 5 - see DBWR trace file
ORA-01110: data file 5: '/home/oracle/oradata/XIFENFEI.dbf'
ORA-1157 signalled during: ALTER DATABASE OPEN...
Sun Apr 07 20:57:04 2019
Errors in file /dbdata/oracle/diag/rdbms/orcl/orcl/trace/orcl_m000_2804.trc  (incident=38578):
ORA-00600: internal error code, arguments: [kcidr_io_check_common_6], [10],
     [/home/oracle/oradata/XIFENFEI.dbf], [8192], [2], [5], [], [], [], [], [], []
ORA-27092: size of file exceeds file size limit of the process

这里看到提示ORA-27092: size of file exceeds file size limit of the process
查看系统limit配置

[oracle@XFF ~]$ ulimit -a
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) 640000
scheduling priority             (-e) 0
file size               (blocks, -f) 2097152
pending signals                 (-i) 128489
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 131072
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 131072
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

一般操作系统block size为1k,这里限制文件大小为2097152=(2G)
查看文件

[oracle@XFF ~]$ ls -l /home/oracle/oradata/XIFENFEI.dbf
-rw-r-----. 1 oracle oinstall 2147491840 Apr  7 19:04 /home/oracle/oradata/XIFENFEI.dbf

文件大小为2097160>2097152,导致异常

设置系统对文件大小限制2097152kb

[root@XFF ~]# ulimit -f 102400000
[root@XFF ~]# su - oracle
[oracle@XFF ~]$ ulimit -a
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) 640000
scheduling priority             (-e) 0
file size               (blocks, -f) 102400000
pending signals                 (-i) 128489
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 131072
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 131072
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

重启数据库,open成功

SQL> shutdown immediate;
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.
Total System Global Area 3340451840 bytes
Fixed Size		    2217952 bytes
Variable Size		 1862273056 bytes
Database Buffers	 1459617792 bytes
Redo Buffers		   16343040 bytes
Database mounted.
Database opened.

非归档数据库异常恢复一例

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:非归档数据库异常恢复一例

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

由于存储故障,数据库为非归档模式,通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)收集信息确认数据库redo异常
2


Thu Mar 28 11:36:13 2019
ALTER DATABASE RECOVER    CONTINUE DEFAULT
Media Recovery Log /u01/app/oracle/flash_recovery_area/ORCL/archivelog/2019_03_28/o1_mf_1_5397869_%u_.arc
Thu Mar 28 11:36:13 2019
Errors with log /u01/app/oracle/flash_recovery_area/ORCL/archivelog/2019_03_28/o1_mf_1_5397869_%u_.arc
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_pr00_17611.trc:
ORA-00308:cannot open archived log
    '/u01/app/oracle/flash_recovery_area/ORCL/archivelog/2019_03_28/o1_mf_1_5397869_%u_.arc'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
ORA-308 signalled during: ALTER DATABASE RECOVER    CONTINUE DEFAULT  ...
ALTER DATABASE RECOVER    CONTINUE DEFAULT
Media Recovery Log /u01/app/oracle/flash_recovery_area/ORCL/archivelog/2019_03_28/o1_mf_1_5397869_%u_.arc
Errors with log /u01/app/oracle/flash_recovery_area/ORCL/archivelog/2019_03_28/o1_mf_1_5397869_%u_.arc
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_pr00_17611.trc:
ORA-00308:cannot open archived log
   '/u01/app/oracle/flash_recovery_area/ORCL/archivelog/2019_03_28/o1_mf_1_5397869_%u_.arc'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
ORA-308 signalled during: ALTER DATABASE RECOVER    CONTINUE DEFAULT  ...
Thu Mar 28 11:38:44 2019
ALTER DATABASE RECOVER  datafile 5,6
Media Recovery Start
Serial Media Recovery started
Recovery of Online Redo Log: Thread 1 Group 3 Seq 5397870 Reading mem 0
  Mem# 0: /u01/app/oracle/oradata/orcl/redo03.log
ORA-279 signalled during: ALTER DATABASE RECOVER  datafile 5,6  ...
Thu Mar 28 11:39:08 2019
ALTER DATABASE RECOVER    CONTINUE DEFAULT
Media Recovery Log /u01/app/oracle/flash_recovery_area/ORCL/archivelog/2019_03_28/o1_mf_1_5397870_%u_.arc
Errors with log /u01/app/oracle/flash_recovery_area/ORCL/archivelog/2019_03_28/o1_mf_1_5397870_%u_.arc
ORA-308 signalled during: ALTER DATABASE RECOVER    CONTINUE DEFAULT  ...
ALTER DATABASE RECOVER CANCEL
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_17605.trc  (incident=365041):
ORA-00600: internal error code, arguments: [3051], [82], [], [], [], [], [], [], [], [], [], []
ORA-600 signalled during: ALTER DATABASE RECOVER CANCEL ...

通过屏蔽一致性,强制打开库,报kgegpa错误,实例启动失败

Database Characterset is ZHS16GBK
No Resource Manager plan active
Exception[type:SIGSEGV, Address not mapped to object][ADDR:0x319C0CF3] [PC:0x2297740, kgegpa()+40] [flags: 0x0, count:1]
Exception[type:SIGSEGV, Address not mapped to object][ADDR:0x319C0CF3] [PC:0x229596B, kgebse()+279][flags: 0x2, count:2]
Exception[type:SIGSEGV, Address not mapped to object][ADDR:0x319C0CF3] [PC:0x229596B, kgebse()+279][flags: 0x2, count:2]
Thu Mar 28 11:43:15 2019
PMON (ospid: 17939): terminating the instance due to error 397
Instance terminated by PMON, pid = 17939

处理上述错误相关undo,启动数据库报ORA-00600 4193,ORA-00600 4137, ORA-00600 6006

Thu Mar 28 11:50:37 2019
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_p001_18267.trc  (incident=373059):
ORA-00600: internal error code, arguments: [6006], [1], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_373059/orcl_p001_18267_i373059.trc
Stopping background process MMON
Trace dumping is performing id=[cdmp_20190328115038]
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_18247.trc  (incident=372995):
ORA-00600: internal error code, arguments: [6006], [1], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_372995/orcl_smon_18247_i372995.trc
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_18247.trc  (incident=372996):
ORA-00600: internal error code, arguments: [4137], [34.22.4206895], [0], [0], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_372996/orcl_smon_18247_i372996.trc
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_18263.trc  (incident=373044):
ORA-00600: internal error code, arguments: [4193], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_373044/orcl_ora_18263_i373044.trc
ORACLE Instance orcl (pid = 16) - Error 600 encountered while recovering transaction (34, 22).
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_18247.trc:
ORA-00600: internal error code, arguments: [4137], [34.22.4206895], [0], [0], [], [], [], [], [], [], [], []

通过重建undo,相关报错消失,安排数据导出重建库